DatsyukToZetterberg
Alligator!
I was looking for a project that could help me begin to learn how to use R and the one that I settled on was to try an recreate the Marcel Projections from baseball, but for hockey. The Marcels are of the most basic projection systems around and created by Tom Tango to act as a reference point for all other projections. When I googled it I wasn't able to find any examples so I thought this would be a good project to try.
Without going into too much detail Marcels use three seasons of data, with the more recent seasons being more heavily weighted; regresses those weighted numbers towards a league average; and then adjusts those projections by an age adjustment. All rookies are assumed to provide league average production for whatever position they play. This blog post by Tango goes more in-depth with the process and will explain it better than I could.
I decided that the best way to go about projecting a player’s stats was not to use TOI, but to use TOI/GP. I found in previous research that there was little predictive value in using a player’s previous seasons GP to predict a future number of GP. For that reason, I felt it would create more accurate projections if the GP portion of the equation was removed and the projections assumed all players would play 82 games.
The steps are the same to calculate the hockey Marcels, the difference is in some of the values used.
1) The 5/4/3 weights for each season were kept the same and I continued to use league averages as the regression component – though I did split the league averages into F & D categories. I also made the decision to implement a minimum GP of 15 to count as a season played. The regression towards the mean component was changed from 1200 PA to 31 toi/gp for Fowards and 37 toi/gp for Defenceman.
2) The original projected PA formula called for PA = 0.5*N-1PA + 0.1*N-2PA + 200. I couldn’t find a reason for why the specific coefficient values were used or why it was + 200 PA. I assumed the 200 PA was used to provide a baseline and act as floor value. It also worked out that 1/3 of 600 PA is 200. I initially tried the base formula, but found that the projected TOI values were far too low to be reasonable projections. I settled on this formula through trial and error:
3) For the age adjustment I lowered the peak to 28 and the values for the adjustments to 0.01 (if under 28) and 0.005 (if over 28). 28 was used because it’s around the age where both F and D see their TOI peak and the values were adjusted because I felt like there should be a slightly higher increase/decrease for aging.
As an example let's calculate McDavid's projected goals for the 2021 season.
These were the only notable changes to the Marcels. With actual projections made the only thing left to do was to compare the projections with how the season actually played out. I decided to use RMSE, R^2, and MAE as my methods to evaluate the accuracy of the model. I believe all three have their merits and can be used to draw helpful conclusions.
The error for the Marcels in the 2018-2019 season:
And the error for the 2019-2020 season:
In this case I included two different samples. One that included all the players that played in the season and had a projection as well as one which included just those that played in at least the past 3 seasons. This was because the Marcel is really designed to only project those with 3 seasons of data and expecting it to project those seasons accurately is unfair.
From first glance the Marcel seems to have been decent at projecting most of the categories as the R^2 values are greater than 0.60, some are over 0.70. Those numbers become more accurate when we limit the sample to just those that had 3 season worth of data with most being in 0.70 to 0.80 range. That’s very promising and shows that overall the Marcels were able to explain quite a bit of the variation. I’ll also add that I did conduct an F-Test for each category and confirmed that the predictions were useful. It’s a little bit harder to tell what the RMSE and MAE mean without comparing them to other projection models. Unfortunately, I wasn’t able to find a ton of public projection models that were easily downloadable, so I used a post from Dom Luszczyszyn to act as my reference point.
The picture is comparing two models for the 2018-2019 season, those being the Left-Wing Lock and Dom’s, and are based on a full 82 GP season for all players. While it’s never specified if Dom used MAE or RMSE in his comparison I make an assumption that he uses MAE due to the lower error values. The Marcel is beaten handedly in some categories by Dom’s model; however, it holds its own against the LWL model in goals, assists, powerplay points, Penalty Minutes, and TOI. While it’s clear that overall the other models are better than the Marcel I think it’s a positive sign that it came close in some categories and wasn’t too far behind in some others.
For the 2019-2020 season I think that while the overall error scores are lower, indicating more accuracy, the R^2 is also lower. I believe this is due to the regular season being cut short and there being a large number of games not played for all players. When I compare the MAE and RMSE values to my own projection model the values are again quite similar and are not too far off in most categories.
The Marcel model clearly has its limitations. It’s a 3 season weighted model which means players without those full 3 seasons will likely have projections that are all over the place. Not just that, but it also regresses players towards a league average. This means that projections will naturally be more conservative than other models. All that said, I think the hockey Marcels are still an excellent baseline for a projection model and are an excellent reference point. If your model is having a hard time beating the Marcel then you should probably re-evaluate it.
I should also reference the links that were quite helpful to me in creating this project. Thanks to Evolving-Hockey, for help with setting up the process; Marcel the Matrix, for help with the R code and a guide I could loosely follow; Beyond the Box Score, for help with setting up the methodology and a process to follow; and Tom Tango for keeping his blog post up so I could have something to reference.
*I have the 2021 projections all ready. I'll attach those in a post below.
Without going into too much detail Marcels use three seasons of data, with the more recent seasons being more heavily weighted; regresses those weighted numbers towards a league average; and then adjusts those projections by an age adjustment. All rookies are assumed to provide league average production for whatever position they play. This blog post by Tango goes more in-depth with the process and will explain it better than I could.
I decided that the best way to go about projecting a player’s stats was not to use TOI, but to use TOI/GP. I found in previous research that there was little predictive value in using a player’s previous seasons GP to predict a future number of GP. For that reason, I felt it would create more accurate projections if the GP portion of the equation was removed and the projections assumed all players would play 82 games.
The steps are the same to calculate the hockey Marcels, the difference is in some of the values used.
1) The 5/4/3 weights for each season were kept the same and I continued to use league averages as the regression component – though I did split the league averages into F & D categories. I also made the decision to implement a minimum GP of 15 to count as a season played. The regression towards the mean component was changed from 1200 PA to 31 toi/gp for Fowards and 37 toi/gp for Defenceman.
2) The original projected PA formula called for PA = 0.5*N-1PA + 0.1*N-2PA + 200. I couldn’t find a reason for why the specific coefficient values were used or why it was + 200 PA. I assumed the 200 PA was used to provide a baseline and act as floor value. It also worked out that 1/3 of 600 PA is 200. I initially tried the base formula, but found that the projected TOI values were far too low to be reasonable projections. I settled on this formula through trial and error:
- TOI(F) = 0.5*N-1 + 0.2*N-2 + 5
- TOI(D) = 0.5*N-1 + 0.2*N-2+ 6
3) For the age adjustment I lowered the peak to 28 and the values for the adjustments to 0.01 (if under 28) and 0.005 (if over 28). 28 was used because it’s around the age where both F and D see their TOI peak and the values were adjusted because I felt like there should be a slightly higher increase/decrease for aging.
As an example let's calculate McDavid's projected goals for the 2021 season.
1) First we must calculate McDavid's weighted goals per gp:
Weighted Goals/GP: ((34G/64GP) * 5) + ((41G/78GP)*4) + ((41G/82GP)*3) = 6.2588
2) We calculate the league wide rate of goals per minute and multiply it by McDavid's TOI:
2018: (6290G/456,259 TOI)*(1767/82)*3 = 0.891
2019: (6452G/456,802 TOI)*(1781/78)*4 = 1.290
2020: (5460G/388,451 TOI)*(1399/64)*5 = 1.536
Total expected goals of: 3.717 in 265.27 TOI. The TOI comes from (1767/82)*3 +(1781/78)*4+(1399/64)*5
3) We want to change this number from the 3.717 expected goals in 265.27 weighted TOI to a set rate per 31 minutes. Then we combine the weighted goals and the expected goals per 31 minutes to get a goals per minute:
Goals per 31 minutes: (3.717/265.27)*(31) = 0.4343
We then use this number to calculate McDavid's goals per minute:
Goals per min: (6.2588+0.4343)/((31)+265.27) =0.02259122
4) Then multiply the goals/toi by McDavid's proj TOI and by 82:
Projected TOI: 0.5*(1399/64) + 0.2*(1781/78)+ (5) = 20.50
Proj Goals (Pre Age adj): 0.02259122*20.50*82 = 37.98
5) Apply the aging value:
((28-(2021-1997))*0.01+1)* 37.98= 39.50 ~ 40
Weighted Goals/GP: ((34G/64GP) * 5) + ((41G/78GP)*4) + ((41G/82GP)*3) = 6.2588
2) We calculate the league wide rate of goals per minute and multiply it by McDavid's TOI:
2018: (6290G/456,259 TOI)*(1767/82)*3 = 0.891
2019: (6452G/456,802 TOI)*(1781/78)*4 = 1.290
2020: (5460G/388,451 TOI)*(1399/64)*5 = 1.536
Total expected goals of: 3.717 in 265.27 TOI. The TOI comes from (1767/82)*3 +(1781/78)*4+(1399/64)*5
3) We want to change this number from the 3.717 expected goals in 265.27 weighted TOI to a set rate per 31 minutes. Then we combine the weighted goals and the expected goals per 31 minutes to get a goals per minute:
Goals per 31 minutes: (3.717/265.27)*(31) = 0.4343
We then use this number to calculate McDavid's goals per minute:
Goals per min: (6.2588+0.4343)/((31)+265.27) =0.02259122
4) Then multiply the goals/toi by McDavid's proj TOI and by 82:
Projected TOI: 0.5*(1399/64) + 0.2*(1781/78)+ (5) = 20.50
Proj Goals (Pre Age adj): 0.02259122*20.50*82 = 37.98
5) Apply the aging value:
((28-(2021-1997))*0.01+1)* 37.98= 39.50 ~ 40
These were the only notable changes to the Marcels. With actual projections made the only thing left to do was to compare the projections with how the season actually played out. I decided to use RMSE, R^2, and MAE as my methods to evaluate the accuracy of the model. I believe all three have their merits and can be used to draw helpful conclusions.
The error for the Marcels in the 2018-2019 season:
And the error for the 2019-2020 season:
In this case I included two different samples. One that included all the players that played in the season and had a projection as well as one which included just those that played in at least the past 3 seasons. This was because the Marcel is really designed to only project those with 3 seasons of data and expecting it to project those seasons accurately is unfair.
From first glance the Marcel seems to have been decent at projecting most of the categories as the R^2 values are greater than 0.60, some are over 0.70. Those numbers become more accurate when we limit the sample to just those that had 3 season worth of data with most being in 0.70 to 0.80 range. That’s very promising and shows that overall the Marcels were able to explain quite a bit of the variation. I’ll also add that I did conduct an F-Test for each category and confirmed that the predictions were useful. It’s a little bit harder to tell what the RMSE and MAE mean without comparing them to other projection models. Unfortunately, I wasn’t able to find a ton of public projection models that were easily downloadable, so I used a post from Dom Luszczyszyn to act as my reference point.
The picture is comparing two models for the 2018-2019 season, those being the Left-Wing Lock and Dom’s, and are based on a full 82 GP season for all players. While it’s never specified if Dom used MAE or RMSE in his comparison I make an assumption that he uses MAE due to the lower error values. The Marcel is beaten handedly in some categories by Dom’s model; however, it holds its own against the LWL model in goals, assists, powerplay points, Penalty Minutes, and TOI. While it’s clear that overall the other models are better than the Marcel I think it’s a positive sign that it came close in some categories and wasn’t too far behind in some others.
For the 2019-2020 season I think that while the overall error scores are lower, indicating more accuracy, the R^2 is also lower. I believe this is due to the regular season being cut short and there being a large number of games not played for all players. When I compare the MAE and RMSE values to my own projection model the values are again quite similar and are not too far off in most categories.
The Marcel model clearly has its limitations. It’s a 3 season weighted model which means players without those full 3 seasons will likely have projections that are all over the place. Not just that, but it also regresses players towards a league average. This means that projections will naturally be more conservative than other models. All that said, I think the hockey Marcels are still an excellent baseline for a projection model and are an excellent reference point. If your model is having a hard time beating the Marcel then you should probably re-evaluate it.
I should also reference the links that were quite helpful to me in creating this project. Thanks to Evolving-Hockey, for help with setting up the process; Marcel the Matrix, for help with the R code and a guide I could loosely follow; Beyond the Box Score, for help with setting up the methodology and a process to follow; and Tom Tango for keeping his blog post up so I could have something to reference.
*I have the 2021 projections all ready. I'll attach those in a post below.
Last edited: