Using Regression to Adjust "Adjusted Points" for Top Tier Players '68-12

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
I added a variable (Xc), along the lines of what barney suggested, to capture the "concentration effect" of the top ~1-3 teams in GF. Xc is defined as the % of players in the top 1N who were on the top 0.1N teams in GF.

More recently, variables which measure expansion (% of new teams in past 1 or 2 seasons) and the effect of non-Canadians on the top 1N scoring average.

1968-2012
=========
R^2 = .691
SEy = 2.48 (avg. Y = 89.9)


Coeff: value, t-score
B0 = 81.4, 65
Bn = (0.39), 17
Bh= (6.45), 7
Bi= 7.56, 9
Bg = (0.15), 1.4
Bp =2.47, 20
Bf = 420, 24
Ba = (118), 10
Bt = 1.93, 4
Bc = (9.66), 8
Be = 40.3, 19

Y: avg. simple adjusted points (gms, GPG, A/G) of top 1N players (N=number of teams)
B0: Y-intercept (constant)
Xn: Number of teams
Xh: Fraction of new teams vs. previous season
Xi: Fraction of new teams vs. two seasons previous
Xg: League GPG
Xp: PP opportunities/game
Xf: Standard deviation of teams' GF, divided by avg. team GF
Xa: Standard deviation of teams' GA, divided by avg. team GA
Xt: Excess above avg. GF of top 0.2N teams in GF, divided by std dev of team GF
Xc: Ratio of players in top 1N which were on teams in the top 0.1N in GF
Xe: Fractional increase in avg. of top 1N due to non-Canadian players

One important factor that may still be missing is the presence/absence of some of the very top Canadian players (i.e., Gretzky and/or Lemieux). It will probably take a lot of trial and error to determine how to best define the proper variable to capture this causality.
 
Last edited:

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
Here are the predicted Y values and the difference between actual & predicted Y values, using the model & coefficients in the previous post.

Act1N = actual average of simple adjusted points of top 1N players in scoring (N= # of teams in league)
Pred1N = predicted value for Act1N based on variables in model
Diff = (Act1N) - (Pred1N); so a positive value means the actual value has higher than predicted
%Diff = % difference in comparison to Act1N

Year | Act1N | Pred1N | Diff | %Diff
1968 | 90.7 | 91.5 | (0.8) | -0.9%
1969 | 99.7 | 99.4 | 0.3 | 0.3%
1970 | 91.0 | 91.7 | (0.7) | -0.8%
1971 | 95.0 | 96.9 | (1.9) | -2.0%
1972 | 98.8 | 95.3 | 3.4 | 3.5%
1973 | 92.5 | 90.7 | 1.8 | 1.9%
1974 | 90.5 | 91.3 | (0.8) | -0.9%
1975 | 92.1 | 92.3 | (0.2) | -0.2%
1976 | 92.6 | 92.0 | 0.6 | 0.6%
1977 | 87.7 | 90.5 | (2.8) | -3.2%
1978 | 88.2 | 89.2 | (1.0) | -1.2%
1979 | 90.0 | 88.4 | 1.6 | 1.8%
1980 | 89.3 | 85.3 | 4.0 | 4.5%
1981 | 86.4 | 87.8 | (1.4) | -1.6%
1982 | 87.7 | 87.2 | 0.5 | 0.6%
1983 | 84.8 | 87.6 | (2.8) | -3.3%
1984 | 86.3 | 88.8 | (2.5) | -2.9%
1985 | 88.4 | 87.1 | 1.2 | 1.4%
1986 | 88.1 | 88.6 | (0.5) | -0.6%
1987 | 82.6 | 85.6 | (3.0) | -3.6%
1988 | 89.3 | 91.3 | (2.0) | -2.3%
1989 | 90.6 | 88.9 | 1.6 | 1.8%
1990 | 88.6 | 85.6 | 3.1 | 3.5%
1991 | 91.0 | 89.5 | 1.5 | 1.7%
1992 | 88.2 | 90.6 | (2.4) | -2.7%
1993 | 94.5 | 93.7 | 0.8 | 0.8%
1994 | 89.2 | 89.0 | 0.1 | 0.2%
1995 | 91.6 | 92.1 | (0.5) | -0.6%
1996 | 99.9 | 94.3 | 5.7 | 5.7%
1997 | 91.2 | 88.1 | 3.2 | 3.5%
1998 | 89.4 | 90.9 | (1.5) | -1.7%
1999 | 93.5 | 93.4 | 0.2 | 0.2%
2000 | 84.6 | 88.0 | (3.4) | -4.0%
2001 | 93.0 | 93.7 | (0.7) | -0.8%
2002 | 85.2 | 87.9 | (2.8) | -3.2%
2003 | 92.1 | 91.0 | 1.2 | 1.3%
2004 | 86.1 | 88.8 | (2.7) | -3.1%
2006 | 88.7 | 91.6 | (2.9) | -3.2%
2007 | 92.0 | 88.0 | 3.9 | 4.3%
2008 | 90.8 | 87.1 | 3.7 | 4.1%
2009 | 86.6 | 87.5 | (0.9) | -1.1%
2010 | 88.3 | 88.0 | 0.3 | 0.4%
2011 | 83.5 | 84.2 | (0.7) | -0.8%
2012 | 84.9 | 84.7 | 0.2 | 0.3%
 
Last edited:

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
The variations are relatively small, but what likely caused the larger variations between the model and actual measurements? Sometimes extreme values for one or more of the variables aren't fully captured by the model, or it could be another factor that is difficult to quantify at all. Let's briefly examine the largest differences:

1972 (+3.5%): The GAG line of Ratelle-Hadfield-Gilbert finished 3-4-5 behind Espo & Orr. Xt (which captures offensive powerhouse teams) has its second highest value during the 44 seasons since O6 expansion, despite Xf (stdev of team GF, which is the denominator for Xt) being at its third highest level in the study. Basically, it's difficult to fully capture just how top-heavy the league was that year.

1977 (-4.2%): It appears there was a real lack of depth in the top 1N. There's Lafleur and Dionne at the top, but no longer Espo & Orr & Co. and not yet Trottier & Bossy. Some indications of the weak depth are Shutt, MacLeish and Tim Young 3-4-5, and the top 1N containing Ratelle & Espo in their mid-30s and d-men Robinson & Potvin.

1980 (+4.0%): While the WHA teams were only 4/21 of the new NHL, 4 of the top 11 point producers were from the former WHA (including of course Gretzky who tied for the lead.

1983 (-3.1%): Tough one to explain. PPO/game were at the lowest level for the period '81-'09, so that may not have been fully captured in the variable.

1987 (-3.2%): Previously elite players like Dionne, Trottier, Bossy and Stastny were no longer near the top, while Lemieux and Yzerman were yet to hit their peaks. It was also a time when parity hit its heights, as Xf (stdev of team GF) was the lowest and Xa (stdev of team GA) was the second lowest value in the 44 seasons since O6 expansion.

1990 (+3.2%): The only thing that strikes me is that the old guard (Gretzky, Lemieux, Yzerman, Messier) were still strong, while the new guard (Hull & Oates, Turgeon, Lafontaine, Sakic) emerged.

1992 (-3.6%): This is one of toughest variations to explain. Basically, Gretzky finally passes his peak, Lemieux misses 20% of the season (but that's typical) and the other stars didn't really step up and have career years (as they would in '93). While the American players had really become a force, the non-N.A. players were not really a factor in the top 1N (Fedorov snuck in at #22 and Mogilny just outside at #24), they and the American players (as well as only one added team since WHA merger) were providing more depth outside the top 1N, which probably caused league GPG to be higher than it otherwise would have been (which lowers the adjusted numbers for the top players).

1996 (+5.7%) and 1997 (+3.7%): None of the values of the variables stands out, but what does is the number of superstars who were entering or still in their prime during these years. Just look at the top 10 in total points in '96+'97: Lemieux, Jagr, Selanne, Francis, Kariya, Forsberg, Gretzky, LeClair, Lindros and Sakic. Francis was playing with Jagr at ES (and with Lemieux at ES in '97 and on the PP both years) and LeClair was playing with Lindros. Rounding out the top 1N was a mix of players from the US (Weight, Tkachuk, Hull? and Modano), overseas (Mogilny, Palffy, Sundin, Fedorov, Nedved) and the usual Canadians (Messier, Turgeon, Yzerman, Damphousse, Oates, Shanahan and Fleury). Just missing the cut were players such as Recchi, Bondra, Gilmour, Kamensky, Brind'Amour, Amonte and Roenick. So it's no surprise that the top 1N outperformed expectations in these years.

2000 (-4.1%) & 2002 (-3.4%): Power plays were the lowest and third lowest, respectively, for the period '86-'09. Lemieux and Gretzky were gone, Messier and Hull were no longer factors. Injuries started to take their toll on Lindros, Forsberg, Mogilny, Palffy, etc. There wasn't yet a strong crop of younger players to take the places of all these retired, aging and injured stars.

2006 (-3.3%): The only extreme value among our variables is the historically high level of power plays. This may have exaggerated expectations, at least in part for the following reasons (which had various effects on their own as well): This was a very dynamic season, as it followed a lockout season and there was a dramatic change in rules enforcement. Many players either retired during the lockout, did so during the season, were with new teams, or were rusty from playing little or no hockey during the lost season (and what hockey they did play was with different players in a different environment). When conditions change so drastically overnight, it's neither surprising nor concerning that there would be a variation between predicted and actual performances.

2007 (+4.6%) & 2008 (+4.1%): Power plays were at a more moderate level, especially by '08, yet play may was probably still more open due to the crackdown in '06. While there were new stars (Ovechkin & Malkin) at the top, there were also many players in their prime (Dastyuk, Thornton, Lecavalier, Spezza, Zetterberg, Kovalcuk, Gaborik) and some older 30+ players having very good seasons (Iginla, Alfredsson, Kovalev, St. Louis).

I believe the variations are generally surprisingly small, given the randomness inherent in such data and the many effects that are very difficult or impossible to properly quantify. It seems that most of the relatively larger variations between predicted and actual have reasonable, logical explanations. I'm satisified with the results of this study at this point and believe it provides solid support that adjusted points are very practical for comparing offense across seasons in the post-expansion era.
 
Last edited:

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
These are index numbers for each of the 44 seasons, with a mean of 1.00. The "predicted 1N" has been used, except Xe (which measures the influence of non-Canadians) has been omitted. This gives us a predicted value which reflects the conditions in that season, free from the influence of non-Canadians bringing up the average for the very top tier of players.

Year | Pred1N* | Index
1968 | 91.2 | 1.04
1969 | 99.8 | 1.14
1970 | 91.6 | 1.05
1971 | 95.8 | 1.10
1972 | 95.3 | 1.09
1973 | 91.2 | 1.04
1974 | 91.0 | 1.04
1975 | 92.8 | 1.06
1976 | 92.0 | 1.05
1977 | 91.3 | 1.05
1978 | 89.0 | 1.02
1979 | 88.5 | 1.01
1980 | 85.5 | 0.98
1981 | 85.2 | 0.98
1982 | 86.4 | 0.99
1983 | 85.7 | 0.98
1984 | 87.3 | 1.00
1985 | 86.3 | 0.99
1986 | 86.2 | 0.99
1987 | 84.6 | 0.97
1988 | 88.9 | 1.02
1989 | 88.1 | 1.01
1990 | 85.5 | 0.98
1991 | 88.8 | 1.02
1992 | 89.0 | 1.02
1993 | 90.2 | 1.03
1994 | 87.3 | 1.00
1995 | 88.2 | 1.01
1996 | 88.3 | 1.01
1997 | 82.4 | 0.94
1998 | 84.9 | 0.97
1999 | 84.7 | 0.97
2000 | 82.7 | 0.95
2001 | 86.2 | 0.99
2002 | 83.9 | 0.96
2003 | 84.5 | 0.97
2004 | 82.1 | 0.94
2006 | 86.6 | 0.99
2007 | 84.1 | 0.96
2008 | 82.2 | 0.94
2009 | 82.9 | 0.95
2010 | 81.7 | 0.94
2011 | 80.2 | 0.92
2012 | 81.4 | 0.93

'68-'77: avg. 1.07, median 1.05, range 1.04-1.14
'78-'96: avg. 1.00, median 1.00, range 0.97-1.03
'97-'12: avg. 0.95, median 0.95, range 0.92-0.99
 
Last edited:

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
I've added a key variable, which is a vast improvement in measuring the effect of non-Canadian players. The variable (Xe) is the increase in the scoring avg. of the top 1N due to the presence of non-Canadian players. Players such as Mikita, Hodge, Nolan, Thomas, and Heatley were considered Canadian.

This had some important effects on the model and study in general:

1. It increased the R^2 from .61 to .69. I figured the limits of a model such as this would be to explain close to 70% of the dependent variable, and that has now been accomplished.

2. It reduced Xg (league GPG), which was previously an important and significant variable in this model, to possibly insignificant. I ran the new model with and without Xg, and it didn't really seem to matter. Effects which were previously captured by Xg have now been captured by more accurate and refined variables, such as the newest variable Xe.

3. It allows us to back out this variable when calculating the index numbers, so that the presence of a stronger player pool does not increase the index number (i.e. more better players don't make it look like it was easier for top players to score adjusted points).
 

unknown33

Registered User
Dec 8, 2009
3,942
150
Sorry for my lack of understanding, but could you explain in a few words what the results of this study tells us?
 

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
Sorry for my lack of understanding, but could you explain in a few words what the results of this study tells us?

There are numerous things that the results can tell us. I consider some of the most important things as follows:

A) Since expansion in '68, adjusted points have become progressively more difficult to score. As shown in the post with index numbers, the first decade after expansion it was rather easy to score adjusted points. From the time shortly before the WHA merger until the mid-90s, it was more difficult (but typically about average for the entire post-expansion period to date). Since the mid-90s (often referred to as the "dead puck era") it's been more difficult still to score adjusted points.

B) The reasons for the increasingly difficulty in scoring adjusted points appear to have been identified and quantified to a large degree. For instance, let's compare the three main eras as identified by the index numbers: the first decade after expansion ('68-'77), the typical non-expansion period surrounding the '80s ('78-'93), and the last two decades after the fall of the Berlin Wall ('94-'12).

The post-expansion period had an average predicted 1N (which is avg. adjusted scoring of top N players, where N is number of teams) of over 93. "The '80s" period had an average predicted 1N of over 88, a decrease of almost 5 points. The main reasons for the decrease were as follows:

- Parity increased significantly (variables Xf & Xa), which caused a drop of over 4 points.

- Expansion slowed significantly (variables Xh & Xi), which caused a drop of almost 1 point.

- The number of teams was larger (generally more difficult for larger group of players to maintain same average), which caused a drop of over 2 points.

- Power play opportunities increased substantially, which caused an increase of almost 2 points.

- The increased presence of non-Canadian players in the top 1N (variable Xe) caused an increase of over 1 point.

Those factors sum to a total decrease of almost 5 points (it may appear more like 4 points due to rounding errors). It was mainly expansion-related factors (new teams and lack of parity) which made it so much easier to score in the post-expansion period.

Now let's compare "the '80s" period to the "dead puck era". The predicted 1N actually increased from over 88 to over 89, almost a 1 point increase. Let's again look at the various factors:

- The larger number of teams caused a drop of 3 points.

- The increased presence of non-Canadian players in the top 1N caused an increase of 4 points.

- Other factors were rather minor, causing offsetting changes of about 1/2 point or less.

In this case the better talent pool including subsantially more non-Canadian stars obscured the fact that it became increasingly hard to score adjusted points.

C) Once the index numbers are more firmly established (I have to give more thought over time to whether/which other factors, besides presence of non-Canadians, should be factored out), then they can be used to calculate "adjusted adjusted" numbers, which we should have more confidence in using when comparing across seasons.
 
Last edited:

unknown33

Registered User
Dec 8, 2009
3,942
150
So basically the goal is to improve the adjusted points method.
Very intersting, thanks.
 

Ad

Upcoming events

Ad

Ad