I ran another regression from 1968-2012.
Y was the average adjusted points of the top N players (N = number of teams).
Xn = number of teams
Xf = standard deviation of teams' GF
Xa = standard deviation of teams' GA
Xe = percent of top N forwards which were non-Canadian
Xg = league GPG
Xp = ratio of special teams goals to all goals
The new results are:
B = 82
Mn = (0.44)
Mf = 87
Ma = (18)
Me = 8.1
Mg = (.74)
Mp = 42
R^2 = .56
All variables appear significant, with Xa having the lowest t-score of ~4.8 (Ma/SEa = .73, N^.5 = 6.6).
I think this model holds a lot of promise, with R^2 > .5, all variables significant, and I thought this was interesting as well:
Standard deviation of Y (avg. adjusted points of top N players) was 3.86, and only 3/44 predicted values of Y varied from the actual value by more than this (the highest deviation was ~1.6 std dev). Each of those three predicted values was lower than the actual value, possibly in part due to some of the best players being in the league and having strong seasons (Orr, Espo, etc. in '72... Lemieux, Jagr, etc. in '96 & '97).
Some of the varaibles may improve with further refinement. It may be useful to define a variable that will somehow capture the effect of having so much top talent in the league, but not exactly sure what the best and fairest way to do that might be. Any suggestions welcome.
For those not familiar with regression, this is what the model suggests at this stage:
For each additional team, Y decreases by .44
For each 1 % point increase in standard deviation of teams' GF, Y increases by .87
For each 1 % point increase in standard deviation of teams' GA, Y decreases by .18
For each 10 % point increase in % of non-Canadians in top N, Y increases by .81
For each .10 increase in league GPG, Y decreases by .07
For each 1 % point increase in special teams goals as % of total goals, Y increases by .42