Ideas for Future Studies

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
I'm not sure if any of you have seen this before...

http://web.williams.edu/go/math/sjmiller/public_html/math/papers/PythagWonLoss_Paper.pdf

It is a nice derivation of the pythagorean won-loss formula from baseball. It would be interesting to see how it would adapt to hockey. (Of course the OT loss issue does complicate things somewhat.)

I haven't seen that, but will look at it at some point. This research on the matter was very good I believe:

http://www.hockeyanalytics.com/Research_files/Win_Probabilities.pdf
 

Mathletic

Registered User
Feb 28, 2002
15,777
407
Ste-Foy
It's generally accepted that the level of play in the NHL has increased over the years; where people differ is in the answer of "how much"?

Some claim that Guy Lafleur would look like a beer leaguer against today's players, while some claim that the effect is minimal.

How much is the effect? What impacts do expansion, the increase of the population pool, the breaking of the color barrier, and other external factors have on the effect?

I'd suggest you take a look at this study ... video at least ... from the MIT conference. It's on baseball but could easily be applied to hockey I think.

http://www.sloansportsconference.com/?p=671
 

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
I'd suggest you take a look at this study ... video at least ... from the MIT conference. It's on baseball but could easily be applied to hockey I think.

http://www.sloansportsconference.com/?p=671

The main problems with expansion are that the talent is rarely, perhaps never, equally distributed between the established and expansion teams, and it often occurs too rapidly. The larger the expansion in % terms and the more inequality between the old/new teams, the bigger the effect.

Taco wants to know how good Lafleur would be today compared to other players. It was easier to dominate in the 70s, no doubt. Not only massive expansion, but repeated expansion, and WHA drained off some talent as well. He and Espo also had real sweet spots in time. Howe, Hull, Mikita and Beliveau had passed their peaks and started (or already had been) declining before these guys hit their primes. They played for O6 powerhouses that could almost dominate at will it seems over most of their competition. Lafleur may actually have been as good or better at producing points in '79 & '80 as '75 & '76, but by then, Trottier, Bossy, Gretzky, etc. were having big years as well, so he didn't appear as dominant. I see him as more at the upper end of the group including Dionne, Bossy, Trottier, Stastny... not a much better peak point producer than Forsberg, Lindros, Sakic, Selanne, Kariya, only more consistent and consistently healthy in his peak years during the era in which he played... and he'd have his work cut out for him beating the current "big 3" for scoring titles.
 
Last edited:

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
If someone has a goalie database, I would like to see a study of this type:

Regress/correlate the SV% of individual goalies from season to season, using the variable (or separate studies) of whether the goalie changed teams or not. Other variables could be SA/60 min., team PPOA/gm, etc. This might help separate the three main components of SV% (ability, team, and luck).
 

seventieslord

Student Of The Game
Mar 16, 2006
36,113
7,179
Regina, SK
when will you post the results?

I'm working on it.

I think the player rankings just need to be tweaked. I rated every player drafted from 1969 to 2001 based on how they turned out on the HF scale. I had help from three oldtimers including an ex-Nordiques scout to make sure I got the 70s and 80s guys that I didn't see, correct. But my depth of knowledge has been expanded greatly in the five years since, thanks to my participation in so many ATDs, MLDs, and deeper drafts. I think I should go over these player ratings again and make sure I have them in the best order possible. It won't take too long, because they are really about 90% "correct" to begin with. Also, I could probably do the 2002-2005 drafts by now.
 

Shrimper

Trick or ruddy treat
Feb 20, 2010
104,192
5,268
Essex
Currently going through something to do with team rankings to define who really is the best each month and each season. Will post up when ready.
 

Patman

Registered User
Feb 23, 2004
330
0
www.stat.uconn.edu
I saw a talk in 2007 at the New England Statistical Symposium in Sports (or some such) suggest that one could relate "probabilistic excitement" to gambling. The idea generally, if I recall correctly, is the amount of activity can relate to the instantaneous shifts in probability as a proxy for perceived change of chance of result (or expected result). Their proxy then, in the end, was something like the accumulated absolute difference of the probabilities... mathematically it'd be something like \sum_t |P(W_t)-P(W_{t-1})| where t represents a time increment indexing t=1,...,T.

As a social phenomenon (as in, I don't expect this to be answered in the next month or year or years) can we use this idea and relate this to other activity within the arena or other functions? Have those original researchers moved beyond this and if so, how are they using it? Can one elicit from observations the value placed by the observer?

This is just stuff I'm throwing out there for further study. I don't have time or desire to follow up, but if somebody else wants to run with it... go for it.
 

Patman

Registered User
Feb 23, 2004
330
0
www.stat.uconn.edu
This is something I would like to do in "theory"...

Can one obtain shift data for every game in the NHL? If so, can we use Poisson regression to obtain some form of measure of individual positive or negative contribution?

In this we admit that the model will need to involve SEVERAL simplifications owing back to many data factors. For instance, analysis cannot be done without player movement and any solution will be severely confounded. Likewise, we cannot rule out the possibility of ridiculous results and any type of interaction model will have to be ignored just due to the difficulty and complexity of such an assumption.

However, if we set this up "simple" enough can such a model lift out anything that may be communicative? Does mere "presence" mean anything? Can we move beyond +/- as a means of communicating relative usefulness?

Anything pulled out of this may need to work to subsequent analysis. Anybody can imagine a world of things this cannot take into account. But you need to start somewhere.

edit: I'm also writing this late at night... so it makes 100% sense in my head because I've been thinking about some of these things but the language isn't well thought out because it's late.
 

johnnybbadd

Registered User
Mar 29, 2011
981
893
How about a chart on ppg for players by age in the playoffs. It may be a much smaller sample size but it would be interesting to see if the old adage of veterans are needed for their playoff experience is really that much of a factor.
 

Patman

Registered User
Feb 23, 2004
330
0
www.stat.uconn.edu
I'll chip in a couple every once in awhile. I was talking to a gentleman who applies his own Bayesian model for predicting scores (he applies it to women's college hockey)... he was presenting a poster in the sports stats session at a recent discipline-wide conference... anyhow, this isn't about that... but we discussed the topic below and I thought it would be a good make-work project for an advanced undergrad.

For those who don't know, in american (NCAA) college hockey the rules for who makes the NCAA tournament is a straight-up formula which is more or less known before the beginning of the season.

There has ALWAYS been questions about stability of procedure. Now, don't get me wrong, I don't want to hate on the procedure... some do but I think any deterministic system is better than a few school presidents getting together and setting a field. That being said, "we" (college fans) tend to view these rankings produced by this formula as the season goes along. From our eye viewpoint it generally has a fair amount of instability due to "cliff rules" and other issue. NCAA hockey also "suffers" from a degree of insularity... different topic, different day.

Anyhow, the question posed is this:

1.) How sensitive are the NCAA ranking procedures?
1a.) How do we quantify sensitivity? How should we?
1b.) Multiple modes of evaluating sensitivity... eliminate k "observations" (games)? Delete and impute? Etc.

2.) There are multiple modes and models proposed as an alternative. The obvious would be to try to compare them.
2a.) Can you use this information along with cross-validation techniques to obtain a suitable value set for certain procedural parameters?
2b.) Is the goal to minimize sensitivity? Is this appropriate unto itself? When may this fail?

3.) Wherein I recall a number 3.

This topic causes me some trepidation because of some of the latent politics of denigrating the current system.
 
Last edited:

Chocas

Registered User
Jul 13, 2011
156
0
Sherbrooke
Hi, hum, people!

I was wondering if there was stats about the probability to get NHL drafted and the round/rank you are drafted in each of the 3 leagues of the CHL.

I happen to know somebody drafted in the second round of the QMJHL.

Thanks in advance!
 

seventieslord

Student Of The Game
Mar 16, 2006
36,113
7,179
Regina, SK
Hi, hum, people!

I was wondering if there was stats about the probability to get NHL drafted and the round/rank you are drafted in each of the 3 leagues of the CHL.

I happen to know somebody drafted in the second round of the QMJHL.

Thanks in advance!

what an interesting study that would be!

although not much more than a curiosity satisfier or at best an indicator of a junior player's "all things being equal" odds of getting drafted based on their junior draft position.

I doubt it exists, but if it does, I'd like to see it.
 

TheDevilMadeMe

Registered User
Aug 28, 2006
52,271
6,981
Brooklyn
adjusting save percentage based on home arena bias

It should be well-established by now that different shot recorders have a different definition of what constitutes a "shot." For the statistical data see:

http://www.puckprospectus.com/article.php?articleid=351

From the study:

One well-known and impressive statistical nugget about the Devils is how well, historically, they have managed to keep shots against to a minimum; the last season where the Devils allowed more shots than the league average was 1989-90. Since then, over the course of 18 seasons, they have averaged about 330 shots against less than the league average per season, despite going through different coaches and massive personnel changes (Brodeur notwithstanding). I believe we now have one explanation for this phenomenon: by my calculations, New Jersey�s home shots against have been underreported by about 100 a year. This would still leave them as a defensive juggernaut (preventing 230 shots against per season is no joke) without making them such a statistical outlier.

If this is true, it could further legitimize Brodeur�s claim of being one of the all-time greats. Brodeur�s career all-time shots against (25,481 as of this writing) could be too low by as much as 1,000 shots, which would boost his career save percentage from 0.914 to 0.918. This would boost his career GVT by roughly 80, which would place him #3 or #4 on the all-time list, neck-and-neck with Jacques Plante and only trailing Dominik Hasek and Patrick Roy.

Further studies confirmed that NJ and St Louis were undercounting shots and Nashville was over counting shots.

Some other studies
http://objectivenhl.blogspot.com/2009/03/in-previous-posts-it-was-shown-how-some.html?m=1
http://brodeurisafraud.blogspot.com/2011/01/goalie-performance-on-road.html?m=1


I would like to see an analysis done of all arenas for each season (or for each season for each season for which we have data). The analysis should be season by season since arena recorders change. The ultimate goal would be to adjust each goalie's save percentage to a "true" number that neutralizes arena bias.

Here's a brief look at Roy and Hasek
http://nhlnumbers.com/2012/5/31/did-home-scorer-bias-affect-roy-and-hase
 
Last edited:

Patman

Registered User
Feb 23, 2004
330
0
www.stat.uconn.edu
How I'd try to answer the previous question... (Devil made me)

1) Need to decide if there is a "scorekeeper" effect (neutral scorekeeper with the same "bad" rule) and a home effect (and/or an interaction).

2) Stating a baseline. Each situation will be relational to each other. I don't think you'll ever get an "ideal shots" but you could, I suppose, neutralize it to location. [[This will not affect the end analysis as any result will just be a multiple of each other... essentially scaling. But, people like to normalize to a baseline on a scale that they understand.]]

3) What can you chalk up to the team itself... important when you realize that 41 team*games of the 41 games at a venue are played at a venue is by one team.

There can always be all kinds of features and quibbles, but this would be the base effect.

I, as a matter of personal policy, like to neutralize for game duration in the case of overtime... this only affects the modeling and parameter estimation stage. Not the adjustments at the application stage.

Other than that, given data, I think I can propose a rough model. Having data available, however, is usually the issue and the name of the game.
 

Jumptheshark

Rebooting myself
Oct 12, 2003
99,866
13,848
Somewhere on Uranus
I think looking at the Stats of the players who played prior and after the last lock out could be interesting.For those who played somewhere during the lockout and those who did not
I know the Sedins went nuts the year after the last lock out, but many players lost a step
 

Jonathan17

Trollface!
Nov 19, 2005
4,328
60
Oakville
what an interesting study that would be!

although not much more than a curiosity satisfier or at best an indicator of a junior player's "all things being equal" odds of getting drafted based on their junior draft position.

I doubt it exists, but if it does, I'd like to see it.

What might add another level to it is the question of whether the three CHL leagues are viewed equally by the NHL. Maybe there is no "all things being equal" across the CHL as a whole. The impression I have is that the Q has inflated scoring but better goaltenders, so do NHL teams discount Q forwards and d-men more? The WHL seems to have the bias that their D-men are better and their goalies worse. So I might rather be a 2nd-round pick in the Q as a goalie than a 1st-round pick in the W as a goalie. Just thinking about goalies taken in the 1st-round the last 15 years, it seems a lot are from the Q and Europe and few from the W.

It would be interesting to see a) if there is an actual bias between leagues and b) if that bias is actually justified based on post-draft performance.
 

2faded

Registered User
Jul 3, 2009
4,484
675
Torrance, CA
Would just like to throw this out there. I spend a lot of time manipulating data in excel for work. I've been wanting to play around with hockey stats for awhile the problem is obtaining the data. If anyone can provide me with the data and what they want analyzed i'd be glad to do it. It'll give me something to do at work. haha

Feel free to send me a pm and I'll post my analysis in whatever thread is appropriate.
 

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
If someone who understands LINEST in Excel and/or linear regression can help me with this, that would be great:

I cannot get coefficients to generate for a particular model using LINEST. I have successfully used LINEST for other models, so it's likely a problem with the model (and its variables). The model is as follows:

one group of X variables are discrete variables for season (let's say there are 32 seasons, so it's either 0 or 1 for each possible season... it will only have a value of 1 for 1/32 of those for each player-season)

the next group of X variables are discrete variables for the player's age (if we use age ranges of 18-40, then the value is either 0 or 1 for each of 23 possible ages... and again it will only have a value of 1 for 1/23 of those for each player-season)

the next group of X variables are discrete and are for the player himself (the value is either 0 or 1 for each of the Q players in the study... and again will only have a value of 1 for 1/Q of those for each player-season)

there are possible variables that I would like to add, but if I do, will wait until I am able to successfully generate coefficients for the model as it already stands

I stopped ~ a dozen players with a total of ~170 player-seasons. I thought since the degrees of freedom are df = N - k - 1 = 170 - (32 + 23 + 12) - 1 = 170- 67 - 1 = 102, that coefficients should generate, but I'm obviously missing something and my linear regression knowledge is relatively basic and quite rusty.

Can anyone tell me why the coefficients won't generate? I don't want to put substantial time into this if it's not going to work. Any help would be appreciated.
 

pdd

Registered User
Feb 7, 2010
5,572
4
There are lots of stats the NHL could provide over and above those it does show on its website...


Certainly I'd like to see a "splits" points tracking for each player and in relation to his position and ranking both in absloute terms and in ranking relative to TOI ...Also broken down for ES, PK, and PP --and all this shown in graph/chart form so you can see instantly the progression or regression over a whole season...the splits could be
for 11 segments of 7 games each plus a final 5 game segment or 8 segments of 10 games each plus a final 2 game segment that won't provide much meaning or 7 segments of 10 games each plus a final stretch segment of the last 12 games played by that player in the reg. season.
Injuries and consequent missed games obviously could effect the comparisons but and skew some of the time segments due to none or less games played in those segments ..

BUT assuming a player did not get injured or missed only a few games in a segment or two ,you could get valueable information on how a player trended through a season ..

FOR instance --take my pet peeve of last season Patrick Kane--we all know he regresssed in total points (just 66) but it would be interesting seeing his "splits" over the course of the 82 games he played...MY hunch was that he started off ahead of point a game pace for a time (about the first 15 games or so,slacked off badly till about the end of February,then picked up a bit more scoring pace at the end...SO it would be interesting to see what actually happened in say each 10 game split (say we go with the 7 ten game splits and a final 12 game segment model )...
This would SHOW what happened with hm over the course of a season and you could see it clearly if shown say in bar chart form....YOU might compare to prior seasons again showing splits for each with diferent coloured bars and maybe some pattern emerges -or if not that too could be useful information for coaching staff to evaluate... IF these are compared to league averages for his POSITION or to say just the top 30 scorers at forward ---those too would be further valueable info in assessing his season ...For instance if his pattern showed so much difference from the league average for the position or from the top 30 forwardscorers -then very valuable information is revealed ...if--barring injury reasons that effected the pattern, there is some big skew off from the top positional scorerrs or top 30 forward scorers in a certain players pattern THEN perhaps coaches could make adjustments in pattern scorings dips negative to those pattern averages for a certain segment or segments in a season.
ALSO you could adjust these charts for TOI to see if that would make any difference .. You could also do all this not just for total points scored by segment by that player but also tracking multi-point games per segment.

Agaim my perception was that Kane had very few multi-points games last year -it wan mostly one point and out type games with just a few eexceptions... WAS this off from his pattern of prior years? Which segments had more multi-points and is there a pattern over several years or is strength of opponent's in a segment more determinative for this ..

So with such deeper info tracking and analysis perhaps important patterns are revealed and certainly you could perhaps make stronger conclusions about a player and changes (good or bad) in his game both relative to others and to himself of prior years...


Is there any consistent pattern during a season or where is he trending and is there some huge change over prior years?

So IF a player say shows consistent pattern of more multi-point games
in the first 2 and last 2 segments and if that pattern is different from the "normal" average for his position or for the top 30 forward scorers in the league (29 if that player is a top 30 scorer himself)---THEN the coaches would use that info to try to get the player on the more "normal" track against the peer group he is measured against..

Perhaps some interesting segmental patterns willbe revealed ..For instance (no saying it is true) but what if the normal average pattern of multi-points fromthe top 30 forward scorers is a consistent level of multi-points scoring accross all the segmenets over a season,but your player skewed his multi-point games heavily at the beginnig and ending portions of the season....then as a coach you would ant to get him to be more "normal" in multi-point games consistency over segments..
IF "fall -off" in multi-points games becomes a pattern in the middle 5 ten gamesegments each season THEN that player must correct this else as a coach you have to reduce his TOI in those middle 5 segments of ten games each because your player just has proven he won't be as effective in those segmentsas in earlier or later segments.. BUT you have to also see if the fall-off pattern in the middle portion of the season was becaue you reduced that player's TOI -if not there is some other cause to his pattern over the seasons (assuming there is such a pattern) ..

Anyway all this is valueable information for both coaches AND for fans ..

We have been toold for years that after the all-star break,certainly after the trade deadline ,down the home stretch games get more competitive as teams jockey for playoff positions...so we intuitively expect tighter defenses ,less scoring and less multi-points games by players ..BUT if a player bucks that exectation but does score at a faster pace again and with more multi=pont efforts AND also did that to start the year in the eraliest segments of the season -you wonder how could such player slack off the scoring and multi-points games in the middle portion segments say from mid-nov till the all-star break when presumably opposing defenses in games should slack off -that should be the time "offensive players" do most damage in scoring because this segment of the season SHOULD be the expected "easy' part where defenses are playing less meaningfull games (to start the year there should be enthusiastic energy and desire to get off to good starts and pile up a lead in points in the standings--then as the 2nd quarter arives you get settled into "routine" games where you are just punching the clock in another city ,anoter game to clear off the schedule,then looking forward to Christmas break ,then after that more routine "grind" till theall-star break, only afterwards of the all-star breal is it supposed to get more tight and contested as teams try to hold playoffs spots and get the advantage in those spots ...so IF an offensive player cannot take advantage of this long stretch of games from mid-nov. till the all-star break or till the trade deadline to elevate his scoring pace again sat less contesting defenses (on average) -there is something dradtically wrong with such a player...IT SHOULD be easier to score in that long stretch of routine games..IF he slacks off then instead but ups his pace again when the games get harder in March and April ,then it is clear that it is not ability that was slacking in the middle portion of the season,but rather simply motivation and will (ie. effort)... IF this is indeed a pattern with such player,then no caoach can live with that ..for a player not perforoming to his ability when it should be easier ,is a double slap in the face to his team..
If he instead showed up more in this period,then that should help get more scoring and more wins..

SO i) is there such a persistent year to year pattern of teams slacking off defensively in the 2nd and 3rd quaters of the season? ii) Is there a consistent prior or a recent change of pattern in a player such that his scoring and multi-point contribution has gotten worse precisely when you expect it least (IF there is a defensive falling -off in the 2nd and 3rd quater of seasons or 4-5 middle segments of ten games each during a season? iii) IF so -what can be done to get such player to perform to his ability in a time we expect scoring to come easier?

Whether my intuition about middle portion of the season defensive slack-off and offensive highest scoring on average is true or false --I do not know ...I do know that last year Pat Kane was ahead of a point a game pace tillabout mid-November then started regressing down below PPG pace only picking it up again late down into the home stretch ...SO IF the middle parts of the season should be the easiest parts to score in -he certainly took no advantage of that easier opportunity to score...Is this just a 1 year aberration? Did it happen to the average of his peers by position or to the top 30 scorers in the league too? Was last year an aberration just for kane or was there some big change going on league-wide that effected the pace of scoring and multipoints to significantly lower them for almost everyone -if so we can criticize Kane less,if not he should be criticized even more for his slack-off in that portion if the season last year...

TO get such prper critical analysis,we need better data than we currently get from the NHL -they SHOULD be providing this more detailed segmental scoring data for each player and in the leaders by position or in te overall top 30 scoring inthe league.

www.hockey-reference.com has scoring logs (a list of every game a player scored in, with the "A, from B and C") from 2005-06 on, and game logs going back to 87-88 with box scores. So those Kane stats you want are perfectly available there.

Here's his player page:
http://www.hockey-reference.com/players/k/kanepa01.html
 

Grind

Stomacheache AllStar
Jan 25, 2012
6,539
127
Manitoba
Something I'd like to look at someday (but honestly, it's not goaltender-related and therefore it always falls to the bottom of my list) is a comparison of the actual value of NHL draft picks (measured by some metric) versus the perceived value of NHL draft picks (measured by the value that NHL GMs impute when they trade those picks on or before draft day).

I'm throwing it up for grabs, and would love to be involved if things get moving.

On my phone and searching is a pain, has this been done?

I've got the start of this going and wanted to check. Also looking for classifications on types of players as I've only been following hockey for five years and have been using rough g"general opinion" of meaningful NHL player (top 6/4, offensive forward or high level penalty killer/ x minutes defencemen.) But would prefer something a little more concrete/ scientific.
 

Bad News Bears

All goalies be trash
May 22, 2009
4,612
2
Australia
This thread is incredible, can't believe I've never seen it before.

I'm have a bit of free time coming up, so I'm going to devote it to looking at creating a PER/WAR style system for NHLers. Pretty excited about starting it up. If something becomes of it, I'll post some ideas.
 

DT0X

Registered User
Jan 28, 2013
6
0
Miami, FL
Power Rankings

I've always wondered if there's a true formula capable of determining a solid power ranking. Not just one by feeling out the team with their respective strengths but more by their actual numbers. Including the basics such goal differential and games played with possibly assigning point values to a number of statistics. Any thoughts?
 

Ad

Upcoming events

Ad

Ad