Is there an equivalent of a "Moneyball" for the NHL?

Mathletic

Registered User
Feb 28, 2002
15,777
407
Ste-Foy
What data did you use to create the model? If you used a sample of historical data to create a model that "predicts" the success of players within that sample, then great, you've fit some equations to a set of numbers. "Predicting" St. Louis, Markov, and Franzen is a lot easier when you've created the model on a sample that includes their numbers.

How well does the model predict results outside of the data sample that created it? Have you run any tests on that? That's the true test of the model's usefulness.

yes I've used regressions to certain threseholds, if I can put it that way, to fit in historical data and used those data to predict St Louis, Markov and Franzen and others who didn't pan out or didn't get to play in the nhl

right now I'm working on not "overfitting" the curve to existing data and ignoring future data. In other words, it might work well for past data but not new ones. I can still evaluate this as things progress and correct it but anyway.

This is only the beginning of my model, I'm trying to gather data to give a general path to my model. Right now I'm working on a logical syllogism of hockey if you will, in the likes of Euclid's elements of geometry, where you start with your axioms and from there build a logical sequence that leads to a unique solution.

I can't tell you more about the Sabres model, however, though it's true that we'd think small offensive players are getting an advantage, my model tells you to draft Jeff Carter, Mike Richards, Brenden Morrow, Hartnell, Armstrong in the first rounds and Ryan Craig, Stastny, Brad Richards, Franzen, Latendresse among others in the later rounds.

The way I have it right now would be to draft more forwards than not and use those guys as trading baits for stay-at-home d-men or use your extra money while your forwards are still under paid to sign your Brendan Witt of this world. But then again, my actual model for the NHL isn't great, there's still plenty of work to do there, I've mainly worked at drafting players.
 

Mathletic

Registered User
Feb 28, 2002
15,777
407
Ste-Foy
Could an argument be made that basketball be easier to model than football? From a play-by-play standpoint considering position types involved in each play.

A discussion for another thread perhaps.

Modeling hockey may indeed be most difficult by comparison, however, this is about using the output, the stats, to gain an value-to-skill advantage.

no, I don't think it is, it only means that you have to start from the start and put the information you have together. Personally, I'm working with basic ideas that can't be wrong.

I know this sounds stupid but offense is trying to score the greatest amount of goals possible and defense is stop the opponent from scoring, tactics is the coordination of players in the objective of taking advantage of the play whether in offense or defense.

From various definitions you make your statements and from there you can make your logical inference and build your structure of hockey, all along you use your data and see what fits and what doesn't and for each new data that might contradict or be contrary to your model you review your premises and build it up all over again.

Also, basketball wouldn't be easier. Basketball, just like handball and soccer are very similar to hockey. A lot of strategies can be derived from these sports ... at least I suspect it to be that way, I haven't done much work in that regard, but I think it's very likely to be that way ... rectangular surface, use of a projectile, 2 goals in which you try to put that projectile, 2 teams that can play offense or defense at any time, forwards, d-men, on offense you can carry the ball or not and on defense you can cover a player from close or far, use of techniques like handling a ball, a puck, fake a shot and so on. By comparing to other sports you can isolate the fundamentals of your own and establishing what they have in common. So, I don't think you'd do good by thinking of hockey as an entirely different sport, although it has its quirks ... and that's how it works for most stuff in life, generally at some point down the road you find a lot of similarities between what seems to be independent, like gravity and the movement of celestial bodies in the days of Newton.
 
Last edited:

Enstrom39

Registered User
Apr 1, 2006
2,174
0
www.birdwatchersanonymous.com
And look where that's got them?

Dominated with mentally and physically soft players and two missed playoff years in a row.

The empirical question is not "make/miss playoffs" but "do the Sabres find more NHL players in the draft than the average team" and the answer is "yes" to the 2nd question.

Missing the playoffs is driven by other factors such as team with a budget that is lower than many other teams. Injuries or massive under performance (Afinogenov). You could be the smartest scout in the league and still have your NHL team afflicted with bad luck or budget management.
 

Snoil11

Registered User
Aug 30, 2006
3,336
0
Germany
For those interested in basketball, the economists Berri, Schmidt & Brook have put quite some effort into modelling basketball.

They have summarized their research on that field (along with some on baseball and football) in a book called "The Wages of wins" which resembles Levitt's & Dubner's "Freakonomics" and explains their methodology and major findings in laymen's terms.

They also launched a blog where they publish their recent analysis of basketball teams: http://dberri.wordpress.com/
 

Grandpabuzz

Registered User
Oct 13, 2003
910
0
Dallas, Texas
I've already been working on a project for over a year with some of my "quant" friends on the NHL. We can get really accurate values from most players that are in the NHL already, but getting the stats from outside leagues are the hard part.

But it has shown that players like Svatos, Montador, and Ribeiro are really underrated compared to players like Afingenov or Jokinen.

Edit: And I'm pretty sure the Sharks employ the same kind of method that we are looking at. Some of their new guys like Pavelski, Clowe, and Michalek all follow the same type of pattern.
 

TheProspector

Registered User
Oct 18, 2007
5,339
1,697
Orlando
Interesting stuff. I have created crude models in attempt to model games and seasons, but they are far too chaotic and dependent on external variables. Additionally, the betting sites I was looking at had spreads which were far too wide to effectively trade. What I would love to find is a good site for trading binary options on various hockey outcomes. I don't suppose anyone knows of one?

-M
 

golfortennis

Registered User
Oct 25, 2007
1,878
291
The empirical question is not "make/miss playoffs" but "do the Sabres find more NHL players in the draft than the average team" and the answer is "yes" to the 2nd question.

Missing the playoffs is driven by other factors such as team with a budget that is lower than many other teams. Injuries or massive under performance (Afinogenov). You could be the smartest scout in the league and still have your NHL team afflicted with bad luck or budget management.

The argument the other guy used was also used to criticize Oakland for not getting to the World Series under Billy Beane. Like has been mentioned, it is about finding undervalued assets, and Oakland was up against the Yankees in the playoffs by spending about 1/4 of what the Yankees spent. A five game series is almost essentially random in baseball, and I put a lot more stock into the 162 game result than the following week. Oakland's methods worked. They had no money to spend(comparatively), and yet were right there in the thick of things.

One concept I really like in baseball when evaluating players is VORP, or Value Over Replacement Player. What it does is say that if Albert Pujols goes down, there is not going to be this black hole where nothing exists. There is going to be someone in there who has a modicum of skill, and will produce something. So you're not paying Pujols for his 40 home runs and 110 RBIs, you're paying him for the (hypothetical numbers)30 home runs and 75 RBIs more he would produce than the replacement player would put up given the same number of at bats. People go gaga over the 20 home run guy and say they need to resign him...well, is that really worth $6 million a season when a $750,000 player will get you 10? Other factors come into play, but the point is to get thinking about things like that.

To bring it to hockey, one discussion I've been having constantly is with Jordan Staal in Pittsburgh and what they should do with him. He is signed for 4 years for $4 million, which on a team that doesn't have Malkin and Crosby is a great contract. But Pittsburgh will likely hit a dilemma of needing to decide if Staal stays and have a third line center of his caliber, or get better wingers for the top two centers. Staal's work on the PK, etc. is always thrown out as why he should be kept.

What I've tried to get them to do is realize the tradeoff. The opinion seems to be the PK will never stop the other team without him. Well, just how less effective will a guy from the minors on a $500,000 deal be? I'm sure he will, but to what degree? How much more does Staal bring to the table? And is that difference worth maybe having lesser wingers for Malkin/Crosby? I have one opinion, others have the other, which is fine, but given the cap's limited resources, that is the thought processs that needs to be undertaken IMHO.

VORP was around long before Moneyball, but it's a concept that I believe is long overdue for hockey.
 

Enstrom39

Registered User
Apr 1, 2006
2,174
0
www.birdwatchersanonymous.com
The argument the other guy used was also used to criticize Oakland for not getting to the World Series under Billy Beane. Like has been mentioned, it is about finding undervalued assets, and Oakland was up against the Yankees in the playoffs by spending about 1/4 of what the Yankees spent. A five game series is almost essentially random in baseball, and I put a lot more stock into the 162 game result than the following week. Oakland's methods worked. They had no money to spend(comparatively), and yet were right there in the thick of things.

One concept I really like in baseball when evaluating players is VORP, or Value Over Replacement Player. What it does is say that if Albert Pujols goes down, there is not going to be this black hole where nothing exists. There is going to be someone in there who has a modicum of skill, and will produce something. So you're not paying Pujols for his 40 home runs and 110 RBIs, you're paying him for the (hypothetical numbers)30 home runs and 75 RBIs more he would produce than the replacement player would put up given the same number of at bats. People go gaga over the 20 home run guy and say they need to resign him...well, is that really worth $6 million a season when a $750,000 player will get you 10? Other factors come into play, but the point is to get thinking about things like that.

To bring it to hockey, one discussion I've been having constantly is with Jordan Staal in Pittsburgh and what they should do with him. He is signed for 4 years for $4 million, which on a team that doesn't have Malkin and Crosby is a great contract. But Pittsburgh will likely hit a dilemma of needing to decide if Staal stays and have a third line center of his caliber, or get better wingers for the top two centers. Staal's work on the PK, etc. is always thrown out as why he should be kept.

What I've tried to get them to do is realize the tradeoff. The opinion seems to be the PK will never stop the other team without him. Well, just how less effective will a guy from the minors on a $500,000 deal be? I'm sure he will, but to what degree? How much more does Staal bring to the table? And is that difference worth maybe having lesser wingers for Malkin/Crosby? I have one opinion, others have the other, which is fine, but given the cap's limited resources, that is the thought processs that needs to be undertaken IMHO.

VORP was around long before Moneyball, but it's a concept that I believe is long overdue for hockey.

Tom Awad has given this a shot. It's called "Goals Versus Threshold" (the threshold being a replacement player). He has values for every player going back to the 1950s over at the Hockey Analysis Group on yahoo groups you can download the file.
 

golfortennis

Registered User
Oct 25, 2007
1,878
291
Tom Awad has given this a shot. It's called "Goals Versus Threshold" (the threshold being a replacement player). He has values for every player going back to the 1950s over at the Hockey Analysis Group on yahoo groups you can download the file.

This sounds cool. Too bad I am unable to get into the group. I keep going around in circles with trying to get in there....
 

Moobles

Registered User
Mar 15, 2009
2,555
0
Not sure if this has been posted, mods can delete this post if it has.

http://hockeyanalytics.com/Research.htm

Link to some hockey studies regarding Poisson models. Unfortunately, hockey seems to be relatively under studied compared to sports like baseball, so there is less information for these guys to work with.

Extremely interesting nonetheless.
 

william_adams

Registered User
Aug 3, 2005
1,942
0
Kyushu
Back to the baseball vs hockey modeling. I think you're bypassing the question. Is hockey a sport that can be modeled?

I hope this doesn't get lost, but hockey is much more a game of random events. There's simply too much going on the ice with too many variables (number of players, conditions, etc.). Baseball is somewhat static in comparison to hockey.

Baseball is a game absolutely suited for statistical analysis as it's just a set of discreet events. I'm not sure that a continuous sport such as Hockey (or basketball) is as easily analyzed. This obviously makes it tougher for moneyball-type analysis...

(This is not to say that the current set of stats we can easily access has been explored enough -- there are tons more we can read into just PIM as it is...) Alan Ryder (occasionally in the Globe and Mail) does some neat things.
 

Patman

Registered User
Feb 23, 2004
330
0
www.stat.uconn.edu
take it for what it's worth but I'm actually onto something that's "moneyball" like or Sabremetrics rather like

I don't treat it the same way Bill James does it, although I'm working on similar ideas

my model uses some nonlinear equations, some linear equations and some statistical theorems; my problem is right now I don't have access to a lot of statistics, I spoke about it to a stats professor and he said he'd see what he can do to help me out

btw, a lot can be made only with games played and points as the previous poster pointed out; and by a lot, I mean most of the work

The obvious issue is that the statistics we have is inefficient in summarize the contribution of the individual... otherwise the "moneyball" ideal of finding weak points within the market would be applicable.

Either way, good luck. I've got a few ideas myself... but as you said, NHL data is very hard to work with and I need to write my dissertation. You need a good computer science background to work with the NHL data. If I were you I'd ping some of my friends if they have a good CS background provided you have a specific idea which you know the data is available but is in a rough way.

Supposedly the Baseball Prospectus people are trying to branch into hockey but I'm not sure what they're trying to do with it.

If Bill James did anything else (through my reading of Moneyball) it seems that he went out and performed the highest heresy of baseball... questioned all the first principles on the understanding of the game. I think that may need to be done for hockey and other sports before going any further. By questioning the principles of the value James was able to re-order the baseball world.

The real issue on the long haul is that hockey is a spatio-temporal sport. Even if one has that data it will be difficult. I think for the short term its better to see if one can find some latent behavior hiding in plain sight. Of course that's where I want to go with it.

Right now, if I could get it I'd want:

1) Complete shift data. All 12 players on the ice are identified as well as the status of goal/not-goal. From a regression standpoint I would want a response of "goal/nogoal" and the covariates would be the players themselves. A unique shift is defined by the unique combination of players on the ice for both teams. If #8 comes off for Hamilton and #12 comes on that means a new shift for all players on the ice. I'd want the duration for each shift to go with. NHL shift data exists but I don't have the wherewithall to extract it.

Goal: See if I can tease out contributions of individuals. Conceptually replaces +/-.
Down side: Likely dependencies through "chemistry" factors.
Issues?: Computation. Public education of model.
Plus Side: Replaces +/-.

2) Empty Net/Extra Attacker Information. I'm trying to see if I can get a ball-park rate on scoring during EN/EA situations. From that I'm curious as to what the statistically "best" decision would be.

Goal: Use probability model to assess optimum empty net time within the context of the game.
Down side: A few issues using the probability model. Can a team really play 10 minutes of 6x5 w/o a breakdown even if they can over 1 min.
Upside: Could turn the tactics of the game on its head.

I hate to lay out my ideas like this for fear of theft but on the other hand I lack the data processing ability.
 

Mathletic

Registered User
Feb 28, 2002
15,777
407
Ste-Foy
I'm in mathematics and I'm taking the core of computer courses as options, c programming algorithms, data structure and file managing. I've already built myself a c program to compile the stats I needed for drafting, which game me a very good model to start with. As for questionning the basis of hockey, that's what I'm doing to generalize my model. I'm doing it in a euclidean way, if you will, starting from axioms and building it all up. I've found several good hockey books to work with, one by gérard gagnon, among others, which I think is pretty good to start with, not great but good enough for now, to question all basis of hockey.
 

pitseleh

Registered User
Jul 30, 2005
19,164
2,613
Vancouver
2) Empty Net/Extra Attacker Information. I'm trying to see if I can get a ball-park rate on scoring during EN/EA situations. From that I'm curious as to what the statistically "best" decision would be.

Goal: Use probability model to assess optimum empty net time within the context of the game.
Down side: A few issues using the probability model. Can a team really play 10 minutes of 6x5 w/o a breakdown even if they can over 1 min.
Upside: Could turn the tactics of the game on its head.

I hate to lay out my ideas like this for fear of theft but on the other hand I lack the data processing ability.

They looked at scoring rates with/without ENs at Behind the Net.

http://www.behindthenet.ca/blog/2007/12/what-happens-when-you-pull-your-goalie.html
 

Blackjack

Registered User
Feb 13, 2003
18,164
14,975
keyjhboardd +bro]ke
Visit site
There are a lot of good ideas in this thread. Here are a few of my own that I'd like to throw out there for comment:

1. I think it would generally be a mistake to try to come up with a "Theory of Everything" (TOE). Prospects in The United States, Canada, Russia, Sweden, Finland, and Czech Republic all have different philosophies on player development, playing styles that translate to the NHL in different ways, and varying amounts of available information. Many prospects come over from Europe after not scoring a whole heck of a lot at the top levels (SEL for example) and proceed to have perfectly good NHL careers.

It might make more sense to focus on a tightly defined group of prospects. For example, look at 16 and 17 year old Canadian OHL forwards with at least 100 games played. Otherwise you may be overwhelmed trying to normalize for all the different development paths that players can take.

2. You don't have to find major inefficiencies to have a big impact. For example, looking at the aforementioned group, if you discovered that players that score under 1 ppg, but are 3rd or better on their teams in ppg are consistently but slightly undervalued, that is a pretty big deal.

3. Before even looking at U18 players, it might make sense to do a deep dive with NHL statistics to try to get some direction; just because there are probably more NHL stats available than for any other sport. Worst case is that you end up with a kick-ass fantasy team.

Thoughts?
 

Enstrom39

Registered User
Apr 1, 2006
2,174
0
www.birdwatchersanonymous.com
Right now, if I could get it I'd want:

1) Complete shift data. All 12 players on the ice are identified as well as the status of goal/not-goal. From a regression standpoint I would want a response of "goal/nogoal" and the covariates would be the players themselves. A unique shift is defined by the unique combination of players on the ice for both teams. If #8 comes off for Hamilton and #12 comes on that means a new shift for all players on the ice. I'd want the duration for each shift to go with. NHL shift data exists but I don't have the wherewithall to extract it.

Goal: See if I can tease out contributions of individuals. Conceptually replaces +/-.
Down side: Likely dependencies through "chemistry" factors.
Issues?: Computation. Public education of model.
Plus Side: Replaces +/-.

2) Empty Net/Extra Attacker Information. I'm trying to see if I can get a ball-park rate on scoring during EN/EA situations. From that I'm curious as to what the statistically "best" decision would be.

Goal: Use probability model to assess optimum empty net time within the context of the game.
Down side: A few issues using the probability model. Can a team really play 10 minutes of 6x5 w/o a breakdown even if they can over 1 min.
Upside: Could turn the tactics of the game on its head.

I hate to lay out my ideas like this for fear of theft but on the other hand I lack the data processing ability.

Puck Prospectus started publishing about a month ago.

re: #1 has already been done to some extent and it's out there on the net. The problem is that frequently you end up with very small slices of shared ice time between two players. Now throw in the fact that goals are relatively rare events and goals/not goals are effected by randomness as well as by patterns. If you have a small sample size with a rare variable that is party random--you're going to have huge confidence interval problems--trying to sort out line combinations.

The EDM/CGY bloggers have focused on SF/SA and Shift End/Shift Begin data precisely because shots and faceoffs are MUCH occur much more frequently than Goals For/Goals Against. Sure there will be some randomness in shots and faceoffs, but because of the large number of data points the pattern element will be more pronounced and the random/luck element less problematic than it is in GF/GA.

Of course Shots are not a perfect measure either, but they are a basic indicator of offensive pressure. Faceoff location isn't a perfect measure of which players advance or retreat with the puck--but it does give you a rough sense of that pattern.
 

Patman

Registered User
Feb 23, 2004
330
0
www.stat.uconn.edu
re: #1 has already been done to some extent and it's out there on the net. The problem is that frequently you end up with very small slices of shared ice time between two players. Now throw in the fact that goals are relatively rare events and goals/not goals are effected by randomness as well as by patterns. If you have a small sample size with a rare variable that is party random--you're going to have huge confidence interval problems--trying to sort out line combinations.

This is why I want to break it down player by player... CI issues are inevitable IMO... so is multi-collinearity. I would think that there could be some separability to be gained. I'm not sure what you mean by "two players" though I do worry that, naturally, players will be unidentifiable from each other... BUT that'd also be true of +/-... lets remember that nobody finds CI's for +/-. This would be the BCS ranking counter-arg... sure W-L is bad... but those CI's for the rankings cover 80 teams after 13 games?!?! (which is 100% true... look at the estimated variances on computer rankings in football... nasty.)

As for "relatively rare"... sure... a player that only sees 5 min of ice time per game is only on the ice for about 15-20 goals in a season. Needless to say, go even lower than that there are similar problems.

If anybody has a link on what has been done I'd be interested.

My thing is that what has been done is based on various tweaks of linear model theory and not much beyond that. y=f(x,\beta)+\error. The underlying behavior is rarely linear this fashion (normal dist theory).

---

pitseleh,

Thanks. The problem in hand, as far as I can tell, is non-trivial since one can only simulate the probabilities for a given "empty-net rule". Needless to say the point where you pull a keeper (more or less) is when you would gain more by pulling the goalie than by keeping him in there. If you keep a goalie in that's an easy calculation (for a computer). Unfortunately, the other value is only available through Monte Carlo. I'm going to see if I can get this thing along enough to coax two other grad students into doing this for NESSIS09. If things go to plan I won't have any vacation days for this conference.

---

FSU, I'm a Ph.D. student in Stats... I should be wrapping that up this summer. My perspective is from the ideas of distributions and likelihood models.

Personally, I think the real jumps will be analyzing cognition and reaction of players in regards to certain game-based stimuli. Tracking, reaction time, efficient decisions. The game has to be broken down before it gets built back up. Part of me believes that the eyes will tell everything.

Its either this or if somebody can advance some notions that David Brillinger put forth on the world cup with his potential function or if somebody can do something fanciful with that tank model proposal I read 2-3 years ago.
 

jaydub*

Guest
Anything out there that does a better job of evaluating goalie's than GAA/Save %?
 

pitseleh

Registered User
Jul 30, 2005
19,164
2,613
Vancouver
Anything out there that does a better job of evaluating goalie's than GAA/Save %?

There are people who tabulate Shot Quality (I'm not well versed in how exactly it works but it's based on the distance from the goal shots come and the probabilities of scoring from different areas) and Hockey Numbers calculates a shot quality neutral SV% which accounts for the differences in shot quality faced.
 

Freezerburn

Registered User
Mar 20, 2003
7,157
16
I don't really have anything to add to this discussion specifically. I'm very interested in this aspect of sports though and have found the thread a very good read. I hadn't seen that Shane Battier article before, it was great. Also, with regards to the Sabres (of which I am a fan), I have suspected that their analysis in which players to draft utilizes some sort of statistical analysis. They'd never tell anyone though haha
 

TheRumble

Registered User
Feb 19, 2009
1,465
2,285
I don't really have anything to add to this discussion specifically. I'm very interested in this aspect of sports though and have found the thread a very good read.

Agreed.

If anyone is a football fan, Lewis also wrote a book chronicling the rise of the left tackle from another faceless lowly paid offensive linemen to one of the most coveted position in all of pro football called "The Blind Side". It's not as stat heavy as Moneyball and much more readable for the casual football fan. Incidentally, the guy he profiles in his book - Michael Oher, was project to go early first round this year's NFL draft had he not withdrawn his eligibility to attend another year of school.
 

gardenfaithful44

Registered User
Apr 2, 2007
185
0
This is a subject that interests me greatly. I've read many books, specifically about baseball pertaining to stat analysis. The problem I see with using a model in hockey would be the current arguement in baseball management (Stats vs Scouts). We can study, analyze, and even create stats/models to determine player values but hockey to me will always be a sport that relies on scouts. Their are too many variables that dont show up in stats. For example, players that crash the net and pay the price to score "garbage" goals. When you look at goals scored it doesnt say where the goal was scored from or how. Player chemistry is another huge variable. Look at Cheechoo a few years back. Comparing that 1 season to his next 3, he would be looked at as an underperformer. When in reality, Cheechoo and Thorton just meshed well together for a period of time that created stat inflation. I can come up with a bunch of these situations but I think you get the point.

Also, what is that book by Gerad Gagnon that studies the basis of hockey. I'd be interested in reading that.
 

Ad

Upcoming events

Ad

Ad