# Statistics in Hockey; What is Their Place?

Discussion in 'By The Numbers' started by Sensfanman, Jan 24, 2011.

Hello HF, this is a subject I've been hesitant to bring up due to possible lack of interest but I'll give it a go.

I am a numbers guy, I can really only think in numbers. I've recently gotten into hockey analytics and I find it absolutely fascinating albeit a budding area of study. That being said, I believe stats can be a better indicator of a player's abilities than even the well trained eye and at the very least a worthwhile supplement or alternative to viewing a player.

Consider this: do you really see everything when you watch a game? What about on TV? Do you see all the plays away from the puck or contributions at the start of a shift that lead to a goal 30 seconds later? I'm not saying statistics capture all of this but they do give you an unbiased, quantitative look at a player based on results under the assumption that good results are generated by good play. No home team bias or hating on a rival player. If used properly, they can give you a nearly complete picture of player's value, beyond what we can see with our eyes.

Some stats to make my case:
GVT by Tom Awad, hosted on behindthenet.
http://hockeyprospectus.com/article.php?articleid=233
http://www.behindthenet.ca/2010/gvt.php?sort=14&mingp=10&team=ALL&pos=ALL

This stat is split into various parts and the derivation is beyond complex. It works like VORP in baseball and gauges a player's impact beyond a threshold (borderline) player. It uses points, TOI, team scoring, Sv%, a modified (better) plus/minus and does the best attempt at defensive quantification I've ever seen. It also factors in total team contributions to defense rather than isolating it. Check out the list on behindthenet and I'm sure the rank order would be similar to an expert consensus.

The only thing missing from GVT is who you play with and against. Something behindthenet attempts to quantify with QualComp and QualTeam

http://www.behindthenet.ca/nhl_stat...0_s&f2=5v5&f7=10-&c=0+1+3+5+11+12+13+14+15+16

It looks at CORSI (developed by Sabre's goaltending coach Jim Corsi) which does a +/- based on shots allowed and taken while on the ice instead of goals. It's proven to be a solid approach. What behindthenet does is take a look at who a player plays with and against and factors in CORSI (or other stats) to quantify who plays against harder opposition with how much help. It's does quite a good job to add context to any stats.

For example, Lidstrom is #3 in GVT for defensemen but has the highest CORSI Rel QualComp in the league with below average CORSI Rel QualTeam. Compare to Byfuglein (#2 dman in GVT) who has a lot of help and faces average competition.

Together, I think these three stats, GVT + CORSI Rel QoC + CORSI Rel QoT form a very informed picture without watching a game.

What is your take?

Herein lies my biggest problem with these type of stats. Stats should be used to supplement your argument, not to form an argument because of a stat. Put it this way, I wouldn't want someone to look at that and say "well, Lidstrom has the best CORSI Rel QualComp therefore he should win the Norris".

While I can appreciate stats (or sabremetrics) in a sport like baseball, I truly don't believe those types of stats are the best indicators in the sport of hockey. Ironically, hockey is such a face paced game that it's difficult for a viewer to form an opinion as there are so many different things going on (as you've mentioned). Maybe I'm just old-school but I will never believe stats can give you the fullest and most complete analysis of the game that you've never watched.

This mindset is where baseball was 30 years ago and sabermetrics are still learning and growing. Hockey is only in the early stages of developing these types of advanced stats so they often do not tell the entire story, but they will become extremely valuable in the future.

Home Page:
My take is that numbers are a great tool for informing (supporting) a point of view. I follow stats, but the depth to which some today (including on this very board) drill down numbers is truly admirable.

All that said, statistics are not a de facto substitute for an opinion. Unfortunately, one sees that misperception played out here regularly - sterile numbers presented, bereft of context and perspective.

Likewise, investing in a pair of skates and a stick, as well mastering the ability to observe, formulate and articulate thought beyond simply reciting numbers, are wonderful things from which everyone, number cruncher or otherwise, can benefit greatly when analyzing and discussing the game.

I'm familiar with sabremetrics and this mindset within baseball. However, I think that baseball is a sport where this makes a lot of sense. It's a lot of 1on1 based stats that can translate well, and do play an important factor in preparation and in-game adjustments. I just don't see it being as transferable in hockey. Do you see teams actually using this to form game plans in the future? I'm not sure I do. I'm not saying it won't be valuable, but IMO, it'll never be the same as it is with baseball.

I never actually thought about your last point and it's a good one IMO.

As for stats being better suited to baseball, that's true to a degree. It's much, much easier to break down baseball as it's very static. Recently, grounds are being made in Basketball, which is much more fluid. Breaking down the game into possessions as every action either maintains or turns over possession is a modern approach to studying basketball; http://en.wikipedia.org/wiki/APBRmetrics. It's hardly 20 years old and probably closer to 10.

Hockey is even more fluid, and very hard to analyze. It doesn't mean it can't be done. I've always wanted to see how much time a player spends controlling the puck and how they direct control. Factor in the number of times with the puck, passes attempted, pass interceptions, takeaways, giveaways, etc and I think you can model a game in a similar way to basketball.

As of now, we can only monitor averages and results. It's very hard to get into a game statistically which is the biggest limitation IMO.

The nature of the game of hockey doesn't necessarily lend itself to using advanced stats to make in-game adjustments, I agree with that. However, it doesn't undermine the usefulness that statistics can have in evaluating players over larger sample sizes of games. I am not saying that stats should be the end-all-be-all of player evaluation but I think they can become extremely valuable and not simply something to support your argument when you want it to. Ultimately, I agree it is just another tool in evaluation, not THE tool. It is certainly complementary aspect and will wield the influence that Sabremetrics have on baseball, but it certainly shouldn't be ignored.

Much in the same way that scouts, GM's, coaches and players will often have different views on a players or different solutions to a problem, statistics will simply add another voice to the argument. Hopefully, advanced statistics will progress at a rate that allows its voice to be heard with equal weight to the conventional evaluation tools.

Basketball is beginning to embrace this movement a little as well and Daryl Morey (GM of the Rockets) is doing a lot for it. He is also the chairman of the annual MIT Sloan Sports Analytics Conference. Michael Lewis (author of Moneyball) actually wrote a piece for the New York Times detailing the use of such statistics and evaluating more heart-and-soul type players (and their imminent importance) like Shane Battier. http://www.nytimes.com/2009/02/15/magazine/15Battier-t.html?_r=1&pagewanted=2

I'm going this year. It's going to be awesome but given that it's called "Dorkapalooza" by Bill Simmons, how could it not be amazing haha.

Statistics can be useful, but you have to be extremely careful with their application. Numbers only measure the values you deem to be important in the formula, after all. And there is a huge danger in overapplying statistics that measure very specific things to categories that are far too broad.

Heck, take CORSI. If you're a team that runs a counter-attack system and does not mind getting outshot 35-20 every game, you won't stack up favorably with teams that really limit the shots against. But your team might be winning 60 percent of its games while their team wins 40 percent. Who is the better player? CORSI would probably say the player on the losing team. It's limited because it places a value on a stat that is not necessarily predictive.

And there's a huge danger of assuming there will never be outliers. For instance, there was a terrible post on an SBNation Oilers blog recently basically saying the Dallas Stars were lucky to be at their current points totals because they had won a lot against the East and because they won a lot of one-goal games. He had all sorts of numbers to back this up.

But both points were fairly ludicrous. For one, all Western Conference teams have basically equal opportunity to rack up points against the East (and vice versa). That the Stars are more effective than most tells you nothing than.... the Stars are more effective at that than most. Nothing further. The second point was even dumber. The Stars do win a disproportionate number of one-goal games, but that doesn't mean it will equalize or that they're "lucky." Being an outlier sometimes just means you're an outlier.

If a team has a 38 percent power play, at this point of the year, I'd say chances are greater than not that it will stay much, much higher than the statistics will tell you it should because we're well more than halfway through the sample set. True numbers nerds, however, will try to make an argument that their PP numbers say that team's power play will drop dramatically over the last half of the season because no team has ever had an average that high or whatever. Outliers will always exist, no matter how much the numbers might say things will equalize.

Particularly in a fluid sport like hockey, which does not break down well into a finite number of one-on-one scenarios like baseball, numbers are a great addition to an eye test, but they do need to be used in conjunction with informed observation, and too many stats-heads don't do that to the necessary extent to make their numbers have any real meaning.

I think statistics in hockey are at least an order of magnitude behind the quality and relevance of statistics in baseball.

For one thing, there is a common critical measure between offense and defense in baseball that allows for the two factors to be reliably combined..."outs." What's more, we can translate outs (as well as any other "positive" offensive events-like a single-that takes place on the diamond) directly into context neutral "runs" thanks to things like linear weights (which have been empirically derived). These context neutral "runs" can also be further adjusted for things like park factors, season/era played, etc. On defense, outs are measured both by opportunity (how many "out" opportunity a player generates relative to average for his position adjusted for total number of chances) and efficiency (how often an "out" opportunity is actually converted into an out).

Combine the ability of a player to generate "out" chances relative to his positional peers with his ability to be efficient with the "out" chances he generates (which are then empirically weighted according by relative importance to other defensive positions), with their ability to avoid making outs on offense (which can also be measured relative to their peers and weighted against other positions) and their ability to generate high run scoring outcomes with their non-outs (measured by linear weights, etc.) and you have a fairly advanced measure of a player's individual contributions to his teams success, and to his contributions relative to other players.

There is some variation in the defensive measurements depending on how they are done, but not as much as their once was. Balls in play are already subjectively categorized for defensive metrics that use that data with a fairly high degree of correlation, but they're currently working on a system that empirically tracks ball velocity/trajectory off the bat (the same way that Pitch F/X data tracks ball velocity/trajectory/movement) that will make it even more sophisticated/objective/reliable, and it's only a matter of time before they start empirically measuring the players' movement throughout the play as well.
-----------

In these statistics, he attempts to combine offensive contributions and defensive contributions without a common measure between the two. Offensive contributions are based off (non-empircally) weighted counting stats. Obviously, there's much more to creating offense than goals and assists. What about transition play? Puck battles? The ability to generate turnovers? Faceoffs? Etc. There's also no differentiation between points scored at even strength and points scored on special teams.

Defensive contributions are measured in shots against (without any ability to distinguish between shot quality, or even shot position). That might be the best team defense indicator we have, but it's still a fairly weak way to analyze individual defensive prowess (especially when Plus/Minus is also being used to help derive those contributions), or the number of quality scoring chances that a team is generating against. What about the ability of a forward to backcheck (or rotate back to cover a pinching defenseman) to eliminate odd-man rushes? The ability to limit in-zone turnovers? The ability to win puck battles? Faceoffs? Etc. What about considerations for special teams play since good PP players are obviously rewarded for their abilities? There's no accounting for who one plays with, or against (QualComp/QualFor is an attempt, but it's just an inferential crutch. It doesn't actually quantify how much those things affect the production of a specific player.) There's no way to distinguish between shots and scoring chances, no way to empircally measure individual contributions, and no way to equate shots against to goals so the two measurements can be reliably combined. It also doesn't help that the relative weighting of the defensive contributions (between forwards and defensemen) is completely arbitrary. It doesn't even take into account the different defensive responsibilities between a winger and a center.

Looking at the second link you posted, the top offensive contributors have numbers three times higher than the top defensive contributors. There are 59 offensive scores (non-goalie) and 23 goalie scores higher than the highest defense score. If allowing a goal is just as damaging as scoring a goal is favorable, then there needs to be some adjustment in the relative scoring weights (assuming they weren't inherently flawed and completely incompatible to begin with).

Speaking of amiss, why are the goalies lumped in with the skaters? It's hard enough to combine forwards and defensemen...and they even contribute in similar ways on the ice. I agree that save percentage is the best measure of a goalie's on-ice contributions, but once again there's no real accounting for the number of quality scoring chances faced on a rate basis relative to other goaltenders. Regardless, the role of a goalie is so dramatically outside the realm of skaters that they should never be combined.

The bottom line is that what hockey has is very primitive indeed, even these "advanced" statistics. At best they are incomplete. At worst, they are round-about approximations based largely on inferences and arbitrary determinations. They have a long way to go before they can be relied upon to accurately/reliably augment what a trained eye sees, much less replace it completely.

12. ### DadoGuest

According to the F rankings, Stamkos and D.Sedin make a much stronger pairing than D.Sedin and H.Sedin.

Which is a long way of saying I can appreciate the value it provides in assessing how a team is put together, but I'm not convinced it provides any predictive value in figuring out *how* to put a team together.

Stats are their to give you a very broad briefing of what happened if you missed it.

They also help answer your questions you ask during the game like "man datsyuk took the puck again...I think thats his 5th time this game" then you look it up and its 6 and you feel good cause you notice the play that much. Besides that, I dont think stats really define the value/ability of a HOCKEY player.

This game is way too dynamic to have stats be a true baseline of which player is better than the other. Theres no pressuring the dman stat is there? Little stuff like that matters a huge amount. Youll notice that watching a game, not looking at stat sheet.

Any statistic that gives an individual player a rating based on a team stat is coplete garbage. +/- and Corsi fall into the category of complete garbage. Giving a player a - on a goal that was not his fault is like giving the passenger in a car a speeding ticket.

Fact is, few statistics give you significant insight into a hockey player's performance. Goals and assists are very good measures of one's offensive ability. The number of hits tells you about the physicality of a player. Outside of those numbers, there aren't a lot that are very helpful. You need to watch a player to understand how well he plays defensively, there is no statistic currently to measure defensive play.

All goaltending statistics are skewed based on the team in front of the goaltender so none are truly useful to definitively tell us how good a goaltender is. This again must be observed.

Some stats are excellent tools, unfortunatley many people try to take the flawed stats and use them to provide flawed insight.

@EastonBlues22

Great insight. I fully agree with the major differences in baseball vs hockey analysis but I do think it's possible to eventually get hockey into a granular form. You could even do something outrageous like tracking the distance a puck from the oppositions net and how much a player contributes to that (with weightings for passing back which ends up ahead via a stretch pass and what not).

As for the GVT, I think you have to read the derivation more. He does factor in PPTOI and adjusts the offensive weighting for it which ends up effecting Offensive Rating. Furthermore, do we even need to measure puck battles and like? If a player wins a puck battle and the team scores as a result. That's reflected in the Relative +/- (the defensive factor which you can argue is misappropriating). If the puck battle win doesn't result in a goal, then who cares?

While you disagree with the scaling, looking at the numbers, Forwards have just over 1.5 times as much offense score than defense on average and Defensemen have just over 1.5 times as much defense score than offense. So the direction is valid.

The offensive/goalie combination is a behindthenet way of collapsing the columns I think.

So while I agree with you points, I would argue that directional stats that can make relative comparisons are as valuable as watching a game and judging. Are you making perfect judgments? Probably not, but you get a directional or relative sense as to who is good.

@Dado
GVT does not factor in chemistry between players but that's damn near impossible. I don't even think they measure chemistry in baseball but someone can answer that better than I.

@blueberrydanish
We think blueline pressure is important but what does it do? Defend against point shots? Generate breakaways? If blueline pressure does indeed do this, it will be reflected in the Relative Plus Minus or CORSI. Perhaps it's more valid to say that current hockey stats are too broad and lose information but they reflect the results of value add plays.

Without stats, the number of posts on HF goes down to 8

It's funny because I'm a big Toews fan...but it's not for the intangibles. It's because he's a strong and tenacious skater that stays busy by engaging in 1 on 1 battles all over the ice, winning them, and then making good decisions with the puck. This gives his teammates more time with the puck and probably gets them the puck in better situations with more room and space. Create's chances off the turnover.

If you can find me stats that can tell me which players are the best at engaging in battles and winning them, and which players make the best decisions with the puck and which players benefit their linemates the most...I'd love to see it. It would be valuable. It is after all, a big part in winning hockey games. You could spend all your time trying to make this stat but it is probably unrealistic. Watch hockey, evaluate players. To this day scouts still do this.

I think Corsi is a good stat though, I think it provides some light into what I was just touching on above.

19. ### DadoGuest

Keep a running tally of how close they are to the nearest opponent at any given time, and whether they have the puck or not at that time.

I'd be that distribution would show some really interesting things.

I just went of what he said in the primer...never made it to the formula followup.

Puck battles and faceoffs are important because they directly contribute to puck possession time, which suppresses opposing chances against and can bolster offensive opportunities. You can argue that these things are reflected in plus/minus or shots disparity...but the difference is that these are individual plays, and so they can be used to directly measure individual contributions (as opposed to trying to derive individual contributions from a collective measure). If the goal is to measure individual contributions, the data should come from individual plays. If there's no practical way to measure these things yet, then we should stick to metrics that reflect teams as a whole.

I'm not sure exactly which part of which metric you're referring to here. I'm going to assume it's the offensive contribution to GVT vs the defensive contribution.

It might be valid within sets (I'm not conceding this at all, but for the sake of argument), but it's not necessarily valid if you're going to combine the two sets. If you're going to combine the two sets, then the scales should be equal...not just in equal proportion. After all, a goal against is just as damaging as a goal for is beneficial. (As a tangent, in baseball there's a trend away from thinking that a run is a run is a run. After all, if a team scores a run in the top of the ninth to make the score 10-1 they haven't significantly added to their probability of winning the game at all. Scoring a run to make the score 1-0 in the same situation, on the other hand, increases their probability a great deal...but I'm ignoring that sort of thinking for the moment just for the sake of conversation).

Anyway, according to this metric, if there was a universal draft then approximately 60 skaters and 23 goalies should be drafted before the first defenseman is taken because they provide more value over replacement level than a defenseman does. Also according to this metric the top defensive players in the league are worth about 5 goals more than a replacement level player (to this point in the season, anyway) on the defensive side of the puck. I'm sorry, but that doesn't pass the smell test at all.

There's an interesting study that TangoTiger does every year. He asks a great many baseball fans to rate their players on a scale for multiple abilities. There is some selection bias because these are typically fans that discuss their team often and watch a lot of games, as they are primarily recruited from high traffic message boards. However, they are almost never professional opinions. The result is that the collective opinion almost always correlates well with what the defensive metrics "see."

In other words, for people who watch the team regularly, their eyes are often (collectively) a fairly reliable indicator for what's actually taking place. What statistics can help illuminate are things that we can't necessarily see (you can only focus on so many things at a time, especially on TV), and it can help illuminate the relative importance of one thing to another (another thing we are not necessarily good at doing...ascribing meaning to what we see). Right now hockey statistics are not at a point where you can tell much about an individual's play from the numbers. It's barely better than looking at a box score (since much of the input comes directly from there anyway), which makes it not very good at all.

What hockey needs first and foremost is some statistical measures that reliably record the results of meaningful individual plays (takeaways in the offensive zone, giveaways in the defensive zone, blocked shots, quality scoring chances created via pass or shot, qualtiy scoring chances eliminated via backcheck or by defensive play on an odd man rush, etc.)...whether in real-time, or after the fact. The quality of the analysis directly depends on the quality of the data being analyzed...so, until they have meaningful individual data to analyze, no truly meaningful individual analysis can be done.

I understand your point about the scaling now. I'll have to look into the stats more (they are still quite complex) to see if there's a reason for that.

You also raise an interesting point on TangoTiger which extends beyond this discussion but is worth talking about. If these defensive SABRmetrics match up to the collective opinion but, at least in Moneyball anyway, scouts viewed it differently, then is it possible professional scouts had trained themselves out being able evaluate guys like B.J Upton?

Tango's project is a post-moneyball activity, and is generally unrelated to that discussion. It focuses solely on how accurately fans "see" the game...with the relatively surprising result that they see it pretty darn well on average. However, there's no attempt for either the fans, or the statistical data their observations are being compared with, to project performance moving forward.

The scout vs stat debate revolves around trying to take the observed/statistical performance of relatively undeveloped players and translate that into five to ten year projections for that player's future. As I mentioned before, fans (in general) can be relatively accurate at rating what they observe...but they're generally much less apt at assigning meaning to it. A fan can tell you that "X" is hitting the hell out of the ball, but it's the scout who's going to be able to give you an educated opinion on whether that power will translate professionally, or whether it's, say, just a product of aluminum/composite bats and inferior pitching.

Obviously, scouts can watch the same player (even at the same game) and draw wildly different conclusions. On the surface, that suggests that statistical analysis might be a better way to go. The truth is, though, that stats don't do a very good job of projecting the future performance of relatively undeveloped amateur talent. For one thing, there simply isn't much data available on those players to crunch. For another, many of the players are going to drastically change from a physical standpoint over the next five years. Statistical projection models do much better with relatively finished products who have a long historical data trail accumulated. The difference between a high school hitter and a hitter in his third professional year in the minors is astronomical from a data/projection standpoint.

Most of the very good scouts I know today acknowledge that observation and number crunching need to work hand-in-hand for the best results. It's rare that I find a number cruncher who doesn't agree. Some organizations focus more on one or the other, but I can't think of any that don't think that both are important.

Without statistics to argue over would HF even exist?

On a different note, I'd love to see a PIMs drawn stat.

This already exists.

I go a pretty even 50/50 split between stats and first-hand observations.
Example: Messier is ridiculously overrated if you go by stats alone, but to ignore the prevailing attitude of observers at the time is also a mistake - I think the true answer lies somewhere in between.

