News Article: "Fun With Numbers" - Advanced Stats Talk Here

Knave

Registered User
Mar 6, 2007
21,627
2,226
Ottawa
True story for you. During the interwar era it was not uncommon for people to be baseball fans and to track their teams even if they never actually went to a game. And this was before the age of television. Some listened on the radio, while others just went by newspaper stories and tracking stats.

Never underestimate the power of tracking numbers in sports.

They were happy their team was winning... they weren't trying to make judgements about the stats beyond "yay we're winning and we are a good team".

Modern analytics is "Clarkson has great puck possession numbers!". He turned out to be a steal.
 

Cosmix

HFBoards Sponsor
Sponsor
Jul 24, 2011
17,797
6,436
Ottawa
In the last 18 months I've given up watching the games entirely and now rely solely on analytics to enjoy and understand the game. I've never enjoyed hockey more.

:). You made me laugh!

You should become a mathematics PHD.
 

dumbdick

Galactic Defender
May 31, 2008
11,294
3,700
Some pet peeves for this stuff.

Not a big fan of people using shot differential stats like Corsi as a proxy for puck possession, and then going on to say that puck possession is a strong indication of team success. (shot differential drives goal differential pretty directly - don't understand the need to go to puck possession as an intermediary causal step).

I guess team corsi could demonstrate that a team is having unsustainable success. But what about team playing styles? Sens for a long time took a lot of junk shots, doesn't mean they deserved to win more games. Other teams have stud goalies which would also probably skew stats. I'm sure it can all be weighted like crazy, but the end of the day there's a lot of noise in there.

I do agree though that OZ and DZ starts are pretty good for things like player usage, mostly because they're not the kind of thing the average fan pays attention to when watching.
 
Last edited:

dumbdick

Galactic Defender
May 31, 2008
11,294
3,700
Here's a better idea...why not just do a Neilsen rating type of setup? Poll some of the fans after every game with a few standard questions, like "how easily did team A beat team B"? You'd get a quantitative measure that would encompass all of the little nuances that team stats like shot differential are never going to capture very well.
 

BonkTastic

ಠ_ಠ
Nov 9, 2010
30,901
10,092
Parts Unknown
why not just do a Neilsen rating type of setup? Poll some of the fans after every game with a few standard questions

Problem is a matter of unreliablility.

Polling some of the fans will get you too broad of a data grouping to do anything with. Some fans will say the Sens did well because X, others because Y, Others because Z, and others still because "that fight in the 2nd period".

At the end of the day, there will be no real consensus, and the data will be fairly useless.
 

dumbdick

Galactic Defender
May 31, 2008
11,294
3,700
Problem is a matter of unreliablility.

Polling some of the fans will get you too broad of a data grouping to do anything with. Some fans will say the Sens did well because X, others because Y, Others because Z, and others still because "that fight in the 2nd period".

At the end of the day, there will be no real consensus, and the data will be fairly useless.

Had another post ready to go, but actually agree with most of this.
 

StefanW

Registered User
Mar 13, 2013
6,286
0
Ottawa
www.storiesnumberstell.com
Some pet peeves for this stuff.

Not a big fan of people using shot differential stats like Corsi as a proxy for puck possession, and then going on to say that puck possession is a strong indication of team success. (shot differential drives goal differential pretty directly - don't understand the need to go to puck possession as an intermediary causal step).

I think this is a fair criticism. The logic behind using Corsi is that goal scoring is actually a relatively rare event in a hockey game. Rare events do not easily lend themselves to analysis because they end up having huge margins for error. Counting shot attempts is a pretty decent replacement because there are tons of such events in each game, and because each shot attempt also indicates that play was in the O zone (shot attempts against mean you were on the defensive).

If you apply this to an actual game situation, imagine a team being outshot and out attempted by a wide margin, but winning 1-0 or 2-1 because of great goaltending. There is almost nothing there to analyze if you are strictly looking at goals. In fact, the majority of players on each would not have been on the ice for any of the goals. If you track shot attempt ratios you can get a sense of who had a strong or not-so-strong game regardless of whether they scored. The bonus of doing this is that you can logically conclude that a team winning the Corsi battle but losing the game is likely to bounce back in the future because they are not playing badly to begin with.

To me the interesting leap in logic revolves around setting up possession as a latent variable. At this point I start to have reservations because the consistency of the logic, IMO, appears to break down.
 

Benjamin

Differently Financed
Jun 14, 2010
31,118
438
yes
My problem with using corsi for a player is that its abit too team influenced.

Correct me if im wrong. Corsi is on ice team shot attempts for vs on ice shots against.

Wouldn't it make sense to have iCorsi vs on ice shots against for using it for an individual?
 

DrEasy

Out rumptackling
Oct 3, 2010
10,911
6,569
Stützville
To me the interesting leap in logic revolves around setting up possession as a latent variable.
It can be captured statistically though. You have a "hidden" random variable (in this case puck possession, which we do not measure... yet), and it presumably influences Corsi, which you can measure. Then, presumably the same hidden variable also is a predictor for W/L, which you can also obviously measure.

You train your model with your evidence, i.e. Corsi and W/L for a significant number of games, and the result of the training is to determine how much influence/correlation there truly is between puck possession and those variables, and much of it is noise. I think this is a more accurate and realistic model than to try to directly correlate Corsi with W/L, and it can be easily expanded with other things you can measure as well.

Once you have that model, you can try predicting W/L using only Corsi. I wonder if that would yield different results than skipping the puck possession hidden variable altogether.

Basically what I'm saying is: instead of:

Corsi -> W/L (where "->" only means "influences")

you go with:

Puck Possession -> Corsi

+

Puck Possession -> W/L
 

StefanW

Registered User
Mar 13, 2013
6,286
0
Ottawa
www.storiesnumberstell.com
It can be captured statistically though. You have a "hidden" random variable (in this case puck possession, which we do not measure... yet), and it presumably influences Corsi, which you can measure. Then, presumably the same hidden variable also is a predictor for W/L, which you can also obviously measure.

You train your model with your evidence, i.e. Corsi and W/L for a significant number of games, and the result of the training is to determine how much influence/correlation there truly is between puck possession and those variables, and much of it is noise. I think this is a more accurate and realistic model than to try to directly correlate Corsi with W/L, and it can be easily expanded with other things you can measure as well.

Once you have that model, you can try predicting W/L using only Corsi. I wonder if that would yield different results than skipping the puck possession hidden variable altogether.

Basically what I'm saying is: instead of:

Corsi -> W/L (where "->" only means "influences")

you go with:

Puck Possession -> Corsi

+

Puck Possession -> W/L

Yeah, your logic is sound. My specific issues with using possession as a latent variable are: 1) Corsi measures only a certain type of possession (where you are generating shots) rather than all possession, 2) when SportsVu and other tech enter into the picture they will measure possession more accurately, and I am guessing that the match between Corsi and actual possession is there but short of what people assume, which leads to 3) Corsi, as a type of possession where you get pucks to the net, is probably a better predictor of W/L than actual possession.

The fourth issue I have with possession as a latent variable is a consistency issue, where other potential latent variables such as leadership, work ethic, etc, are discounted out of hand as being non-existent. In my view this is a modeling issue, where Corsi-based stats are thought to determine skill. I have read comments where members of the analytics crowd say something to the effect of "if leadership is a real thing it would show up in the numbers." However, if it did show up in the numbers it would be interpreted as reflecting a player who is simply "good at hockey." At this stage the models used are far too simple, and often one dimensional. Current analysis, IMO, is really prone to misinterpretation and false positives.
 

DrEasy

Out rumptackling
Oct 3, 2010
10,911
6,569
Stützville
The fourth issue I have with possession as a latent variable is a consistency issue, where other potential latent variables such as leadership, work ethic, etc, are discounted out of hand as being non-existent. In my view this is a modeling issue, where Corsi-based stats are thought to determine skill. I have read comments where members of the analytics crowd say something to the effect of "if leadership is a real thing it would show up in the numbers." However, if it did show up in the numbers it would be interpreted as reflecting a player who is simply "good at hockey." At this stage the models used are far too simple, and often one dimensional. Current analysis, IMO, is really prone to misinterpretation and false positives.
I agree... The thing with a latent variable though is that you can call it anything you want. We called it "puck possession", but you could also call it "puck possession + leadership + whatever else influences both Corsi and W/L".
 

Caeldan

Whippet Whisperer
Jun 21, 2008
15,459
1,046
Article by Yost in Sporting News taking on the question: Are the Ottawa Senators too Reliant on Erik Karlsson?
http://www.sportingnews.com/nhl/story/2014-08-07/too-reliant-on-erik-karlsson?

That IPP number, I'm not sure I'm drawing the same conclusion that Yost is?

If I understand the definition correctly based on the graph title, it's the % of goals that Ottawa scores, while Karlsson is on the ice in which he receives a point.

So basically saying that if and when Ottawa scores while Karlsson is on the ice... 55% of the time he's receiving a point for that. It has no bearing on any offense production for situations when he's not on the ice though. When you consider the only other 'playmakers' we have had is either Spezza or Alfredsson, to me it makes sense that over half the goals scored when Karlsson is on the ice - he's one of the last three people to touch the puck.
 

StefanW

Registered User
Mar 13, 2013
6,286
0
Ottawa
www.storiesnumberstell.com
That IPP number, I'm not sure I'm drawing the same conclusion that Yost is?

If I understand the definition correctly based on the graph title, it's the % of goals that Ottawa scores, while Karlsson is on the ice in which he receives a point.

So basically saying that if and when Ottawa scores while Karlsson is on the ice... 55% of the time he's receiving a point for that. It has no bearing on any offense production for situations when he's not on the ice though. When you consider the only other 'playmakers' we have had is either Spezza or Alfredsson, to me it makes sense that over half the goals scored when Karlsson is on the ice - he's one of the last three people to touch the puck.

It looks that way at first blush, but then you have to factor in instances where there is only one assist, or no assists at all.

I am not a huge fan of this stat in particular. However, when you compile these numbers for D or for forwards as a group, an interesting thing happens. A lot of great Corsi players are not the same guys as the ones we identify as great from watching games. IPP is the first fancy stat I have seen where the names on the lists correspond the players I think are the best.

There is a lot of discussion about IPP. This link provides a good overview plus IPP for forwards, and includes a few links to other IPP articles and discussions. It is all really interesting stuff, and I highly recommend giving it a read:

http://hockeyanalysis.com/2012/10/17/breaking-apart-individual-point-percentage/
 

Caeldan

Whippet Whisperer
Jun 21, 2008
15,459
1,046
It looks that way at first blush, but then you have to factor in instances where there is only one assist, or no assists at all.

I am not a huge fan of this stat in particular. However, when you compile these numbers for D or for forwards as a group, an interesting thing happens. A lot of great Corsi players are not the same guys as the ones we identify as great from watching games. IPP is the first fancy stat I have seen where the names on the lists correspond the players I think are the best.

There is a lot of discussion about IPP. This link provides a good overview plus IPP for forwards, and includes a few links to other IPP articles and discussions. It is all really interesting stuff, and I highly recommend giving it a read:

http://hockeyanalysis.com/2012/10/17/breaking-apart-individual-point-percentage/

I'm not saying that IPP doesn't make sense as an indicator of the 'better' players on a team.

I'm saying that Yost's argument (I think) that because the number is so high relatively speaking for Karlsson to his peers, that it may be coming at a detriment to overall offensive production to the rest of Ottawa doesn't make sense.

It's obvious that most of the offense generated comes through Karlsson from watching games and this stat just shows that quantifiably. But I don't think you can draw a link between that and overall (when Karlsson isn't on the ice) Ottawa production.
 

StefanW

Registered User
Mar 13, 2013
6,286
0
Ottawa
www.storiesnumberstell.com
I'm not saying that IPP doesn't make sense as an indicator of the 'better' players on a team.

I'm saying that Yost's argument (I think) that because the number is so high relatively speaking for Karlsson to his peers, that it may be coming at a detriment to overall offensive production to the rest of Ottawa doesn't make sense.

It's obvious that most of the offense generated comes through Karlsson from watching games and this stat just shows that quantifiably. But I don't think you can draw a link between that and overall (when Karlsson isn't on the ice) Ottawa production.

Right, I misunderstood (although the link I shared is still really good).

You make a really good point. I can't speak for Yost, and I can't think of a direct answer to this that I can post with any certainty.
 

Yost

Registered User
Apr 27, 2009
206
0
I'm not saying that IPP doesn't make sense as an indicator of the 'better' players on a team.

I'm saying that Yost's argument (I think) that because the number is so high relatively speaking for Karlsson to his peers, that it may be coming at a detriment to overall offensive production to the rest of Ottawa doesn't make sense.

It's obvious that most of the offense generated comes through Karlsson from watching games and this stat just shows that quantifiably. But I don't think you can draw a link between that and overall (when Karlsson isn't on the ice) Ottawa production.

The point of the IPP and individual shot attempt generation is to try and catch other instances where guys may be involved in shot generation (i.e., through passing). It's my theory that Erik Karlsson's shouldering an ungodly load (which manifests in the raw shot attempts), and to the extent that he's not getting shots off, he's passing to immediately get shots off.

This all makes sense -- we see Karlsson dominate games, we see him dominate control of play in the OZ, we see him rack up a ton of points. The problem I think is that he's almost too responsible for shot generation. Defenders take an absurd portion of low percentage shots due to the nature of their position; a heat map for Karlsson sort of exhibits that's the case.

4vc0vEu.png


A lot of people point to the disparity in Ottawa's elite Corsi% w/ Karlsson on the ice and average Goal% w/ Karlsson on the ice and attribute it to his 'defensive woes', which I'm not sure is entirely the case. It's possibly playing a role. But there's also part of me that sees that 'Defenseman Taking Massive Number of Shots in OZ' would probably artificially deflate on-ice shooting percentages in Ottawa's favor. (http://stats.hockeyanalysis.com/rat...&teamid=21&type=goals&sort=ShPct&sortdir=DESC kind of lends credence to that theory).

So, to answer your question: I'm not really venturing into on-ice v. off-ice stuff. I'm kind of more wondering why Karlsson's on-ice numbers don't see more favorable goal percentages, despite the fact that we know he's an elite possession driver by every stretch of the definition. And it's obviously not tied into qual teammate -- he generally plays with top lines, year after year.

What I think happens is Karlsson's boosting his line's Corsi% through more OZ time, but the team's experiencing a bit of a shot quality drag because of it. Which isn't necessarily a bad thing -- so long as you have a five-man unit that can keep pucks out of their own net on the transition back. That, of course, has been a problem.
 

StefanW

Registered User
Mar 13, 2013
6,286
0
Ottawa
www.storiesnumberstell.com
The point of the IPP and individual shot attempt generation is to try and catch other instances where guys may be involved in shot generation (i.e., through passing). It's my theory that Erik Karlsson's shouldering an ungodly load (which manifests in the raw shot attempts), and to the extent that he's not getting shots off, he's passing to immediately get shots off.

This all makes sense -- we see Karlsson dominate games, we see him dominate control of play in the OZ, we see him rack up a ton of points. The problem I think is that he's almost too responsible for shot generation. Defenders take an absurd portion of low percentage shots due to the nature of their position; a heat map for Karlsson sort of exhibits that's the case.

4vc0vEu.png


A lot of people point to the disparity in Ottawa's elite Corsi% w/ Karlsson on the ice and average Goal% w/ Karlsson on the ice and attribute it to his 'defensive woes', which I'm not sure is entirely the case. It's possibly playing a role. But there's also part of me that sees that 'Defenseman Taking Massive Number of Shots in OZ' would probably artificially deflate on-ice shooting percentages in Ottawa's favor. (http://stats.hockeyanalysis.com/rat...&teamid=21&type=goals&sort=ShPct&sortdir=DESC kind of lends credence to that theory).

So, to answer your question: I'm not really venturing into on-ice v. off-ice stuff. I'm kind of more wondering why Karlsson's on-ice numbers don't see more favorable goal percentages, despite the fact that we know he's an elite possession driver by every stretch of the definition. And it's obviously not tied into qual teammate -- he generally plays with top lines, year after year.

What I think happens is Karlsson's boosting his line's Corsi% through more OZ time, but the team's experiencing a bit of a shot quality drag because of it. Which isn't necessarily a bad thing -- so long as you have a five-man unit that can keep pucks out of their own net on the transition back. That, of course, has been a problem.

Thanks for taking the time to respond directly to this. By the way, love the heat map chart in your reply.

One thing that I have often wondered about when it comes to big minute players like EK is the degree to which we can say they are playing on with the top 5 man unit. If our top forwards are in the mid 20 minute range and Karlsson is on the ice for roughly the same amount of time then it makes sense to view them as a 5 man unit. But when EK cracks the 30 minute barrier often and the top forward line minutes are not increasing, then he is playing with the second line or worse. If there is a talent dropoff from the first line down then I would expect the extra minutes to create a drag effect on some of EKs numbers. E.g. the Spezza line was 2nd last year, and that line was on the ice for an ungodly amount of goals against.

Just a thought.
 

dingbatz

Registered User
Apr 20, 2013
3,113
29
Thanks for taking the time to respond directly to this. By the way, love the heat map chart in your reply.

One thing that I have often wondered about when it comes to big minute players like EK is the degree to which we can say they are playing on with the top 5 man unit. If our top forwards are in the mid 20 minute range and Karlsson is on the ice for roughly the same amount of time then it makes sense to view them as a 5 man unit. But when EK cracks the 30 minute barrier often and the top forward line minutes are not increasing, then he is playing with the second line or worse. If there is a talent dropoff from the first line down then I would expect the extra minutes to create a drag effect on some of EKs numbers. E.g. the Spezza line was 2nd last year, and that line was on the ice for an ungodly amount of goals against.

Just a thought.

Relevant to your point: http://www.hockeybuzz.com/blog/Travis-Yost/Isolating-For-Performance/134/61384
 

Micklebot

Moderator
Apr 27, 2010
53,145
30,369
Yeah, your logic is sound. My specific issues with using possession as a latent variable are: 1) Corsi measures only a certain type of possession (where you are generating shots) rather than all possession, 2) when SportsVu and other tech enter into the picture they will measure possession more accurately, and I am guessing that the match between Corsi and actual possession is there but short of what people assume, which leads to 3) Corsi, as a type of possession where you get pucks to the net, is probably a better predictor of W/L than actual possession.

The fourth issue I have with possession as a latent variable is a consistency issue, where other potential latent variables such as leadership, work ethic, etc, are discounted out of hand as being non-existent. In my view this is a modeling issue, where Corsi-based stats are thought to determine skill. I have read comments where members of the analytics crowd say something to the effect of "if leadership is a real thing it would show up in the numbers." However, if it did show up in the numbers it would be interpreted as reflecting a player who is simply "good at hockey." At this stage the models used are far too simple, and often one dimensional. Current analysis, IMO, is really prone to misinterpretation and false positives.

Just to the bold point, there has already been a lot of work done to show the correlation of Corsi and Possession; Vic Ferrari showed it has an extremely strong correlation to offensive zone time, and Pension Plan Puppets did a similar study focusing on the leafs, using a stop watch to manually track it.

The one thing many tend to assume is that when people say it correlates to possession, they think it means anywhere on the ice where as it's specific to OZ time. I'm not sure how relevant possession is waiting behind the net for your forward lines to change, so I for one am not to broken up about Corsi or Fenwick not necessarily correlating to that.
 

Caeldan

Whippet Whisperer
Jun 21, 2008
15,459
1,046
Just to the bold point, there has already been a lot of work done to show the correlation of Corsi and Possession; Vic Ferrari showed it has an extremely strong correlation to offensive zone time, and Pension Plan Puppets did a similar study focusing on the leafs, using a stop watch to manually track it.

The one thing many tend to assume is that when people say it correlates to possession, they think it means anywhere on the ice where as it's specific to OZ time. I'm not sure how relevant possession is waiting behind the net for your forward lines to change, so I for one am not to broken up about Corsi or Fenwick not necessarily correlating to that.

Well I guess you could argue that any time you have the puck on your stick, even if it is behind your net - the opponent isn't scoring on you.
 

StefanW

Registered User
Mar 13, 2013
6,286
0
Ottawa
www.storiesnumberstell.com
Just to the bold point, there has already been a lot of work done to show the correlation of Corsi and Possession; Vic Ferrari showed it has an extremely strong correlation to offensive zone time, and Pension Plan Puppets did a similar study focusing on the leafs, using a stop watch to manually track it.

The one thing many tend to assume is that when people say it correlates to possession, they think it means anywhere on the ice where as it's specific to OZ time. I'm not sure how relevant possession is waiting behind the net for your forward lines to change, so I for one am not to broken up about Corsi or Fenwick not necessarily correlating to that.

Yup, I'm aware of that work. That is why I said that Corsi measures a certain kind of possession rather than measuring possession writ large.

My point about SportsVu was that when actual possession is measured league-wide over a long period of time I believe the outcome will be that the correlation between possession and Corsi is stronger with some players than others. I wont know if I am right, of course, until that type of data becomes readily available.
 

Ad

Upcoming events

Ad

Ad

-->