The Advanced Stats Thread Episode IX

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
Oh yeah, I completely agree.

On a game-by-game basis, any additions or subtractions I would make would be entirely subjective and worth very little anyway.

I'm thinking in terms of larger sample sizes. Like, if Hayes' Corsica xGF% is 50 by Xmas, and his PDO is in line with his average since 2015 (102.3), then perhaps it would be more accurate to estimate that his GF% will be 55.22% as opposed to the 51.15% (?) that those numbers suggest.
I'm not sure I follow.
 

nyr__1994

Registered User
Apr 4, 2006
709
172
Raleigh, NC




The outputs are only as good as the inputs. NHL is listing the Kreider goal as a 10 foot wrist shot that isn't a rush attempt. If you take a magnifying glass to xG, you'll be able to find countless items like this that are "wrong" (for lack of a better term). It's use is when viewing it as a whole, not as an individual.


Can you then explain, how if you have crappy data going into a model how you are getting good data out of it?

Also I see above that you have not read the article on the data that Vally has produced. He and his team have tracked thousands upon thousands of shots, on their own, not using NHL PbP to produce their product. It really provides a lot of insight into how to look at numbers and what they mean as far as different shot types and the 'danger' of a scoring chance. It is worth the time to read.
 

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
Can you then explain, how if you have crappy data going into a model how you are getting good data out of it?

Also I see above that you have not read the article on the data that Vally has produced. He and his team have tracked thousands upon thousands of shots, on their own, not using NHL PbP to produce their product. It really provides a lot of insight into how to look at numbers and what they mean as far as different shot types and the 'danger' of a scoring chance. It is worth the time to read.
I have not read the article, and I won't. Vally's team subjectively throws data out. It's bad practice, IMO, and I won't support it.

Not all of the data is crappy, and some of the inconsistencies in the data are consistent enough to adjust for, ie. shot location data in certain arenas is known to be shoddy and is therefore adjusted in every xG model for the shot location input.

There might be a few bad data points out of every 100, but to throw everything out on that would be wrong.

But there really isn't anything wrong with the Kreider goal. It was a wrist shot, in terms of classification categories, and it was from 11 or so ft out. What's the problem? That it didn't track that it was a royal road pass? Sure, you want that info, but that's not something you're going to get from NHL PBP.
 

nyr__1994

Registered User
Apr 4, 2006
709
172
Raleigh, NC
I have not read the article, and I won't. Vally's team subjectively throws data out. It's bad practice, IMO, and I won't support it.

Not all of the data is crappy, and some of the inconsistencies in the data are consistent enough to adjust for, ie. shot location data in certain arenas is known to be shoddy and is therefore adjusted in every xG model for the shot location input.

There might be a few bad data points out of every 100, but to throw everything out on that would be wrong.

But there really isn't anything wrong with the Kreider goal. It was a wrist shot, in terms of classification categories, and it was from 11 or so ft out. What's the problem? That it didn't track that it was a royal road pass? Sure, you want that info, but that's not something you're going to get from NHL PBP.

But isn't adjusting data kinda the same thing as selectively eliminating it? In both cases you are changing the input. This is the biggest issue I have with all of the new stat packages. They all assume that a shot is a shot is a shot to some extent. There is no context to each shot, was there a screen, was the goalie moving, who was shooting, those all make differences. Some teams are better at generating higher danger chances than others. Look at the Hurricanes - They average over 40 sog a game, yet cant score... They are #1 in xGF yet only 18th in Goals scored. Do you really think the canes will be top 5 in goals scored this year?

I definitely think there is value is some of these models and numbers, I am just not sure how much. And then you have someone who has invested in the resources to try and filter this data to get more reliable numbers and you simply dismiss it....
 

Irishguy42

Mr. Preachy
Sep 11, 2015
26,791
19,038
NJ
can one of you fine gentleman show me a comparison of vesey's advanced stats this year compared to last year? eye test he's alot better this year and curious if the stats tell the same story.
I have not forgotten you.

Just remember, the stats we have this year are a smaller sample size, so take that with a grain of salt.

The following stats are at 5v5 and adjusted (per Corsica)

17-18:
TOI: 979.56
P/60: 1.35
P1/60: 1.23
CF%: 44.32
relCF%: -1.18
GF%: 43.03
relGF%: -2.48
xGF%: 46.41
relxGF%: -1.06
iCF/60: 12.43
ixGF/60: 0.87

18-19:
TOI: 190.15
P/60: 1.58
P1/60: 1.26
CF%: 46.82
relCF%: -0.04
GF%: 48.63
relGF%: 0.6
xGF%: 52.03
relxGF%: 1.96
iCF/60: 15.78
ixGF/60: 1.1

So, yes, in this short sample size, he is performing better than last year. Still not great, but better. But let's see where this ends up at the end of the season, or even halfway through the season.
 

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
But isn't adjusting data kinda the same thing as selectively eliminating it? In both cases you are changing the input. This is the biggest issue I have with all of the new stat packages. They all assume that a shot is a shot is a shot to some extent. There is no context to each shot, was there a screen, was the goalie moving, who was shooting, those all make differences. Some teams are better at generating higher danger chances than others. Look at the Hurricanes - They average over 40 sog a game, yet cant score... They are #1 in xGF yet only 18th in Goals scored. Do you really think the canes will be top 5 in goals scored this year?

I definitely think there is value is some of these models and numbers, I am just not sure how much. And then you have someone who has invested in the resources to try and filter this data to get more reliable numbers and you simply dismiss it....
If there's proof that the input is faulty and you can adjust it, then no, I wouldn't agree that it's selectively eliminating it.

I'm really glad Vally is getting the recognition he is. I'm hopeful what he's doing will help bring a lot of the objectively better things other people are doing into the mainstream, as well.
 

Mac n Gs

Gorton plz
Jan 17, 2014
22,580
12,822
FWIW, I see nothing wrong with using exclusionary criteria in data analysis. I work in clinical research and have to do so when appropriate for various physiological recordings.

The big if here is Vally following this strictly and applying it without selective bias.
 

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
The big if here is Vally following this strictly and applying it without selective bias.
The problem is that this is impossible. If he has 5 trackers, you're going to get 5 different variations of what a "No shot" is. You poll 5 different NHL coaches, and you're going to get 5 different examples of what is or is not a scoring chance.
 

Mac n Gs

Gorton plz
Jan 17, 2014
22,580
12,822
The problem is that this is impossible. If he has 5 trackers, you're going to get 5 different variations of what a "No shot" is. You poll 5 different NHL coaches, and you're going to get 5 different examples of what is or is not a scoring chance.
Eh, not necessarily. If he strictly says only offensive-zone shot attempts will be used in analysis, that’s a pretty easy criteria to follow. Will it exclude some purposeful attempts at slappers from the neutral zone? Sure, but I’d bet that it would exclude events like Stepan’s 150-foot shot on goal more frequently.

I don’t like doing it, but I can support it if it’s done in a statistically justified manner. Six sigma is another easy method to spot outliers that I’ve used before and is applied in physics regularly
 

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
Eh, not necessarily. If he strictly says only offensive-zone shot attempts will be used in analysis, that’s a pretty easy criteria to follow. Will it exclude some purposeful attempts at slappers from the neutral zone? Sure, but I’d bet that it would exclude events like Stepan’s 150-foot shot on goal more frequently.

I don’t like doing it, but I can support it if it’s done in a statistically justified manner. Six sigma is another easy method to spot outliers that I’ve used before and is applied in physics regularly
I think it's much more broad. He's throwing out like 14% of shots a game. That's absurdly high. I'd love transparency of what is or is not a shot, but we'll never get that.
 

Mac n Gs

Gorton plz
Jan 17, 2014
22,580
12,822
I think it's much more broad. He's throwing out like 14% of shots a game. That's absurdly high. I'd love transparency of what is or is not a shot, but we'll never get that.
Unless he provides rational, that’s lunacy. Is he suggesting nhl pbp guys incorrectly record shot attempts that miss the net as actual shots that frequently? I don't buy it
 

1Knee1T

Registered User
Jun 29, 2008
3,401
126
I think it's much more broad. He's throwing out like 14% of shots a game. That's absurdly high. I'd love transparency of what is or is not a shot, but we'll never get that.

Unless he provides rational, that’s lunacy. Is he suggesting nhl pbp guys incorrectly record shot attempts that miss the net as actual shots that frequently? I don't buy it

From The Athletic article

"No shots is interesting. Those are the shots that the NHL says hit the net but either they hit shin pads or just didn’t get to the goalie. The biggest sample size we have in our database is clear-sighted shots. The second biggest is no shots. What we’re doing for teams is basically cleansing the numbers that they can see and refining them.

What I’ve really come to understand from doing this project is that we can’t live by what’s counted for as shots on goal in an official capacity. The only way you can be 100 percent accurate is if you have two sets of eyes on every game — live on the game and then another person on the computer reconciling what’s coming in. We have first, second and third looks at every chance. Our numbers will never be the same as the NHL’s. And we’re OK with that because we’re doing the due diligence that needs to be done. This affects shooting percentages, save percentages, things that are coming into play when contracts are being discussed."
 

SA16

Sixstring
Aug 25, 2006
13,287
12,578
Long Island
Unless he provides rational, that’s lunacy. Is he suggesting nhl pbp guys incorrectly record shot attempts that miss the net as actual shots that frequently? I don't buy it

I don't think he's saying they incorrectly record shots (they do though and it varies a lot by arena).

I think he is saying that he disagrees with what the definition of a SOG should be and he is basing all his stats on Vally-Shots. I think what he is going for is a vSOG has to actual be an attempted shot at the net - not a pass that ends up hitting the goalie, not a dump in from long range that ends up hitting the goalie, not a random spin around at the blue line and throw the puck in deep but it ends up at the net. Something where the actual intent of the play is to shoot on the goalie trying to score or get a rebound. I don't really have a problem with that.
 
Last edited:
  • Like
Reactions: GodlyRangers

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
I need to see exact instances of what the NHL is recording as a shot that Vally is throwing out. There is no way 14% of shots are crap. Until then, I'm going to believe that he's doing it subjectively and to his liking. I feel like it's fully within my rights to be critical/skeptical, but I could be wrong :dunno:
 

nyr__1994

Registered User
Apr 4, 2006
709
172
Raleigh, NC
The problem is that this is impossible. If he has 5 trackers, you're going to get 5 different variations of what a "No shot" is. You poll 5 different NHL coaches, and you're going to get 5 different examples of what is or is not a scoring chance.

I agree with you here, and also this is where my skepticism regarding the shot metrics and xG models comes from. They take a part of the game that is very subjective (gray) and try to paint it black or white. You can do a lot of things with numbers, just not sure these numbers are there yet....

It stareted with +/-, moved to Corsi/Fenwick, and has now moved to xG models. Each one better than the last, but IMHO are still not anywhere close to where they need to be as far as building a team around like they are used in baseball.
 
  • Like
Reactions: nyr2k2

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
I agree with you here, and also this is where my skepticism regarding the shot metrics and xG models comes from. They take a part of the game that is very subjective (gray) and try to paint it black or white. You can do a lot of things with numbers, just not sure these numbers are there yet....

It stareted with +/-, moved to Corsi/Fenwick, and has now moved to xG models. Each one better than the last, but IMHO are still not anywhere close to where they need to be as far as building a team around like they are used in baseball.
I'm re-reading this post and the tone that's coming off it is terrible. I promise that isn't my intention. Maybe it's just the way I'm reading it, but if this comes off annoyingly terrible, it's not my intention.

But it falls back to the same thing. Just because they aren't perfect does not mean that we throw them out completely. xG is the best thing we have right now. It's not perfect, but it is scientifically, objectively, the best predictor metric we have. So. Let's use it. Right?

e: It's the same conversation. No singular metric is perfect. A lot of metrics working together, including whatever eye-test inputs you use, is probably the best thing to do. No one actually analyzes hockey only off metrics and nothing else.

I'm not watching the games this year, because I ****ing hate this team. I'm also not posting in any threads but this one, because I have no right to speak about this team, because I haven't been watching.
 
Last edited:

nyr__1994

Registered User
Apr 4, 2006
709
172
Raleigh, NC
I'm re-reading this post and the tone that's coming off it is terrible. I promise that isn't my intention. Maybe it's just the way I'm reading it, but if this comes off annoyingly terrible, it's not my intention.

But it falls back to the same thing. Just because they aren't perfect does not mean that we throw them out completely. xG is the best thing we have right now. It's not perfect, but it is scientifically, objectively, the best predictor metric we have. So. Let's use it. Right?

e: It's the same conversation. No singular metric is perfect. A lot of metrics working together, including whatever eye-test inputs you use, is probably the best thing to do. No one actually analyzes hockey only off metrics and nothing else.

I'm not watching the games this year, because I ****ing hate this team. I'm also not posting in any threads but this one, because I have no right to speak about this team, because I haven't been watching.

Use it? Absolutely! But to preach about it like it is gospel is where I have the issue. You don't do this, and you take the time to explain your thought process. I don't always agree with it/you, but appreciate the time you take to talk about these numbers. There are just too many times where it comes across that people are speaking in absolutes about some of these numbers when in reality the world of analytics in hockey has a long way to go.
 
  • Like
Reactions: Harbour Dog

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
Use it? Absolutely! But to preach about it like it is gospel is where I have the issue. You don't do this, and you take the time to explain your thought process. I don't always agree with it/you, but appreciate the time you take to talk about these numbers. There are just too many times where it comes across that people are speaking in absolutes about some of these numbers when in reality the world of analytics in hockey has a long way to go.
Yeah, I feel, but at the same time it's a double-standard.

You can't expect a thesis in every post, so someone might just boil down their thought process to like: "Pionk was bad tonight, had a -20% relxGF%". There's nothing wrong with that post, but people will attack it because it has the number in it. Obviously, or at least it should be obvious, this poster is not only saying Pionk had a bad game because of that, but they're using it as validation.

Meanwhile, there are hundreds of posts that are like "Pionk was exceptional tonight, he looked explosive" - and no one gives a shit to call it out because it doesn't have a number in it.
 

SA16

Sixstring
Aug 25, 2006
13,287
12,578
Long Island
I need to see exact instances of what the NHL is recording as a shot that Vally is throwing out. There is no way 14% of shots are crap. Until then, I'm going to believe that he's doing it subjectively and to his liking. I feel like it's fully within my rights to be critical/skeptical, but I could be wrong :dunno:

I'm not sure about this. I've done research into how different arenas handle shots and blocks for DFS purposes and there's an absolutely huge difference in how different teams score them. I think it's over a 14% difference from the most active to lease but I don't have my numbers with me.
 
  • Like
Reactions: Mac n Gs

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
I'm not sure about this. I've done research into how different arenas handle shots and blocks for DFS purposes and there's an absolutely huge difference in how different teams score them. I think it's over a 14% difference from the most active to lease but I don't have my numbers with me.
Seems crazy to me.
 

nyr2k2

Can't Beat Him
Jul 30, 2005
45,660
32,732
Maryland
Yeah, I feel, but at the same time it's a double-standard.

You can't expect a thesis in every post, so someone might just boil down their thought process to like: "Pionk was bad tonight, had a -20% relxGF%". There's nothing wrong with that post, but people will attack it because it has the number in it. Obviously, or at least it should be obvious, this poster is not only saying Pionk had a bad game because of that, but they're using it as validation.

Meanwhile, there are hundreds of posts that are like "Pionk was exceptional tonight, he looked explosive" - and no one gives a **** to call it out because it doesn't have a number in it.
There is a perception among many that posters check the numbers after the game and arrive at their conclusions on how a player played not because of what the poster actually watched, but because of a specific statistic they see afterwards. As in, someone watches the game, enjoys it by just generally watching what's going on and not really focusing independently on who is having a good or bad game at controlling/preventing shot generation, and then afterwards will review the statistical performances of players and base their evaluation on that. So the feeling is that some posters aren't using the numbers as validation of their observations; they're just using the numbers, and that's it.

I can't count how many times I've seen a situation where most posters in the PGT seem to think so-and-so played great, only to have a handful of posters near-simultaneously come in with, "He didn't play great his xxxxx% sucked ass." And then the board lets out a giant sigh and we start accusing each other of watching games or not watching games or whatever. I just wish we could get to a point where people could acknowledge that what we see with our eyes does not always correlate with the numerical outputs, and that's okay, and not condemn each other for being overly-reliant on our eyes or overly-reliant on the numbers.

The whole thing is tiring. And this isn't a condemnation of the stats community, as the "other side" is just as guilty of being unreasonably stubborn.
 

sbjnyc

Registered User
Jun 28, 2011
5,933
1,997
New York
I don't think he's saying they incorrectly record shots (they do though and it varies a lot by arena).

I think he is saying that he disagrees with what the definition of a SOG should be and he is basing all his stats on Vally-Shots. I think what he is going for is a vSOG has to actual be an attempted shot at the net - not a pass that ends up hitting the goalie, not a dump in from long range that ends up hitting the goalie, not a random spin around at the blue line and throw the puck in deep but it ends up at the net. Something where the actual intent of the play is to shoot on the goalie trying to score or get a rebound. I don't really have a problem with that.
The issue with this is if that dump in actually winds up in the net, it's a goal and therefore a shot on goal. I don't think there is a problem removing those attempts but then you just have to make sure that any resulting goals also get excluded (there can't be too many of them).
 

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
There is a perception among many that posters check the numbers after the game and arrive at their conclusions on how a player played not because of what the poster actually watched, but because of a specific statistic they see afterwards. As in, someone watches the game, enjoys it by just generally watching what's going on and not really focusing independently on who is having a good or bad game at controlling/preventing shot generation, and then afterwards will review the statistical performances of players and base their evaluation on that. So the feeling is that some posters aren't using the numbers as validation of their observations; they're just using the numbers, and that's it.

I can't count how many times I've seen a situation where most posters in the PGT seem to think so-and-so played great, only to have a handful of posters near-simultaneously come in with, "He didn't play great his xxxxx% sucked ass." And then the board lets out a giant sigh and we start accusing each other of watching games or not watching games or whatever. I just wish we could get to a point where people could acknowledge that what we see with our eyes does not always correlate with the numerical outputs, and that's okay, and not condemn each other for being overly-reliant on our eyes or overly-reliant on the numbers.

The whole thing is tiring. And this isn't a condemnation of the stats community, as the "other side" is just as guilty of being unreasonably stubborn.
I talk to a lot of these posters outside of HF and I guarantee you they're watching the game because I see the real-time thoughts that they have.

I think this perception that there are people who only use the numbers is because there are posters who use the numbers a lot. But I know that they're watching.

And I agree with you, it's tiring. But I bet if any of these guys stopped watching the games, they'd stop posting about the games and the players, like I have, because watching the game is an input that holds weight.
 

nyr2k2

Can't Beat Him
Jul 30, 2005
45,660
32,732
Maryland
I talk to a lot of these posters outside of HF and I guarantee you they're watching the game because I see the real-time thoughts that they have.

I think this perception that there are people who only use the numbers is because there are posters who use the numbers a lot. But I know that they're watching.

And I agree with you, it's tiring. But I bet if any of these guys stopped watching the games, they'd stop posting about the games and the players, like I have, because watching the game is an input that holds weight.
I wasn't saying they don't watch--to the contrary, they do watch. I have no doubt. I just get the sense that there are quite a few people who watch the games, come out of it without a strong opinion on the performance of a given player, and then develop that opinion after the fact only once they consult the numbers. It's like, "I don't really know what I saw regarding Player X, seemed pretty uneventful. Numbers, tell me what I saw!"

If people are livestreaming their disdain for Neal Pionk or Jimmy Vesey or whoever throughout the game and then support that with some statistics afterwards, great. No complaints for me. But I know quite a few people who actively participate in the GDT, say nothing about a particular guy, then start trashing him with everything they can find from NaturalStatTrick as soon as the game ends.

That's all. I'm not a "watch the games, nerd" guy, you know? I know everyone is watching.
 

Ad

Upcoming events

Ad

Ad

-->