2017-18 stats and underlying metrics thread [Mod: updated season]

Status
Not open for further replies.

Whileee

Registered User
May 29, 2010
46,083
33,165
Aw, you made me go to Corsica and hit F5 a hundred times... :)

Jets 5v5 xGF% 47.94 (22nd)
Jets 5v5 Adjusted xGF%: 47.93 (21st)

Looking at the numbers the Jets do appear to be limiting dangerous chances - their xGA/60 is 11th best at 2.18. But their xGF/60 is 27th at 2.00 (5v5 adjusted).

On the PK the Jets xGA/60 is worst in the league by a mile: 4.04 - that's 24% worse than the next worst team at 3.26. That 0.78 gap is the largest between any two teams. In fact there are 15 teams between the 2nd worst team and the next team that's 0.78/60 below them...sorry if that's a little Garretish - let me rephrase: The gap between the Jets PK and the 30th ranked xGA/60 PK is the same as the gap between the 30th worst PK and the 17th worst.
Some residual adjustment for score effects might be warranted considering the remarkable amount of time the Jets have played with leads, and especially 2+ goal leads.
 

SensibleGuy

Registered User
Nov 26, 2011
12,283
8,404
jeez. There's obviously a LOT of randomness in hockey. However, it isn't ALL random. If it was, it wouldn't matter what players you put on the ice...just grab any number of guys who can skate and let them go at it. Even if hockey is 95% random chance and 5% a result of measurable stuff, the only way you can get an advantage over everyone else is to try and figure out that measurable stuff and make use of it. That's the point of statistical analysis...so stating that stats can't be applied to random chance totally misses the point since the stats aren't concerned with the random stuff. They are concerned with the stuff (however tiny the effect of that stuff may be) that occurs as a result of non-random factors such as PLAYER ABILITIES! :D
 

Jimby

Reformed Optimist
Nov 5, 2013
1,428
441
Winnipeg
Very interesting comment by Maurice today in response to a question about the shots against that the Jets have been giving up. He said that what he looks at is the shot quality and the expected goals against "for those interested in analytics". Also interesting that most of the Central division teams including Winnipeg are in the group with the best xGA numbers. So, clearly he is running the team to compete in the division and is aware of the analytics.
 
Last edited:
  • Like
Reactions: YWGinYYZ

Jetfaninflorida

Southernmost Jet Fan
Dec 13, 2013
15,698
18,988
Florida
W
Very interesting comment by Maurice today in response to a question about the shots against that the Jets have been giving up. He said that what he looks at is the shot quality and the expected goals against "for those interested in analytics". Also interesting that most of the Central division teams including Winnipeg are in the group with the best xGA numbers. So, clearly he is running the team to compete in the division and is aware of the analytics.

Wow. This post has validated my motto for this season. That being

Screw the Corsi, win the Scorsi!
 
  • Like
Reactions: Dayofthedogs

mcpw

WPG
Jan 13, 2015
10,024
2,072
Note that xG, unlike Corsi/Fenwick/etc, doesn't have a universal definition. The idea of xG that we know is to weigh shot attempts (Corsi) by context -- shot location, shot type, maybe shooter talent or maybe not depending on the version, possibly something else. There were two public versions of xG (DTMaboutHeart and Corsica), now one of them is gone, and Corsica remains. It's absolutely possible that the Jets have their own version of xG and they don't just look at the Corsica numbers. Maybe they only use available play by play data (like Corsica does), maybe they have trackers who do extra work on top of that, maybe they pay an external company to do extra work on top of that, maybe they don't do any work and just pay an external company to provide their version of xG. We don't know. We now know that they might be familiar with the concept of xG (but not whether they use it as a predictive metric). Nothing more.

Points% isn't "noise", it's the key result. GF% is just a predictor of results.
And using 5v5CF% to try to determine Points% is fundamentally flawed. Special teams as noise is bad.
 

Whileee

Registered User
May 29, 2010
46,083
33,165
Note that xG, unlike Corsi/Fenwick/etc, doesn't have a universal definition. The idea of xG that we know is to weigh shot attempts (Corsi) by context -- shot location, shot type, maybe shooter talent or maybe not depending on the version, possibly something else. There were two public versions of xG (DTMaboutHeart and Corsica), now one of them is gone, and Corsica remains. It's absolutely possible that the Jets have their own version of xG and they don't just look at the Corsica numbers. Maybe they only use available play by play data (like Corsica does), maybe they have trackers who do extra work on top of that, maybe they pay an external company to do extra work on top of that, maybe they don't do any work and just pay an external company to provide their version of xG. We don't know. We now know that they might be familiar with the concept of xG (but not whether they use it as a predictive metric). Nothing more.


And using 5v5CF% to try to determine Points% is fundamentally flawed. Special teams as noise is bad.
I think it's entirely possible that the Jets and other teams have different models to assess xG, and deploy more measurements to augment their models. Corsica is constrained in terms of input variables, moreso than teams are. Chayka's analytics company includes a lot of variable measurement, in addition to the statistical analyses.

The point of my analyses was to compare the association between Corsi and results over time. Special teams play a role and create noise, but there's no reason to believe that they create more noise recently than they did several years ago. The gist of the observation is supported with GF%. The strength of the correlation between team CF% and results (points or GF%) is much weaker in 2014-16 than it was in 2008-10, an era when many of the initial assessments of Corsi were done.
 

bebob

HFBoards Sponsor
Sponsor
Apr 8, 2015
19
44
Winnipeg

As with all predictive statistics, your degree of certainty is relative to both sample size.

What happens when you include last season?[/QUOTE]

But what if a young player is still developing and improving from season to season or an older player is declining? In those cases wouldn’t an increased sample size, I.e. including the prior year, reduce the correlation to future results?
 

Aavco Cup

"I can make you cry in this room"
Sep 5, 2013
37,630
10,440
5on5 CF% would be a perfectly fine win prediction tool if everyone's special teams and goaltending was the same.
 

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
Very interesting. Do these relationships hold at the same level of correlation for later samples (these are from the 2007-2012 seasons). Some have indicated more parity (at the team level) in shot metrics. I wonder if things are changing at the individual level, too.

Here's another interesting tidbit on the Jets. According to this article, by far the strongest variable associated with individual goal production (iG/60) is expected goals (ixG/60). Expected Goals are a better predictor of future scoring than Corsi, Goals

Here are the Jets' current top players in ixG/60 (according to Corsica Hockey) (I left out Perreault due to very small sample size).

Jets 5v5 ixG/60...

Laine 0.91
Wheeler 0.87
Connor 0.67
Tanev 0.63 :eek:
Armia 0.60
Scheifele 0.59
Matthias 0.58 (lol)
Ehlers 0.52
Copp 0.47
Little 0.42 :help:
Lowry 0.37

newplot-1.png

1) It varies year to year because you are predicting something highly volatile: goals. We’ve had some recent strong years and recent weak years. It does not appear to be a trend... yet.

2) Corsica’s xGoal model is not Dawson’s model.
 

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
Hmmmm....

Why has the correlation between xGF% and Points% increased so dramatically between 2008-10 and 2014-16? This mirrors a substantial decline in the correlation between CF% and Points%. Is this just poor measurement, or have NHL teams started to focus more on the quality of shots rather than shot volume as a pathway to success?

(data from Corsica Hockey, and xGF% data are 5v5 and adjusted for venue and score).

View attachment 82741

I have, yes.
Corsi’s strongest year was 2013-14.
Corsi’s weakest year was 2009-10.
It’s highly volatile year to year.

I would say you are seeing heads flipped three times, despite it even tails prior, and calling shenanigans.

No offense, but your analysis on this seems more confirmation bias than actual confirmation of your hypothesis.

Ex: you could just be focusing on xG seasons manny trained his model on.

NOTE FOR EVERYONE:
Dawson’s (Dtmaboutheart and hockey graphs), the old xG on Corsica, and the new xG on Corsica 2.0 are 3 unique xG models.
 
Last edited:
  • Like
Reactions: Grind

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
i don’t know what’s so hard... Corsi is a part of the picture but not the whole picture...

It’s better than a lot that’s out there because that part of the game that is real and important.

It’s not everything because it was never intended or used as everything.

It’s not a war but you can’t ignore it either.
 
Last edited:
  • Like
Reactions: Mathmew Purrrr Oh

Whileee

Registered User
May 29, 2010
46,083
33,165
I have, yes.
Corsi’s strongest year was 2013-14.
Corsi’s weakest year was 2009-10.
It’s highly volatile year to year.

I would say you are seeing heads flipped three times, despite it even tails prior, and calling shenanigans.

No offense, but your analysis on this seems more confirmation bias than actual confirmation of your hypothesis.

Ex: you could just be focusing on xG seasons manny trained his model on.

NOTE FOR EVERYONE:
Dawson’s (Dtmaboutheart and hockey graphs), the old xG on Corsica, and the new xG on Corsica 2.0 are 3 unique xG models.
I didn't try to "confirm" anything. I raised a hypothesis. Maybe you could point in the direction of your work, or others', that indicate that there is constancy in the relationship between Corsi and results. It has nothing to do with "shenanigans". A bit surprised at the defensive tone, to be honest.
 

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
I didn't try to "confirm" anything. I raised a hypothesis. Maybe you could point in the direction of your work, or others', that indicate that there is constancy in the relationship between Corsi and results. It has nothing to do with "shenanigans". A bit surprised at the defensive tone, to be honest.

You are making a potentially dangerous mistake of making a trend off of a short streak of something highly volatile.

No defensive tone. I am merely here to help.
 

castle

Registered User
Dec 2, 2011
2,263
922
Australia
Garret, you might know this, or at least I presume you'd have a clue. are most models still garden variety linear models on presumed normally distributed outcome variables? I tended to discount or ignore many of the older models because of this. Using a generalized model with the proper link function and distribution might perform better. It's also been unclear whether folks are looking at model diagnostics, even things as simple as residual plots or q-q plots, or heteroskedasticity. Is this where the proprietary models have gone or is the difference with them mostly about quality of input variables?

Are people looking into spline functions, or at least pooynomials to test curvilinear relationships? How about accounting for clustering of observations with generalized estimating equations, or cross classified mixed models?

To me, these have always been obvious things, but I've come to think that the majority of the hockey stats guys are really working with rather blunt tools. From what I've seen the use of the word 'advanced' is wildly unwarranted but I'm hoping some of the secret stuff going on is really thinking about the true nature of the distribution(s) they are trying to model and how the values on the input observations are generated.
 

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
Garret, you might know this, or at least I presume you'd have a clue. are most models still garden variety linear models on presumed normally distributed outcome variables? I tended to discount or ignore many of the older models because of this. Using a generalized model with the proper link function and distribution might perform better. It's also been unclear whether folks are looking at model diagnostics, even things as simple as residual plots or q-q plots, or heteroskedasticity. Is this where the proprietary models have gone or is the difference with them mostly about quality of input variables?

Are people looking into spline functions, or at least pooynomials to test curvilinear relationships? How about accounting for clustering of observations with generalized estimating equations, or cross classified mixed models?

To me, these have always been obvious things, but I've come to think that the majority of the hockey stats guys are really working with rather blunt tools. From what I've seen the use of the word 'advanced' is wildly unwarranted but I'm hoping some of the secret stuff going on is really thinking about the true nature of the distribution(s) they are trying to model and how the values on the input observations are generated.

The short version of the answer is:
1) I cannot speak for every model but I can for some.
2) Some of this stuff has been looked at but typically it isn't what you talk about on blog posts due to the audience. To take Hockey Graphs as an example, we tend to talk about this in the Slack group before we publish our work, but keep things simple for the actual blog posts.
3) You are right that this kind of stuff is generally looked at a lot more in proprietary work (like with HockeyData), although some of the public win probability models do have some of this. I will admit that we could do a lot better here (I guess for me this is more talking in past tense as I'm out of the public sphere). I honestly have been fighting for a while just to use confidence intervals more... but the more mathy you make things seem, the more scared/nervous people seem to get (both public and also decision makers in hockey)... and unfortunately people are afraid of not being accepted more than being slightly worse... as stupid as that seems.
4) There are real marginal and significant (in statistical sense) gains made by using better methods.
 

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
Oops forgot to also answer this part of your question:

I would say the improved quality and granularity of data has improved things (ex: passing data for improved shot quality model), but I feel methodology has been the larger difference between public and proprietary.

A great deal of public analysis is what I call "eye testing the numbers" where they look at different, yet very important inputs --like shot quantity (Corsi), quality (xGoals), finishing (qualitative/eyetest and p/60), etc.-- and then mix as they feel is best weighting in their gut.

Take the same methods and input the more granular information and you don't get much better. Take the superior methods and input the less granular information and you don't get a huge step back.
 

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
I think Manny's write ups are probably the most "sophisticated" (I just don't like that word) write ups you'd probably find in non-academic public work...

The Art Of WAR | Corsica
http://www.corsica.hockey/misc/K_Manuscript.pdf
On Salad And Predicting Hockey Games | Corsica
Probabilistic Forecasting And Weighting Of Three Star Selections | Corsica

Dawson's write ups dipped a bit into sophisticated methodologies but didn't go as in depth into them (and are also gone from the interwebs).

^ This isn't an answer to previous questions... but I thought nerds may enjoy reading this content.
 

Gm0ney

Unicorns salient
Oct 12, 2011
14,682
13,610
Winnipeg
Thanks Gomney!

Not that I want to make you F5 yourself to death but I'm curious about our PP.

I'm guessing our numbers look better excluding games 1 and 2 which at this point I consider outliers.

Not all doom and gloom but the PK #'s are terrifying.
Jets 5v4 xGF/60: 6.9 (17th)

I think other than the troubles the Jets have had getting into the zone on the powerplay it's looking decent.
 

Aavco Cup

"I can make you cry in this room"
Sep 5, 2013
37,630
10,440
Thanks for dropping by Garret. You're always welcome here. It's always nice to have some guidance from the "inside" Exiting times for the world you're in but it is getting harder for the masses to keep up with the changes and trust the "formulas"
 

Whileee

Registered User
May 29, 2010
46,083
33,165
You are making a potentially dangerous mistake of making a trend off of a short streak of something highly volatile.

No defensive tone. I am merely here to help.
If 3 seasons is a "short streak" for a metric's performance, then "volatile" might be an understatement. I'll take your word for it that the relationship between Corsi and results is unchanging, but always prefer to see empiric results.

I think the term "confirmation bias" is overused as a thinly disguised argumentum ad hominem. Use data to address data. I'm completely open to evidence.

Author of this article is a bit agnostic about the changing relationships between Corsi and goals, with goal parity appearing to be on the increase and Corsi parity appearing to be on the decrease. I don't think the hypothesis is outlandish.

Stop Worrying About Shot Parity
 
Last edited:

Whileee

Registered User
May 29, 2010
46,083
33,165
You are making a potentially dangerous mistake of making a trend off of a short streak of something highly volatile.

No defensive tone. I am merely here to help.
...by the way, the sample size isn't 3 vs. 3 coin flips, it's 90 team-seasons vs. 90 team-seasons. Still, I agree that one legitimate hypothesis is that the difference in correlations is due to random error. I didn't do the math to test the statistical significance of the difference between correlations, but just eyeballing the numbers I wouldn't be surprised if the p values were pretty small.
 

Jimby

Reformed Optimist
Nov 5, 2013
1,428
441
Winnipeg
An observation from a fan, prompted by some here on HF recently stressing over the Jets CF% and wondering how much stock to put in CF% being predictive... It is probably reasonable that if CF% is USEFULLY predictive that a team's CF% on Jan 1 should give some reliable idea how the team will finish the year. Over the last 3 years fully one third of NHL teams end up in or out of the playoffs contrary to what their CF% suggested. The Rangers made the playoffs all 3 years despite having poor CF%. Maybe they do that every year - I dunno as I only went back 3 years.

Now, to me being wrong one third of the time is huge. I would like the margin of error to be more like a poll where the margin of error is "plus or minus 3% 95% of the time". We wouldn't pay much attention to a polling company that was dead wrong one third of the time. As many have stated, the analytics community has moved on and for good reason. There are new stats like xGF% in town. I will pay more attention to those and less to CF%.
 
Status
Not open for further replies.

Ad

Upcoming events

Ad

Ad