Advanced Stat Correlations

Solmors

<3 Data
May 3, 2010
2,052
795
San Jose
People have been talking a lot about advanced stats this offseason, and as a fan of them I want to help shed more light on how much faith you should put in them.

What I've done is put all of the major advanced stats (for teams) I could find into one spreadsheet, then run correlation algorithms across them to see how significant they are. I've included corsi, fenwick, goals for and against, zone starts, PP/PK%, etc. I'll put this warning in the top to, I am not a statistician or anything like that. I just enjoy statistics, excel, and hockey.

What is a Correlation Coefficient?
For those of you who don't know what a correlation coefficient is, I will explain it briefly. It is an algorithm that compares two sets of data and gives you a number between -1 and 1. A value of 1 means the two data sets perfectly correlate, for example wins and win%, this means when win% goes up, so does number of wins. A value of -1 means the data sets are perfectly negatively correlated, an example of this is wins and losses, when a team has more wins, their losses go down. A value of 0 means there is no correlation between the data at all.

Generally, a value under 0.10 (and above -0.10), is considered statistically insignificant and no correlation. 0.25 is slight correlation, 0.5 is medium correlation, 0.75 is strong correlation, and 0.90+ is near perfect.

Findings

Remember that correlation does not equal causation!
The standard example is this: Shark attacks and ice cream sales are positively correlated. In the weeks/months when ice cream sales go up, so do shark attacks. One might wrongly assume causation, but that is not the case. Both are cause by another factor, warm weather, but the stats don't tell us that.

First, my spreadsheet: https://docs.google.com/spreadsheets/d/1rMNFoELTUTwe7cqFLs7LLcW_XTECH1Rg5m2xtOQtCz8/edit?usp=sharing

I think the most important is correlating to win percentage, so I'll post those findings. For others, look at the sheet.

The strongest correlations to win% is goals for (0.853), goals against (-0.817), goal for % (0.974), shots for (0.517), shooting % (0.799), save % (0.704), PDO (0.884), power play % (0.601), penalty kill % (.-0.462), power play goals for and against (0.513 and -0.512). Interesting note is all of the ones with an offensive and defensive side, the offensive side correlates a little higher. This means than in the NHL today, offense correlates more with wins than defense. It's also amazing how high PDO and sh, sv% is.

yOMS51j.png


As far as corsi and fenwick go, fenwick (FF% 0.440) does correlate higher to win % than does corsi (CF% 0.346). But there is some interesting things going on. FF/60 is 0.402 and CF/60 is 0.422, fairly close but an edge to corsi. However with against: FA/60 is -0.325 (again we see defensive stats having a smaller impact) and CA/60 is -0.095, or statistically insignificant. This shows that total shot attempts against doesn't matter that much, but teams that block more win a higher percentage of games.

Other stats that are not particularly important are: zone stats (0.251 at the highest for neutral, only 0.162 for offensive), average age (0.124, interesting because there is a slight bias for older teams), total goals per game (0.186), power plays for and against (-0.073 and -0.158), shorthanded goals (0.089), penalty minutes per game for and against (-0.108 and 0.028). I think the ones that would surprise people most are the ones about penalties. The number of penalties taken and given is statistically insignificant to winning games, what is important is capitalizing on your opportunities and killing your opponents.

PDO
I have long been a fan of PDO, and this solidifies it. PDO's correlation coefficient to wins is an astounding 0.884. It has a higher correlation than all other stats except one (goals for percentage). So the question becomes, how do teams increase their PDO? Well, lets look at stats that correlate high with it. The one that stands out to me is shots for and against. Shots for (and SF/60) are 0.294, which is a slight to medium correlation, but shots against is 0.016 with is essentially no correlation at all. This means that teams that shoot more also have higher shooting percentages (sh% to shots for is 0.267), but teams that allow fewer shots don't necessarily give up a lower percentage of goals (in fact the opposite is true, sv% to SA is 0.134 meaning more shots against is very slightly correlated with higher sv%).

There is also a strong link between shooting percentage and save percentage (0.453). This is where we have to look at correlation not being causation. Is it because teams with high shooting percentage can play more defensively and help eliminate rebounds? Is it because teams with good goalies (and therefor higher sv%) can lean on the goalies and take more risks offensively? This is where advanced stats lets us down and you need people to actually watch the games. The "eye test" as it were.

TL;DR:
  • Advanced stats can tell you a lot, when you understand how to use them.
  • Corsi is a little over used, and some people probably trust it too much.
  • Total number of pp and pk opportunities matter a lot less than the percentage you convert (or stop) those opportunities.
  • PDO is the the best stat when it comes to predicting wins.
 

Solmors

<3 Data
May 3, 2010
2,052
795
San Jose
PDO is a luck stat.

Being lucky correlates with winning? Groundbreaking.

I thought that as well before, but the more I look into it the less I believe it, at least over an entire 82 game season. There is a case to be made for that if a team is significantly above or below the norm, CBJ the first half of last season with a 105 PDO for example, but over the course of a full season it tends to average out.

Are you saying it is just happenstance that it has a 0.88+ correlation coefficient? Or are you saying that the teams that have more wins are just luckier than the teams with less?

IMO good teams create their own luck. For example it is well known that rebound shots have a higher shooting percentage. Good teams will be able to pick up more rebounds, and then have higher shooting percentage, which raises PDO.
 

Aladyyn

they praying for the death of a rockstar
Apr 6, 2015
18,113
7,235
Czech Republic
Um, is this looking at wins that already happened? That's kind of pointless, isn't it? We already know which teams won in the past.
 

Machinehead

GoAwayTrouba
Jan 21, 2011
142,212
112,228
NYC
PDO has a high correlation to winning, yes, but it has low repeatability. Sure, good teams create their own luck to an extent, but PDO falls within a range, usually 98-102, that's a fact. Nobody "creates" a 105.

You've found correlation to winning, but correlation to winning isn't the thing to be looking for. The thing to look for is repeatability.

Outscoring the opponent has a 1.00 correlation to winning. The team with more goals won every single game last year. If that was repeatable, everyone would 82-0.

People like corsi and xG because they're extremely repeatable, not because they correlate highly with winning.

The higher something correlates with winning, the less control you tend to have over it, which again, is why nobody goes 82-0.
 

Solmors

<3 Data
May 3, 2010
2,052
795
San Jose
PDO has a high correlation to winning, yes, but it has low repeatability. Sure, good teams create their own luck to an extent, but PDO falls within a range, usually 98-102, that's a fact. Nobody "creates" a 105.

You've found correlation to winning, but correlation to winning isn't the thing to be looking for. The thing to look for is repeatability.

Outscoring the opponent has a 1.00 correlation to winning. The team with more goals won every single game last year. If that was repeatable, everyone would 82-0.

People like corsi and xG because they're extremely repeatable, not because they correlate highly with winning.

The higher something correlates with winning, the less control you tend to have over it, which again, is why nobody goes 82-0.

Which is why I not only said that it was correlates with win percentage and then went on to look for other things that correlate with PDO. Because like you said, you can't directly change it, but maybe you can change something that can in turn raise PDO, which will raise your wins.

Maybe try reading what is written before commenting. You commented 1 minute after it was posted. Unless you can read 1000 words a minute, you went to the TL;DR only.
 

Solmors

<3 Data
May 3, 2010
2,052
795
San Jose
Um, is this looking at wins that already happened? That's kind of pointless, isn't it? We already know which teams won in the past.

No ****, Nostradamus. Its hard to get advanced stats for games that haven't happened yet. Unless you hold the power of seeing into the future. If so, let me know and I'll make a trip to Vegas...
 
  • Like
Reactions: What the Faulk

Machinehead

GoAwayTrouba
Jan 21, 2011
142,212
112,228
NYC
Which is why I not only said that it was correlates with win percentage and then went on to look for other things that correlate with PDO. Because like you said, you can't directly change it, but maybe you can change something that can in turn raise PDO, which will raise your wins.

Maybe try reading what is written before commenting. You commented 1 minute after it was posted. Unless you can read 1000 words a minute, you went to the TL;DR only.

But statistical randomness correlates with PDO stronger than anything. We already know this.

I feel like the idea of manufacturing PDO is already disproven and archaic.
 

me2

Go ahead foot
Jun 28, 2002
37,903
5,595
Make my day.
You could probably get a similar correlation looking at team goalie SV%. This year 12 of the 13 best sv% teams all made the playoffs. 13 of the 16 playoff teams were in the top 15 team sv%, only 3 in the bottom 15 made it.

And that's half of PDO.

TLDR: strong goaltending helps you beat teams with bad goaltending.
 

Ad

Upcoming events

Ad

Ad