People have been talking a lot about advanced stats this offseason, and as a fan of them I want to help shed more light on how much faith you should put in them.
What I've done is put all of the major advanced stats (for teams) I could find into one spreadsheet, then run correlation algorithms across them to see how significant they are. I've included corsi, fenwick, goals for and against, zone starts, PP/PK%, etc. I'll put this warning in the top to, I am not a statistician or anything like that. I just enjoy statistics, excel, and hockey.
Generally, a value under 0.10 (and above -0.10), is considered statistically insignificant and no correlation. 0.25 is slight correlation, 0.5 is medium correlation, 0.75 is strong correlation, and 0.90+ is near perfect.
Remember that correlation does not equal causation!The standard example is this: Shark attacks and ice cream sales are positively correlated. In the weeks/months when ice cream sales go up, so do shark attacks. One might wrongly assume causation, but that is not the case. Both are cause by another factor, warm weather, but the stats don't tell us that.
First, my spreadsheet: https://docs.google.com/spreadsheets/d/1rMNFoELTUTwe7cqFLs7LLcW_XTECH1Rg5m2xtOQtCz8/edit?usp=sharing
I think the most important is correlating to win percentage, so I'll post those findings. For others, look at the sheet.
The strongest correlations to win% is goals for (0.853), goals against (-0.817), goal for % (0.974), shots for (0.517), shooting % (0.799), save % (0.704), PDO (0.884), power play % (0.601), penalty kill % (.-0.462), power play goals for and against (0.513 and -0.512). Interesting note is all of the ones with an offensive and defensive side, the offensive side correlates a little higher. This means than in the NHL today, offense correlates more with wins than defense. It's also amazing how high PDO and sh, sv% is.
As far as corsi and fenwick go, fenwick (FF% 0.440) does correlate higher to win % than does corsi (CF% 0.346). But there is some interesting things going on. FF/60 is 0.402 and CF/60 is 0.422, fairly close but an edge to corsi. However with against: FA/60 is -0.325 (again we see defensive stats having a smaller impact) and CA/60 is -0.095, or statistically insignificant. This shows that total shot attempts against doesn't matter that much, but teams that block more win a higher percentage of games.
Other stats that are not particularly important are: zone stats (0.251 at the highest for neutral, only 0.162 for offensive), average age (0.124, interesting because there is a slight bias for older teams), total goals per game (0.186), power plays for and against (-0.073 and -0.158), shorthanded goals (0.089), penalty minutes per game for and against (-0.108 and 0.028). I think the ones that would surprise people most are the ones about penalties. The number of penalties taken and given is statistically insignificant to winning games, what is important is capitalizing on your opportunities and killing your opponents.
There is also a strong link between shooting percentage and save percentage (0.453). This is where we have to look at correlation not being causation. Is it because teams with high shooting percentage can play more defensively and help eliminate rebounds? Is it because teams with good goalies (and therefor higher sv%) can lean on the goalies and take more risks offensively? This is where advanced stats lets us down and you need people to actually watch the games. The "eye test" as it were.
TL;DR:
What I've done is put all of the major advanced stats (for teams) I could find into one spreadsheet, then run correlation algorithms across them to see how significant they are. I've included corsi, fenwick, goals for and against, zone starts, PP/PK%, etc. I'll put this warning in the top to, I am not a statistician or anything like that. I just enjoy statistics, excel, and hockey.
What is a Correlation Coefficient?
For those of you who don't know what a correlation coefficient is, I will explain it briefly. It is an algorithm that compares two sets of data and gives you a number between -1 and 1. A value of 1 means the two data sets perfectly correlate, for example wins and win%, this means when win% goes up, so does number of wins. A value of -1 means the data sets are perfectly negatively correlated, an example of this is wins and losses, when a team has more wins, their losses go down. A value of 0 means there is no correlation between the data at all.Generally, a value under 0.10 (and above -0.10), is considered statistically insignificant and no correlation. 0.25 is slight correlation, 0.5 is medium correlation, 0.75 is strong correlation, and 0.90+ is near perfect.
Findings
Remember that correlation does not equal causation!
First, my spreadsheet: https://docs.google.com/spreadsheets/d/1rMNFoELTUTwe7cqFLs7LLcW_XTECH1Rg5m2xtOQtCz8/edit?usp=sharing
I think the most important is correlating to win percentage, so I'll post those findings. For others, look at the sheet.
The strongest correlations to win% is goals for (0.853), goals against (-0.817), goal for % (0.974), shots for (0.517), shooting % (0.799), save % (0.704), PDO (0.884), power play % (0.601), penalty kill % (.-0.462), power play goals for and against (0.513 and -0.512). Interesting note is all of the ones with an offensive and defensive side, the offensive side correlates a little higher. This means than in the NHL today, offense correlates more with wins than defense. It's also amazing how high PDO and sh, sv% is.
As far as corsi and fenwick go, fenwick (FF% 0.440) does correlate higher to win % than does corsi (CF% 0.346). But there is some interesting things going on. FF/60 is 0.402 and CF/60 is 0.422, fairly close but an edge to corsi. However with against: FA/60 is -0.325 (again we see defensive stats having a smaller impact) and CA/60 is -0.095, or statistically insignificant. This shows that total shot attempts against doesn't matter that much, but teams that block more win a higher percentage of games.
Other stats that are not particularly important are: zone stats (0.251 at the highest for neutral, only 0.162 for offensive), average age (0.124, interesting because there is a slight bias for older teams), total goals per game (0.186), power plays for and against (-0.073 and -0.158), shorthanded goals (0.089), penalty minutes per game for and against (-0.108 and 0.028). I think the ones that would surprise people most are the ones about penalties. The number of penalties taken and given is statistically insignificant to winning games, what is important is capitalizing on your opportunities and killing your opponents.
PDO
I have long been a fan of PDO, and this solidifies it. PDO's correlation coefficient to wins is an astounding 0.884. It has a higher correlation than all other stats except one (goals for percentage). So the question becomes, how do teams increase their PDO? Well, lets look at stats that correlate high with it. The one that stands out to me is shots for and against. Shots for (and SF/60) are 0.294, which is a slight to medium correlation, but shots against is 0.016 with is essentially no correlation at all. This means that teams that shoot more also have higher shooting percentages (sh% to shots for is 0.267), but teams that allow fewer shots don't necessarily give up a lower percentage of goals (in fact the opposite is true, sv% to SA is 0.134 meaning more shots against is very slightly correlated with higher sv%). There is also a strong link between shooting percentage and save percentage (0.453). This is where we have to look at correlation not being causation. Is it because teams with high shooting percentage can play more defensively and help eliminate rebounds? Is it because teams with good goalies (and therefor higher sv%) can lean on the goalies and take more risks offensively? This is where advanced stats lets us down and you need people to actually watch the games. The "eye test" as it were.
TL;DR:
- Advanced stats can tell you a lot, when you understand how to use them.
- Corsi is a little over used, and some people probably trust it too much.
- Total number of pp and pk opportunities matter a lot less than the percentage you convert (or stop) those opportunities.
- PDO is the the best stat when it comes to predicting wins.