Warning long post:
So I spent some time working on NHL analysis as a whole, not just the Rangers but wanted to share since it's stats. Particularly what I call "Next Game Analysis". Meaning what do NHL teams score in the next game after a particular game. I used two methods, one just simple cross tabs using pivot tables and one method to get into the nitty gritty, I used regression.
Cross Tab Analysis- I grouped 4+ goal wins and losses as "blowout wins and losses" and 1-3 goal wins and losses as "non-blowout wins and losses" and did a cross tab of the 4 types of scenario margins (of wins and losses) in the rows and 4 types of scenarios for the next game and took the percentage of rows (margins). Throughout the analysis I excluded all final games of seasons due to not having any regular season next games. In the aggregate the pattern that was followed was that blowout losses followed blowout losses at a higher rate than blowout losses followed non-blowout losses, which was at a higher rate that blowout losses following non-blowout wins, and this was in turn at a higher rate than blowout losses following blowout wins.
This is interesting and there are two hypothesis that I had. 1) Momentum and injuries. Maybe teams gain confidence from winning by 5 and win next game by 6 and the opposite with blowout losses. Or teams lose big consecutively because of injuries. 2) Bad teams are the ones that tend to lose big and they're also more likely to lose big in the next game because they're bad. The opposite is true of good teams.
I decided to control for record and broke down the league into 4 quarters. The range of points in the last 10 years go from 52 to 132 (132 is prorated for the 12-13 Blackhawks). The quarters are as follows 52-71 (Q1), 72-91 (Q2), 92-111 (Q3), and 112-132 (Q4). I created the same cross tabs but this time per quarter. And the relationship didn't hold as well as in the aggregate but it seemed to me that it held somewhat throughout.
Next, I decided to do this a bit of a more precise way using regression. I first looked at all point totals with the Next Game as the dependent variable and Margin as the independent variable. The slope I got was 0.031 (every 1 goal increase in current game's margin yields 0.031 increase in the next game's margin). Which is a positive slope meaning the higher the margin of victory or loss, the higher the margin of victory or loss of the next game. The p-value is less than 0.05 for the slope (p-value=0), which makes it statistically significant (roughly speaking 0 percent chance that the slope we get is random).
So, once again this may be because of how good or bad a team is. I controlled for those factors running regressions for point totals for each of the 4 quarters. When differences in points are taken away, the results are all over the place and insignificant. The least significant are the middle tier teams the most are the best and worst teams (which makes sense as they're significantly biased towards winning or losing), the most significant are the top tier teams with a slope of 0.034 and a p-value of 0.1471. Still, insignificant.
So finally I decided I'll run a multiple regression using point totals as one of two factors (margins once again being the other). Now the slope for Margin actually becomes negative but the p-value is off the charts high (0.5623) and insignificant. Points on the other hand are positive (0.0319) and are significant (p-value of 0).
In conclusion, while the cross tabs seem to point at possible momentum swings or injury factors, it appears that any patterns in the following game are related to the strength of the team. Basically, this was probably the safe bet for the hypothesis even before I began the analysis.
If you made it this far thanks for reading. My spreadsheet is attached.
So I spent some time working on NHL analysis as a whole, not just the Rangers but wanted to share since it's stats. Particularly what I call "Next Game Analysis". Meaning what do NHL teams score in the next game after a particular game. I used two methods, one just simple cross tabs using pivot tables and one method to get into the nitty gritty, I used regression.
Cross Tab Analysis- I grouped 4+ goal wins and losses as "blowout wins and losses" and 1-3 goal wins and losses as "non-blowout wins and losses" and did a cross tab of the 4 types of scenario margins (of wins and losses) in the rows and 4 types of scenarios for the next game and took the percentage of rows (margins). Throughout the analysis I excluded all final games of seasons due to not having any regular season next games. In the aggregate the pattern that was followed was that blowout losses followed blowout losses at a higher rate than blowout losses followed non-blowout losses, which was at a higher rate that blowout losses following non-blowout wins, and this was in turn at a higher rate than blowout losses following blowout wins.
This is interesting and there are two hypothesis that I had. 1) Momentum and injuries. Maybe teams gain confidence from winning by 5 and win next game by 6 and the opposite with blowout losses. Or teams lose big consecutively because of injuries. 2) Bad teams are the ones that tend to lose big and they're also more likely to lose big in the next game because they're bad. The opposite is true of good teams.
I decided to control for record and broke down the league into 4 quarters. The range of points in the last 10 years go from 52 to 132 (132 is prorated for the 12-13 Blackhawks). The quarters are as follows 52-71 (Q1), 72-91 (Q2), 92-111 (Q3), and 112-132 (Q4). I created the same cross tabs but this time per quarter. And the relationship didn't hold as well as in the aggregate but it seemed to me that it held somewhat throughout.
Next, I decided to do this a bit of a more precise way using regression. I first looked at all point totals with the Next Game as the dependent variable and Margin as the independent variable. The slope I got was 0.031 (every 1 goal increase in current game's margin yields 0.031 increase in the next game's margin). Which is a positive slope meaning the higher the margin of victory or loss, the higher the margin of victory or loss of the next game. The p-value is less than 0.05 for the slope (p-value=0), which makes it statistically significant (roughly speaking 0 percent chance that the slope we get is random).
So, once again this may be because of how good or bad a team is. I controlled for those factors running regressions for point totals for each of the 4 quarters. When differences in points are taken away, the results are all over the place and insignificant. The least significant are the middle tier teams the most are the best and worst teams (which makes sense as they're significantly biased towards winning or losing), the most significant are the top tier teams with a slope of 0.034 and a p-value of 0.1471. Still, insignificant.
So finally I decided I'll run a multiple regression using point totals as one of two factors (margins once again being the other). Now the slope for Margin actually becomes negative but the p-value is off the charts high (0.5623) and insignificant. Points on the other hand are positive (0.0319) and are significant (p-value of 0).
In conclusion, while the cross tabs seem to point at possible momentum swings or injury factors, it appears that any patterns in the following game are related to the strength of the team. Basically, this was probably the safe bet for the hypothesis even before I began the analysis.
If you made it this far thanks for reading. My spreadsheet is attached.
Last edited: