Ideas for Future Studies

PepsiCenterMagic

Food is Great
Jul 17, 2013
651
44
Did you end up doing the ML model in the end? :)
Unfortunately, no.

At least not yet.

Like I said above, it's hard to justify the time needed. I found some data from somewhere and did some poking around, trying to see if I could create a proof of concept, and realized the tedious nature of what I wanted to do.

It became tedious quick. And I question whether or not the data I need even is available to the general populous. I say this because if this is a model that is going to update itself everyday, not only do I need season stats per player, but a snapshot of a player's stats after every single game. This way, the model hopefully would be able to find complex relationships between features and time of season, and would be able to adaptively train and predict according to the trends of a progressing season - obviously training on the trends of all progressions of past seasons

And really, that is the concept that really intrigues me. Really, you want to turn to these models because they pick up complex relationships a lot more intuitively than we ever could, and iterate through them with impressive speed. This model would be able to pick up trends of a progressing season - and of course it all depends on what features we would be able to provide as traction - i.e interactions of injuries ~ recuperation time ~ player production behaviors (fast, slower starters) ~ team's intra-season production ~et al.
 
Last edited:

PepsiCenterMagic

Food is Great
Jul 17, 2013
651
44
Unfortunately, no.

At least not yet.

Like I said above, it's hard to justify the time needed. I found some data from somewhere and did some poking around, trying to see if I could create a proof of concept, and realized the tedious nature of what I wanted to do.

It became tedious quick. And I question whether or not the data I need even is available to the general populous. I say this because if this is a model that is going to update itself everyday, not only do I need season stats per player, but a snapshot of a player's stats after every single game. This way, the model hopefully would be able to find complex relationships between features and time of season, and would be able to adaptively train and predict according to the trends of a progressing season - obviously having trained on the trends of the progressions of all past seasons.

And really, that is the concept that really intrigues me. Really, you want to turn to these models because they pick up complex relationships a lot more intuitively than we ever could, and iterate through them with impressive speed. This model would be able to pick up trends of a progressing season - and of course it all depends on what features we would be able to provide as traction - i.e interactions of injuries ~ recuperation time ~ player production behaviors (fast, slower starters) ~ team's intra-season production ~et al.
 

Mashed Potatoes

Registered User
Feb 14, 2015
514
3
Unfortunately, no.

At least not yet.

Like I said above, it's hard to justify the time needed. I found some data from somewhere and did some poking around, trying to see if I could create a proof of concept, and realized the tedious nature of what I wanted to do.

It became tedious quick. And I question whether or not the data I need even is available to the general populous. I say this because if this is a model that is going to update itself everyday, not only do I need season stats per player, but a snapshot of a player's stats after every single game. This way, the model hopefully would be able to find complex relationships between features and time of season, and would be able to adaptively train and predict according to the trends of a progressing season - obviously training on the trends of all progressions of past seasons

And really, that is the concept that really intrigues me. Really, you want to turn to these models because they pick up complex relationships a lot more intuitively than we ever could, and iterate through them with impressive speed. This model would be able to pick up trends of a progressing season - and of course it all depends on what features we would be able to provide as traction - i.e interactions of injuries ~ recuperation time ~ player production behaviors (fast, slower starters) ~ team's intra-season production ~et al.

I see. So what you need is how each players stats progressed historically? Who would have access to the data otherwise you think, teams?

If you change your mind/get free time to do it I would be interested to see how you do it. I'm going to be taking some ML-courses this coming semester and plan on learning more afterwards by doing projects in areas I'm interested in (hockey is of course one).
 

PepsiCenterMagic

Food is Great
Jul 17, 2013
651
44
I see. So what you need is how each players stats progressed historically? Who would have access to the data otherwise you think, teams?

If you change your mind/get free time to do it I would be interested to see how you do it. I'm going to be taking some ML-courses this coming semester and plan on learning more afterwards by doing projects in areas I'm interested in (hockey is of course one).
I'm sure the data exists somewhere, whether accessible or not. It may not be too accessible, at least in the form I described.

However, I'm sure there are other ways of accomplishing the same task than the way I outlined above.

About you taking courses, that sounds great! I am curious to see what/how they teach what they are going to teach.
 

morehockeystats

Unusual hockey stats
Dec 13, 2016
617
296
Columbus
morehockeystats.com
I see. So what you need is how each players stats progressed historically? Who would have access to the data otherwise you think, teams?

If you change your mind/get free time to do it I would be interested to see how you do it. I'm going to be taking some ML-courses this coming semester and plan on learning more afterwards by doing projects in areas I'm interested in (hockey is of course one).

The historical data for simple stats, and for data that can be derived from simple stats (e.g. Corsi, Fenwick) is right there, at NHL.com
 

abo9

Registered User
Jun 25, 2017
9,091
7,184
I should have asked way earlier but I'm still going to try my luck:

Any idea with existing data for a machine learning classification* project?

It's for school and I already have a topic but I would be more invested if it was something interesting and hockey-related!

*Classification as in predicting different categories of outcomes, win/loss predictions can be one example but it does not have to be binary
 

atrud66

Tank Tabarnack
Aug 5, 2014
1,370
1,995
Montreal
I should have asked way earlier but I'm still going to try my luck:

Any idea with existing data for a machine learning classification* project?

It's for school and I already have a topic but I would be more invested if it was something interesting and hockey-related!

*Classification as in predicting different categories of outcomes, win/loss predictions can be one example but it does not have to be binary

I'm a Masters Student studying AI. For an ML course, I took FIFA player and team attributes and used these to create a classifier that predicted wins, losses, and draws. I just quickly browsed Kaggle and I think this is an equivalent dataset for NHL. I'm sure you could create a fun project with a dataset like this: NHL Game Data
 

abo9

Registered User
Jun 25, 2017
9,091
7,184
I'm a Masters Student studying AI. For an ML course, I took FIFA player and team attributes and used these to create a classifier that predicted wins, losses, and draws. I just quickly browsed Kaggle and I think this is an equivalent dataset for NHL. I'm sure you could create a fun project with a dataset like this: NHL Game Data

Thank you! In the end we tried to predict NBA missed/made shots using classification and now I am free so I will play around with this dataset! Maybe try to predict goals or wins.

Using FIFA in-game attributes sounds really interesting! Did you find anything surprising? Or were you able to attain a greater accuracy with this model than others using existing data? Video games simulating precision/attributes always fascinated me!

Similar to your FIFA idea, I tried to use Eastside hockey manager to predict Stanley Cup winners by simulating multiple seasons and extracting the player's stats and team's stats. I ran a bunch of sim's but then realized that I could only do so much since the roster acted as the "model" for the NHL...
 

Chriscftb97

Registered User
Feb 6, 2016
21
8
Toronto
Have there been any studies done on whether blocking shots is a good idea or not? ie tracking where the puck goes after a block and what the aftermath is?

Also, is there any publicly available info on breaking up zone entries?
 
  • Like
Reactions: Bear of Bad News

Merrrlin

Grab the 9 iron, Barry!
Jul 2, 2019
6,768
6,925
Is there a model out there that predicts how many points a player may be able to generate on a different team?

I would be really curious to try to evaluate what kind of numbers a player like Barzal could put up on another team.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,155
14,477
I was very quickly able to come up with a simple formula that predicted 13 of the past 30 Lady Byng winners. Not great, but not bad for a simple two-step calculation. (It was what I was suggesting before - the voters appear to just look at points over a certain threshold, and PIM under a certain threshold).

For 7 of the differences, the actual winner was predicted to finish 2nd, and in four more cases, the actual winner was predicted to finish 3rd or 4th. So a simple formula can predict the actual winner finishing in the top four 24 times over the past 30 seasons.

The biggest misses were Wayne Gretzky in 1999 (undeserved - probably a retirement trophy), Brian Campbell in 2012 (no way of predicting they'd pick a defenseman for the only time in the past 60 years), Ryan O'Reilly in 2014 (for once it looks like they took into account a player's defensive responsibilities, not just his scoring totals).

I'll see if I can refine this a bit further before posting full details.
 
Last edited:

Hockey Outsider

Registered User
Jan 16, 2005
9,155
14,477
One project that I've been working on for a while is looking at what type of offensive player (ie primarily a goal-scorer, primarily a playmaker, or someone who's balanced) performs better in the playoffs.

Let's say you have three players who score 80 points (in 80 games). One of them has 45 goals and 35 assists. Another has 30 goals and 50 assists (very close to average split between goals and assists). And another has 15 goals and 65 assists. Can we tell, looking at the data, whether one type of player is better able to maintain his production in the playoffs?

The results that I have so far suggest that playmakers have the smallest drop in production (but everyone's production drops in the postseason). This appears to hold true going back to 1980, no matter how I slice and dice the data. That surprised me - I would have thought that the balanced scorer, on average, would be more likely to maintain his production, since it's tougher to cover a balanced offensive threat. (As opposed to knowing that it's highly likely that the player will take the shot, or make the pass).

Has anyone else looked into this? Any preliminary theories as to why playmakers hold up better in the postseason?

(EDIT - there's one other theory. I suspect, but don't have any hard data to back this up, that teams rely more on PP scoring in the playoffs. There are more assists per goal on the PP compared to at ES. So this might skew the results - the data might appear to say that playmakers do better in the playoffs, but it might be due to the fact that powerplay scoring, where there are more assists on average, becomes more important).
 

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
One project that I've been working on for a while is looking at what type of offensive player (ie primarily a goal-scorer, primarily a playmaker, or someone who's balanced) performs better in the playoffs.

Let's say you have three players who score 80 points (in 80 games). One of them has 45 goals and 35 assists. Another has 30 goals and 50 assists (very close to average split between goals and assists). And another has 15 goals and 65 assists. Can we tell, looking at the data, whether one type of player is better able to maintain his production in the playoffs?

The results that I have so far suggest that playmakers have the smallest drop in production (but everyone's production drops in the postseason). This appears to hold true going back to 1980, no matter how I slice and dice the data. That surprised me - I would have thought that the balanced scorer, on average, would be more likely to maintain his production, since it's tougher to cover a balanced offensive threat. (As opposed to knowing that it's highly likely that the player will take the shot, or make the pass).

Has anyone else looked into this? Any preliminary theories as to why playmakers hold up better in the postseason?

(EDIT - there's one other theory. I suspect, but don't have any hard data to back this up, that teams rely more on PP scoring in the playoffs. There are more assists per goal on the PP compared to at ES. So this might skew the results - the data might appear to say that playmakers do better in the playoffs, but it might be due to the fact that powerplay scoring, where there are more assists on average, becomes more important).

If it's true that playmakers do better than balanced scorers (which would surprise me as well), then it could be that because space is tougher to come by and possession more contested in the playoffs, players that aren't good at making space, keeping possession, and creating plays have a tougher go of it compared to the regular season. Didn't help Thornton much though.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,155
14,477
Question for statisticians.

Until 92-93 the head of the NHL was called the President. In the Spring of 93 the Montreal Canadians won the Stanley Cup. Gil Stein retired on July 1, 1993

Gary Bettman became the first Commissioner in 93-94. After this spring, from that time forward American teams have won 26 consecutive Stanley Cups. Given the varying number of Canadian Teams in the league over the period what is the probability of this happening?

I realize I'm responding to a post from two years ago, but here's an interesting article from FiveThirtyEight.com - Why Can’t Canada Win The Stanley Cup?

My main takeaways are:
  1. Yes, it's highly improbable that no Canadian team has won the Stanley Cup since 1993
  2. It's somewhat less improbable than it first appears because winning the Stanley Cup isn't a random draw - better teams are more likely to win, and (for various reasons - the article goes into a few of them) there have been few really strong Canadian teams since 1993.
  3. Don't under-estimate randomness in a seven-game series. A Canadian team has lost in game 7 of the Stanley Cup finals four times since then (1994, 2004, 2006, 2011). If the outcome of just four games changes (out of tens of thousands of games played since 1993), we'd be pretty much in line with expectations. (Plus the Nordique/Avalanche franchise - they won a Stanley Cup in the first year after their move to the US).
 
Last edited:

overpass

Registered User
Jun 7, 2007
5,271
2,808
I’d like to look at shorthanded scoring in general and some of the leading shorthanded scorers in history to see:

1. How frequently SHG are scored over the course of a 2 minute penalty. I suspect they may be more common later in the penalty as the opponents best players tend to start the PP, but I’d like to see the data for sure.

2. For the top SH scorers in history, when did they score their SH goals and points over the course of the penalty? Did they tend to be out at the start of the penalty or out near the end?

I could do this work manually with box scores, but if anyone else has already run these numbers, feel free to post them.
 
  • Like
Reactions: Bear of Bad News

billingtons ghost

Registered User
Nov 29, 2010
10,576
6,835
So... I'm trying to get an idea of how we could eventually come up with a statistic for defensive defenseman. Plus/Minus is so maligned these days for its flaws, but when you look at the list of players with top plus/minus historically - it's a pretty good representation of top defensive defensemen and it got me thinking:
What could we actually measure that makes sense in defending?

With Corsi/Fenwick and others we get some clue of zone exists - but that's can be pretty far removed from actually defending.

What would be interesting to measure is:
1. How well defensemen partition the ice - ie. force players wide, keep players pinned on the board, and keep the puck from moving side to side - either behind the net or across the crease/royal road.
2. How much symbiosis there is between defensive pairings.

#2 I think I can get from line statistics elsewhere, but my main question is: what about #1? How can we effectively turn Corsi sideways and use lines (circle to circle and net to net) to see how defensemen prevent players and the puck from moving across the ice.

I have been suffering watching the NJ Devils this year struggle in our own zone - with largely the same team except for missing Ryan Graves (Pitt) playing top pairing minutes with Marino - and Damon Severson on our third pair.

These two were replaced by rookies Luke Hughes and Simon Nemec who have both been very good - and there's often the statement that Hughes and Nemec are 'better' than the players we lost - a statement that is true of talent in an Erik Karlsson kinda way - but certainly not for making life easy for goaltenders and keeping the puck out of the net.

I started out thinking about what made last team's defense so significantly better than this year's. It's easy to chalk it up to rookie mistakes - and there have been those - but the reality is that there haven't been THAT many catastrophic mistakes and breakdowns - (and both Severson and Graves were historically prone to brain farts that would bring the fanbase's ire upon them).

So what are the traits that separate the two.

Also - Graves has historically been a +/- wonder. Love or hate the stat - it clearly shows SOMETHING otherwise the guy wouldn't be quite so successful pretty much everywhere he's been. (+96 career, +40 with the Avs, +34 last year with NJD, tied for 3rd with +12 this year on the Pens)

So I started watching Graves's Penguins games. There are some obvious traits (size, good stick, good body position, blocks shots and passes, ties players up, stay at home type) but some others that I recognized from other players as well:
He seems to keep the puck and players from moving side-to-side - especially below the goal line - either with his body or by cutting passes or tying people up.

It seems like when Graves goes behind the net or in the corner, he ties players up completely - stick on stick, body against the boards. And that side of the ice becomes static. He keeps the puck from going around behind the net to the other side with his stick and his feet. Almost like setting up a concrete road barrier so that the puck can move up the boards, but not around behind it unless he (or his teammate center picks up the puck) moves the puck himself to his partner.

What is distinctly lacking from this year's Devils' team is John Marino getting the puck with time and looking up ice. I think this is directly attibutable to his work with Graves. I'm not saying Graves is a terrific defenseman, but I am saying that his skillset is perfect for keeping a John Marino free from pressure - and for keeping goalies sane by limiting the time the puck is going from side to side.

I don't know if any of this at all makes sense but I'm wondering if anyone has seen anything like this or if it is worth merit.
 
  • Like
Reactions: Bear of Bad News

Michael Farkas

Celebrate 68
Jun 28, 2006
13,487
8,059
NYC
www.hockeyprospect.com
First of all, Ryan Graves is all yours...just say the word. I'll be travelling from Pittsburgh back to NYC in mid-February, would be happy to bring him back with me. He is repugnant defensively, always has been.

Second of all, you need to make sure that you're agnostic to team tactics enough so that you don't unfairly punish players that play in different systems. It's the old thing about defending lines vs defending areas.

These days, a lot of d-men pinch by design. So, is this going to be tracked "locally", per play...or is it going to be an on-ice situation?

Shot deflections/blocks into harmless areas vs danger areas. Essentially, a version of "rebound control" for defensemen.

Attack time against, zone entry denials (this could be team tactic dependent in some cases), high quality danger shots/chances against, rebound recovery...then depending on how far you go into "the best defense is offense", you get into puck retrievals, partner support, zone exits, touches/zone exit, etc.

There's a lot of ways this can go. But this is a worthwhile endeavor. In part, this is more in line with legitimate/proprietary analytics work...corsi and all that noise that became public was because clubs were done with it.
 
  • Like
Reactions: Bear of Bad News

Ad

Upcoming events

Ad

Ad