What are the best advanced stats projects?

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
There are a couple of points to make here:

1. I don't think anyone will say zone starts don't help. What people do tend to suggest is the effect is exaggerated by many.

2. Roughly half of the NHL falls into what you're calling noise and only 20% in what you call the true signal. While there is a correlation, that isn't causation. Other factors could be the cause. Some players are deployed defensively because the add nothing offensively, and vice versa. A guy like Gaustad is there purely for Defense, and doesn't even try to score. The task set upon him by the coach is shut the other team down until our scorers get on the ice.

3. Further to point two, guys who get OZ starts tend to be the offensive guys, so it stands to reason they would outshoot the opposition. Guys deployed defensively tend to be there to shut down scoring chances. The tend to be the Grybas, Gorges and Smids of the world. Sure, they might get better corsi by being deployed in the OZ, but if you swapped their zone starts with the Karlssons of the world, it would just lessen the correlation you're seeing between rel OZ starts and Corsi.

I guess what I'm saying is zone starts to me seem to be a pretty minor factor. It's certainly going to help, but all the other uncontrolled variable are likely a lot more important.

1. Yes and no. I'd argue that it is necessary to adjust relCF% for outlying players.

2. Understood, and agreed, but even so, wouldn't you say their relCF% should be adjusted accordingly due to the abhorrent zone starts they get?

3. Also understood and agreed.

A lot of the work I do is regarding the Rangers, naturally. So Dan Boyle gets a **** load of O zone starts, not because he's sheltered, but because starting Dan Boyle in the D-zone would be a waste of Dan Boyle. What that does though, is exacerbate the usage of Dan Girardi - which I believe, among other things, is leading to Girardi's metrics being as bad as they are.

Even so, I think there should be proper adjustments for guys like Gaustad and Malhotra.
 

Micklebot

Moderator
Apr 27, 2010
53,777
30,976
1. Yes and no. I'd argue that it is necessary to adjust relCF% for outlying players.

2. Understood, and agreed, but even so, wouldn't you say their relCF% should be adjusted accordingly due to the abhorrent zone starts they get?

3. Also understood and agreed.

A lot of the work I do is regarding the Rangers, naturally. So Dan Boyle gets a **** load of O zone starts, not because he's sheltered, but because starting Dan Boyle in the D-zone would be a waste of Dan Boyle. What that does though, is exacerbate the usage of Dan Girardi - which I believe, among other things, is leading to Girardi's metrics being as bad as they are.

Even so, I think there should be proper adjustments for guys like Gaustad and Malhotra.

This is getting a touch off topic, but I agree that Zone starts for the outliers can have a significant effect, but when you look at the research, the advantage (or disadvantage) of a OZ or DZ start is primarily only attained if you win the draw (or lose it for a disadvantage). So right there, you cut any difference in half as for as the effect it has. Add to that, the advantage from a faceoff win in the OZ will last all of about 10 secs before any gained increase in shot attempts disappears.

I guess what I'm getting at is that for the vast majority, there is no value in trying to adjust for zone starts. Guys like Gaustad, sure. But looking at a guy like Karlsson getting around 53-55% OZ starts and saying he has some crazy advantage over Subban who gets 50% OZ starts is grasping for straws.

I don't really see much of a point of adjusting for a guy like Gaustad or Malholtra though. They play a very specific role, and probably shouldn't be compared to the player in a more normal role.
 

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
This is getting a touch off topic, but I agree that Zone starts for the outliers can have a significant effect, but when you look at the research, the advantage (or disadvantage) of a OZ or DZ start is primarily only attained if you win the draw (or lose it for a disadvantage). So right there, you cut any difference in half as for as the effect it has. Add to that, the advantage from a faceoff win in the OZ will last all of about 10 secs before any gained increase in shot attempts disappears.

I guess what I'm getting at is that for the vast majority, there is no value in trying to adjust for zone starts. Guys like Gaustad, sure. But looking at a guy like Karlsson getting around 53-55% OZ starts and saying he has some crazy advantage over Subban who gets 50% OZ starts is grasping for straws.

I don't really see much of a point of adjusting for a guy like Gaustad or Malholtra though. They play a very specific role, and probably shouldn't be compared to the player in a more normal role.

Of course not, Karlsson and Subban are guys who I believe fall in what I've designated as the noise -5 to +5, where no adjustment is necessary.

But on your last paragraph, if we're not adjusting for guys like Gaustad or Malhotra, then were do we draw the line?

I bring this up because Dan Girardi is incessantly torched on the Rangers board for being a bad d-man because his corsi is bad, and all the adjustment factors like dCorsi, say his corsi is bad. But, he was the 7th most 'abused' d-man in terms of rel zone starts among d-men 350 or more minutes. Yet, because of 'teammate strength' his dCorsi CF% is something like 2% less than Dan Boyle, who had the second easiest zone starts in the league this year.

Just doesn't sit right with me, especially when their is explanation in the variance of relCF% due to relZS% when you get to those outliers.

If you're saying we shouldn't use things like CF% on Gaustad and Malhotra, again, where do you draw the line? Maybe Dan Girardi shouldn't be judged on his corsi either?
 

Micklebot

Moderator
Apr 27, 2010
53,777
30,976
Of course not, Karlsson and Subban are guys who I believe fall in what I've designated as the noise -5 to +5, where no adjustment is necessary.

But on your last paragraph, if we're not adjusting for guys like Gaustad or Malhotra, then were do we draw the line?

I bring this up because Dan Girardi is incessantly torched on the Rangers board for being a bad d-man because his corsi is bad, and all the adjustment factors like dCorsi, say his corsi is bad. But, he was the 7th most 'abused' d-man in terms of rel zone starts among d-men 350 or more minutes. Yet, because of 'teammate strength' his dCorsi CF% is something like 2% less than Dan Boyle, who had the second easiest zone starts in the league this year.

Just doesn't sit right with me, especially when their is explanation in the variance of relCF% due to relZS% when you get to those outliers.

If you're saying we shouldn't use things like CF% on Gaustad and Malhotra, again, where do you draw the line? Maybe Dan Girardi shouldn't be judged on his corsi either?

I guess my point is we shouldn't be comparing a guy used purely in defensive roles against offensive guys, or vice versa. My point is more that when an adjustment is needed, we probably shouldn't be making the comparison in the first place. I get why someone might want to compare girardi to say Stralman, but imo there are too many variables (deployment just being one of them) to gain value i n doing so,
 

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
I guess my point is we shouldn't be comparing a guy used purely in defensive roles against offensive guys, or vice versa. My point is more that when an adjustment is needed, we probably shouldn't be making the comparison in the first place. I get why someone might want to compare girardi to say Stralman, but imo there are too many variables (deployment just being one of them) to gain value i n doing so,

So then the deployment, usage, does matter? And significantly enough that comparing two players with different deployment, usage, it needs to be specified?
 

Micklebot

Moderator
Apr 27, 2010
53,777
30,976
So then the deployment, usage, does matter? And significantly enough that comparing two players with different deployment, usage, it needs to be specified?

As I said, in extreme cases, yes, it matters, but at least partially (and imo a large part) because guys on the extreme ends of deployment are also asked to play differently to achieve very specific goals.

More importantly, I've seen no proof that moving one guy from DZ deployment to OZ deployment would have the same effect on performance as doing so for another player. Just because you move Girardi to more offensive deployment doesn't mean he'll get the same offensive numbers boost as say Phaneuf would if you changed his role. So if you can't show that the effect of deployment has the same effect on everyone, how can you adjust for it?
 

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
As I said, in extreme cases, yes, it matters, but at least partially (and imo a large part) because guys on the extreme ends of deployment are also asked to play differently to achieve very specific goals.

More importantly, I've seen no proof that moving one guy from DZ deployment to OZ deployment would have the same effect on performance as doing so for another player. Just because you move Girardi to more offensive deployment doesn't mean he'll get the same offensive numbers boost as say Phaneuf would if you changed his role. So if you can't show that the effect of deployment has the same effect on everyone, how can you adjust for it?

I would say if Burtch and his dCorsi hit a reset button, and made the zone starts conditional, and he ran dCorsi three times pending the players zone starts.

-5 to 5
-10 through -5 and +5 through +10
Everyone else

Because what you can show, is that as you move down the line in terms of zone starts away from the mean, one way or another, what you see is an increased explanation in the variance of individual players' relCF%.

I'm not trying to say that if Dan Girardi had Dan Boyle's zone starts this season that he'd have Dan Boyle's possession numbers. What I'm saying is that their usage had an effect on their metrics, and when comparing these two players, for example, we need to take that into context heavily, or, adjust accordingly via statistical methods and compare the adjustment or the expected values rather than the raw metrics.

I don't know if you can adjust for it, but you can certainly run regressions, get expected outputs, and compare player performance to their expected rates.
 

Micklebot

Moderator
Apr 27, 2010
53,777
30,976
Where are the SuperWOWY charts? Dave Johnson's site doesn't seem to have them?

http://www.puckalytics.com/#/

And http://www.progressivehockey.com/ seems out of date.

Nice, I forgot about the superwowy charts on Puckalytics even though I use them fairly frequently.

They can get you data on some really specific situations, but the sample gets small pretty quickly. Probably should have at least a 1000 mins for goal based metrics and 250 at minimum for Corsi.


Silverfish: Wrt zone starts; I don't want to hijack the thread, but from what I've seen only the extreme deployments warrent an adjustment, and once you get there, there are other variables beyond zone starts that imo likely factor in more (ie role; coach want's you to shut down the opposition, so you make the easy plays to clear rather than take chances to create offense). So, if you are looking to compare guys in ultra defensive roles to guys in offensive ones, you don't really need to control for these unknown variables, as you only care about the end result, but if your looking for the effect of zone starts themselves it becomes important. IMO, it's these uncontrolled variables that account for the difference you see between the noise range (between +/- 5% OZ/DZ starts) and the extremes (>+10% or <-10% OZ/DZ starts).

So I'm not disagreeing that there is an effect in the extremes of deployment, just that I'm not sold that zone starts are the sole or even primary culprit, and because of my suspicion of the actual cause (role, not deployment) I don't think it's fair to compare players in the extreme. If you asked Girardi to change his role, I imagine his CF% would improve, though he might allow more scoring chances against in the process. When you've got 10 mins left in a game, and are up by one goal, you're better off slowing the pace down and limiting chances than uping it and outchancing the opposition, because while in the long run you'll likely do better outchancing the opposition, in a small sample it only takes one shot to lose the game.
 

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
Silverfish: Wrt zone starts; I don't want to hijack the thread, but from what I've seen only the extreme deployments warrent an adjustment, and once you get there, there are other variables beyond zone starts that imo likely factor in more (ie role; coach want's you to shut down the opposition, so you make the easy plays to clear rather than take chances to create offense). So, if you are looking to compare guys in ultra defensive roles to guys in offensive ones, you don't really need to control for these unknown variables, as you only care about the end result, but if your looking for the effect of zone starts themselves it becomes important. IMO, it's these uncontrolled variables that account for the difference you see between the noise range (between +/- 5% OZ/DZ starts) and the extremes (>+10% or <-10% OZ/DZ starts).

So I'm not disagreeing that there is an effect in the extremes of deployment, just that I'm not sold that zone starts are the sole or even primary culprit, and because of my suspicion of the actual cause (role, not deployment) I don't think it's fair to compare players in the extreme. If you asked Girardi to change his role, I imagine his CF% would improve, though he might allow more scoring chances against in the process. When you've got 10 mins left in a game, and are up by one goal, you're better off slowing the pace down and limiting chances than uping it and outchancing the opposition, because while in the long run you'll likely do better outchancing the opposition, in a small sample it only takes one shot to lose the game.

I think we're closer together than our arguments make us believe. I'm not saying that Girardi's terrible metrics are only attributed to ZS, but that they play a part is all - as, he is, an extreme case (I'd say 7th toughest zone starts in the league warrants 'extreme'.)

To get the thread back on topic, I launched a tool on my blog today that will calculate a goalies GAA on any team in the league 5v5. It's a pretty fun little thing, not to be taken too seriously.

Even though we know that players on the ice don't really effect a goalies save percentage that much, there's obviously a lot more to it than numbers on the table. Feel free to click around on there and mess with it. Here's a screenshot of Talbot on the Oilers:

kbj0F0K.png


http://nerdhockey.com/player-tools/2015/7/18/goalie-performance-tool
 

Bill_Crosby*

Guest
To the people with their own projects going on are you scraping the information from somewhere or have you found a free, or open-source, API? Send me a message if you don't want to talk about it here.
 

silverfish

got perma'd
Jun 24, 2008
34,644
4,353
under the bridge
To the people with their own projects going on are you scraping the information from somewhere or have you found a free, or open-source, API? Send me a message if you don't want to talk about it here.

Ripped from War-On-Ice, always. Which makes me feel terrible, but I do not currently have the coding capabilities to build my own scraper. Hopefully soon.
 

Micklebot

Moderator
Apr 27, 2010
53,777
30,976
The problem is I can't find the charts on that site.

There really just tables that the thing spits out. You type the name of the player in the "player search" field, click on the player name, then click on "make player 1". Repeat for as many players as you want in the analysis, then seclect your date range and click the load wowy button. It will spit out a table that looks kinda like this:

Player_Name|TOI|GF|GA|GF/60....
Together|100:34|4|4|3.38|2.38...
Erik Karlsson*|1400|25|...|...|....
Marc Methot*|...|...|...|...|....
Kyle Turris*|...|...|...|...|....



On The Ice

ERIK KARLSSON, MARC METHOT, KYLE TURRIS



Not On The Ice





For Team

Any Team



Against





Not Against





Against Team

Any Team

*Stats in all situations player on/off ice combinations (with/against teams still matter though)
 

tinyzombies

Registered User
Dec 24, 2002
16,849
2,350
Montreal, QC, Canada
There really just tables that the thing spits out. You type the name of the player in the "player search" field, click on the player name, then click on "make player 1". Repeat for as many players as you want in the analysis, then seclect your date range and click the load wowy button. It will spit out a table that looks kinda like this:

Player_Name|TOI|GF|GA|GF/60....
Together|100:34|4|4|3.38|2.38...
Erik Karlsson*|1400|25|...|...|....
Marc Methot*|...|...|...|...|....
Kyle Turris*|...|...|...|...|....



On The Ice

ERIK KARLSSON, MARC METHOT, KYLE TURRIS



Not On The Ice





For Team

Any Team



Against





Not Against





Against Team

Any Team

*Stats in all situations player on/off ice combinations (with/against teams still matter though)

FOUND IT.

It's at http://stats.hockeyanalysis.com/showplayer.php?pid=1743&withagainst=true&season=2014-15&sit=5v5

You go to the player, choose a year, then click the "Visualize this table" link and it gives you WOWY charts.

Can't find it at the new site. I don't think he's added it there yet.
 
Last edited:

Micklebot

Moderator
Apr 27, 2010
53,777
30,976
Ok, so the WOWY charts I see on the internet were self-generated then.

Are you looking for something like this:

http://stats.hockeyanalysis.com/showplayerwowycharts.php?pid=971&season=2012-13&sit=5v5

If so, go to Stats.Hockeyanalysis.com, and select any players page. Then click on the date range you want, then click the link that says visualize this table.

It's not super wowy (which adds the ability to refine the date range, and look at his performance with a group of players on ice, not just a single player) but it's kinda pretty.

edit: seems like you found it while I typed this response. The Super Wowy tables give you a lot more control, even if they aren't as nice visually though.
 

tinyzombies

Registered User
Dec 24, 2002
16,849
2,350
Montreal, QC, Canada
Are you looking for something like this:

http://stats.hockeyanalysis.com/showplayerwowycharts.php?pid=971&season=2012-13&sit=5v5

If so, go to Stats.Hockeyanalysis.com, and select any players page. Then click on the date range you want, then click the link that says visualize this table.

It's not super wowy (which adds the ability to refine the date range, and look at his performance with a group of players on ice, not just a single player) but it's kinda pretty.

edit: seems like you found it while I typed this response. The Super Wowy tables give you a lot more control, even if they aren't as nice visually though.

Yea, he's probably still working on it at the new page. Funny how he used WOWY at individual player pages on the new site and not SuperWOWY, though there is a separate SuperWOWY page. I guess he only offers the WOWY chart tool at the individual player's pages on the old site so far. Can't find charts at the new page anywhere.
 
Last edited:

charlie1

It's all McDonald's
Dec 7, 2013
3,132
0
David Johnson posted a reply to that string of tweets,

http://hockeyanalysis.com/2015/03/21/zone-starts-and-impact-on-players-statistics/

in short, he suggests that the effect being show isn't zone starts on corsi, but rather teams that are playing well get more OZ starts, and better Corsi. If Ovechkin or Phaneuf get 60-70% OZ starts, there's a good chance it's because the team is playing well. If they get 40%, odds are their team is not doing well. It's playing well, not zone starts, that results in higher corsi. Corsi and zone starts are both byproducts of good play but not the driving forces themselves on one another.

It all boils down to correlation does not equal causality.

That was a pretty weak rebuttal. He is agreeing that zone starts matter, but not THAT much. Even after applying the correction of converting to ZSrel the positive correlation is still there (his graphs). That positive correlation is the reason he removes the first 10 seconds of every zone-start to calculate CF%. His additional analysis of using only on-the-fly and neutral zone starts again suggests that there is a positive effect of offensive zone starts on CF%. So, yeah, his entire rebuttal supports the idea that zone starts matter.
 

Bear of Bad News

Your Third or Fourth Favorite HFBoards Admin
Sep 27, 2005
13,518
27,013
Guys, I'm happy to pull this tangent off into its own thread if you'd like.

It's valuable and interesting, but perhaps orthogonal to the purpose of this thread.
 

Ad

Upcoming events

Ad

Ad