Corsi, shot quality, and the Toronto Maple Leafs

Delicious Dangles*

Guest
I never made that assertion, so...
On the contrary, I remember you making that assertion near the beginning of the thread, indicating that anything other than 5 on 5 is so randomized and not team-specific that it cannot be utilized in these models.

I do not care to look through 24 pages, but the paper you referred me to says this:

We focus strictly on team offense, and we restrict our attention to 5-on-5 situations in which both goalies are on the ice.

I'm effectively utilizing the same method that professor Brian Macdonald employed when he performed his own study regarding the predictive validity of various measures of team performance. The only difference is that I examined many different samples of randomly selected sets of games, rather than looking at the correlation between odd numbered games and even numbered games.

I'd suggest reading professor Macdonald's study.

http://www.academia.edu/2483597/An_Expected_Goals_Model_for_Evaluating_NHL_Teams_and_Players

If you read his study and still don't understand the method, then I can't help you, unfortunately.
First off, if you cant explain something yourself, there is a problem. Second, this does not answer most of the questions I asked.

However, from reading through that, I did notice that he used different methods to build his 40-game sets, he removed outliers, the cross-validated correlation for goals, shots, fenwick and corsi are near identical, and that this is an offense-based player evaluation model, not an all-around team evaluation model.
 

Delicious Dangles*

Guest
1. It's a theoretical upper limit. It goes without saying that the theoretical upper limit will somewhat exceed the practical upper limit. Which actually assists my argument.

In any case, the predictive validity is not terrible. Adjusted fenwick predicts 75% of the non-luck variance in future results. The underlying numbers model predicts 90% of the non-luck variance. Far from terrible. There's tonnes of utility there.

Slightly better than correlating two random half seasons? Hardly. If you think otherwise, you simply don't understand.

2. I've run the numbers in the past, and the predictive was virtually identical to points percentage. And no - I have no issue posting the results.

3. Frankly, I have no idea what you're getting at here. Because predictions were made about this season on the basis of last season, that somehow precludes utlizing a within-season analysis? The chain of reasoning is so bizarre that I'm forced to wonder whether you're simply being wilfully obtuse at this point.

A within-season analysis is obviously preferable as it mitigates the impact of roster turnover. Does that mean that a between-season analysis is useless? No. It's simply not as rigorous.

4. I meant to write 1000. I'll try one more time - for each season from 2007-08 to 2010-11, I randomly selected two independent sets of 40 games. I looked at the correlation between the two data sets for each of the three variables in question, in order to assess the predictive validity of each variable. I repeated the process 1000 times for each season. The figures I posted represent the average values for all four seasons.
1. But how did you create this theoretical upper limit based on a team's talent with exact certainty, when you do not have those values or even know how to properly create them.

Luck is just a word for variables that you do not understand and haven't incorporated into your model. Fact is, regardless of the reasons, that type of correlation has no predictive value.

2. Then let us see it.

3. You think in-season analysis doesn't have roster changes? How is season to season amateurish and your method perfect when they deal with the same problem in similar amounts?

4. How did you get 1000 samples of two 40-game sets in one season without overlap or stretching into different seasons?
 

Delicious Dangles*

Guest
If you do that, the same pattern holds as far as predictive validity: shot based metrics outperforming goal differential as a predictor of future goal differential.
The point was that you were pointing to regression in non-5-on-5 goal differential as proof of regression, when your initial shot metrics were based on 5-on-5.

5-on-5 goal differential did not regress by any meaningful amount.
Point percentage did not regress by any meaningful amount.

Where is the regression?
 

TKB

Registered User
Jun 12, 2010
1,114
403
Chicago
Heh.

The analytics guys seem to be tying themselves in knots trying to spin their way out of this one. Seem to have a serious problem admitting they were wrong about the Leafs.

I don't follow the Leafs closely, but didn't the "regression" also just happen to occur largely while Bozak and Bolland where both out of the line-up?

I could be wrong, I don't recall the exact time frames, but am surprised I have not seen any related discussion.
 

Beef Invictus

Revolutionary Positivity
Dec 21, 2009
128,144
166,159
Armored Train
There are a LOT of Leafs fans who are taking this a lot more personally than they should. Based on everything stats had shown before, there was every reason to believe they would regress. That's not anything to get offended over. The Leafs continue to buck the trend, and it calls the stats into question. So, now it's time to go back to the drawing board and figure out a more accurate way.

This has nothing to do with bashing the Leafs, but reading half the posts in here one would think there was a crusade to disparage them, instead of a discussion about how what they're doing runs counter to the statistics...the fact that so many fans are so offended by that is a bit mind-boggling. People are doing a lot of heavy handed and misplaced bashing of the stats being incorrect, instead of wondering how they can be revised to make more accurate predictions. It drags the thread down.
 

Delicious Dangles*

Guest
I don't follow the Leafs closely, but didn't the "regression" also just happen to occur largely while Bozak and Bolland where both out of the line-up?

I could be wrong, I don't recall the exact time frames, but am surprised I have not seen any related discussion.
Yes, it did. Both Bozak and Bolland were out, and Kadri and Phaneuf both missed games. This was also when the dreaded JVR at center experiment happened, which also killed his and his line's production for 5ish games. McClement was also tasked with a lot of the center duties and a lot more minutes, which dragged down our already depleted PK.

It was also one of our toughest stretches in terms of quality of opposing teams. And we had 24/7 cameras invading our player's lives and work environment.

There are a LOT of Leafs fans who are taking this a lot more personally than they should. Based on everything stats had shown before, there was every reason to believe they would regress. That's not anything to get offended over. The Leafs continue to buck the trend, and it calls the stats into question. So, now it's time to go back to the drawing board and figure out a more accurate way.

This has nothing to do with bashing the Leafs, but reading half the posts in here one would think there was a crusade to disparage them, instead of a discussion about how what they're doing runs counter to the statistics...the fact that so many fans are so offended by that is a bit mind-boggling. People are doing a lot of heavy handed and misplaced bashing of the stats being incorrect, instead of wondering how they can be revised to make more accurate predictions. It drags the thread down.
This is because the advanced statistics only got popularity, especially in the internet community, when the Leafs started doing well (Leafs were defying it back when they sucked too) and people incorrectly interpreted these statistics to conclude that they shouldn't. It became another tool to bring down the Leafs and devalue their accomplishments.

Majority of the people pushing it on here are not doing it for the advancement of statistical analysis in hockey. And half of the ones that are don't seem to really understand the fundamentals of what statistics are, and how they should be used.

If your models aren't consistent with reality, then it's probably not reality that needs fixing. Many won't accept how flawed and incomplete these statistics are.
 
Last edited by a moderator:

Beef Invictus

Revolutionary Positivity
Dec 21, 2009
128,144
166,159
Armored Train
Wait. You REALLY believe advanced stats only got popular as a means to try and bash the Leafs?

You're proving my point...and that sure does explain a lot.
 

Ninja Hertl

formerly sharkohol
Feb 25, 2006
6,398
0
The Yay
Yes, it did. Both Bozak and Bolland were out, and Kadri and Phaneuf both missed games. This was also when the dreaded JVR at center experiment happened, which also killed his and his line's production for 5ish games. McClement was also tasked with a lot of the center duties and a lot more minutes, which dragged down our already depleted PK.

It was also one of our toughest stretches in terms of quality of opposing teams. And we had 24/7 cameras invading our player's lives and work environment.


This is because the advanced statistics only got popularity, especially in the internet community, when the Leafs started doing well (Leafs were defying it back when they sucked too) and people incorrectly interpreted these statistics to conclude that they shouldn't. It became another tool to bring down the Leafs and devalue their accomplishments.

Majority of the people pushing it on here are not doing it for the advancement of statistical analysis in hockey. And half of the ones that are don't seem to really understand the fundamentals of what statistics are, and how they should be used.

If your models aren't consistent with reality, then it's probably not reality that needs fixing. Many won't accept how flawed and incomplete these statistics are.

Advanced analytics in hockey were around far before the Leafs crept out of the basement my friend.
 

Master_Of_Districts

Registered User
Apr 9, 2007
1,744
4
Black Ruthenia
1. But how did you create this theoretical upper limit based on a team's talent with exact certainty, when you do not have those values or even know how to properly create them.

Luck is just a word for variables that you do not understand and haven't incorporated into your model. Fact is, regardless of the reasons, that type of correlation has no predictive value.

2. Then let us see it.

3. You think in-season analysis doesn't have roster changes? How is season to season amateurish and your method perfect when they deal with the same problem in similar amounts?

4. How did you get 1000 samples of two 40-game sets in one season without overlap or stretching into different seasons?

1. The theoretical upper limit can be calculated through taking the following steps:

A. Calculate the talent spead between teams in winning percentage over the sample in question (2007-08 to 2010-11). The spread in talent can be calculated by comparing the actual spread in results to the spread in results attributable to binomial variation, based on the formula: VARIANCE (actual) = VARIANCE (binomial variation) + VARIANCE (talent). I calculate a true talent standard deviation of 0.049 with respect to winning percentage.

B. Create an artificial league using excel where the talent standard deviation between teams in terms of winning percentage is 0.049. Simulate 1000 seasons where each team plays 40 games per season (ideally 20 home games and 20 road games). At the end of each season, look at the correlation between results (i.e. points) and talent (i.e. true talent winning percentage). The correlation will, of course, be imperfect, given the impact of binomial variation upon the sample. I calculate a correlation of 0.57 (r^2=0.32), averaged over the 1000 seasons. The corrollary of this, of course, is that the theoretical upper limit for the predictive validity of any given metric over a typical 40 game sample, in today's nhl, is 0.57.

Luck simply refers to binomial variation, in this context. That's all. See: http://en.wikipedia.org/wiki/Binomial_distribution.

So no - luck is not just a term I'm using for variance that cannot be explained. Binomial variation is simply the random variation that inheres in the sample. It's perfectly susceptible to being explained. I'm explaining it to you, in this paragraph right here. There's really nothing complicated or magical about it, to be honest.

2. K.

1st half points percentage - 2nd half points percentage: r = 0.38; r^2=0.15
1st half adjusted fenwick - 2nd half points percentage: r= 0.49; r^2=0.24
1st half underlying numbers - 2nd half points percentage: r=0.53; r^2=0.29
1st half goal differential - 2nd half points percentage: r=0.39; r^2=0.15

3. The effect of roster changes is mitigated with a within season analysis. This much should be obvious. Is it perfect? No. Is it superior to an across season analysis? Clearly. Is there any reasonable alternative? No.

4. Each team plays 82 per season. Sample 1 is comprised of 40 randomly selected games. Sample 2 is comprised of 40 randomly selected games based on the 42 remaining games. That leaves two games left over per team. No overlap.
 
Last edited:

Master_Of_Districts

Registered User
Apr 9, 2007
1,744
4
Black Ruthenia
On the contrary, I remember you making that assertion near the beginning of the thread, indicating that anything other than 5 on 5 is so randomized and not team-specific that it cannot be utilized in these models.

I do not care to look through 24 pages, but the paper you referred me to says this:

That doesn't sound like something I would say. I utilize special teams data in my models all the time. My underlying numbers model, described above, incorporates special teams data. If you can obtain a quote, though, I'll gladly concede the point.

On the issue more generally, some advanced stats proponents dismiss special teams results out-of-hand, but they're wrong to do so, in my opinion. If anything, the spread in talent between teams is larger on special teams than it is for even strength play, although that's mitigated by the fact that the sample sizes are much smaller for special teams situations. In any event, more information is generally better, so excluding special teams data unnecessarily eliminates otherwise useful data. But that's just my opinion.

First off, if you cant explain something yourself, there is a problem. Second, this does not answer most of the questions I asked.

Well - perhaps that's ultimately on me. Or maybe it's just the nature of the subject matter. Who knows.
 

Bear of Bad News

Your Third or Fourth Favorite HFBoards Admin
Sep 27, 2005
13,603
27,407
This is because the advanced statistics only got popularity, especially in the internet community, when the Leafs started doing well (Leafs were defying it back when they sucked too) and people incorrectly interpreted these statistics to conclude that they shouldn't. It became another tool to bring down the Leafs and devalue their accomplishments.

What? What?

More likely, it's that *you* started paying attention to advanced statistics once the Leafs got started doing well.

I've been doing advanced hockey analytics since 1994, and I sure as hell ain't the first.

Believe it or not, the world did exist before you started paying attention.
 

PSGJ

Registered User
May 19, 2012
833
0
Sweden
What we're seeing here is hopefully the death of PDO. The claim with PDO is that it will regress to 1, but I see no reason to believe that since some teams just have better goaltending, Toronto is definitely one of those teams. Not one, but two really good relatively young goalies.

As for shooting % it's harder for a team to be better than average obviously, but having one of the best snipers in the league will certainly help. Kessel and van Riemsdyk are the leaders in taking shots and while they are above their career shooting % this season, it's not by a huge amount.

So, what should one take away from this? Toronto is a good team and they have their snipers and goalies to thank for that.
 

Rants Mulliniks

Registered User
Jun 22, 2008
23,071
6,136
I don't follow the Leafs closely, but didn't the "regression" also just happen to occur largely while Bozak and Bolland where both out of the line-up?

I could be wrong, I don't recall the exact time frames, but am surprised I have not seen any related discussion.

Yes it did. That's what makes this all so funny. Hard to imagine really that a team whose known biggest question mark is the centre position would struggle when their #1 and #3 go down and even for a brief spell, their #4 becomes #1 and a winger takes over at C. Some people would use that as evidence of "regression", others as an opportunity to grasp the obvious.

It kind of reminds me of this (I'll leave it to you to decide which side has the nail):

 

Delicious Dangles*

Guest
Advanced analytics in hockey were around far before the Leafs crept out of the basement my friend.
Wait. You REALLY believe advanced stats only got popular as a means to try and bash the Leafs?

You're proving my point...and that sure does explain a lot.
What? What?

More likely, it's that *you* started paying attention to advanced statistics once the Leafs got started doing well.

I've been doing advanced hockey analytics since 1994, and I sure as hell ain't the first.

Believe it or not, the world did exist before you started paying attention.
Well obviously hockey analytics were around beforehand (though I question 1994). I never said any differently. And yes, I started paying attention to them later. THAT IS THE POINT.

The explosion in popularity in the general (especially internet) community happened last year, conveniently when the Leafs started doing well and people started looking for new ways to bring them down.

Since then, these statistics have been used in every possible way against the Leafs, by media personalities, analysts, writers, bloggers, hockey forum users, etc. to devalue accomplishments and continuously predict doom and gloom that DOES NOT COME. There was not a single mention of these things in past years, even when the Leafs were defying corsi and sucking.

These articles and blogs are not titled "Advanced statistics". They are titled like "advanced statistics and why Leafs are destined to fail". All you have to do is look around you, and not stay in your statistical forum bubble. Heck, look at the title of this thread! Corsi, shot quality, AND THE TORONTO MAPLE LEAFS.

If this thread was REALLY only about statistics, Toronto Maple Leafs wouldn't be in the title.
 

Johnny Engine

Moderator
Jul 29, 2009
4,983
2,365
Wait. You REALLY believe advanced stats only got popular as a means to try and bash the Leafs? You're proving my point...and that sure does explain a lot.

Advanced analytics in hockey were around far before the Leafs crept out of the basement my friend.

What? What? More likely, it's that *you* started paying attention to advanced statistics once the Leafs got started doing well. I've been doing advanced hockey analytics since 1994, and I sure as hell ain't the first. Believe it or not, the world did exist before you started paying attention.

The poster you're referring to has taken some wild reaches in this thread, but he's not wrong. He said that so called advanced stats gained popularity recently, which is absolutely true. Nobody said they didn't exist, so calling out others for straw-man arguments is disgenuous.

http://www.google.ca/trends/explore#q=hockey%20analytics
http://www.google.ca/trends/explore#q=possession%20stats
http://www.google.ca/trends/explore#q=Corsi%20stats

As an avid user of the history forum, I understand that it's frustrating that so many uneducated voices persist in the discussion. It would serve you well to push the difference between serious statisticians and anti-Leaf trolls, rather to insist the latter doesn't exist.
 

Bear of Bad News

Your Third or Fourth Favorite HFBoards Admin
Sep 27, 2005
13,603
27,407
If this thread was REALLY only about statistics, Toronto Maple Leafs wouldn't be in the title.

There are other threads in this subforum.

The fact that a small number of Leafs fans have set up a bunker in this one has made this thread more popular than other threads, yes.

Don't mistake HFBoards for the hockey community at large.
 

Delicious Dangles*

Guest
There are other threads in this subforum.

The fact that a small number of Leafs fans have set up a bunker in this one has made this thread more popular than other threads, yes.

Don't mistake HFBoards for the hockey community at large.
I am not mistaking HF for the hockey community at large. This is not only contained to HF, nor is it only contained in this thread or forum. As Leaf fans, we have been bombarded with this by every form of media over the last 2 years, and especially this year.

Yes, there are other threads in this sub-forum. Most of them, except for the one discussing how Toronto sucks because of corsi, are relatively dead.

Not EVERY person using advanced statistics are trying to bring down the Leafs. I never said that. However, it DID gain popularity and IS used (usually wrongly) by the majority of people for that reason.
 

Bear of Bad News

Your Third or Fourth Favorite HFBoards Admin
Sep 27, 2005
13,603
27,407
By the majority of people?

You believe that the majority of people using hockey analytics are doing so to "bring down the Leafs"?

:laugh:
 

hatterson

Registered User
Apr 12, 2010
35,640
13,062
North Tonawanda, NY
So I finally caught up on the thread and wanted to respond to a few points. Sorry if this is a wall of text.

They don't dictate what happens in the shootout, which is entirely random.

Team shootout results may be random in relation to team Corsi (I haven't actually looked at numbers on this, but will defer to others if they have), however that's a lot different than saying they are, in fact, random.

There's every reason to believe that players like TJ Oshie, Jonathan Toews, James van Riemsdyk are better shootout shooters than players like Colton Orr or John Scott. Further there's significant reason to believe that Oshie, Toews, JVR, etc. are better at shootouts than other elite goal scorers. Specifically Ovechkin and Kessel are elite goal scorers, but that hasn't translated to the shootout. Ovechkin's results have been subpar and Kessel's have been absolutely atrocious. Similarly there's also reason to believe that some goalies are better than others at the shootout.

If you accept those two things, it stands to reason that teams with good shootout performers and a good shootout goalie (Pittsburgh might be a solid example) will have better results than teams without both those qualities, and that those results will be non-random.

What we're seeing here is hopefully the death of PDO. The claim with PDO is that it will regress to 1, but I see no reason to believe that since some teams just have better goaltending, Toronto is definitely one of those teams. Not one, but two really good relatively young goalies.

As for shooting % it's harder for a team to be better than average obviously, but having one of the best snipers in the league will certainly help. Kessel and van Riemsdyk are the leaders in taking shots and while they are above their career shooting % this season, it's not by a huge amount.

So, what should one take away from this? Toronto is a good team and they have their snipers and goalies to thank for that.

I've been vocal in the past about my doubt of PDOs strict regression to 1000 as some claim it should. However, a simple glance at the numbers on a larger level show that PDOs in the range of 1020-1030 simply cannot be expected to maintain over an 82 game schedule. There's the odd team that finishes a season with a 5 on 5 PDO above 1020, but the vast majority that have a high PDO early in the season fade down to the normal range as things go on. In fact, this is what the Leafs are doing right now. Their PDO is now under 1020 and has been in a slight decline for a good while.

As I estimated earlier, if we take each players on-ice shooting percentage for the last 5 years, weight it by ice time and then do the same for goalies, the Leafs "expected" PDO comes out to 1011, so even getting to 1019 requires believing that the Leafs players are performing significantly above their last 5 year averages, which in the case of Lupul and Kadri, would be quite impressive.

Luck is just a word for variables that you do not understand and haven't incorporated into your model.

I strongly disagree with this. Luck most certainly exists in sporting events. We have goals that are scored off of two deflections, a bounce off the boards and then off someone's elbow and into the net. If luck exists in the scoring or preventing of a single goal, then it must exist in the results of games and if it exists in the results of games it must exist in larger stretches. Sure, the effects of luck may be greatly diminished over a season long sample, but it still exists.

If shot metrics are only relevant for the purpose of evaluating affects on goal differential, which you then extrapolate to quality of team, then shouldn't actual goal differential have better predictive value for points than those shot metrics?

Are these shot metrics also only based on 5 on 5? Why are you then not looking at 5 on 5 goal differentials?

The basic line of thinking is that shot metrics correlate long term very well with goal metrics, which correlate long term very well with winning.

It is understood that by using indirect measures you add a level of error to your predictions. Shots don't correlate perfectly with goals and goals don't correlate perfectly with talent. However, given sample size requirements, the variation in correlation between goals and talent and between shots and goals, has historically been smaller than the variation from randomness inside a wins based sample due to size requirements.

From what I've seen, the point at which the variation from sample size drops below the variation from indirect measurement for goals vs wins is well above the season mark. The point at which that happens for shots vs goals is right around the full season mark.
 

Delicious Dangles*

Guest
The spread in talent can be calculated by comparing the actual spread in results to the spread in results attributable to binomial variation, based on the formula: VARIANCE (actual) = VARIANCE (binomial variation) + VARIANCE (talent). I calculate a true talent standard deviation of 0.049 with respect to winning percentage.
The formula should be var(talent)=var(observed)-var(luck). I dont know why you are not calculating the so-called luck variance yourself. I have seen these calculations done by others, and they did not get the same standard deviation that you got.

The corrollary of this, of course, is that the theoretical upper limit for the predictive validity of any given metric over a typical 40 game sample, in today's nhl, is 0.57.
That tells me that you should not be using 40-game samples.

So no - luck is not just a term I'm using for variance that cannot be explained..
Actually, it is. Look at it like a coin flip. One would usually say that there is a fair amount of luck in terms of flipping a coin. But there isn't.

What it lands on is a function of where it started, and the force you put on whatever location on the coin, plus smaller factors like air resistance.

Theoretically, given enough information and the ability to affect small currency with amazingly accurate forces, we could predict what the coin was going to land on with 100% perfect accuracy.

Of course, for this it is one action by one person that creates the results. Easy. With hockey, it is a constant combination of multiple actions from multiple sources that creates a result. Which makes it infinitely more complicated and beyond current human capabilities, but still theoretically measurable and predictable given enough data.

So-called luck is not an excuse for having bad predictive value.

And the big point you keep missing is that even if you had good predictive value for the league as a whole, it does not mean that you have that same predictive value for each individual team in that league.

You are going under the assumption that every team plays the exact same way when you apply league-wide results to individual teams. Some teams will have corsi correlate more closely, and some will have less correlation with corsi. It is entirely reasonable that a team, say the Toronto Maple Leafs, who did not play a style that correlated well with corsi, would get better predictive value from other statistics.

3. The effect of roster changes is mitigated with a within season analysis. This much should be obvious. Is it perfect? No. Is it superior to an across season analysis? Clearly. Is there any reasonable alternative? No.
It isn't though. There is often just as much roster change in-season, with injuries, trades, call-ups, etc. And with an in-season analysis comes inconsistent opponents, circumstances, and pretty much everything else.

Just because something is better than a worse way doesn't make it good, or even useful at all in all cases.

4. Each team plays 82 per season. Sample 1 is comprised of 40 randomly selected games. Sample 2 is comprised of 40 randomly selected games based on the 42 remaining games. That leaves two games left over per team. No overlap.
Oh, so you are just taking 40 random games and comparing them to 40 other random games that season. You gave the impression that you were doing sets of games. How the 2nd half of a season correlates to the 1st half.

Either way, there are huge problems with both methods, some as previously stated.
 

Delicious Dangles*

Guest
That doesn't sound like something I would say. I utilize special teams data in my models all the time. My underlying numbers model, described above, incorporates special teams data. If you can obtain a quote, though, I'll gladly concede the point.

On the issue more generally, some advanced stats proponents dismiss special teams results out-of-hand, but they're wrong to do so, in my opinion. If anything, the spread in talent between teams is larger on special teams than it is for even strength play, although that's mitigated by the fact that the sample sizes are much smaller for special teams situations. In any event, more information is generally better, so excluding special teams data unnecessarily eliminates otherwise useful data. But that's just my opinion.
Weird, because in pretty much every post throughout the rest of the thread, you reference even strength data.

And even using it when evaluating goal differential - back when it fit your argument of course:

And for all the talk about how the Leafs are apparently "defying advanced stats," they're only outscored the opposition by one goal during 5-on-5 play thus far.

And scoffed at the thought of the regression pertaining to overall performance in other areas:

Obviously the regression comment pertained to EV shooting percentage specifically and not overall performance.
 

Delicious Dangles*

Guest
By the majority of people?

You believe that the majority of people using hockey analytics are doing so to "bring down the Leafs"?

:laugh:
Considering that it only grew in popularity when it applied negatively to the Leafs, and majority of articles and hockey personalities that reference it also reference or apply it to the Leafs in some negative light, yes.

I have been following hockey for many years, and I didn't see a single mention of it anywhere back when we sucked and corsi said we should be good.

In fact, I think it is quite amusing that the group advocating for the accuracy of these stats that have very little correlation, are the ones arguing that the huge correlation between the explosion in popularity and the Leafs performance in those metrics was just coincidence, even when ongoing evidence proves otherwise.
 

Master_Of_Districts

Registered User
Apr 9, 2007
1,744
4
Black Ruthenia
The formula should be var(talent)=var(observed)-var(luck). I dont know why you are not calculating the so-called luck variance yourself. I have seen these calculations done by others, and they did not get the same standard deviation that you got.

Yeah

You see - here's where something called "algebra" comes into play.

Because VAR (observed) is referring to the exact same thing as VAR (actual), and VAR (binomial variation is referring to the same thing as VAR (luck), the two formulas are identical!

That tells me that you should not be using 40-game samples.

K.

Actually, it is. Look at it like a coin flip. One would usually say that there is a fair amount of luck in terms of flipping a coin. But there isn't.

What it lands on is a function of where it started, and the force you put on whatever location on the coin, plus smaller factors like air resistance.

Theoretically, given enough information and the ability to affect small currency with amazingly accurate forces, we could predict what the coin was going to land on with 100% perfect accuracy.

Of course, for this it is one action by one person that creates the results. Easy. With hockey, it is a constant combination of multiple actions from multiple sources that creates a result. Which makes it infinitely more complicated and beyond current human capabilities, but still theoretically measurable and predictable given enough data.

So-called luck is not an excuse for having bad predictive value.

Well - that sounds like a philosophical issue. Certainly, findings in fields like quantum mechanics would suggest that some things truly are random, and cannot be predicted.

In any event, when you develop your own model that's able to predict binomial variation, let me know. No excuses, right? :laugh:

And the big point you keep missing is that even if you had good predictive value for the league as a whole, it does not mean that you have that same predictive value for each individual team in that league.

You are going under the assumption that every team plays the exact same way when you apply league-wide results to individual teams. Some teams will have corsi correlate more closely, and some will have less correlation with corsi. It is entirely reasonable that a team, say the Toronto Maple Leafs, who did not play a style that correlated well with corsi, would get better predictive value from other statistics.

I made no such assumption.

The original inquiry was simply whether corsi and fenwick predict future results, when the original sample is smaller than 80 games, better than points percentage or goal differential. Which they do, as substantiated by the data I posted.

It isn't though. There is often just as much roster change in-season, with injuries, trades, call-ups, etc. And with an in-season analysis comes inconsistent opponents, circumstances, and pretty much everything else.

With an across season analysis, all those factors come into play as well, just like they do with a within season analysis.

Except with an across season analysis, you have the added effect of off-season roster acquisitions and departures.

So - quite clearly - a within season analysis has less confounding variables.

Oh, so you are just taking 40 random games and comparing them to 40 other random games that season. You gave the impression that you were doing sets of games. How the 2nd half of a season correlates to the 1st half.

Either way, there are huge problems with both methods, some as previously stated.

Then devise your own method.
 

Delicious Dangles*

Guest
As I estimated earlier, if we take each players on-ice shooting percentage for the last 5 years, weight it by ice time and then do the same for goalies, the Leafs "expected" PDO comes out to 1011, so even getting to 1019 requires believing that the Leafs players are performing significantly above their last 5 year averages, which in the case of Lupul and Kadri, would be quite impressive.
Except this is heavily skewed by what those 5 years include. Were they on the same team? In the same situations? Same stages of development? A player can change a lot over 5 years, and the type of player they were 5 years ago does not indicate what type of player they are today. Not even close actually.

This is especially true for a young team like the Leafs, who haven't had the time to accumulate a 5-year average that is representative of their abilities today. In fact, most Leaf players either were not in the league 5 years ago, or were entirely different players.

And it is not just about the quality of players in a vacuum (and shooting percentages are far from the only value a player can add to a team). You can two players of the exact same quality and capabilities, and they would produce different results depending on the circumstances and their role on the team.

I strongly disagree with this. Luck most certainly exists in sporting events. We have goals that are scored off of two deflections, a bounce off the boards and then off someone's elbow and into the net. If luck exists in the scoring or preventing of a single goal, then it must exist in the results of games and if it exists in the results of games it must exist in larger stretches. Sure, the effects of luck may be greatly diminished over a season long sample, but it still exists.
Except that is not luck. It is physics and geometry. Theoretically you could replicate that shot if we had the capabilities to control that kind of accuracy and force.

The basic line of thinking is that shot metrics correlate long term very well with goal metrics, which correlate long term very well with winning.

It is understood that by using indirect measures you add a level of error to your predictions. Shots don't correlate perfectly with goals and goals don't correlate perfectly with talent. However, given sample size requirements, the variation in correlation between goals and talent and between shots and goals, has historically been smaller than the variation from randomness inside a wins based sample due to size requirements.

From what I've seen, the point at which the variation from sample size drops below the variation from indirect measurement for goals vs wins is well above the season mark. The point at which that happens for shots vs goals is right around the full season mark.
This is basically saying we should not be using such small samples.

And before somebody says that is all we have, well sorry, but that doesn't magically make the results better or more accurate.

It also makes the statement that Stanley Cups cant be won a different way untrue, since all you would need is 1 season.
 

Master_Of_Districts

Registered User
Apr 9, 2007
1,744
4
Black Ruthenia
Weird, because in pretty much every post throughout the rest of the thread, you reference even strength data.

And even using it when evaluating goal differential - back when it fit your argument of course:



And scoffed at the thought of the regression pertaining to overall performance in other areas:

Uh - the second comment is taken completely out of context.

I assume you're aware of that as you must have just gone back and read the post in which it's contained.

The comment was made when I was examining the 0910 Capitals, 0607 Sabres, 0607 Predators, and 0809 Penguins - all even strength shooting percentage outliers - and the regression each team experienced in that respect in the following seasons.

Which is what my comment refers to - that I was examining regression in the context of even strength shooting percentage and not overall performance.

It has nothing to do with the utility of special teams data vis-a-vis even strength data.

As for the other comment, it also has nothing to do with the utility of special teams data vis-a-vis even strength data.

It was merely an observation - that the Leafs were +1 in 5-on-5 goal differential at the time the comment was made.
 

Ad

Upcoming events

Ad

Ad