Jets At The World Hockey Championship (Part II)

Whileee

Registered User
May 29, 2010
46,075
33,132
What? No one is doing what you are saying they are doing though. I am not suggesting what you think I'm suggesting.

This isn't something knowledge in advanced statistical analysis is needed.

Some players will peak prior to the average peak. Some will peak after. Most will peak closer than further.... sorta like a skewed normal curve.

No one knows for certain where either of these players will peak. No one can assume where either of these players peak. One can point out where it is historically most likely.

This is less a stats thing than a logic thing.

What we know:
Barkov's results at the same age has been better for every age we can compare.
Scheifele's results heavily accelerated this last season, which could be growth, luck/variance, or combination thereof.
Most players peak around 24-26, but not all do.

That's it.


It's not an "advanced" statistical principle, but there are statistical principles involved, because a fitted curve is a statistical representation of a set of data.

The main problem with the logic is the assumption that individuals all belong to the same sample set, and individual variations simply represent random fluctuations within that distribution.

Let me illustrate. Suppose you have two large bags, each with 1000 marbles that are black or white. In the first bag, 70% of the marbles are black. In the second bag, 30% of the marbles are black. If I take a large number of random samples (say 100 marbles per sample) from each of the bags, on average I would end up with 70 black marbles (out of 100) in samples from the first bag and 30 black marbles in the samples from the second bag. The sample mean from the first bag would be 70/100 (70%), and the sample mean from the second bag would be 30/100 (30%).

Now, if I mixed all of the marbles from the two bags together (i.e. 2000 marbles), and took a bunch of samples of 100 I would end up with a mean of 50 black marbles (i.e. 50%).

So, if I just focus on the combined sample and didn't know the proportions in the original sample, I could make the simplifying assumption that both bags started with 50% black marbles, though we know that neither bag had that proportion of black marbles.

How does this relate to hockey players and this topic? Well, if you assume that all players follow the same age / development trajectory, then you could suggest that the mean age for peak performance should be same for everyone, apart from some random statistical fluctuations. If, however, different groups of players follow different age / development trajectories, then lumping them together to predict when they would peak would ignore the different base trajectories. This is not simply a statistical nicety, it's also a logical concept. What statistical analysts should do is assemble more variables and outcomes to determine whether there are key factors that are associated with age / development trajectories, and age at peak performance. This could then be applied to develop a more robust model to predict when groups of players are likely to peak.

Analysts do this all the time to develop models such as the PCS, etc. Obviously, using a single variable (e.g. height) will result in a model that is not very reliable. The more influential variables that are included in the model, the more robust the model.
 

nobody important

the pessimist returns
Jul 12, 2015
6,426
1,719
a quiet suburb
So, if I just focus on the combined sample and didn't know the proportions in the original sample, I could make the simplifying assumption that both bags started with 50% black marbles, though we know that neither bag had that proportion of black marbles.

See, now I would have assumed one bag was all white marbles and one bag was all black marbles.

I hope this isn't one of those sneaky little tests that determined I was a racist. Or a serial killer. :)

Anyway, back to your intelligent discussion.
 

truck

Registered User
Jun 27, 2012
10,992
1,583
www.arcticicehockey.com
Perhaps if I posed a couple of questions, it would help. What proportion of players that have Scheifele's history and trajectory end up following the average curve? What proportion of players with Barkov's history and trajectory end up following the average curve? Your statement above suggests that you think that it is over 50%. Is there data to support that?

If we combined Scheifele's trajectory and Barkov's trajectory into an "average" curve, should we expect each of them to follow that same "average" curve?
How do you determine trajectory - especially if there is no ser curve as you suggest?
 

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
Well, if you assume that all players follow the same age / development trajectory,

I'm stoping you there, because we are not making that assumption.
This assumption of yours is one no one is making but what you are arguing against.

Not everyone peaks at the same point.
There are some later. There are some earlier.
There are some who fall quickly.
There are some that rise quickly.
There are some that plateau for longer.

Yes. I am well aware of these factors.

The average is only the average of a larger distribution of many people who vary from the norm. But, it's also a very normal like distribution, where the closer you go to that average peak, the more often it happens.

HOWEVER, it is more likely than not that the players do peak at or near that peak than further from the peak.

Scheifele may peak far later than the avg point; Scheifele may have peaked this year; Scheifele may just fit right in at the norm
Barkov may peak far later than the avg point; Barkov may have peaked this year; Barkov may just fit right in at the norm

Any of the 9 possible sets of combinations are possible. Not all are most probable.

I'm not assuming that the 3rd choice in each is what will happen, although I will acknowledge it is the most likely to happen but probability isn't destiny. Others though were making assumptions.
 

truck

Registered User
Jun 27, 2012
10,992
1,583
www.arcticicehockey.com
It's not an "advanced" statistical principle, but there are statistical principles involved, because a fitted curve is a statistical representation of a set of data.

The main problem with the logic is the assumption that individuals all belong to the same sample set, and individual variations simply represent random fluctuations within that distribution.

Let me illustrate. Suppose you have two large bags, each with 1000 marbles that are black or white. In the first bag, 70% of the marbles are black. In the second bag, 30% of the marbles are black. If I take a large number of random samples (say 100 marbles per sample) from each of the bags, on average I would end up with 70 black marbles (out of 100) in samples from the first bag and 30 black marbles in the samples from the second bag. The sample mean from the first bag would be 70/100 (70%), and the sample mean from the second bag would be 30/100 (30%).

Now, if I mixed all of the marbles from the two bags together (i.e. 2000 marbles), and took a bunch of samples of 100 I would end up with a mean of 50 black marbles (i.e. 50%).

So, if I just focus on the combined sample and didn't know the proportions in the original sample, I could make the simplifying assumption that both bags started with 50% black marbles, though we know that neither bag had that proportion of black marbles.

How does this relate to hockey players and this topic? Well, if you assume that all players follow the same age / development trajectory, then you could suggest that the mean age for peak performance should be same for everyone, apart from some random statistical fluctuations. If, however, different groups of players follow different age / development trajectories, then lumping them together to predict when they would peak would ignore the different base trajectories. This is not simply a statistical nicety, it's also a logical concept. What statistical analysts should do is assemble more variables and outcomes to determine whether there are key factors that are associated with age / development trajectories, and age at peak performance. This could then be applied to develop a more robust model to predict when groups of players are likely to peak.

Analysts do this all the time to develop models such as the PCS, etc. Obviously, using a single variable (e.g. height) will result in a model that is not very reliable. The more influential variables that are included in the model, the more robust the model.
To take this back to your example....

Two bag of marbles.
You know the mean is 50.
You don't know the splits for either bag.

More logical to assume 50? Or should one pretend they know the splits of the individual bags and make assumptions accordingly?
 

heilongjetsfan

Registered User
Jul 4, 2011
3,591
1,578
To take this back to your example....

Two bag of marbles.
You know the mean is 50.
You don't know the splits for either bag.

More logical to assume 50? Or should one pretend they know the splits of the individual bags and make assumptions accordingly?
Sounds like an opportunity for the eye test to shine!
 

Whileee

Registered User
May 29, 2010
46,075
33,132
I'm stoping you there, because we are not making that assumption.
This assumption of yours is one no one is making but what you are arguing against.

Not everyone peaks at the same point.
There are some later. There are some earlier.
There are some who fall quickly.
There are some that rise quickly.
There are some that plateau for longer.

Yes. I am well aware of these factors.

The average is only the average of a larger distribution of many people who vary from the norm. But, it's also a very normal like distribution, where the closer you go to that average peak, the more often it happens.

HOWEVER, it is more likely than not that the players do peak at or near that peak than further from the peak.

Scheifele may peak far later than the avg point; Scheifele may have peaked this year; Scheifele may just fit right in at the norm
Barkov may peak far later than the avg point; Barkov may have peaked this year; Barkov may just fit right in at the norm

Any of the 9 possible sets of combinations are possible. Not all are most probable.

I'm not assuming that the 3rd choice in each is what will happen, although I will acknowledge it is the most likely to happen but probability isn't destiny. Others though were making assumptions.

Just get and use more data. That's the basic message. I'm not a fan of overly simple analyses, especially when they are over-interpreted. Not saying you're doing that, but I'm seeing it more and more.
 

Whileee

Registered User
May 29, 2010
46,075
33,132
To take this back to your example....

Two bag of marbles.
You know the mean is 50.
You don't know the splits for either bag.

More logical to assume 50? Or should one pretend they know the splits of the individual bags and make assumptions accordingly?

I wouldn't pretend to know with confidence if I knew that proportions differ between bags.

Are we really saying that the overall average trajectory is a reliable way to predict the trajectory of individual players? This is going to make "analytics" a piece of cake.
 

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
I wouldn't pretend to know with confidence if I knew that proportions differ between bags.

Are we really saying that the overall average trajectory is a reliable way to predict the trajectory of individual players? This is going to make "analytics" a piece of cake.

It's still not quite comparable situation.

I tried to keep thinking of examples that would work better with the marble example but it's difficult.

Point is that peak age has a distribution that fits fairly tightly around 24-26, and most have a slow decline afterwards, although there are some quick fallers after.
There are those outside of those, and I'd wager good money a huge portion of the early fallers is due to illegal substance practices and injuries.
There are real people though that peak outside of that 24-26 range.
Without any real evidence, some are making an assumption that these two players are likely existing in the tails of the distribution.
It's possible, but we're completely fine in pointing out that it's more likely not, that there exists a large distribution of possibilities, and where the highest proportion of the distribution rests.

Obviously the practice and usage of analytics does not end there. No one who is doing research for a team or agency would stop at that point. But, for a discussion on an online hockey discussion board it is fine to point out how the probabilities lie with the information we have. This is not the same as saying the players are destined to be peaking at point X and it is disingenuous to say the others are.

That's all.
 
Last edited:

Whileee

Registered User
May 29, 2010
46,075
33,132
I'm stoping you there, because we are not making that assumption.
This assumption of yours is one no one is making but what you are arguing against.

Not everyone peaks at the same point.
There are some later. There are some earlier.
There are some who fall quickly.
There are some that rise quickly.
There are some that plateau for longer.

Yes. I am well aware of these factors.

The average is only the average of a larger distribution of many people who vary from the norm. But, it's also a very normal like distribution, where the closer you go to that average peak, the more often it happens.

HOWEVER, it is more likely than not that the players do peak at or near that peak than further from the peak.

Scheifele may peak far later than the avg point; Scheifele may have peaked this year; Scheifele may just fit right in at the norm
Barkov may peak far later than the avg point; Barkov may have peaked this year; Barkov may just fit right in at the norm

Any of the 9 possible sets of combinations are possible. Not all are most probable.

I'm not assuming that the 3rd choice in each is what will happen, although I will acknowledge it is the most likely to happen but probability isn't destiny. Others though were making assumptions.

What I have said repeatedly is that better analysis using more variables is required to compare players at different stages of their development. Agree or disagree?
 

Whileee

Registered User
May 29, 2010
46,075
33,132
It's still not quite comparable situation.

I tried to keep thinking of examples that would work better with the marble example but it's difficult.

Point is that peak age has a distribution that fits fairly tightly around 24-26, and most have a slow decline afterwards, although there are some quick fallers after.
There are those outside of those, and I'd wager good money a huge portion of the early fallers is due to illegal substance practices and injuries.
There are real people though that peak outside of that 24-26 range.
Without any real evidence, some are making an assumption that these two players are likely existing in the tails of the distribution.
It's possible, but we're completely fine in pointing out that it's more likely not, that there exists a large distribution of possibilities, and where the highest proportion of the distribution rests.

Obviously the practice and usage of analytics does not end there. No one who is doing research for a team or agency would stop at that point. But, for a discussion on an online hockey discussion board it is fine to point out how the probabilities lie with the information we have. This is not the same as saying the players are destined to be peaking at point X and it is disingenuous to say the others are.

That's all.

Making simplifying assumptions and analyses for online discussions is okay, I suppose. Not sure why it's a problem when it's pointed out as a simplistic analysis, though.
 

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
What I have said repeatedly is that better analysis using more variables is required to compare players at different stages of their development. Agree or disagree?

Making simplifying assumptions and analyses for online discussions is okay, I suppose. Not sure why it's a problem when it's pointed out as a simplistic analysis, though.

Corsi is simple and will eventually be improved upon. I'm still going to use it as it is useful and I still feel confident using it against those who are not. When the other individual is using plus/minus or just their eyes, and my opinion plus the simplistic but useful model counters, I will still use it. I will still strive towards better forms of analysis, but use the best available ones until superior alternatives are reached.

Corsi is more likely to be right than plus/minus.
The two young centres are more likely to peak at the average point than not.

Simplistic or not, those two statements are still true.
 

Whileee

Registered User
May 29, 2010
46,075
33,132
Corsi is simple and will eventually be improved upon. I'm still going to use it as it is useful and I still feel confident using it against those who are not. When the other individual is using plus/minus or just their eyes, and my opinion plus the simplistic but useful model counters, I will still use it. I will still strive towards better forms of analysis, but use the best available ones until superior alternatives are reached.

Corsi is more likely to be right than plus/minus.
The two young centres are more likely to peak at the average point than not.

Simplistic or not, those two statements are still true.

Corsi stats and an age / production curve are not the same conceptually, and are not based on the same empirical underpinnings.

Your statement that "the two young centres are more likely to peak at the average point than not" is likely untrue, as it doesn't hold with most frequency distributions, especially when they don't have a very pronounced central tendency.

Consider the following frequency distribution... less than 1/3 of the individuals would "peak" in one of the two central age groupings (mean, median or mode). More than 2/3 would peak older or younger than either of the two central groups. More analysis would probably help to determine which would peak younger, which around the average, and which older than the average.

View attachment 90161

But we should probably stop derailing this thread... :)
 
Last edited:

garret9

AKA#VitoCorrelationi
Mar 31, 2012
21,738
4,380
Vancouver
www.hockey-graphs.com
Corsi stats and an age / production curve are not the same conceptually, and are not based on the same empirical underpinnings.

Your statement that "the two young centres are more likely to peak at the average point than not" is likely untrue, as it doesn't hold with most frequency distributions, especially when they don't have a very pronounced central tendency.

Consider the following frequency distribution... less than 1/3 of the individuals would "peak" in one of the two central age groupings (mean, median or mode). More than 2/3 would peak older or younger than either of the two central groups. More analysis would probably help to determine which would peak younger, which around the average, and which older than the average.

View attachment 90161

But we should probably stop derailing this thread... :)

Conceptually they are, but the point still holds out the same that simple does not mean bad, and it still lets me use the concept that the simple model can still be better than no model (like the assumptions that were made that started this conversation) or a poor model (like plus/minus).

Now you are falling to semantics to try and defend your debate.

1) The alternative argument being made by others in your example still have lower probability of being right than the ones being made by Truck, myself, et al.

2) Last time I checked, the kurtosis is actually quite positive, not negative, especially if you use marcel based regressions to reduce the volatility of player performance metrics (like for example Scheifele's p/60 will likely fall next season).


Back to the topic at hand:
Scheifele may not have normal trajectory
Barkov may not have normal trajectory
Normal trajectory is still the highest probable outcome, although it's not the only outcome.
Those who are certain of any outcome are kidding themselves.
 
Last edited:

Ad

Upcoming events

Ad

Ad