Jets At The World Hockey Championship (Part II)

Whileee · May 25, 2016

garret9 said:
What? No one is doing what you are saying they are doing though. I am not suggesting what you think I'm suggesting.

This isn't something knowledge in advanced statistical analysis is needed.

Some players will peak prior to the average peak. Some will peak after. Most will peak closer than further.... sorta like a skewed normal curve.

No one knows for certain where either of these players will peak. No one can assume where either of these players peak. One can point out where it is historically most likely.

This is less a stats thing than a logic thing.

What we know:
Barkov's results at the same age has been better for every age we can compare.
Scheifele's results heavily accelerated this last season, which could be growth, luck/variance, or combination thereof.
Most players peak around 24-26, but not all do.

That's it.

It's not an "advanced" statistical principle, but there are statistical principles involved, because a fitted curve is a statistical representation of a set of data.

The main problem with the logic is the assumption that individuals all belong to the same sample set, and individual variations simply represent random fluctuations within that distribution.

Let me illustrate. Suppose you have two large bags, each with 1000 marbles that are black or white. In the first bag, 70% of the marbles are black. In the second bag, 30% of the marbles are black. If I take a large number of random samples (say 100 marbles per sample) from each of the bags, on average I would end up with 70 black marbles (out of 100) in samples from the first bag and 30 black marbles in the samples from the second bag. The sample mean from the first bag would be 70/100 (70%), and the sample mean from the second bag would be 30/100 (30%).

Now, if I mixed all of the marbles from the two bags together (i.e. 2000 marbles), and took a bunch of samples of 100 I would end up with a mean of 50 black marbles (i.e. 50%).

So, if I just focus on the combined sample and didn't know the proportions in the original sample, I could make the simplifying assumption that both bags started with 50% black marbles, though we know that neither bag had that proportion of black marbles.

How does this relate to hockey players and this topic? Well, if you assume that all players follow the same age / development trajectory, then you could suggest that the mean age for peak performance should be same for everyone, apart from some random statistical fluctuations. If, however, different groups of players follow different age / development trajectories, then lumping them together to predict when they would peak would ignore the different base trajectories. This is not simply a statistical nicety, it's also a logical concept. What statistical analysts should do is assemble more variables and outcomes to determine whether there are key factors that are associated with age / development trajectories, and age at peak performance. This could then be applied to develop a more robust model to predict when groups of players are likely to peak.

Analysts do this all the time to develop models such as the PCS, etc. Obviously, using a single variable (e.g. height) will result in a model that is not very reliable. The more influential variables that are included in the model, the more robust the model.

nobody important · May 25, 2016

Whileee said:
So, if I just focus on the combined sample and didn't know the proportions in the original sample, I could make the simplifying assumption that both bags started with 50% black marbles, though we know that neither bag had that proportion of black marbles.

See, now I would have assumed one bag was all white marbles and one bag was all black marbles.

I hope this isn't one of those sneaky little tests that determined I was a racist. Or a serial killer.

Anyway, back to your intelligent discussion.

truck · May 25, 2016

Whileee said:
Perhaps if I posed a couple of questions, it would help. What proportion of players that have Scheifele's history and trajectory end up following the average curve? What proportion of players with Barkov's history and trajectory end up following the average curve? Your statement above suggests that you think that it is over 50%. Is there data to support that?

If we combined Scheifele's trajectory and Barkov's trajectory into an "average" curve, should we expect each of them to follow that same "average" curve?

How do you determine trajectory - especially if there is no ser curve as you suggest?

garret9 · May 25, 2016

Whileee said:
Well, if you assume that all players follow the same age / development trajectory,

I'm stoping you there, because we are not making that assumption.
This assumption of yours is one no one is making but what you are arguing against.

Not everyone peaks at the same point.
There are some later. There are some earlier.
There are some who fall quickly.
There are some that rise quickly.
There are some that plateau for longer.

Yes. I am well aware of these factors.

The average is only the average of a larger distribution of many people who vary from the norm. But, it's also a very normal like distribution, where the closer you go to that average peak, the more often it happens.

HOWEVER, it is more likely than not that the players do peak at or near that peak than further from the peak.

Scheifele may peak far later than the avg point; Scheifele may have peaked this year; Scheifele may just fit right in at the norm
Barkov may peak far later than the avg point; Barkov may have peaked this year; Barkov may just fit right in at the norm

Any of the 9 possible sets of combinations are possible. Not all are most probable.

I'm not assuming that the 3rd choice in each is what will happen, although I will acknowledge it is the most likely to happen but probability isn't destiny. Others though were making assumptions.

truck · May 25, 2016

Whileee said:
It's not an "advanced" statistical principle, but there are statistical principles involved, because a fitted curve is a statistical representation of a set of data.

The main problem with the logic is the assumption that individuals all belong to the same sample set, and individual variations simply represent random fluctuations within that distribution.

Let me illustrate. Suppose you have two large bags, each with 1000 marbles that are black or white. In the first bag, 70% of the marbles are black. In the second bag, 30% of the marbles are black. If I take a large number of random samples (say 100 marbles per sample) from each of the bags, on average I would end up with 70 black marbles (out of 100) in samples from the first bag and 30 black marbles in the samples from the second bag. The sample mean from the first bag would be 70/100 (70%), and the sample mean from the second bag would be 30/100 (30%).

Now, if I mixed all of the marbles from the two bags together (i.e. 2000 marbles), and took a bunch of samples of 100 I would end up with a mean of 50 black marbles (i.e. 50%).

So, if I just focus on the combined sample and didn't know the proportions in the original sample, I could make the simplifying assumption that both bags started with 50% black marbles, though we know that neither bag had that proportion of black marbles.

How does this relate to hockey players and this topic? Well, if you assume that all players follow the same age / development trajectory, then you could suggest that the mean age for peak performance should be same for everyone, apart from some random statistical fluctuations. If, however, different groups of players follow different age / development trajectories, then lumping them together to predict when they would peak would ignore the different base trajectories. This is not simply a statistical nicety, it's also a logical concept. What statistical analysts should do is assemble more variables and outcomes to determine whether there are key factors that are associated with age / development trajectories, and age at peak performance. This could then be applied to develop a more robust model to predict when groups of players are likely to peak.

Analysts do this all the time to develop models such as the PCS, etc. Obviously, using a single variable (e.g. height) will result in a model that is not very reliable. The more influential variables that are included in the model, the more robust the model.

To take this back to your example....

Two bag of marbles.
You know the mean is 50.
You don't know the splits for either bag.

More logical to assume 50? Or should one pretend they know the splits of the individual bags and make assumptions accordingly?

heilongjetsfan · May 25, 2016

truck said:
To take this back to your example....

Two bag of marbles.
You know the mean is 50.
You don't know the splits for either bag.

More logical to assume 50? Or should one pretend they know the splits of the individual bags and make assumptions accordingly?

Sounds like an opportunity for the eye test to shine!

truck · May 25, 2016

heilongjetsfan said:
Sounds like an opportunity for the eye test to shine!

All eyes all the time!

Whileee · May 26, 2016

garret9 said:
I'm stoping you there, because we are not making that assumption.
This assumption of yours is one no one is making but what you are arguing against.

Not everyone peaks at the same point.
There are some later. There are some earlier.
There are some who fall quickly.
There are some that rise quickly.
There are some that plateau for longer.

Yes. I am well aware of these factors.

The average is only the average of a larger distribution of many people who vary from the norm. But, it's also a very normal like distribution, where the closer you go to that average peak, the more often it happens.

HOWEVER, it is more likely than not that the players do peak at or near that peak than further from the peak.

Scheifele may peak far later than the avg point; Scheifele may have peaked this year; Scheifele may just fit right in at the norm
Barkov may peak far later than the avg point; Barkov may have peaked this year; Barkov may just fit right in at the norm

Any of the 9 possible sets of combinations are possible. Not all are most probable.

I'm not assuming that the 3rd choice in each is what will happen, although I will acknowledge it is the most likely to happen but probability isn't destiny. Others though were making assumptions.

Just get and use more data. That's the basic message. I'm not a fan of overly simple analyses, especially when they are over-interpreted. Not saying you're doing that, but I'm seeing it more and more.

garret9 · May 26, 2016

Whileee said:
Just get and use more data. That's the basic message. I'm not a fan of overly simple analyses, especially when they are over-interpreted. Not saying you're doing that, but I'm seeing it more and more.

I'm using more data than the ones who were making assumptions, though, which was my whole point.

Whileee · May 26, 2016

truck said:
To take this back to your example....

Two bag of marbles.
You know the mean is 50.
You don't know the splits for either bag.

More logical to assume 50? Or should one pretend they know the splits of the individual bags and make assumptions accordingly?

I wouldn't pretend to know with confidence if I knew that proportions differ between bags.

Are we really saying that the overall average trajectory is a reliable way to predict the trajectory of individual players? This is going to make "analytics" a piece of cake.

garret9 · May 26, 2016

Whileee said:
I wouldn't pretend to know with confidence if I knew that proportions differ between bags.

Are we really saying that the overall average trajectory is a reliable way to predict the trajectory of individual players? This is going to make "analytics" a piece of cake.

It's still not quite comparable situation.

I tried to keep thinking of examples that would work better with the marble example but it's difficult.

Point is that peak age has a distribution that fits fairly tightly around 24-26, and most have a slow decline afterwards, although there are some quick fallers after.
There are those outside of those, and I'd wager good money a huge portion of the early fallers is due to illegal substance practices and injuries.
There are real people though that peak outside of that 24-26 range.
Without any real evidence, some are making an assumption that these two players are likely existing in the tails of the distribution.
It's possible, but we're completely fine in pointing out that it's more likely not, that there exists a large distribution of possibilities, and where the highest proportion of the distribution rests.

Obviously the practice and usage of analytics does not end there. No one who is doing research for a team or agency would stop at that point. But, for a discussion on an online hockey discussion board it is fine to point out how the probabilities lie with the information we have. This is not the same as saying the players are destined to be peaking at point X and it is disingenuous to say the others are.

That's all.

Whileee · May 26, 2016

garret9 said:
I'm stoping you there, because we are not making that assumption.
This assumption of yours is one no one is making but what you are arguing against.

Not everyone peaks at the same point.
There are some later. There are some earlier.
There are some who fall quickly.
There are some that rise quickly.
There are some that plateau for longer.

Yes. I am well aware of these factors.

The average is only the average of a larger distribution of many people who vary from the norm. But, it's also a very normal like distribution, where the closer you go to that average peak, the more often it happens.

HOWEVER, it is more likely than not that the players do peak at or near that peak than further from the peak.

Scheifele may peak far later than the avg point; Scheifele may have peaked this year; Scheifele may just fit right in at the norm
Barkov may peak far later than the avg point; Barkov may have peaked this year; Barkov may just fit right in at the norm

Any of the 9 possible sets of combinations are possible. Not all are most probable.

I'm not assuming that the 3rd choice in each is what will happen, although I will acknowledge it is the most likely to happen but probability isn't destiny. Others though were making assumptions.

What I have said repeatedly is that better analysis using more variables is required to compare players at different stages of their development. Agree or disagree?

Whileee · May 26, 2016

garret9 said:
It's still not quite comparable situation.

I tried to keep thinking of examples that would work better with the marble example but it's difficult.

Point is that peak age has a distribution that fits fairly tightly around 24-26, and most have a slow decline afterwards, although there are some quick fallers after.
There are those outside of those, and I'd wager good money a huge portion of the early fallers is due to illegal substance practices and injuries.
There are real people though that peak outside of that 24-26 range.
Without any real evidence, some are making an assumption that these two players are likely existing in the tails of the distribution.
It's possible, but we're completely fine in pointing out that it's more likely not, that there exists a large distribution of possibilities, and where the highest proportion of the distribution rests.

Obviously the practice and usage of analytics does not end there. No one who is doing research for a team or agency would stop at that point. But, for a discussion on an online hockey discussion board it is fine to point out how the probabilities lie with the information we have. This is not the same as saying the players are destined to be peaking at point X and it is disingenuous to say the others are.

That's all.

Making simplifying assumptions and analyses for online discussions is okay, I suppose. Not sure why it's a problem when it's pointed out as a simplistic analysis, though.

Howard Chuck · May 26, 2016

I saw more posts in this thread and thought there would be some further discussion of the Jets at the WHC......

garret9 · May 26, 2016

Whileee said:
What I have said repeatedly is that better analysis using more variables is required to compare players at different stages of their development. Agree or disagree?

Whileee said:
Making simplifying assumptions and analyses for online discussions is okay, I suppose. Not sure why it's a problem when it's pointed out as a simplistic analysis, though.

Corsi is simple and will eventually be improved upon. I'm still going to use it as it is useful and I still feel confident using it against those who are not. When the other individual is using plus/minus or just their eyes, and my opinion plus the simplistic but useful model counters, I will still use it. I will still strive towards better forms of analysis, but use the best available ones until superior alternatives are reached.

Corsi is more likely to be right than plus/minus.
The two young centres are more likely to peak at the average point than not.

Simplistic or not, those two statements are still true.

garret9 · May 26, 2016

Howard Chuck said:
I saw more posts in this thread and thought there would be some further discussion of the Jets at the WHC......

Sorry haha

Whileee · May 26, 2016

garret9 said:
Corsi is simple and will eventually be improved upon. I'm still going to use it as it is useful and I still feel confident using it against those who are not. When the other individual is using plus/minus or just their eyes, and my opinion plus the simplistic but useful model counters, I will still use it. I will still strive towards better forms of analysis, but use the best available ones until superior alternatives are reached.

Corsi is more likely to be right than plus/minus.
The two young centres are more likely to peak at the average point than not.

Simplistic or not, those two statements are still true.

Corsi stats and an age / production curve are not the same conceptually, and are not based on the same empirical underpinnings.

Your statement that "the two young centres are more likely to peak at the average point than not" is likely untrue, as it doesn't hold with most frequency distributions, especially when they don't have a very pronounced central tendency.

Consider the following frequency distribution... less than 1/3 of the individuals would "peak" in one of the two central age groupings (mean, median or mode). More than 2/3 would peak older or younger than either of the two central groups. More analysis would probably help to determine which would peak younger, which around the average, and which older than the average.

View attachment 90161

But we should probably stop derailing this thread...

garret9 · May 26, 2016

Whileee said:
Corsi stats and an age / production curve are not the same conceptually, and are not based on the same empirical underpinnings.

Your statement that "the two young centres are more likely to peak at the average point than not" is likely untrue, as it doesn't hold with most frequency distributions, especially when they don't have a very pronounced central tendency.

Consider the following frequency distribution... less than 1/3 of the individuals would "peak" in one of the two central age groupings (mean, median or mode). More than 2/3 would peak older or younger than either of the two central groups. More analysis would probably help to determine which would peak younger, which around the average, and which older than the average.

View attachment 90161

But we should probably stop derailing this thread...

Conceptually they are, but the point still holds out the same that simple does not mean bad, and it still lets me use the concept that the simple model can still be better than no model (like the assumptions that were made that started this conversation) or a poor model (like plus/minus).

Now you are falling to semantics to try and defend your debate.

1) The alternative argument being made by others in your example still have lower probability of being right than the ones being made by Truck, myself, et al.

2) Last time I checked, the kurtosis is actually quite positive, not negative, especially if you use marcel based regressions to reduce the volatility of player performance metrics (like for example Scheifele's p/60 will likely fall next season).

Back to the topic at hand:
Scheifele may not have normal trajectory
Barkov may not have normal trajectory
Normal trajectory is still the highest probable outcome, although it's not the only outcome.
Those who are certain of any outcome are kidding themselves.

Romang67 · May 26, 2016

I'm getting increasingly excited about that Statistics class I'm taking this summer.

buggs · May 26, 2016

Romang67 said:
I'm getting increasingly excited about that Statistics class I'm taking this summer.

That'll pass quickly.

GJF · May 26, 2016

How is your level of excitement growing? Linear or exponentially?

Howard Chuck · May 26, 2016

garret9 said:
Sorry haha

Just kidding. I always get something out of these discussions.

garret9 · May 26, 2016

GJF said:
How is your level of excitement growing? Linear or exponentially?

Logarithmically

pucka lucka · May 26, 2016

GJF said:
How is your level of excitement growing? Linear or exponentially?

logarithmically

Howard Chuck · May 26, 2016

buggs said:
That'll pass quickly.

Like gas.....

Jets At The World Hockey Championship (Part II)

Registered User

the pessimist returns

Registered User

AKA#VitoCorrelationi

Registered User

Registered User

Registered User

Registered User

AKA#VitoCorrelationi

Registered User

AKA#VitoCorrelationi

Registered User

Registered User

HFBoards Sponsor

AKA#VitoCorrelationi

AKA#VitoCorrelationi

Registered User

AKA#VitoCorrelationi

BitterSwede

screenshot

Beaver Jedi

HFBoards Sponsor

AKA#VitoCorrelationi

Registered User

HFBoards Sponsor

Ad

Ad

Ad