Adjusting for size

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
I’m working on something for a stats class where I’m trying to adjust production for size. Already have it adjusted for NHLe.

I’m using the 2000 NHL central scouting lists, as it’s the 1st one that I could find that looks like it uses their draft year Heights/weights.

Median height is 185.42cm, weight 85.73kg
Average height is 184.3cm, weight 85.65kg
Mode height is 182.88cm, weight 83.91kg
Mode Standard deviation is 11.46cm, 7.68 kg

Should I just be adjusting for % of medium/avg/mode of NHL H/W? Should it be as a H/W ratio? Does standard deviations confuse things?(it is to me. Under/over doesn’t matter, just the delta

this is as much a thing of me talking through the process as anything.

Added: thinking that delta of weight should be as a % of the standard deviation.
 
Last edited:

Doctor No

Registered User
Oct 26, 2005
9,250
3,971
hockeygoalies.org
I think the answer to your question depends on what you're attempting to do with your results. What is your overall thesis that you're trying to test?

If it's related to a general "hockey players are getting larger and therefore a fixed height/weight becomes relatively smaller as time progresses", then I'd probably try to fit a percentile distribution across each era (so if six-foot-zero is 85th percentile in 1960, and six-foot-three is 85th percentile in 2000, you'd compare those directly).

Of course, you might consider doing this exercise on the general population (and not on the hockey player population) since that would avoid the biases that various general managers and coaches have had over the years (where they prefer size some years, and quick some years, et cetera).
 

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
I think the answer to your question depends on what you're attempting to do with your results. What is your overall thesis that you're trying to test?

If it's related to a general "hockey players are getting larger and therefore a fixed height/weight becomes relatively smaller as time progresses", then I'd probably try to fit a percentile distribution across each era (so if six-foot-zero is 85th percentile in 1960, and six-foot-three is 85th percentile in 2000, you'd compare those directly).

Of course, you might consider doing this exercise on the general population (and not on the hockey player population) since that would avoid the biases that various general managers and coaches have had over the years (where they prefer size some years, and quick some years, et cetera).

Thesis is that in order to play hockey at higher levels successfully, the farther away from average size, the higher the scoring rates needed.

Smaller the height, the more the other skills needed. The greater the height, the higher scoring rate to demonstrate skill exceeding size. For those outside a standard deviation for weight, it indicates a body type that makes it tough to perform. Too much and the player isn’t skating with a good power weight ratio, too skinny means they get pushed around.

I’m trying to normalize point production, accounting for size and league. The latter is largely done, the former is largely unexplored because historical draft year physical data is rather patchy.
 

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
Formula in excel is:
(P*e)*1(((H-h)/stddevh)-(W-w)/std devw)))
P=ppg
e=NHLe
H=height
h= mode pop height
W=weight
w=mode pop weight

added: at some point I’ll have to add an =ABS to the right side of the equation to adjust properly.

Added: the minus on the H/W side will be changed to a +, with ABS nested inside both Height and Weight equations. H-W makes for a good rule of thumb ratio, but makes for too narrow a range. The biggest problem is data entry, excel keeps on crashing every couple of minutes. Only 210 more entries to go!!
 
Last edited:

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
So the formula works, I just can’t get a column of the end product to auto fill. Cells that should stay one value, change.

Apparently I’m using the offset function improperly, but I’m too knackered to try and fix it right now.

Data entry sucks when excel crashes every other time you try and paste numbers.

Also sucks trying to find certain European players and determining which league data you should use, then adjusting the NHLe manually.

It would help if I was an excel ninja, but I pretty much stopped at Lotus 1-2-3.
 

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
So I got everything sorted using a logarithm, adjusting for size is somewhat problematic with taller players, but otherwise the numbers track rather well when separating forwards and D. The only true outliers are guys whose weight is well above what it should be for their height. Was using CJ Turtoro’s NNHLe and some of the values are really off(was using 2000 draft, Czech, Russian & QMJHL leagues threw off some crazy projections that didn’t match outcomes.)

I took the range of heights, divided by 2 to establish the base(which actually matched up close to the NHL average), then took the height minus the minimum height.

About to use the 2020 draft, boy, there are some tiny guys ranked.

Added: wrote down the formulas, scraped CSB for this year’s prospects, got the offset function working(turns out it was the sheet that was broken, not the table) and it looks pretty good.
D always rated lower, but real wide variance.
European players still somewhat problematic due to potential flaws in NNHLe(player transfers between leagues in season rare except between juniors & men’s leagues.)

Whenever I wake up I’m gonna work up a separate stats line for players who played in multiple leagues, then weigh the production in a men’s league.
 
Last edited:

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
Logarithm?!?
Indeed

=LOG(height - minimum height of range, base=average height-minimum height)

So average height=1

Added:
The difference in modification between the shortest player(5’5”, .079) and the tallest(6’9”) is 2.05
 
Last edited:

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
I for one have often wondered how Crosby's career would have gone had he been 6'2 instead of 5'10...
Easy to joke, but with power laws trying to discern who’s better than whom when you go out the curve is hard, I’m trying to figure out if I can build a better mousetrap and work on my spreadsheet & database skills.
As to your question: Crosby would score a 33.32. In comparison, Lafraniere is a 26, Byfield a 20.

Power laws.
 

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
Here's a Hot 100. Defenders are ranked way lower than they would be otherwise, but the types of forwards who slide are surprising, so I'm going to have to weigh Swedish players junior production for playing in a men's league. Links go o NHL.com

LAFRENIERE, ALEXIS26.83899699
BYFIELD, QUINTON20.66156068
STUETZLE, TIM19.77761125
JARVIS, SETH18.06829564
LUNDELL, ANTON18.028859
FOERSTER, TYSON17.76187242
ZARY, CONNOR17.50364398
PERFETTI, COLE17.41761194
MERCER, DAWSON17.29638173
HOLTZ, ALEXANDER16.1797642
AMIROV, RODION14.62335744
QUINN, JACK14.39408529
FINLEY, JACK13.39859623
BOURQUE, MAVRIK13.01662768
CHROMIAK, MARTIN12.72762177
ROBINS, TRISTEN11.32436441
BRISSON, BRENDAN11.15811978
MYSAK, JAN11.00997739
KHUSNUTDINOV, MARAT10.9519089
DRYSDALE, JAMIE10.56295669
ROSSI, MARCO10.27117966
FRANCIS, RYAN10.20283912
SUNI, OLIVER9.786571697
SOURDIF, JUSTIN9.771206975
LAPIERRE, HENDRIX9.76561195
DUFOUR, WILLIAM9.628943498
RAYMOND, LUCAS9.621289122
NOVAK, PAVEL9.090791956
PYTLIK, JAROMIR8.924688087
TORGERSSON, DANIEL8.757174606
KERINS, RORY8.544313563
TULLIO, TYLER7.945170188
KNAK, SIMON7.934384707
CARDWELL, ETHAN7.550763984
BARRON, JUSTIN7.474078587
CUYLLE, WILLIAM7.304828899
COLANGELO, SAM6.886310384
SEBRANGO, DONOVAN6.882683148
ANSONS, RAIVIS6.653807373
NEIGHBOURS, JAKE6.642073383
HIRVONEN, RONI6.518997252
BORDELEAU, THOMAS6.355902307
JARVENTIE, ROBY6.23068405
O'ROURKE, RYAN6.196050451
WIESBLATT, OZZY5.572166719
STRANGES, ANTONIO5.534837393
SCHNEIDER, BRADEN5.328658108
GUSHCHIN, DANIL5.188937619
NIEDERBACH, THEODOR5.064210642
PUUTIO, KASPER4.867825182
PERREAULT, JACOB4.85643613
PONOMAREV, VASILIY4.849978604
EVANGELISTA, LUKE4.842143323
BIONDI, BLAKE4.616208984
CORMIER, LUKAS4.373098993
GUHLE, KAIDEN4.363091237
KUZNETSOV, YAN4.270229925
GREIG, RIDLY4.160506471
REICHEL, LUKAS4.01189747
BERARD, BRETT3.970765717
MCCLENNON, CONNOR3.861102953
THOMPSON, JACK3.800012518
UENS, ZACHARY3.798614823
SLAGGERT, LANDON3.773948472
LAFERRIERE, ALEX3.703902129
SEDOFF, CHRISTOFFER3.610651375
FOUDY, JEAN-LUC3.429365206
SEELEY, RONAN2.586491463
VILLENEUVE, WILLIAM2.406443275
SANDERSON, JAKE2.379998684
TUCH, LUKE2.262640463
COE, BRANDON2.227290081
POIRIER, JEREMIE2.14490459
ROCHETTE, THEO1.808372997
GONCALVES, GAGE1.396765166
MILLER, MITCHELL1.364916101
GUNLER, NOEL0.918355673
VIERLING, EVAN0.713837018
YODER, CHASE0.67276185
AMBROSIO, COLBY0.52210218
FABER, BROCK0.378866742
EDWARDS, ETHAN0.211604946
COTTON, ALEX-0.214598548
REID, LUKE-0.516540856
COSTANTINI, MATTEO-0.631105249
FOWLER, HAYDEN-1.023782121
SMILANIC, TY-1.083147148
GRANS, HELGE-1.11785251
BENNING, MICHAEL-1.242429403
KAISER, WYATT-1.392188715
KLEVEN, TYLER-1.434529452
HANAS, CROSS-1.438152624
KUNZ, JACKSON-1.67552867
SOKOLOV, EGOR-1.848947317
JEFFERIES, ALEX-1.8925893
PETERSON, DYLAN-2.039514565
FARRELL, SEAN-2.528749861
HUNT, DAEMON-3.10886953
RAFKIN, RUBEN-3.201396648
POWELL, EAMON-3.506909577
RATZLAFF, JAKE-3.981074104
CALISTI, ROBERT-4.6234642
NICKL, THIMO-4.986657276
KOSIOR, LANDON-5.000246204
HOLLOWAY, DYLAN-5.159344146
DURAN, RILEY-5.306970865
WALLINDER, WILLIAM-5.415285268
PETERKA, JOHN-JASON-5.630851509
JURMO, JONI-7.170976838
NIEMELA, TOPI-7.801141238
TRUSCOTT, JACOB-9.059748022
VIRO, EEMIL-9.737607553
WISDOM, ZAYDE-9.792507452
SAVOIE, CARTER-10.67067888
MOORE, IAN-10.762166
MUKHAMADULLIN, SHAKIR-12.22166696
ANDRAE, EMIL-12.66253317
SHLAINE, ARTEM-14.47705935
DICKINSON, TANNER-17.08643283
SCHINGOETHE, WYATT-19.27571426
[TBODY] [/TBODY]
 
Last edited:

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
What do these numbers intend to represent, and why are they so precise?
What the numbers mean is in the formulas above(I think I need to edit the H/W ratio to include power formula)

((scoring rate*NNHLe)*82 games)*(adjusted height-adjusted body mass)

It’s early days of actually working things out as I’ve written above. I need to probably need to do an If/then statement to account for taller players somewhat, need to weight players scoring better to account for NNHLe issues in men’s leagues(probably modify using aging curves that also makes the NDTP a bit of a black box) but now I need to start migrating over to working on my database skills.
 

ForsbergForever

Registered User
May 19, 2004
3,325
2,046
Easy to joke, but with power laws trying to discern who’s better than whom when you go out the curve is hard, I’m trying to figure out if I can build a better mousetrap and work on my spreadsheet & database skills.
As to your question: Crosby would score a 33.32. In comparison, Lafraniere is a 26, Byfield a 20.

Power laws.

I'd like to understand this but am too busy to devote much brain power to it, so in layman's terms can you explain what it means that Crosby is a 33 and Lafreniere is a 26?
 
  • Like
Reactions: Doctor No

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
To be honest, I’m still working that out; I would need to back test more (and have historical NNHLe numbers) to get a better understanding.

The general idea is that there’s a range of scores and certain types of players will fall into ranges consistently. Replacement level players will have mildly negative numbers.
Gretzky is a 31.11 Lemieux a model busting 53.59
Hopefully this picture posts:
29767a96-ed72-40bc-9faf-071c2748e96b-jpeg.346962
 

Attachments

  • 29767A96-ED72-40BC-9FAF-071C2748E96B.jpeg
    29767A96-ED72-40BC-9FAF-071C2748E96B.jpeg
    58.9 KB · Views: 38
Last edited:

ForsbergForever

Registered User
May 19, 2004
3,325
2,046
But what do the numbers indicate about that player's height and weight, I feel like this thread is basically a "who's on first?" routine. Do the numbers mean that a certain player's production is more a product of their size rather than talent or vice versa? Like Eric Lindros and Mario Lemieux are both 6'4 and 220+ lbs, whereas Theo Fleury is 5'6 but scored a ton regardless. Some players rely on being big to push their way to the net whereas other like Fleury and Johnny Gaudreau for a more recent example can use being small as an advantage to weave through bigger guys and use speed and agility to their advantage. Gretzky was 6'0 but barely 170 lbs. So does he get a low score or a higher score relative to the mean because his size clearly had nothing to do with his production?
 
  • Like
Reactions: Doctor No

TheWhiskeyThief

Registered User
Dec 24, 2017
1,625
496
But what do the numbers indicate about that player's height and weight, I feel like this thread is basically a "who's on first?" routine. Do the numbers mean that a certain player's production is more a product of their size rather than talent or vice versa? Like Eric Lindros and Mario Lemieux are both 6'4 and 220+ lbs, whereas Theo Fleury is 5'6 but scored a ton regardless. Some players rely on being big to push their way to the net whereas other like Fleury and Johnny Gaudreau for a more recent example can use being small as an advantage to weave through bigger guys and use speed and agility to their advantage. Gretzky was 6'0 but barely 170 lbs. So does he get a low score or a higher score relative to the mean because his size clearly had nothing to do with his production?

The thesis is explained above in the thread.
 

Ad

Upcoming events

Ad

Ad