Forecasting players' career stats

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
PLAYER FORECASTS

In this series of posts, I present a system that can be used to project how many goals, assists and points a player may score in the future.

I want to emphasize that this system is just for fun and shouldn't be taken too seriously.

The general concept is borrowed from Alan Ryder (see his 2007 article - How Good is Sidney Crosby?). He hasn't posted very much lately but his research has been invaluable. He has a deep understanding of hockey and an equally strong knowledge of math.

Essentially, this system looks at a group of young hockey prodigies (which I'll discuss in more detail in a following post). The player's actual performance, career to date, is compared to the peer group's performance at those same ages. We then calculate how the player compares to the peer group in terms of health (number of games played), goals per game, and assists per game. The model then projects the rest of the player's career, assuming that he'll maintain the same level of out-performance over the rest of his career. This is a big assumption, of course, but the out-performance ratios are more stable than one might think.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
A DETAILED EXAMPLE

Let's take Connor McDavid as an example. We'll take his actual results from ages 19 to 22 (the 2016 through 2019 seasons). We'll also extrapolate how he's done this year (age 23 - 2020 season). From ages 19 to 23, he played in 369 games out of a possible 410 (90.0%). This put him about 11% ahead of the peer group, which averaged 333 games (normalized to an 82 game schedule). Thus, the rest of his career, we'll assume that he'll continue to play 11% more games each season than his peer group. This, of course, is a significant assumption. A single bad play can end his career. It's also possible that the games he missed in 2016 were a fluke, and he achieves a Howe-like level of health the rest of the way. We simply have no way of knowing, so we'll proceed on the basis that past data is the best foundation for forecasting the future.

We do the same thing with goals. McDavid has averaged about 0.49 adjusted goals per game from ages 19 to 23. (I take the "adjusted" goals from Hockey-Reference.com; more on that later). His peer group averaged about 0.46 adjusted goals per game during that same age range. So, McDavid has scored about 7% more goals per game, compared to the per group (his year-by-year results fall within a pretty narrow range - from 100% to 115% of the group average).

Let's look ahead to 2021 (age 24). The average 24-year-old in the peer group played in about 72 games. As discussed above, we're assuming McDavid will be about 11% healthier than his peer group. So we assume he'll play in about 79 games. The average 24-year-old in this group scored 0.49 goals per game. We're assuming McDavid will score about 7% more goals per game than the peer group, so that works out to 42 goals next year (71.6 games * 1.11 * 0.49 * 1.07).

We continue to do this calculation, each year into the future, for both goals and assists. (We get total points, of course, by adding the two together). I'll talk about how I deal with the question of when a player retires below.

This sounds like a lot of work but, now that the spreadsheet is up and running, it takes me around 10 minutes to do a complete career forecast for one player.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
DEFINING THE PEER GROUP

Who is this "peer group" that I keep referring to? I've looked at players who were young offensive talents (defined below), from a certain range of years (also defined below), with a few subjective adjustments (and I'm upfront about that, so don't accuse me of tampering with the data).

In terms of the years - I looked at seasons between 1980 and 1999. Why these years? Player longevity has increased over time, so I didn't want to go too far back. Otherwise, we'd get players from previous generations - like Guy Lafleur, Bryan Trottier, Mike Bossy - who tended to have much shorter careers (at least as top offensive talents) than modern players. That didn't feel like an appropriate peer group. I also excluded seasons after 1999, because I wanted to include players who had completed their careers.

In terms of young offensive talents - I've defined this as players who had at least one season with 85 adjusted points (per hockey-reference.com) at age 22 or younger. I'm not really interested in forecasting the career totals of solid third-liners; I want to forecast the league's best young stars. Of course there are players like Martin St. Louis who don't become stars until later in their careers, but the reality is most Hall of Fame forwards demonstrate their abilities early on. Players like St. Louis are, by definition, exceptions, and you really can't forecast exceptions.

The original peer group consists of the following 28 players - Jason Allison, Rob Brown, Pavel Bure, Jimmy Carson, Paul Coffey, Theo Fleury, Peter Forsberg, Michel Goulet, Wayne Gretzky, Dale Hawerchuk, Jaromir Jagr, Paul Kariya, Mario Lemieux, Eric Lindros, Mark Messier, Owen Nolan, Barry Pederson, Mark Recchi, Mikael Renberg, Luc Robitialle, Jeremy Roenick, Joe Sakic, Denis Savard, Teemu Selanne, Mats Sundin, Keith Tkachuk, Pierre Turgeon and Steve Yzerman.

I've subjectively removed four players from the group. Paul Coffey is out - he's the only defenseman, and I'm only interested in forecasting forwards. Rob Brown is gone - he qualifies because of his 1989 season, where he very clearly benefited from playing on Lemieux's line (and this was the only time in his career he even approached 85 adjusted points). I also removed Mikael Renberg and Owen Nolan (who both qualified because of the shortened 1995 season - flukes are more likely to take place in a shorter season - and neither ever really repeated the offensive performance demonstrated that year).

Of the 24 players left in the sample, 17 are in the Hall of Fame (including Jagr, who will certainly be inducted). Four more are borderline (Fleury, Roenick, Turgeon and, maybe generously, Tkachuk). So the peer group inherently assumes the possibility that a young star fades away early - namely Allison, Carson, and Pederson). These are the players the individuals in the study are compared to.

(In case someone asks why I'm using hockey-reference.com's adjusted stats as the benchmark when I've been critical of their system - their system clearly fails in comparing players prior to expansion. That being said - if we're only looking at the eighties onwards, it's not perfect, but it's a reasonable starting point).
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
RETIREMENTS

An interesting question is - how do we deal with player retirements? Some players in the study had very long careers - Jagr played until he was 45, Messier and Selanne until 43, and Recchi until 42. It doesn't make sense to forecast that 23-year-old Connor McDavid will score 5 goals and 10 assists at age 42, reflecting the small probability that he'll play that long. It makes more sense to have a specific cut-off point, after which we assume a player retires.

Looking back on the data (and, really, just using common sense), players retire when they get old, and their offensive production drops. Through trial and error, I found that a reasonable predictor is a player will retire in any season when 1) they're at least 30 years old and 2) their production over the past two seasons is 100 (adjusted) points or fewer.

The "over 30" criteria prevents us from falsely including players who have a couple of down years in a row, either due to injury (such as Pavel Bure after 1996) or young players who haven't broken out yet (such as Keith Tkachuk after 1993).

The "under 100 points" criteria is intended to reflect a deterioration in play. Some players hang on longer because they became good defensive forwards (such as Steve Yzerman), but that's an exception rather than the rule.

This criteria successfully identifies the final season for 19 of the 24 players in the study. (One of the exceptions is Jimmy Carson - an outlier, as he's the only player in this group to retire in his twenties). Not bad for a simple, two-criteria approach.

There are a number of "false positives" though - players who were predicted to retire before they actually did. In many cases I'm not too worried (for example Forsberg's 2008 season is technically a false positive, because he came out of retirement to play two games in 2011). A lot of times a player hangs around one year longer than expected (Hawerchuk, Kariya) or bounce back from a mid-career slump (Selanne, Robitaille). The fact that there are quite a few false positives probably means I'm conservative in making my estimates - that is, I probably assume that players end their careers a bit earlier than they actually will. But it's easy enough to make wild extrapolations, so I don't think it's wrong to have some conservatism baked into the method.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
CASE 1: CONNOR MCDAVID

SeasonAgeGoalsAssistsPoints
201619163248
2017203070100
2018214167108
2019224175116
2020234780127
202124 42 80 122
202225 41 85 125
202326 43 87 130
202427 35 71 106
202528 34 68 102
202629 31 69 100
202730 33 68 101
202831 28 62 90
202932 20 49 69
203033 21 48 69
203134 17 42 59
203235 17 34 51
203336 13 29 42
TOTAL 550 1,116 1,667
[TBODY] [/TBODY]
Let's get to the good stuff. (A reminder to any readers in the future: this study was posted on February 2nd, 2020. For all players, I pro-rated their 2020 results. Their actual results for that season might be different).

McDavid is forecasted to score 550 goals in his career - very impressive, since he's not primarily a goal-scorer. As some of us know, goal-scoring tends to peak relatively early in a player's career. He's forecasted to hover around 40 goals for the next three seasons (ages 24 to 26), before gradually declining.

McDavid is projected to record over 1,100 assists. That would place him seventh all-time, behind only Gretzky, Francis, Messier, Bourque, Jagr, and Coffey. That's extremely impressive, especially since all of those players (except Jagr) spent a significant portion of their careers in the high-scoring eighties.

Ultimately, McDavid is projected to finish his career with 1,667 points, retiring in 2033. At that point, he'd be 9th all-time (behind Lemieux, but ahead of Sakic). Incredibly, McDavid is expected to score at least 100 points for the next seven years. That would give him 11 such seasons in total (assuming he maintains the pace in 2020), all consecutive. That would give him more 100 point seasons than any player in NHL history except Gretzky. If McDavid actually does this, it would be one of the greatest accomplishments in NHL history.

Of course, I wouldn't bet on McDavid scoring 100 points for 11 years in a row. A single injury could ruin his chances. But it's still amazing that is within the realm of possibility.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
CASE 2: AUSTON MATTHEWS

SeasonAgeGoalsAssistsPoints
201719402969
201820342963
201921373673
202022584098
202123 58 44 102
202224 54 43 97
202325 52 45 98
202426 56 47 102
202527 45 38 83
202628 43 37 80
202729 40 37 77
202830 42 37 79
202931 36 33 69
203032 26 26 52
203133 27 26 53
203234 22 23 44
TOTAL 670 570 1,240
[TBODY] [/TBODY]
Let's look at Auston Matthews - one of most polarizing players on HFBoards. He seems to be overrated by Leafs fans, and underrated by everyone else. But, as of today, he's on pace to set career highs in goals and points at age 22, and that helps his projection.

Matthews is forecasted to end his career with 670 goals. That seems like a high number to me (he's less than a quarter of the way there), but the system sees his best days as ahead of him. He's easily on track to reach 50 goals this year. He system projects him to reach fifty the next four years as well, before his goal-scoring ability drops off sharply in his early thirties.

Matthews has been a poor playmaker thus far, but players generally accumulate more assists as they get older. He'll never be mistaken for Adam Oates, but his assist totals should (somewhat) stabilize his loss in goal-scoring ability as he gets older.

Ultimately Matthews is forecasted to reach 670 goals - that would rank him 13th all-time (a bit behind the Messier/Yzerman/Lemieux/Selanne logjam that Ovechkin just leapfrogged). His 1,240 points put him around where Crosby and Ovechkin are today. Who knows how long he'll stay in Toronto - but this would make him easily the franchise leader in goals and points.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
CASE 3: ILYA KOVALCHUK (who didn't leave the NHL)

I was always sure that Kovalchuk would earn a spot in the Hall of Fame. Him leaving for the KHL almost certainly ruined that. Let's explore that might have happened, had he stayed in the NHL after 2013.

SeasonAgeGoalsAssistsPoints
200218292251
200319382967
200420414687
200521000
200622524698
200723423476
200824523587
200925434891
201026414485
201127312960
201228374683
201329112031
201430 41 41 82
201531 35 37 72
201632 25 29 55
201733 26 29 55
201834 21 25 46
201935 21 21 42
TOTAL 587 580 1,168
[TBODY] [/TBODY]
Kovalchuk is primarily a goal-scorer and, as I've discussed, goal-scorers tend to be peak relatively young. By the time he left the NHL, his best days were probably behind him. But staying would have allowed him to build up his career totals to amounts that the Hall couldn't have been able to ignore.

Overall the model projects that Kovalchuk, had he not gone to the KHL, would have scored 587 goals in the NHL (more than Mike Bossy). I think Kovalchuk had the talent (and consistency) to score 600 goals - he very likely would have done that, had he not missed the 2005 season due to the lockout.

His 1,168 points are impressive, but not overwhelming. That puts him on the same level as Brind'Amour, Alfredsson and Hossa - but with much poorer defensive play, and a weaker playoff resume. Still, I think Kovalchuk likely would have eventually earned a spot in the Hall, especially if he beat expectations and reached the arbitrary (but psychologically important) 600 goal barrier.
 
Last edited:

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
CASE 4: NATHAN MACKINNON

SeasonAgeGoalsAssistsPoints
201418243963
201519142438
201620213152
201721163753
201822395897
201923415899
2020245070120
202125 36 63 99
202226 39 64 103
202327 31 53 84
202428 30 51 81
202529 28 51 79
202630 30 51 80
202731 25 46 71
202832 18 36 55
202933 19 35 54
203034 15 31 46
TOTAL 476 798 1,275
[TBODY] [/TBODY]
There was recently a poll asking if MacKinnon had more talent than Joe Sakic. It's a stretch comparing a 24-year-old to one of the top forty players in hockey history - but it's not entirely a laughable comparison. It's doubtful MacKinnon will ever match Sakic's defensive play, or his ability to elevate his game in the playoffs, but strictly in terms of natural athletic talent - it's not a ridiculous question.

One thing that hurts MacKinnon in this analysis is, after winning the Calder (63 points at age 18), he treaded water for the next three seasons. But he finished just shy of 100 points at ages 22 and 23, and looks to shatter that mark at age 23.

Even with that slow start, he's forecasted to finish his career with 1,275 points (including nearly 800 assists). That puts him well behind Sakic, but it should at least get him into the Hall of Fame conversation.
 
Last edited:

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
CASE 5: SIDNEY CROSBY

SeasonAgeGoalsAssistsPoints
2006183963102
2007193684120
200820244872
2009213370103
2010225158109
201123323466
20122482937
201325154156
2014263668104
201527285684
201628364985
201729444589
201830296089
2019313565100
202032153146
202133 22 42 64
202234 17 37 54
202335 18 30 48
202436 13 26 39
TOTAL 531 936 1,467
[TBODY] [/TBODY]
All right, I expect to get flamed after posting this one. The model projects that Crosby will only play four more seasons after 2020, with a huge drop in his scoring.

To clarify - the model doesn't expect Crosby's offense to vanish. Instead, it forecasts that Crosby will have trouble staying healthy. (He's actually forecasted to score more than a point per game every season through age 36, which is truly remarkable). However, he's missed at least a quarter of the season five times in fifteen years so far. The model doesn't like that - especially since players have a harder time staying in the lineup as they get older.

It's also worth noting that Crosby has gradually become a solid two-way forward, which could extend his career if he loses his scoring touch (as it did for Steve Yzerman and Bryan Trottier, among others).

Overall the model projects Crosby to finish his career with 1,467 points (the same as Stan Mikita, who sits 15th all-time). As great as that is, it still feels underwhelming for a player of Crosby's talent. He surely has the talent, and the drive, to reach 1,600 points - it will largely be a matter of him staying healthy.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
CASE 6: ALEXANDER OVECHKIN

SeasonAgeGoalsAssistsPoints
2006205254106
200721464692
2008226547112
2009235654110
2010245059109
201125325385
201226382765
201327322456
201428512879
201529532881
201630502171
201731333669
201832493887
201933513889
202034592786
202135 27 21 48
202236 21 18 38
TOTAL 764 618 1,383
[TBODY] [/TBODY]
All right, this post might get me flamed too. The system predicts that Ovechkin will be out of the NHL within two years.

As I mentioned in another recent post, Ovechkin is in uncharted waters. Nobody aside from Gordie Howe has been this good of a goal-scorer (and this consistent), so late into a career. Any statistical model that relies on historical data (as mine does) will suggest that Oveckhin will stop scoring goals - and quickly too.

One strike against Ovechkin is his assist totals have been relatively poor for a player of his calibre for nearly a decade. It's been nine years since he reached the relatively low total of 40 assists (accomplished by 138 different players through 2019). As players tend to rely more on playmaking as they become older, Ovechkin's point totals are damaged as a result.

Overall, the system forecasts that Ovechkin will end his career with 764 goals (a couple of markers behind Jagr for third place all-time). Barring a sudden, unexpected drop in his abilities, I'm sure that Ovechkin would play at least another couple of years to pass Mr. Hockey's total of 801 goals.
 

morehockeystats

Unusual hockey stats
Dec 13, 2016
617
296
Columbus
morehockeystats.com
Before starting to put these forecasts in, you should've said: I tried this model on (for example)
* Sedins (seed - seasons 2000-2003)
* David Legwand (seed 1999-2003)
* Matt Cullen (seed 1998-2002)
* Zdeno Chara (seed 1997-2001)
and here are the results vs. the actual career
 
  • Like
Reactions: Zuluss

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
CASE 7: EVGENI MALKIN

SeasonAgeGoalsAssistsPoints
200720335285
2008214759106
2009223578113
201023284977
201124152237
2012255059109
20132692433
201427234972
201528284270
201629273158
201730333972
201831425698
201932215172
202033266086
202134 16 31 47
202235 16 25 41
TOTAL 449 727 1,176
[TBODY] [/TBODY]

I'll admit it. I've never been as impressed by Malkin as most people seem to be. Yes, the talent is undeniable. Yes, when he's at his best, he's a game-breaking force every bit as good as Crosby. But I can't overlook the recurring injuries nearly as much as most other people seem to.

The forecast isn't kind to Malkin. It sees him as scoring very well per-game through age 35, but missing a lot of time. After his first three seasons (which were very healthy), Malkin has played in 70 games just twice in 11 years. (He's already missed enough games to miss that threshold in 2020, and wasn't anywhere close to being on pace for that in 2013 either).

Malkin is projected to finish with 449 goals and 1,176 points. Impressive totals, of course, but he had the potential for much more.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
CASE 8: STEVEN STAMKOS

SeasonAgeGoalsAssistsPoints
200918232346
201019514495
201120454691
201221603797
201322292857
201423251540
201524432972
201625362864
20172691120
201827275986
201928455398
202029375390
202130 39 40 79
202231 33 36 69
202332 24 28 53
202433 25 28 53
202534 20 24 44
TOTAL 572 582 1,154
[TBODY] [/TBODY]
Stamkos is the second-best goal-scorer of this generation. His forecast of 572 goals is certainly impressive, and it will be enough to get him into the Hall of Fame, even if he doesn't strengthen an underwhelming playoff resume. But it still feels somewhat underwhelming - he easily had the talent to reach 600. Seasons of 29, 25, and 9 goals scattered throughout his twenties set him back.

Trivia - Stamkos has four 90+ point seasons, without ever reaching 100 points. That ties him for the "record" with Ray Bourque, Brian Propp, and Vincent Damphousse.
 
Last edited:

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
CASE 9: PATRICK KANE

SeasonAgeGoalsAssistsPoints
200819215172
200920254570
201021305888
201122274673
201223234366
201324233255
201425294069
201526273764
2016274660106
201728345589
201829274976
2019304466110
2020314064104
202132 20 38 57
202233 20 37 57
202334 16 32 48
202435 16 26 43
TOTAL 468 779 1,247
[TBODY] [/TBODY]

Kane's progression has been relatively unusual. He was always a very good player, but only had scoring finishes of 5th and 9th through age 26. At age 27, he decisively won the Art Ross (and Hart), added two more top-three finishes, and is currently in 7th this year.

Kane's relatively late peak hurts his career totals according to the model. But one thing that helps is he's never had a really bad season. His career-low is 64 points (excluding 2013, when he was easily on pace to exceed that).
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
CASE 10: DAVID PASTRNAK

SeasonAgeGoalsAssistsPoints
201518101727
201619151126
201720343670
201821354580
201922384381
202023 60 58 118
202124 45 50 95
202225 43 53 96
202326 46 54 101
202427 37 45 82
202528 36 43 79
202629 33 43 76
202730 35 43 78
202831 30 39 69
202932 22 31 52
203033 22 30 52
203134 18 26 44
TOTAL 560 665 1,226
[TBODY] [/TBODY]

After a small blip in 2016, Pastrnak has been progressing steadily each year. As of today, he's third in the NHL in scoring and is leading the race for the Rocket Richard trophy. The system sees him as an elite offensive three for the next three years, before remaining a perennial 30 goal scorer. He's forecasted to end his career with 560 goals (on par with Mats Sundin and Mike Modano) and 1,226 points (in hallowed company, finishing slightly ahead of Jean Beliveau).

I picked Pastrnak as my final player for a few reasons. For one, I wanted to pick a young player with a lot of upside (Jack Eichel was another candidate). But there are some quantitative points worth discussing.

The forecast assumes that the player's numbers are a true representation of his ability. I'm not saying that isn't the case with the Czech, but it can't be denied that he's part of arguably the best line in hockey. He won't play his whole career with Bergeron and Marchand - and when that stops, how much will his productivity drop?

The model also assumes that the "out-performance ratio" remains constant over a player's career. The ratio is usually fairly steady. In Pastrnak's case, it varies quite a bit. For example, he was far behind his peer group in per-game productivity at age 19 (more than 30% behind) and this year he's on pace to finish 17% ahead. Unlike, say, McDavid or Matthews, Pastrnak has been all over the place. More uncertainty means more risk, and that means his forecast of 560 goals and 1,200+ likely needs to be deflated.

The point of this isn't to badmouth Pastrnak - let's hope he has the career the model suggests he's capable of. But keep in mind - there's a lot of uncertainty in forecasts (hockey or otherwise). Maybe in another fifteen years (I just celebrated my 15th anniversary on this site) we'll see how accurate these really are.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
Before starting to put these forecasts in, you should've said: I tried this model on (for example)
* Sedins (seed - seasons 2000-2003)
* David Legwand (seed 1999-2003)
* Matt Cullen (seed 1998-2002)
* Zdeno Chara (seed 1997-2001)
and here are the results vs. the actual career

I can post this later on. I looked at some players who began before the lockout (so we have completed careers, but nobody who was in the peer group that I based the analysis on).

I believe I had Thornton, Iginla, Heatley, Lecavalier and Hossa using their actual stats through age 22. The results looked relatively good.
 

Zuluss

Registered User
May 19, 2011
2,449
2,088
I will repeat what I said in another thread: a good forecast should come with a confidence interval and should carefully select the group of peers using the information we already have. I will add a third thing (which I also have kind of said already): one should never, ever statistically predict upcoming peak play that is far from already observed performance. Let hockey scouts do that.

So let's go over these points. A 90% confidence interval means that the forecast falls with 90% probability between this and that number. For example, suppose we are looking at teenager Crosby. The (awful) projection the OP links to has a group of 18 peers. So toss the worst one and the best one (Gretzky and Sylvain Turgeon, roughly 10% of the peers) and your 90% confidence interval is that teenager Crosby will end up somewhere between Messier and Jimmy Carson (1804 and 561 career points) - one can manipulate those points somehow and make the interval 2000 to 600 or whatever, but you get the drift - early-career forecasts of career totals are so imprecise that they are useless.

Now we are going to the second point: suppose we are now forecasting the career of 22-year-old Crosby. Now we know more about him, we saw 4 full seasons. We can exclude Carson and Sylvain Turgeon from his peers and add someone better. We can also exclude Greztky and someone ridiculously healthy like Gartner.

If we are forecasting the rest of the career for 34-year-old Ovechkin, we will have to redo the whole peer sample: we know by now that he is a unique goal-scorer, we know he is superdurable and refuses to age. So his peers are folks like that, not Lindros and Lafontaine, not Mario, not Francis. If we keep the same sample for everyone, no matter how young or old, we are just refusing to use the information we learned about Ovechkin (or whoever) over his career so far.

And the third point suggests that we need to know why the forecast OP linked to is so awful. I get it that the confidence interval is super wide, but why did it get the average so wrong? We need to know what went wrong with the prediction of teenager Crosby career that he ended up with the prediction of 143, 129, 118 points between ages 27-29 and actually scored 84, 85, 89. This is not injuries or lower league-wide scoring. This is caused by some major flaws in forecast design we need to identify and not to repeat again.

As for late career forecasts that include the possibility of a sudden drop-off in play, I would report individual season forecasts as medians (so that they do not look low because 2 out of 8 peers missed almost the whole season or retired) and career numbers as means.
 
  • Like
Reactions: Doctor No

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
I've also back-tested the system, looking at some young star players prior to the 2005 lockout, based on their actual results through age 23. On the one hand, scoring was lower in the late nineties and early 2000's (compared to today), so that probably under-stated their forecasted career totals. On the other hand, the system (obviously) couldn't have predicted that these players would lose a full year to the lockout. My gut instinct is these two factors largely offset.

Here are the results:
  • Rick Nash - the system forecasts his career with remarkable accuracy. It predicts he'd score 442 goals (actual - 437) and 823 points (actual - 805) in 1,076 games (actual - 1,060). It's quite remarkable - the system correctly predicts Nash's actual career high in both goals (42) and points (79).
  • Chris Gratton - you don't hear too much about him anymore - the hulking centre was 3rd in the 1993 entry draft, picked ahead of Kariya when the NHL size fetish was at its peak. The system was quite accurate in predicting that he'd play in 1,061 games (actual 1,092) and would score 231 goals (actual - 214). It somewhat over-predicted the number of points he'd score - the system had him at 654, rather than the 568 he'd actually score. He was forecasted to be a perennial 50- to 60- point player; but his offense really dropped off after his age 22 season.
  • Simon Gagne - another fairly accurate forecast. He was predicted to play in 852 games (actual - 822), and score 275 goals (actual - 291) and 614 points (actual - 601). The system didn't expect his stellar 2006 campaign (47 goals), but that can largely be explained by scoring temporarily skyrocketing after the lockout, and playing with Peter Forsberg.
  • Marian Hossa - okay, this is a forecast failure. Hossa was only expected to score 260 goals (actual - 525) and 571 points (actual - 1,134). Why was the system so wrong? For one thing, it's basically impossible to predict Hossa's longevity. The system had him retiring at 30. There was no way of knowing, when he was 23, that he'd eventually become an excellent defensive forward (which helped extend his career). But the system failed to predict his peak as well (four straight 80+ point seasons). His numbers through age 23 weren't a good indication of how good a player he actually was.
  • Scott Gomez - the system is remarkably accurate in forecasting his career through age 31 (when he was expected to retire). He was projected to accumulate 864 games (actual - 869), 151 goals (actual - 167) and 672 points (actual - 675). Remarkable! In reality, Gomez played five years longer than expected, recording another 81 points in 215 games.
  • Alex Tanguay - once again, we have a really accurate forecast through age 31 (the expected retirement age). Tanguay was expected to play 853 games (actual - 818), score 237 goals (actual - 225), and record 695 points (actual - 686). However Tanguay played another five years after, scoring 177 points in 270 games.
  • Patrik Elias - the system is completely unsuccessful in forecast Elias's career. It has him at just 175 goals and 412 points. In reality, he scored nearly as many goals (408) as I've forecasted his point totals. Even if I exclude his short stints in 1996 and 1997, it doesn't help his totals very much (I can get him up to around 200 goals and 500 points - still much less than he actually accomplished). The system saw him as way behind his peer group at every step of his early career. The missing context is Elias got relatively limited ice time on a deep, winning team (and was also an excellent defensive forward) - both of which surely suppressed his scoring totals.
  • Sergei Samsonov - let's look at the 1998 Calder trophy winner. The system is actually too generous in forecasting his career. He was expected to score 316 goals (actual - 235) and 761 points (actual - 571). Why did he do so much worse than expected? In this case, the answer is pretty obvious. He had a wrist injury that made him miss most of the 2003 season (age 24). After that, he was never the same player. Although he kept his speed, he lost his incredible wrist shot (on par with Joe Sakic and Markus Naslund's), and his deking ability. The system predicted he'd score between 55 and 75 points seven years in a row (from ages 24 to 30). In reality, he'd eclipse the 40 points barrier just once the rest of his career (retiring at age 32).
  • Dany Heatley - the 2002 Calder trophy winner. I'm going to break my own rule and look at Heatley only through age 22. Why exclude the 2003 season? That was the year of the tragic car accident, in which he sustained some physical injuries, and it surely caused him to be distracted on the ice. So, forecasting out from 2002, the system expected Heatley to play 833 games (actual - 869), score 333 goals (actual - 372) and record 816 points (actual - 791). This is quite accurate. It never expected him to have consecutive 50-goal, 100-point seasons (at least partially explained by playing on one of the most dominant lines in the NHL, right after the lockout ended), but it also didn't expect him to slow down quite so fast.
  • Vincent Lecavalier - the forecast is quite accurate here. Lecavalier was expected to play 1,249 games (actual - 1,212), score 389 goals (actual - 421), and record 911 points (actual - 949). For those wondering why the forecast wasn't even higher (given how much hype surrounded him) - his 2002 season (at age 21) really hurts. If he really was the "Michael Jordan of Hockey", he never would have scored 37 points in a full season at the prime age of 21.
  • Jarome Iginla - the system is way off here. Iginla was only expected to score 311 goals (actual - 625) and 703 points (actual - 1,300). Part of the reason is due to his longevity (1,554 games - only eight forwards have played more). Even still, the forecast really underrates him through age 31 (when he was expected to retire). The main reason is he was significantly behind his peer group for the first five years of his career. There was no way to forecast that he'd win two goal-scoring titles (and add two more third-place finishes) with a career high of only 31 goals through age 23.
  • Joe Thornton - if I compare Thornton's actual totals to his projected totals, we're way off. Again, that's due to freakish longevity. As of today, he's 10th all-time in games played. Barring injury he'll be 8th by the end of the year, and if he can squeeze out one more healthy season, he'll be 5th. At the risk of repeating myself, nobody can project this type of longevity from a 23-year-old. That being said - the forecast is reasonably accurate in predicting his totals through age 33 (when he was forecasted to retire). The system has him at 1,018 points (actual - 1,118 points). The system didn't expect his incredible back-to-back 90 assist seasons (though that's partly boosted by the higher scoring after the lockout). If I manipulate the forecast and project forward solely on the basis of his 2003 season (arguing that he was rushed into the NHL too young, and battling injuries before that) - I get 1,295 points through his forecasted retirement at age 35 - very close to the 1,259 he actually achieved. Thornton is one of those players who took several years to really put it all together.
Overall the forecasts look reasonably accurate (as far as forecasts go - remember, the future is tough to predict, and if I could do it with even greater accurately, I'd probably shift my efforts towards fields that would be more financially rewarding). Sometimes the forecasts are off due to a catastrophic injury (i.e. Samsonov) - I've tried to be very clear that a single major injury can derail a player's career, and there's no way to predict that.

The biggest systematic issue is the system struggles to predict players with tons of longevity (like Thornton, and Iginla). Simply put, there's no way of knowing if a 23-year-old player will play 1,400+ games. I think, overall, I'm a bit too conservative in predicting retirements but the system is still reasonably accurate overall.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
I will repeat what I said in another thread: a good forecast should come with a confidence interval and should carefully select the group of peers using the information we already have. I will add a third thing (which I also have kind of said already): one should never, ever statistically predict upcoming peak play that is far from already observed performance. Let hockey scouts do that.

So let's go over these points. A 90% confidence interval means that the forecast falls with 90% probability between this and that number. For example, suppose we are looking at teenager Crosby. The (awful) projection the OP links to has a group of 18 peers. So toss the worst one and the best one (Gretzky and Sylvain Turgeon, roughly 10% of the peers) and your 90% confidence interval is that teenager Crosby will end up somewhere between Messier and Jimmy Carson (1804 and 561 career points) - one can manipulate those points somehow and make the interval 2000 to 600 or whatever, but you get the drift - early-career forecasts of career totals are so imprecise that they are useless.

Now we are going to the second point: suppose we are now forecasting the career of 22-year-old Crosby. Now we know more about him, we saw 4 full seasons. We can exclude Carson and Sylvain Turgeon from his peers and add someone better. We can also exclude Greztky and someone ridiculously healthy like Gartner.

If we are forecasting the rest of the career for 34-year-old Ovechkin, we will have to redo the whole peer sample: we know by now that he is a unique goal-scorer, we know he is superdurable and refuses to age. So his peers are folks like that, not Lindros and Lafontaine, not Mario, not Francis. If we keep the same sample for everyone, no matter how young or old, we are just refusing to use the information we learned about Ovechkin (or whoever) over his career so far.

And the third point suggests that we need to know why the forecast OP linked to is so awful. I get it that the confidence interval is super wide, but why did it get the average so wrong? We need to know what went wrong with the prediction of teenager Crosby career that he ended up with the prediction of 143, 129, 118 points between ages 27-29 and actually scored 84, 85, 89. This is not injuries or lower league-wide scoring. This is caused by some major flaws in forecast design we need to identify and not to repeat again.

As for late career forecasts that include the possibility of a sudden drop-off in play, I would report individual season forecasts as medians (so that they do not look low because 2 out of 8 peers missed almost the whole season or retired) and career numbers as means.

Fair comments. I assume the forecast model that you're referring to with these comments is Alan Ryder's, which I linked to in the first post, right?

Assuming that's the case - his article was published in February 2007. It was forecasting Crosby out based on less than two seasons' worth of data. It's hard to make a good focus based on so little information.

It's also important to note that Crosby was 18 and 19 those two years. History shows that players really improve from their teens, to their offensive peaks (generally ages 21 to 27). Both goal-scoring and playmaking rates increase by around 50%. I don't think Ryder's initial forecast is completely ridiculous (even when it forecasted Crosby to score 2,400 points). History says - if a player averages 110 points as a teenager, he should score even more than that in his twenties. Yes, scoring dropped league-wide, and he struggled with injuries - but how many people, after 2007, would have thought that we had already witnessed Crosby's 1st and 5th most productive seasons?

The better question might be - why did Crosby fail to live up to his potential at ages 18 and 19? (Some of it is clearly injuries - Crosby probably would have ended up around 1,600 points without all his lost time - but a player who was already scoring 120 points as teenager should have scored way more in his early to mid twenties).

As to your comments about late career forecasts - I agree entirely. It probably isn't appropriate for me to forecast Ovechkin (or Crosby, or maybe even Kane and Stamkos) based on the original peer group. We need to consider the possibility that a young player like McDavid or Matthews might flame out of the NHL early, or struggle with injuries (as did Carson and Allison, or Bure and Lindros). But that's clearly not the case for Ovechkin. Maybe a more appropriate peer group for him would be players who are still going strong in their early to mid thirties (Howe, Jagr, Selanne, Messier, etc). It's not difficult to imagine Ovechkin being forecasted to reach, say, 850 goals under these assumptions. But that'll be a project for another time.
 

morehockeystats

Unusual hockey stats
Dec 13, 2016
617
296
Columbus
morehockeystats.com
I've also back-tested the system, looking at some young star players prior to the 2005 lockout, based on their actual results through age 23. On the one hand, scoring was lower in the late nineties and early 2000's (compared to today), so that probably under-stated their forecasted career totals. On the other hand, the system (obviously) couldn't have predicted that these players would lose a full year to the lockout. My gut instinct is these two factors largely offset.

Here are the results:
  • Rick Nash - the system forecasts his career with remarkable accuracy. It predicts he'd score 442 goals (actual - 437) and 823 points (actual - 805) in 1,076 games (actual - 1,060). It's quite remarkable - the system correctly predicts Nash's actual career high in both goals (42) and points (79).
  • Chris Gratton - you don't hear too much about him anymore - the hulking centre was 3rd in the 1993 entry draft, picked ahead of Kariya when the NHL size fetish was at its peak. The system was quite accurate in predicting that he'd play in 1,061 games (actual 1,092) and would score 231 goals (actual - 214). It somewhat over-predicted the number of points he'd score - the system had him at 654, rather than the 568 he'd actually score. He was forecasted to be a perennial 50- to 60- point player; but his offense really dropped off after his age 22 season.
  • Simon Gagne - another fairly accurate forecast. He was predicted to play in 852 games (actual - 822), and score 275 goals (actual - 291) and 614 points (actual - 601). The system didn't expect his stellar 2006 campaign (47 goals), but that can largely be explained by scoring temporarily skyrocketing after the lockout, and playing with Peter Forsberg.
  • Marian Hossa - okay, this is a forecast failure. Hossa was only expected to score 260 goals (actual - 525) and 571 points (actual - 1,134). Why was the system so wrong? For one thing, it's basically impossible to predict Hossa's longevity. The system had him retiring at 30. There was no way of knowing, when he was 23, that he'd eventually become an excellent defensive forward (which helped extend his career). But the system failed to predict his peak as well (four straight 80+ point seasons). His numbers through age 23 weren't a good indication of how good a player he actually was.
  • Scott Gomez - the system is remarkably accurate in forecasting his career through age 31 (when he was expected to retire). He was projected to accumulate 864 games (actual - 869), 151 goals (actual - 167) and 672 points (actual - 675). Remarkable! In reality, Gomez played five years longer than expected, recording another 81 points in 215 games.
  • Alex Tanguay - once again, we have a really accurate forecast through age 31 (the expected retirement age). Tanguay was expected to play 853 games (actual - 818), score 237 goals (actual - 225), and record 695 points (actual - 686). However Tanguay played another five years after, scoring 177 points in 270 games.
  • Patrik Elias - the system is completely unsuccessful in forecast Elias's career. It has him at just 175 goals and 412 points. In reality, he scored nearly as many goals (408) as I've forecasted his point totals. Even if I exclude his short stints in 1996 and 1997, it doesn't help his totals very much (I can get him up to around 200 goals and 500 points - still much less than he actually accomplished). The system saw him as way behind his peer group at every step of his early career. The missing context is Elias got relatively limited ice time on a deep, winning team (and was also an excellent defensive forward) - both of which surely suppressed his scoring totals.
  • Sergei Samsonov - let's look at the 1998 Calder trophy winner. The system is actually too generous in forecasting his career. He was expected to score 316 goals (actual - 235) and 761 points (actual - 571). Why did he do so much worse than expected? In this case, the answer is pretty obvious. He had a wrist injury that made him miss most of the 2003 season (age 24). After that, he was never the same player. Although he kept his speed, he lost his incredible wrist shot (on par with Joe Sakic and Markus Naslund's), and his deking ability. The system predicted he'd score between 55 and 75 points seven years in a row (from ages 24 to 30). In reality, he'd eclipse the 40 points barrier just once the rest of his career (retiring at age 32).
  • Dany Heatley - the 2002 Calder trophy winner. I'm going to break my own rule and look at Heatley only through age 22. Why exclude the 2003 season? That was the year of the tragic car accident, in which he sustained some physical injuries, and it surely caused him to be distracted on the ice. So, forecasting out from 2002, the system expected Heatley to play 833 games (actual - 869), score 333 goals (actual - 372) and record 816 points (actual - 791). This is quite accurate. It never expected him to have consecutive 50-goal, 100-point seasons (at least partially explained by playing on one of the most dominant lines in the NHL, right after the lockout ended), but it also didn't expect him to slow down quite so fast.
  • Vincent Lecavalier - the forecast is quite accurate here. Lecavalier was expected to play 1,249 games (actual - 1,212), score 389 goals (actual - 421), and record 911 points (actual - 949). For those wondering why the forecast wasn't even higher (given how much hype surrounded him) - his 2002 season (at age 21) really hurts. If he really was the "Michael Jordan of Hockey", he never would have scored 37 points in a full season at the prime age of 21.
  • Jarome Iginla - the system is way off here. Iginla was only expected to score 311 goals (actual - 625) and 703 points (actual - 1,300). Part of the reason is due to his longevity (1,554 games - only eight forwards have played more). Even still, the forecast really underrates him through age 31 (when he was expected to retire). The main reason is he was significantly behind his peer group for the first five years of his career. There was no way to forecast that he'd win two goal-scoring titles (and add two more third-place finishes) with a career high of only 31 goals through age 23.
  • Joe Thornton - if I compare Thornton's actual totals to his projected totals, we're way off. Again, that's due to freakish longevity. As of today, he's 10th all-time in games played. Barring injury he'll be 8th by the end of the year, and if he can squeeze out one more healthy season, he'll be 5th. At the risk of repeating myself, nobody can project this type of longevity from a 23-year-old. That being said - the forecast is reasonably accurate in predicting his totals through age 33 (when he was forecasted to retire). The system has him at 1,018 points (actual - 1,118 points). The system didn't expect his incredible back-to-back 90 assist seasons (though that's partly boosted by the higher scoring after the lockout). If I manipulate the forecast and project forward solely on the basis of his 2003 season (arguing that he was rushed into the NHL too young, and battling injuries before that) - I get 1,295 points through his forecasted retirement at age 35 - very close to the 1,259 he actually achieved. Thornton is one of those players who took several years to really put it all together.
Overall the forecasts look reasonably accurate (as far as forecasts go - remember, the future is tough to predict, and if I could do it with even greater accurately, I'd probably shift my efforts towards fields that would be more financially rewarding). Sometimes the forecasts are off due to a catastrophic injury (i.e. Samsonov) - I've tried to be very clear that a single major injury can derail a player's career, and there's no way to predict that.

The biggest systematic issue is the system struggles to predict players with tons of longevity (like Thornton, and Iginla). Simply put, there's no way of knowing if a 23-year-old player will play 1,400+ games. I think, overall, I'm a bit too conservative in predicting retirements but the system is still reasonably accurate overall.
So out of 13 players, you have Hossa, Samsonov, Elias, Iginla and Thornton completely off, and Gratton, Gomez, Tanguay, Heatley and Samsonov somewhat off.

How about you find something else...
 

vancityluongo

curse of the strombino
Sponsor
Jul 8, 2006
18,630
6,290
Edmonton
Fascinating stuff HO. Really appreciate the work you're putting in - just wanted to say thank you and looking forward to seeing how you tweak this model.

So out of 13 players, you have Hossa, Samsonov, Elias, Iginla and Thornton completely off, and Gratton, Gomez, Tanguay, Heatley and Samsonov somewhat off.

How about you find something else...

You could always just not read it lol. He's pretty clear that this is exploratory.
 
  • Like
Reactions: Hockey Outsider

Doctor No

Registered User
Oct 26, 2005
9,250
3,971
hockeygoalies.org
This is great stuff - something I have developed internally for goaltenders (and a lot of the things I currently work on - like strength of schedule, in-season variation, goal support, score effects) are things designed to make these a bit better.

Variability is important and something that should be embraced - of course, you want models to get better, but it's important to be fully aware of the spectrum of outcomes that a 21-year-old budding superstar may follow.
 
  • Like
Reactions: Hockey Outsider

Hockey Outsider

Registered User
Jan 16, 2005
9,144
14,456
So out of 13 players, you have Hossa, Samsonov, Elias, Iginla and Thornton completely off, and Gratton, Gomez, Tanguay, Heatley and Samsonov somewhat off.

How about you find something else...

What level of precision are you expecting here?

Let's review the results. I predicted Gagne to within 2% of his career total. Same with Nash. Lecavalier was within 4%.

I predicted Heatley to within 3% of his actual career total. You consider that "somewhat off". In most businesses, one would be thrilled with a model that looks ten years into the future and gets the total sales (or costs) accurate to within a few percentage points.

I predicted Gomez and Tanguay to within 1% of their actual total through age 31. They hung around for some time after that but I don't think that's a forecasting failure. The model accurately predicted their entire primes. Neither added very much meaningful to their legacy (especially Gomez) after that point.

Samsonov was off - but that's because of a major injury sustained after he turned 23. No forecasting system could possibly identify that in advance.

So that's six out of twelve that I'd consider very accurate, and one more whose career was ruined by a serious, unpredictable injury.

Gratton was within 15% of his career total. Thornton within 12% through age 33. These are the ones I'd consider "somewhat off". (One was too high, the other too low). Eight out of eleven (excluding Samsonov) are very or somewhat accurate.

That leaves us with Iginla, Hossa, and Elias. What do they have in common? None of them hit 75 points through age 23. This tells me that the model struggles to identify late bloomers. I don't find that overly surprising. It was designed to look at really young players (McDavid, MacKinnon, Matthews, etc). I think the model is reasonably successful in doing so. But it can't tell you who might break out in the future, among those who haven't broken out yet - you might need to look at things like ice time, corsi, zone starts, etc. to get a better sense (or just watching the players).

(For the record, I'm not "defensive" about this method. If someone has any suggestions to improve it, I'm all for it - it should be relatively easy to incorporate that into the model).
 

Zuluss

Registered User
May 19, 2011
2,449
2,088
Here are some additional thoughts. I think it is hard to forecast the shape of the career arc from the first 2-4 seasons, and career arcs are very different. I agree with you that one cannot forecast late bloomers using points from the first 2-4 seasons (one can try look for trends there, but that's the only thing that comes to mind). I do not see it as a drawback of the system - if we have no information, we have no information.

The information we do have though is the information on early bloomers - especially if a teenager/rookie challenges for Art Ross/Rocket. In such cases, it is useful to remember that career arcs are more often hill-shaped than tent-shaped and there are diminishing returns in hockey like in everything else. A 50-point teenager can progress to a 100-point player, but a 100-point teenager will not progress to a 200-point player, even Gretzky did not. This was the problem with Ryder's forecast of teenager Crosby, and I still see a bit of that in your forecasts - McDavid is not going to peak at 27 and Matthews is not going to yield 5 100-point seasons. I know that point-wise players tend to peak around 27, but I think the early career of McDavid and Matthews has convinced us they are not going to follow the career arc of Sakic, Kane, and Marchand. They hit the ground running and realized their potential early.

Another thought in a similar vein is that it is probably useful to discount a bit one fluke season like Selanne's 76 goal season or Laine's 44 goal season. I think Pastrnak falls into the same bin to an extent, I do not see him as perennial 100-point player.

The test sample shows that your model usually has players retire too early. To an extent, it may be unavoidable selection bias in the sample - everyone remembers more players of great longevity. It also may be an issue of trying to fit a Bure and a Jagr by the same model or just the consequence of your "retirement rule" being too strict.

In the case of established players, the solution seems obvious - Kanes and Malkins of the world do not retire at 35. They either age very well or change their game or the franchise feels too indebted to them / believes in them too much and gives them long contracts, second chances, third chances, etc. Besides, they still sell tickets and jerseys.

What to do with younger players? Apparently, there is a certain threshold they have to break to fall into the franchise players group. If they did or are about to do so, it would be wise to award them extra longevity points. Another thing to consider is to model the probability of early retirement separately - and then set it to 0 if a player shows a lot of durability early on (e.g., Hossa, Iginla).

Overall, I think you can be proud of how your model fared in tests on complete careers - like I said, the confidence interval of a career forecasts with just a few seasons in the books is likely to be super-wide, and even being off by 20% can be plausibly written off as bad luck. The only thing that can be held against you is the "sign test" and the fact that you undershoot much more often than overshoot, but again, it may well be the selection bias in the test sample.
 
  • Like
Reactions: Hockey Outsider

Ad

Upcoming events

Ad

Ad