The Data-Based Drafting Thread (what players would a Potato pick?)

Melvin

21/12/05
Sep 29, 2017
15,198
28,055
Montreal, QC
One side benefit of this project is that in compiling this data I now have a database of draft information I have never had before, and can do all sorts of stuff with it very easily.

For example, here is the strength of each draft, rated as the avg of potato scores for drafted-players:

1. 2016 draft
2. 2015 draft (the 2018 draft is looking comparable.)
3. 2010 draft
4. 2009 draft
5. 2011 draft
6. 2013 draft
7. 2017 draft (surprised this is so low)
8. 2014 draft (interestingly enough, based on first round only this draft is #1. But it fell off after that.)
9. 2012 draft

This lines up with common perception that the 2012 draft was a poor one.

How annoying is it that the 2016 draft is the one where we didn't have any picks. Argh.
 
Last edited:

ChefBoiRD

Registered User
Feb 26, 2018
593
249
The Patata (potato) reigns supreme!!

Lllllove it! Looooooooove iiiiiiiiit!!

But Gillis is like our own Bill James (Sabermetrics)

Patatametrics!!

Nice work But Gillis! Very interesting stuff.

Thanks for doing the 2018 class as well. It will be my go too, cheat sheet come Draft day.

Ciao!!
 
  • Like
Reactions: Hindustan Smyl

Mafic

Registered User
Oct 2, 2007
238
25
BC
Thanks for putting the time into this, it's a very interesting read! Reminds me a bit of that Sham-the-intern draft Canucks Army put out a while back (We think the Vancouver Canucks may have a scouting problem(!!!!)), but obviously with a lot more effort.

I'm curious how you ended up training the model, but didn't see much mentioned about it in the methods. Are you basically taking some NHL metric (I'm guessing TOI), and adjusting your position/league/age/height modifiers until you get the best correlation with PPG?

It's probably way too much work (or maybe it already accounts for this), but I'd be interested to see how different the model would be if it only used data that existed prior to each of the draft years in question. So for instance, Pastrnak and Nylander's success in the NHL likely improves the equivalency numbers for Allsvenskan players. But if you exclude any data after 2014 in calculating the equivalency numbers, would they be ranked as highly in the 2014 draft as they were, or is their success influencing their high ranking when looking back, creating a bit of a feedback loop?

I agree. Obviously the Sham example is flawed as he knows which prospects are drafted in each round. Since you mentioned you're using the same version of the model for every draft, I think it's value as a scouting-evaluation tool would diminish the farther back in time you go simply because you have access to a greater amount of data that was not known at that time.
It would be interesting to see your potato system with some kind of moving-window to train the model. But I'm not that great of a stats or programming guy, so have no idea how much effort that requires...

Anyways, these 'potato' drafts are always interesting because I think the data would show they generally outperform our drafting. But it would certainly be more convincing if it relied solely on data prior to the draft. The 2018 predictions will be interesting to follow just for that reason.

Cheers
 

Melvin

21/12/05
Sep 29, 2017
15,198
28,055
Montreal, QC
Thanks for putting the time into this, it's a very interesting read! Reminds me a bit of that Sham-the-intern draft Canucks Army put out a while back (We think the Vancouver Canucks may have a scouting problem(!!!!)), but obviously with a lot more effort.

I'm curious how you ended up training the model, but didn't see much mentioned about it in the methods. Are you basically taking some NHL metric (I'm guessing TOI), and adjusting your position/league/age/height modifiers until you get the best correlation with PPG?



I agree. Obviously the Sham example is flawed as he knows which prospects are drafted in each round. Since you mentioned you're using the same version of the model for every draft, I think it's value as a scouting-evaluation tool would diminish the farther back in time you go simply because you have access to a greater amount of data that was not known at that time.
It would be interesting to see your potato system with some kind of moving-window to train the model. But I'm not that great of a stats or programming guy, so have no idea how much effort that requires...

Anyways, these 'potato' drafts are always interesting because I think the data would show they generally outperform our drafting. But it would certainly be more convincing if it relied solely on data prior to the draft. The 2018 predictions will be interesting to follow just for that reason.

Cheers

I set it up deliberately so that it is very easy for me to re-model based on different years if need be.

I don't want to get too detailed into the methodology until I know where I am going with this but I have done hundreds of different runs using different draft years and different approaches.

When I did the team evaluations on the blog, my method to avoiding this was to use older data to "train" but then do the team evaluations based on the newer drafts only. Thus I was posting evaluations of teams from 2014-2017 but only after "training" based on the drafts prior. Having said that though, I have made changes to things since then so I can't guarantee that if I do the same exercise I will get the exact same results.

It is pretty marginal though in terms of how much difference it makes in the big picture, and that makes me happy. For example just last weekend I added the 2009 draft and re-ran calculations using 2009-2013 as my "window" and I did not get any different results. So that is a good sign. That is why I am comfortable posting this thread now; because I believe that I am not going to be adjusting anything significantly unless I come across some sort of breakthrough and I am mostly into diminishing returns.

When I do the league calculations for example, I always give myself a "range" of sort of values. I don't have it output an exact coefficient because I believe that introduces a false sense of precision and you are lying to people when you say "oh this league is 0.834 of the OHL." Like nobody has enough data to be able to pinpoint things to a 3rd decimal place; it's absurd. So I always end up with an output that is like "between 0.5 and 0.7." The more data I have, the tighter the window I will be able to make. For something like the Czech league where hte data is limited, I get something like "between 2.5 and 6.0" or something crazy. At that point I am picking a value as pretty much a guess based on my own instincts so it is a bit subjective. That's why I said that Martin Kaut could realistically be anywhere from like 3rd on my ranking to 16th. Just, nobody really knows much about that league unfortunately.

I believe that my approach of outputting a window and not a hyper-precise number means that I am not as prone to small sample fluctuations or biases introduced by using specific model years. Whereas someone calculating specific values might get 0.834 when running these years, but get 0.893 when using some other data set. For me, I get "between 0.8 and 0.9" both times and I pick my coefficient based on that.

Anyway, I am working on 2008 now; I think I may go back as far as I can because even if I don't use this old data for this exercise it may still have a purpose for future project. Just having a database with all this draft data is interesting even just for answering questions like "when was the last time a player like X was drafted."

BTW, if anyone out there is willing to help with some data entry, please PM me. That would really expedite the process in terms of getting these older drafts loaded.
 
Last edited:

Melvin

21/12/05
Sep 29, 2017
15,198
28,055
Montreal, QC
Here are my team rankings from 2009-2016:

ott 77
car 76
ana 74
lak 69
was 64
min 63
col 62
nas 62
tam 61
fla 60
cbs 59
buf 57
edm 54
chi 53
bos 52
nyi 50
det 49
win 48
cal 47
ari 45
dal 45
pit 42
njd 40
sjs 40
mon 37
phi 33
tor 33
stl 32
nyr 30
van 29
[TBODY] [/TBODY]
 

Horse McHindu

They call me Horse.....
Jun 21, 2014
9,668
2,650
Beijing
There seems to be enough interest in this topic to merit continuation, however I was starting to clutter up the management thread with stuff that is not directly related to current management and I have nowhere else to put this, so I am starting a new thread.

I have spent the last several months working on a system for drafting players based on nothing more than publicly-available information, namely games played and points in the player's draft year, along with the league played in and biographical data (birthday and height/weight as listed on NHL.com at time-of-draft.)

The purpose of this exercise was to establish a "baseline" that one could use to properly evaluate a team's draft performance. How well are they doing at drafting? Well, compare their picks to the picks that this simple system would have made, and you have your answer. If a team cannot consistently do better than "the potato," then some difficult questions should be asked of the scouting staff.

I have put a lot more information on the methodology and the intention on my blog, so I will just link to the introduction there rather than repeating everything here. I have made several follow-up posts there which talk about a few other topics.

Honestly, the system performs better than I would have expected, and it is something into which I have now invested substantial time. I have made several tweaks and added considerable more data to it since my original post, including draft data going back to 2009.

Because this is the Canucks forum, here are the picks the Canucks would have made since 2009, taking the best player available using the actual picks the Canucks had, and applying the latest version of my formula. Note that the system does not know where the player was actually drafted or even in some cases if the player was even drafted at all, so it makes some substantial reaches. Note also that these might differ in some way from what is on the blog, but not substantially so. Feel free to ask me if you have any questions about any particular player.

2009: Brandon Pirri, Anton Rodin, Mike Hoffman, Benjamin Cassavant, Curtis McKenzie, Brandon Kozun, Michael Cichy

2010: Jesper Fast, Brendan Gallagher, Artemi Panarin, Alexei Marchenko, Brendan Ranford

2011: Shane Prince, Jean-Gabriel Pageau, Andrew Fritsch, Ondrej Palat, Joel Lowry, Josh Manson, Ryan Dzingel, Henrik Tommernes

2012: Esa Lindell, Jujhar Khaira, Alexander Kerfoot, Matej Beran, Emil Lundberg

2013: Alexander Wennberg, Artturi Lehknonen, Sven Andreighetto, Eric Locke, Andreas Johnsson, Juuso Ikonen, Brendan Harms

2014: William Nylander, David Pastrnak, Brayden Point, Viktor Arvidsson, Spencer Watson, Axel Holmstrom, August Gunnarsson

2015: Anthony Beauvillier, Anthony Richard, Andrew Mangiapane, Nikita Korostelev, Jonathan Davidsson, Tim McGauley, Kay Schweri

2016: Matthew Tkachuk, Vitaly Abramov, David Bernhardt, Maxime Fortier, Brayden Burke, Tim Wahlgren

2017: Elias Pettersson, Jason Robertson, Jonah Gadjovich, Igor Shvyrov, Matthew Strome, Artem Minulin, Austen Keating, Ivan Chekhovich

The same formula is used for each and every draft and not altered or tweaked in any way for any particular draft year. The 2017 draft has also not been used for any assessment of it and I have tried, in each year, to include as many undrafted players as I could find to truly represent the pool of players available, but it is difficult to find this information and some may be missing.

Remember, scouts are paid for their predictions, and just as you would want to know how a money manager is doing by comparing his predictions to a standard model for picking stocks, so too should you compare your scouting to a standard model for picking players, and probably look to invest your money elsewhere if it compares poorly.

How the Canucks have performed compared to this simple baseline is 100% up to your evaluation of these players. I have also spent a lot of time on this topic but won't get into the actual team evaluations here because this post is already too long. I have team evaluations and rankings on my blog if you want to read them there.

As with any system, some drafts are looking better than others for this system. The 2009 and 2015 drafts are not good in general when applying this model to all 30 teams, while the 2010, 2011 and 2014 drafts are looking quite strong. The 2016 and 2017 are too early to call one way or the other.

Finally, I will close this post by posting the top-20 for both 2017 and 2018. I was originally going to do every draft since 2009 but this post is already so long and nobody is going to read it all. I have the 2018 draft also posted on my blog along with further commentary, however I will answer any questions here and take any requests for further information.

First, 2017:

1. Elias Pettersson
2. Lias Andersson
3. Nico Hischier
4. Nick Suzuki
5. Nolan Patrick
6. Cody Glass
7. Jason Robertson
8. Conor Timmins
9. Gabriel Vilardi
10. Kole Lind
11. Jonah Gadjovich
12. Martin Necas
13. Nicolas Hague
14. Igor Shvyryov
15. Owen Tippett
16. Kailer Yamamoto
17. Cal Foote
18. Aleksi Heponiemi
19. Matthew Strome
20. Antoine Morand
21. Michael Rasmussen

The Canucks actually managed to get 3 of the top ten from this system, although it preferred Jason Robertson to Kole Lind. I am not a big fan of lists, and surely parts of this is always going to be laughable, but the important thing to me is the overall performance of applying this methodology to every pick and comparing it to the overall performance of teams. There are going to be some massive misses in both cases so the long-run evaluation is more important than the rankings. On the one hand, the model had Sven Baertschi ahead of Nikita Kucherov in 2011, but on the other hand, so did the scouts, and at least the model had Kucherov in the top-20 (Baertschi was ranked 14th, drafted 13th; Kucherov was ranked 15th, drafted 58th.) I think that keeping perspective on this is important.

I think very clearly the biggest place where this methodology will differ from scouts is with players with perceived skating issues. Guys like Matthew Strome and Jonah Gadjovich were available to be selected much later because of their skating. This is not something that I can account for in the data (yet!) This seems like a very clear space where scouting can add value and should be able to out-perform this baseline. If it were "me" making the picks I would want to definitely consider skating ability and factor that into the rankings.

OK, finally we get to 2018. Brace yourself!

1. Rasmus Dahlin
2. Evan Bouchard
3. Andrei Svechnikov
4. Noah Dobson
5. Filip Zadina
6. Jesperi Kotkaniemi
7. Isac Lundeström
8. Martin Kaut
9. Jacob Olofsson
10. Filip Hållander
11. Joe Veleno
12. Alexander Alexeyev
13. Akil Thomas
14. Oliver Wahlstrom
15. Ryan McLeod
16. David Gustafsson
17. Jonatan Berggren
18. Linus Karlsson
19. Carl Wassenius
20. Nathan Dunkley

Without repeating what I wrote on the blog, it is a big draft for European players as guys like Lundestrom, Kaut, Olofsson and Hallander are highly-ranked but could be grabbed with later picks. Most of these guys are separated by the slimmest of margins and could be moved around anywhere. The biggest gap is after Dahlin, as the scouts seem to have as well.

I will be honest, I have worked very hard on this and probably put 200-300 hours into it at this point, far exceeding my original intentions of making the laziest system possible. I am excited to see how it does for 2018 but also aware that any one draft can boom or bomb, and the 2018 draft having a lot of defenders makes it even more difficult, so it wouldn't surprise me if it performs more like the 2015 draft than the 2014 one.

In any case, this should set for us a relevant baseline against which we can compare the Canuck picks. When the draft occurs, I will post our picks in here in "real time" as I will pick for the Canucks the best player available based on the system. I will also post "my" pick by taking into account some expectations for where a player will go which the system does not account for.

If you have any questions, comments or suggestions I am happy to answer them, although I will note just one more time that more information is on the blog so if you want some more details I would encourage you to at least read the introduction post there.

Thank you for your time.

EDIT: I had made a translation error when copying into this post and missed Conor Timmins in the 2017 rankings, who should have beetn between Robertson and Vilardi. I have updated the post but now show the top 21 so that I am not removing anything.

One question for you: Why does Hughes fall completely off the map under your analysis? Your opinion here seems to go against the grain of a lot of other esteemed hockey pundits.

One thing I find interesting however is that ALL of you guys (me included) are high on Kotkaniemi.
 

Horse McHindu

They call me Horse.....
Jun 21, 2014
9,668
2,650
Beijing
Here are my team rankings from 2009-2016:

ott 77
car 76
ana 74
lak 69
was 64
min 63
col 62
nas 62
tam 61
fla 60
cbs 59
buf 57
edm 54
chi 53
bos 52
nyi 50
det 49
win 48
cal 47
ari 45
dal 45
pit 42
njd 40
sjs 40
mon 37
phi 33
tor 33
stl 32
nyr 30
van 29
[TBODY] [/TBODY]

Thank you Mike Gillis!
 

Peter10

Registered User
Dec 7, 2003
4,193
5,042
Germany
One question for you: Why does Hughes fall completely off the map under your analysis? Your opinion here seems to go against the grain of a lot of other esteemed hockey pundits.

One thing I find interesting however is that ALL of you guys (me included) are high on Kotkaniemi.

I would say you are a bit late to the Kotkaniemi train jumping on only last month. I think it was Knight53 who was banging the drum for him since about a year and I guess many folks had a look then and liked what they saw.
 
  • Like
Reactions: Hindustan Smyl

mossey3535

Registered User
Feb 7, 2011
13,475
10,038
As for d-men, IMO there's no point in changing your model.

If a team at that position wants a d-man, just filter the potato list for d-men, pick the top rated potato d-man. Done.
 

Doyle Hargraves

Registered User
May 11, 2018
400
199
Only 5/8 of your thanks belong to Mike Gillis, the other 3/8 are to Jim Bennings credit.
I think you have that backwards. Gillis had six drafts in Vancouver. He drafted from 08-13. The drafting here was a garbage fire from 06-12.
 

drax0s

Registered User
Mar 18, 2014
3,751
2,915
Vancouver, BC.
One side benefit of this project is that in compiling this data I now have a database of draft information I have never had before, and can do all sorts of stuff with it very easily.

For example, here is the strength of each draft, rated as the avg of potato scores for drafted-players:
So effectively, when people claim a draft is weak or strong, you can quantity it. I could see draft rankings like "top 5 picks, 5% above average, top 20 picks 20% below average", etc being useful to quantify sort of an "actual value" of a draft pick. 2016 draft picks, for example, are more valuable than 2012 picks.
 

Melvin

21/12/05
Sep 29, 2017
15,198
28,055
Montreal, QC
So effectively, when people claim a draft is weak or strong, you can quantity it. I could see draft rankings like "top 5 picks, 5% above average, top 20 picks 20% below average", etc being useful to quantify sort of an "actual value" of a draft pick. 2016 draft picks, for example, are more valuable than 2012 picks.

True, and an interesting thought but I think the difference is usually going to be quite marginal. 2012 was baaad though.
 

Melvin

21/12/05
Sep 29, 2017
15,198
28,055
Montreal, QC
Lol in this years draft the model doesnt even have hughes on the list

5'10" defender with meh production in the weakest college conference. He rates similarly to Julius Honka and Adam Fox.

This is -again- where your scouts need to be able to add value. If you're going to make him the highest-selected 5'10" defender to be taken in history, you'd better be sure.
 

Ad

Upcoming events

Ad

Ad