GVM
Registered User
- Apr 9, 2017
- 10
- 28
I have minimal domain knowledge of advanced stats. I should perhaps read up.
With this in mind, I wondered if it was possible for me to predict players who have scored 20 + goals at even strength in 2017-2018 based solely on advanced stats culled from www.hockey-reference.com.
Search:
For single season, in 2017-18, in Even Strength situations, playing skater, sorted by descending Games Played.
Link:
Player Advanced Stats Finder | Hockey-Reference.com
Simple binary classification model:
Based on data from hockey-reference.com, has a player scored 20 + goals?
I removed all of the derived value columns: CF%, CF% rel, C/60, Crel/60, FF%, FF% rel, PDO.
I also removed all of the face-off stats: FOW, FOL, FO%
Data looks like this:
[TBODY]
[/TBODY]G column stands for goals: 0 means less than 20, 1 means 20 or more goals scored.
Class distribution:
Number of True cases: 53 (5.96%)
Number of False cases: 837 (94.04%)
Data split:
74.94% in training set
25.06% in test set
Verifying predicted value split:
Original True : 53 (5.96%) Original False : 837 (94.04%)
Training True : 37 (5.55%) Training False : 630 (94.45%)
Test True : 16 (7.17%) Test False : 207 (92.83%)
Tried a bunch of classification algorithms, and they all pretty much outputted similar results.
Output from the logistic regression algorithm on the training set with predictions made on the test set:
Accuracy: 0.9596
Confusion matrix:
[TBODY]
[/TBODY]Meaning that out of 223 rows, we had a total of 9 bad predictions.
Model, without any tweaking, is really good at predicting less than 20, but needs work on predicting 20 or more.
If I remove the # of games played from the features, I believe that my => 20 predictions would improve.
Some players have 20 or more goals with less than 70 games played.
Suspect I would've achieved similar results had I used only derived value columns.
Anyway, just a possible scenario on how a NHL team, or a fantasy league pooler could make use of advanced stats.
Cheers
With this in mind, I wondered if it was possible for me to predict players who have scored 20 + goals at even strength in 2017-2018 based solely on advanced stats culled from www.hockey-reference.com.
Search:
For single season, in 2017-18, in Even Strength situations, playing skater, sorted by descending Games Played.
Link:
Player Advanced Stats Finder | Hockey-Reference.com
Simple binary classification model:
Based on data from hockey-reference.com, has a player scored 20 + goals?
I removed all of the derived value columns: CF%, CF% rel, C/60, Crel/60, FF%, FF% rel, PDO.
I also removed all of the face-off stats: FOW, FOL, FO%
Data looks like this:
GP | CF | CA | FF | FA | oiSH% | oiSV% | oZS% | dZS% | TOI/Gm | HIT | BLK | TK | GV | G |
82 | 1572 | 1636 | 1242 | 1285 | 9.4 | 91.4 | 42.2 | 57.8 | 18.9 | 61 | 100 | 33 | 81 | 0 |
61 | 458 | 582 | 326 | 444 | 11.8 | 90.7 | 45.5 | 54.5 | 9.5 | 55 | 21 | 27 | 14 | 0 |
82 | 1135 | 957 | 784 | 716 | 10.1 | 91 | 58.9 | 41.1 | 12.6 | 37 | 24 | 30 | 21 | 1 |
82 | 1105 | 1137 | 773 | 840 | 5.5 | 90.7 | 55.9 | 44.1 | 13.2 | 32 | 35 | 28 | 49 | 0 |
78 | 1406 | 1641 | 1024 | 1237 | 7 | 90.5 | 44.9 | 55.1 | 19.3 | 101 | 125 | 15 | 57 | 0 |
82 | 1377 | 1302 | 1032 | 1004 | 10.8 | 90.5 | 58.1 | 41.9 | 15.9 | 132 | 21 | 21 | 64 | 1 |
Class distribution:
Number of True cases: 53 (5.96%)
Number of False cases: 837 (94.04%)
Data split:
74.94% in training set
25.06% in test set
Verifying predicted value split:
Original True : 53 (5.96%) Original False : 837 (94.04%)
Training True : 37 (5.55%) Training False : 630 (94.45%)
Test True : 16 (7.17%) Test False : 207 (92.83%)
Tried a bunch of classification algorithms, and they all pretty much outputted similar results.
Output from the logistic regression algorithm on the training set with predictions made on the test set:
Accuracy: 0.9596
Confusion matrix:
n =223 | predicted < 20 | predicted => 20 |
actual < 20 | 207 | 0 |
actual => 20 | 9 | 7 |
Model, without any tweaking, is really good at predicting less than 20, but needs work on predicting 20 or more.
If I remove the # of games played from the features, I believe that my => 20 predictions would improve.
Some players have 20 or more goals with less than 70 games played.
Suspect I would've achieved similar results had I used only derived value columns.
Anyway, just a possible scenario on how a NHL team, or a fantasy league pooler could make use of advanced stats.
Cheers
Last edited: