Here is a more detailed explanation how the algorithm currently works and discussion about the problems (this ended up to be quite a heavy post... probably should've started with this.)
___________________
As I said before I have disregarded all individual statistics and look only the on ice stats. After all, ice hockey is a team sport. If a team scores a goal it's not always clear who really made it happen. Or furthermore if they concede a goal it's quite impossible to say whose fault was it only by looking at the statistics.
Doing it like this is very tempting because building a somewhat balanced model is quite straightforward. You can treat defencemen and forwards the same and don't have to worry so much about bias between different playing styles/positions.
However, there seems to be some problems which can't be solved without looking at the individual statistics
___________________
So the algorithm works like this:
For these events
- goal
- shot (including missed and blocked)
- penalty
- zone_ending
- shorthanded time / wasted power plays
do the following:
1) Calculate mean(rating) for both teams using the skaters on ice (in case of goal include the goalie rating)
2) Using those avg ratings calculate rating change according to the ELO-formula.
3) Multiply the rating change with: event weight, score weight, coordinate weight
4) In case of "not balanced" event (events after zone start, power play, empty net)
the rating changes are multiplied by correction coefficient to ensure that the average rating change is 0 in those occasions.
5) Finally this multiplied/corrected rating change is applied to all skaters on ice
Performance for each player is then calculated similarly but instead of using the players own rating average rating is used.
The motivation behind this is that it should be equally easy for high and low rated player to achieve a good performance.
________________________
The event weights are:
event_weights = {
'shot':0.25,
'goal':1.9,
'time':0.25, #one minute power play time
'blocked_shot':0.15,
'missed_shot':0.15,
'penalty':0.45,
'zone_ending':0.15
}
The score weights by score_difference are:
0 1.0
1 1.0
2 0.8
3 0.6
4 0.4
5 0.2
The coordinate weights range from 0.5 to 1.5 depending on the shot distance and angle.
____________________
Problems:
1) "The Sedin problem 1"
If two players are always plaing together then the difference in their ratings stays constant whatever happens.
This will lead to a problem when one of the players is underrated (or overrated).
For example, assume that Daniel Sedin has a way too low rating (as he probably does).
Then the algorithm tries to correct this but if D plays with Henrik all the time then H's rating will be pushed upwards also.
Now both H's and D's ratings will be pushed up until H is as much overrated as D is underrated and then the ratings stay there.
This problem wouldn't exist if the team mates/lines would mix enough but sadly that's not always the case.
2) "The Sedin problem 2"
The highest rated player always get's the most performance points from each event. This happens because if you take all players from one team then the best player has worst team mates
If a very good and a very bad player achieve a same result together then the very good player has played better than the bad player because he has bad team mates.
3) Lack of individual statistics will cause inaccuracies
For example if one player takes a stupid penalty/makes a great goal then all his team mates are punished/rewarded equally.
I think this is only a minor problem in the long run (IF the players are sufficiently mixed).
I think these problems are impossible to fix without looking at the individual stats so that's probably what I will try to do next.
However, mixing in the individual stats is somewhat hard because there is always a danger that the model will reward some playing style more than other even though it would not be better for the team in the end.
Also, many important stats (like passes and their coordinates) are seemingly not tracked by NHL at all.
But anyways... stay tuned.