Identifying D-partners with game by game plus-minus

overpass

Registered User
Jun 7, 2007
5,271
2,808
Here's a statistical method for identifying which defencemen played together at even strength, using historical game log plus-minus data which goes back to 1959-60. It's most useful for historical seasons for which we don't have any ice-time data on which players played together.

It's relatively easy to identify which forwards played together by looking at the scoring logs and seeing which forwards combined for goals. But it's relatively rare for two defencemen to combine on a goal, especially at even strength, so we have to look for another solution for defencemen. It's possible to look at the seasonal plus-minus totals and take a guess, but these guesses aren't always accurate for various reasons. Maybe the defencemen didn't play together all season, or maybe there are several defencemen all in the same range for seasonal plus-minus.

It's possible to identify which defencemen played together the most by lining up their game by game plus-minus for the season and finding the correlation between each of their game by game plus-minus results, using the correlation function in Excel. Correlations range from -1 to 1. For this method, a correlation approaching 1, e.g. 0.8 or 0.9, would mean that the players usually played together.

Here's an example, using the 1962-63 Leafs.

PlayerPositionGP+/-HortonStanleyBrewerBaunDouglas
Tim HortonD
70​
1​
1.00​
0.86
0.05​
0.09​
0.20​
Allan StanleyD
61​
2​
0.86
1.00​
0.01​
-0.07​
-0.02​
Carl BrewerD
70​
32​
0.05​
0.01​
1.00​
0.84
0.40
Bob BaunD
48​
14​
0.09​
-0.07​
0.84
1.00​
0.15​
Kent DouglasD
70​
18​
0.20​
-0.02​
0.40
0.15​
1.00​

Tim Horton and Allan Stanley's game by game plus-minus had a very high correlation of 0.86, indicating that they were regular partners. Same with Carl Brewer and Bob Baun (0.84). Kent Douglas doesn't appear to have had a regular partner, and his highest correlation is 0.40 with Carl Brewer, possibly because Douglas played with Brewer during the 22 games that Baun missed.

Stanley-Horton
Brewer-Baun
Douglas

The same method should work for the forwards as well, although it adds less information because we can already look to see which forwards combined on points together.

Frank Mahovlich: Red Kelly 0.89, Bob Nevin 0.53, Eddie Shack 0.32, Dave Keon 0.28
Red Kelly: Frank Mahovlich 0.89, Bob Nevin 0.53, Eddie Shack 0.32, Ron Stewart 0.30,
Bob Nevin: Frank Mahovlich 0.53, Red Kelly 0.53, Bob Pulford 0.36, Billy Harris 0.28

Red Kelly and Frank Mahovlich appear to have been linemates all season. Bob Nevin was their most frequent RW, with Eddie Shack and others also playing some RW, and Nevin also played some time with Pulford and Harris.

Dave Keon: George Armstrong 0.85, Dick Duff 0.72, Frank Mahovlich 0.28
George Armstrong: Dave Keon 0.85, Dick Duff 0.79, Billy Harris 0.26, Red Kelly 0.24
Dick Duff: George Armstrong 0.79, Dave Keon 0.72, Red Kelly 0.21

Duff-Keon-Armstrong appears to have been a regular line.

Bob Pulford: Ron Stewart 0.66, Billy Harris 0.55, Eddie Shack 0.43, Bob Nevin 0.36
Billy Harris: Bob Pulford 0.55, Ron Stewart 0.50, Bob Nevin 0.28, George Armstrong 0.26
Ron Stewart: Bob Pulford 0.66, Billy Harris 0.50, Red Kelly 0.30, Eddie Shack 0.25

Pulford-Harris-Stewart was the most common third line, but wasn't as regular as the Keon line or the Mahovlich-Kelly duo. Eddie Shack, Bob Nevin, and others also got some time here.

Eddie Shack: Bob Pulford 0.43, Frank Mahovlich 0.32, Red Kelly 0.32, Ron Stewart 0.25, Billy Harris 0.21
Ed Litzenberger: Bob Pulford 0.26, Ron Stewart 0.20, Bob Nevin 0.16

Neither of the two spare forwards had regular linemates, as you would expect. It looks like Shack spent some time with Mahovlich-Kelly and some time on the third line, but very little with Keon's line. Litzenberger has very little correlation with anyone and was probably more of a true spare forward.

Mahovlich-Kelly-Nevin
Duff-Keon-Armstrong
Pulford-Harris-Stewart
Shack, Litzenberger

We can also use the plus-minus correlations to see if any forwards and defence spent more time together, or if it was pretty evenly distributed. Starting with the first line.

PlayerPositionHortonStanleyBrewerBaunDouglas
Frank MahovlichLW
0.50​
0.37​
0.32​
0.54​
0.28​
Red KellyC
0.52​
0.43​
0.31​
0.48​
0.26​
Bob NevinRW
0.35​
0.43​
0.50​
0.47​
0.23​

Douglas has lower correlations most likely because he was the 5th defenceman at EV for most of the season and played less ice time. The Mahovlich-Kelly-Nevin line appears to have played more or less evenly with the Horton-Stanley pairing and the Brewer-Baun pairing.

PlayerPositionHortonStanleyBrewerBaunDouglas
Dick DuffLW
0.35​
0.25​
0.36​
0.40​
0.19​
Dave KeonC
0.39​
0.25​
0.36​
0.37​
0.33​
George ArmstrongRW
0.31​
0.12​
0.49​
0.52​
0.32​

PlayerPositionHortonStanleyBrewerBaunDouglas
Bob PulfordLW
0.40​
0.33​
0.54​
0.35​
0.45​
Billy HarrisC
0.34​
0.31​
0.40​
0.48​
0.25​
Ron StewartRW
0.51​
0.52​
0.22​
0.29​
0.10​

There may have been something going on at the RW position, where Armstrong was more likely to play with Brewer-Baun or Brewer-Douglas and Stewart more likely to play with Stanley-Horton. There's less of a pattern, if any, with the LWs and Cs on these lines.
 

overpass

Registered User
Jun 7, 2007
5,271
2,808
This is awesome. Did you do any tests of this method in the "shift chart" era...?

Edit: Also, did you remove SH pluses (and ENG pluses and minuses, if obvious)...?

No, I didn't remove SH pluses. I don't think there's any way to consistently identify those in the plus-minus data, at least not without adding a lot more time to the process. That is an issue with the method. For players that played together on special teams, especially those on teams that scored and allowed a lot of SH goals, the plus-minus correlations will reflect both their EV goals together and the SH goals together.

I didn't do a lot of testing in the shift chart era, but I tried running it on the 2016-17 Ottawa Senators. Here are some results.

Erik Karlsson 2016-17
PartnerEV IceTime TogetherEV% (Player)EV% (Partner)+/- Correlation
Marc Methot
968​
60%​
81%​
0.78​
Dion Phaneuf
261​
16%​
18%​
0.16​
Cody Ceci
39​
2%​
3%​
-0.03​
Chris Wideman
60​
4%​
6%​
0.23​
Mark Borowiecki
114​
7%​
12%​
0.10​
Fredrik Claesson
133​
8%​
32%​
0.34​
Kyle Turris
508​
31%​
41%​
0.48​
Derick Brassard
486​
30%​
41%​
0.42​
Jean-Gabriel Pageau
459​
28%​
41%​
0.36​
Zack Smith
411​
25%​
43%​
0.30​

Karlsson had a +/- correlation of 0.78 with Marc Methot, so this method correctly identifies them as frequent partners. Looking at the actual ice time, Karlsson played 60% of his EV ice time with Methot. Methot played fewer minutes, so 81% of his EV ice time was with Karlsson.

Fredrik Claesson had the second highest +/- correlation with 0.34, and in fact Claesson did play 32% of his EV minutes with Karlsson. (Only 8% of Karlsson's EV minutes were with Claesson, because Claesson only played 33 games and fewer minutes per game.)

Wideman and Karlsson had a +/- correlation of 0.23, but in fact very rarely played together at EV. This correlation is low enough that I wouldn't draw any conclusions from it.

It looks like Karlsson played a pretty even % of minutes with each of the 4 centres, Turris, Brassard, Pageau, and Smith. Their +/- correlation ranged from 0.30 to 0.48, so that could just be random difference, or maybe points to Karlsson combining better with the more offensive forwards.

Dion Phaneuf 2016-17
PartnerEV IceTime TogetherEV% (Player)EV% (Partner)+/- Correlation
Erik Karlsson
261​
18%​
16%​
0.16​
Marc Methot
12​
1%​
1%​
0.06​
Cody Ceci
925​
63%​
61%​
0.63​
Chris Wideman
184​
13%​
20%​
0.20​
Mark Borowiecki
25​
2%​
3%​
0.33​
Fredrik Claesson
19​
1%​
5%​
0.00​
Kyle Turris
421​
29%​
34%​
0.27​
Derick Brassard
418​
29%​
35%​
0.33​
Jean-Gabriel Pageau
428​
29%​
38%​
0.48​
Zack Smith
355​
24%​
37%​
0.33​

Phaneuf's plus-minus correlation with Cody Ceci was 0.63. In fact, Phaneuf and Ceci were each other's most frequent partner, and Phaneuf played 63% of his EV minutes with Ceci.

Phaneuf's correlation of 0.33 with Mark Borowiecki, however, did not reflect them playing any time together. The two left defencemen were very infrequent partners. So this is a bit of a false positive and shows that correlations in the 0.3 range may not be based on playing together. Something to keep in mind for seasons where we don't have actual ice time data to check.

Phaneuf's plus-minus correlation with his second and third most frequent partners, Karlsson and Wideman, matches up pretty closely to the % of EV time they played together.

Phaneuf got a little more ice time (relative to the forward) with the depth centres Pageau and Smith than he did with 1st and 2nd liners Turris and Brassard, and his +/- correlation with them is a little higher. I don't know that I would draw too many conclusions from forward-defence +/- correlations in this range.

Chris Wideman
PartnerEV IceTime TogetherEV% (Player)EV% (Partner)+/- Correlation
Erik Karlsson
60​
6%​
4%​
0.23​
Dion Phaneuf
184​
20%​
13%​
0.20​
Marc Methot
52​
6%​
4%​
0.25​
Cody Ceci
56​
6%​
4%​
0.10​
Mark Borowiecki
437​
47%​
47%​
0.26​
Fredrik Claesson
131​
14%​
32%​
0.28​

Wideman didn't have a +/- correlation above 0.3 with any other defenders on the team, suggesting he didn't have a regular partner. In fact he did play 47% of his minutes with Mark Borowiecki, so we would have liked to see a higher correlation there. So we can't expect too much accuracy when looking at cases where the player didn't play the majority of his minutes with a single partner, especially not for a single season.
 

plusandminus

Registered User
Mar 7, 2011
1,404
268
Here's a statistical method for identifying which defencemen played together at even strength, using historical game log plus-minus data which goes back to 1959-60. It's most useful for historical seasons for which we don't have any ice-time data on which players played together.
...

It sounds as if you have an Excel sheet which shows the players' +/- for each game..?

To use the sheet to spot changes during season

For large quantities of data, your method obviously saves time in spotting the correlations.
But that's not all. For more depth one can just manually look at the sheet. We will for example see that the pairs' +/- will closely follow each other, and sometimes differ slightly. But we will also see if - for example - a player suddenly gets a different partner during the season, etc. Often players are injured and in those cases you can get strong clues as to which changes in pairings that were made. I myself find looking at the sheet manually is often more rewarding than just looking at the correlation numbers, and I can imagine you may think so too.

For example, if a player changed partner during the mid-season, he will not have a near 1 correlation with any partner. He may have .46 with player B and .30 with partner C. But when looking at the sheet, we may see that "Aha, player A and B seem to have played together during game 1-50, but then player A and D played together for game 51-80.". And we may find out that "Aha, player A and B actually had a .90 correlation during game 1-50!". And then we probably curiously want to check out things like "I wonder how much that change affected each player, or team overall performance" and so on...


The difficulty in constructing lineups based on data

In reality, we'll see that basically every regular defenceman share icetime (or +/-) with basically any other regular defenceman on the team. The pair may not shift at the exactly same time, etc., etc. We also have the cases where we have a star defenceman that the coach wants to have on the ice a lot, making him sometimes play with different partners. Like the Erik Karlsson and Marc Methot case, where Methot had more icetime % with Karlsson than vice versa.
Same with forwards, Looking at Gretzky's EDM, he seemed to double shift so much that it was almost like we had him centering two lines. He basically played with any winger on the team, although of course more with some (like Kurri).

These things may make it quite frustrating to construct line ups based on our data. It is sometimes basically impossible to write down three pairings and four lines, because in reality it varied throughout the season. And what to do with cases where we in reality had players (like Gretzky) double shifting, frequently appearing on say two out six lines, rather than one out of three or four. Or even three out of seven lines.


What would we want the lineups to look like?

On TV, the lineups looks so nice:
G1
D1-D2, D3-D4, D5-D6
L1-C1-R1, L2-C2-R2, L3-C3-R3, L4-C4-R4
(or during the past, using fewer players, like you Overpass illustrated)

But how would we want seasonal lineups to be showed?

Personally, I may actually find a table/sheet to be the best(?) way. We may use colors or things to make it even easier to spot who likely played with who when.

Or do we think a "TV like" lineup would do?
The problem is that some players may actually have spent only like 40 % or less with his "linemate".
What about trades and injuries during the season?
What about 3rd and 4th liners, where we may have had say six different left wingers playing at least 6 games?

Or maybe we want it as text?
Player A: game 1-50 partner B, game 51-80 partner D.
and so on...
But in reality, it may not be that easy.


More on determining who played with who

For older seasons, we didn't have much data to work with. We basically had the scoring logs, and from the 1980s(?) +/- on a per game basis. In more recent years, we do have per game +/- from the 1959-60 onwards.

At first, we used point shares to try to get a picture of who playing with who. But point shares makes for a quite small amount of data (points) to work with, especially for defencemen. We get results that may be very far from reflecting actual ice time with each other.

The approach of looking at +/- game by game (topic of this thread) is a step forward.
One can also combine the two methods (and hope that the results points to the same direction, which may not always be the case).
We'll sometimes find out that just because a player had an assist on a goal, doesn't mean he necessarily was on ice during the goal, although those cases are fortunately relatively rare.

We may also use factual ESGF and ESGA from the games. For example, it is often possible to deduct - based from the scoring stats and each players +/- from the game - exactly which goals he was on ice on. And we may use methods to "scientifically" further deduct the likelihoods of him being on ice during the other goals.
We can test our results by applying our algorithms to recent seasons.

Doing all the above, we should always be aware that the raw data we use is somewhat incorrect. We know by reviewing video footage that the +/- and even scoring stats from the 1972 Summit Series was wrong. That was a televised tournament with millions of viewers. Possibly, we may expect at least as many errors regarding regular season NHL games. We'll never know if Bobby Orr was +124, or +119 or maybe even +126. We'll never know if Maurice Richard actually scored 50 goals in 50 games or if it was a case where a teammate of his actually touched the puck after he touched it (or vice versa). Especially regarding +/-, we can likely expect many errors, just as in the Summit Series.

--

The difficult task of using +/- to determine how good a player or pair was

I started processing NHL stats in the mid 1990s when the Internet made it possible to access NHL stats online. During the 2002-03 season I wrote a program that downloaded ("scraped"?) shift by shift data from nhl.com, spent many hours cleaning it (there could be formatical errors, misspellings and all kinds of discrepancies), and stored it in an SQL Server database. I could thus see how often different players started shifts together, as well as who was on ice during scoring.

I soon learned a lot, for example...
Every regular player basically shares at least some icetime with basically every regular teammate.
Empty net goals and short handed goals affects +/- a lot!
When breaking things down, randomness seemed to affect players' statistics hugely.

If one wants to draw judgemental conclusions based on "how good did he/they play" based on +/-, it's quite hard to do so, since there are so many other factors "biasing" the numbers.

First thing to understand is how much +/- is affected by empty net goals and short handed goals. (I partly exemplified it some year(s) ago in the Housley vs Lowe thread.) Only count goals scored when both teams have a goalie and an equal number of players on ice. Otherwise we will just take an inherently biased stats and bias it even further.

Second is to understand randomness. Randomness plays a big role in hockey. Break down the above mentioned +/- on pairs of players, and you'll see a lot of numbers that seems strange and irregular, and that may change dramatically from season to season, Okay, if a pair is 18-2 in goal difference, while the rest of the team is average, it tells that they've been doing great. But 1-5 or 6-3 or even 15-19 basically won't tell us anything due to the influence of randomness. They next season the pairings just exemplified may get completely different results together.

Third we have all the other things biasing +/-. Like teammates (especially the goalie), opponents, coaching decisions, and so on. Most here hopefully have seen cases where prime aged players' +/- have changed from say -30 one season to +25 the next, for example when changing team but there are also lots of cases where they have continued on same team and maybe even same partner/linemates.
 
Last edited:
  • Like
Reactions: tarheelhockey

Ad

Upcoming events

Ad

Ad