# Secondary assists - new study

1. ### Hockey OutsiderRegistered User

Joined:
Jan 16, 2005
Messages:
6,424
4,495
Trophy Points:
216
SB Cash:
\$ 50,000
INTRODUCTION

As promised, I've done another analysis of secondary assists. All of the previous studies done in this area, including mine, have looked at the data using the same conceptual approach - examining year-over-year correlations. In other words - does knowing how many secondary assists a player records in YEAR X allow us to better estimate how many secondary assists, total assists, or points, a player will record in YEAR X+1. (The answer is pretty clear - yes it helps us predict points in the future, but not as well as goals or primary assists).

For this study, I've analyzed the data in a completely different way. To the best of my knowledge, nobody has done this type of analysis before. The question that I was trying to answer is - does knowing how many secondary assists a player records help us predict how well his team scores when he's on the ice?

An example might help illustrate this concept. Let's say two players have the following stat lines:

Player 1: 30 goals, 25 primary assists, 25 secondary assists
Player 2: 30 goals, 25 primary assists, 10 secondary assists

What we really care about is - does Player 1's team do better when he's on the ice, compared to Player 2? If, as some have argued, secondary assists are just statistical noise, then we'd expect that knowing how many secondary assists a player records wouldn't improve our predictive ability. If secondary assists are more than just statistical noise, then knowing the number of secondary assists a player earns would improve the accuracy of our predictive models. (Obviously, looking at only two players is meaningless - there's way too much context not taken into account - but I'm going to look at thousands of data points).

My approach is as follows. First, I'll definite the sample and validate the data. Second, I'll estimate how many 5-on-5 goals a player should be on the ice for, once we know his 5-on-5 goals and primary assists. Third, I'll do the same analysis, except this time I'll also use secondary assists, and see if we get a more useful prediction model.

The conclusion - which will soon be obvious from the data - is secondary assists have informational value.

2. ### Hockey OutsiderRegistered User

Joined:
Jan 16, 2005
Messages:
6,424
4,495
Trophy Points:
216
SB Cash:
\$ 50,000
DATA VALIDATION

Defining the sample

For this project, I'm looking at twelve seasons' worth of data - 2007-08 through 2018-19. I'm looking at forwards only, who have played at least 300 minutes of 5-on-5 ice time.

Source of data

All data has been taken from Natural Stat Trick

Number of players in sample each year

 Season Count 2008 396 2009 398 2010 397 2011 393 2012 396 2013 339 2014 393 2015 410 2016 405 2017 407 2018 415 2019 423 Total 4772
In total, I have 4,772 data points. The numbers fall within a narrow range from 2008 to 2017 (except for 2013) - between 393 and 407 players met my criteria each year. There were significantly fewer players from 2013 because that was the lockout-shortened season. There was a slight uptick in 2018 and 2019, because Las Vegas joined the league, so there were several more roster spots that became available.

Validating the data

It was important to validate the data. First, I'm not positive the data from Natural Stat Trick is completely accurate. Second, I wanted to make sure I didn't make any errors in compiling and organizing the data (I needed to combine data from multiple databases from that site).

As one example - let's look at John Tavares. NHL.com shows him as having 187 goals, 146 primary assists, and 68 secondary assists at 5-on-5. My data has him at 189, 146, and 68, respectively. So both assist figures agreed, goals are off by 2. Obviously I can't spot-check 4,000+ lines of data, but I checked some players. Most of the data is identical, but I found some small discrepancies here and there (like I did for Tavares). None were significant so I'm confident we have a good starting point for the analysis.

NHL.com, as far as I can tell, doesn't have 5-on-5 goals for data. So I cross-referred this to hockey-reference.com. They have Tavares at 557 5-on-5 GF through the 2019 season. My data has him at 556. Another immaterial difference, so again I think we have a good starting point for the analysis.

Last edited: Jan 11, 2020
Black Gold Extractor likes this.
3. ### Hockey OutsiderRegistered User

Joined:
Jan 16, 2005
Messages:
6,424
4,495
Trophy Points:
216
SB Cash:
\$ 50,000
ANALYSIS 1 - GOALS AND PRIMARY ASSISTS ONLY

What does the model mean?

The model above predicts that a player's team will score at a baseline rate of 0.81 goals per 60 minutes at 5-on-5, plus an extra 1.20 G/60 for every goal the player scores, plus an extra 1.25 G/60 for every primary assist the player records.

Is the model accurate?

The R^2 is 0.72, which means this is a fairly accurate model.

Does the model make sense?

There's nothing about it that strikes me as being obviously wrong - open to comments.

Last edited: Jan 11, 2020
4. ### Hockey OutsiderRegistered User

Joined:
Jan 16, 2005
Messages:
6,424
4,495
Trophy Points:
216
SB Cash:
\$ 50,000
ANALYSIS 2 - GOALS, PRIMARY ASSITS, AND SECONDARY ASSISTS

What does the model mean?

The model above predicts that a player's team will score at a baseline rate of 0.58 goals per 60 minutes at 5-on-5, plus an extra 1.10 G/60 for every goal the player scores, plus an extra 1.06 G/60 for every primary assist the player records, plus an extra 1.08 G/60 for every second assist the player records.

Is the model accurate?

The R^2 is 0.82, which means this is an accurate model. Note that this is a much higher result ("coefficient of determination", if you want to get technical) compared to the previous model. You can also see this pretty clearly by looking at the graph - the data is much closer to the trendline in this one.

Does the model make sense?

Compared to the previous model, this model is significantly lowering the baseline level of offense (that a team would score with effectively zero contributions from a player). It slightly reduces the value of goals and primary assists, while recognizing a new variable - the secondary assist.

From a technical standpoint - the model possibly suffers from a defect called "multicollinearity". When there are two predictor variables that are highly correlated (as primary and secondary assists are), the model might get confused about the relative predictive value of each variable. So it's possible that the coefficient for one one of the numbers is too high, and the other is too low. Before someone tells me that this invalidates the entire model - it doesn't. The overall accuracy of a model isn't affected (just the relative value of the coefficients within the model may be off).

Last edited: Jan 11, 2020
Black Gold Extractor likes this.
5. ### Hockey OutsiderRegistered User

Joined:
Jan 16, 2005
Messages:
6,424
4,495
Trophy Points:
216
SB Cash:
\$ 50,000
WHAT DOES THIS ALL MEAN?

The data is quite clear. Although we can put together a reasonably accurate model using only goals and primary assist data, the accuracy is clearly enhanced when we include secondary assists. This means that there's informational value in secondary assists. If there wasn't, there wouldn't be any meaningful difference between Model 1 and Model 2.

The only objection to this analysis that I can think of is someone might argue that this a case of the tail wagging the dog. The argument might be - players who are good at recording secondary assist don't cause their team to score; they record lots of secondary assists because their team scores a lot when they're on the ice.

My response to that is - first of all, we're looking at a gigantic set of data, covering 12 years worth of data and more than 4,700 player-seasons. It's almost certain that, in certain situations, players have racked up a lot of secondary assists by virtue of being on good teams. But there's no evidence whatsoever that this is the case for 4,700+ data points.

Second, look at which players recorded the most 5-on-5 secondary assists per 60 minutes. If we use 6,000 minutes (over the entire 12 year period) as a cutoff, the top twelve consists of H. Sedin, Benn, Getzlaf, Backstrom, St. Louis, Williams, Thornton, Kucherov, Scheifele, Ribiero, Datsyuk and Crosby. With the exception of Justin Williams, these are all top offensive players (McDavid would have been 2nd on the list, had he met the minutes cutoff). It seems obvious that the players recording the most secondary assists per 60 aren't depth players, leaching off the offensive talents of their teammates - they're among the best offensive players in the league, and the driving forces behind their teams.

My overall conclusion is secondary assists aren't statistical noise. If they were - Model 2 wouldn't have been much effective at predicting how many goals (per 60 minutes, at 5-on-5) a team scores when a player is on the ice.

Last edited: Jan 11, 2020
6. ### supsensRegistered User

Joined:
Oct 6, 2013
Messages:
4,243
996
Trophy Points:
109
SB Cash:
\$ 50,000

The 1.08 G/60 per secondary assist does seem to suggest it’s a little bit of “free points” seeing how it takes more goals to contribute to your assist. It’s close enough that it’s not many freebies at all tho
Looks fun

7. ### ZulussRegistered User

Joined:
May 19, 2011
Messages:
1,917
922
Trophy Points:
109
SB Cash:
\$ 50,000
You have panel data here (both time-series and cross-sectional dimensions), which means the standard OLS is not exactly applicable.
I do not work much with panel data, so I cannot give detailed advice off the bat (though I can read up a bit and refresh my memory), but at the very least you should have time fixed effects and team fixed effects (the latter would take care of the reverse causality you are talking about). You might want to cluster standard errors by those dimensions too.
If you are using Excel for the analysis, time fixed effects mean that you create a dummy for each year (dummy2005 = 1 if 2005, 0 otherwise, etc.) and use those variables on the RHS in addition to goals and assists. Ditto for team fixed effects.
Statistical packages will do all that for you, of course. For example, in SAS you just read up on proc surveyreg.

It would be nice to compare the R-squares between the two models. Strictly speaking, you can take the difference and do the F-test (you have to scale it by something). Cutting corners a bit, you just have to see if the extra variable (2nd assists) is significant (has t-stat>2).

Intuively, the results that A1 matters more than a goal (in the first regression) and that A2 matters more than A1 (in the second one) are weird, even though the difference is probably insignificant.

toxic poster likes this.
8. ### ZaideDevil Eyes Come

Joined:
Aug 11, 2009
Messages:
97,369
5,604
Trophy Points:
186
SB Cash:
\$ 50,000
Location:
あなたの頭
Thanks for sharing this. Obviously some good work behind this and you even validated the data, which is cool.

Allow me to challenge the results, however, as is done in every published scientific work. What I'm going to say isn't necessarily correct, and I didn't try to experiment with the data myself, so this is just an hypothesis in order to open a discussion on the meaning of the results.

I feel like the results aren't at all surprising, because the predictors (G_60, A1_60, A2_60) are inherently going to correlate with the response (on-ice GF_60, or oiGF_60 for short). Because the predictors are inherently already existing components of the response. For that reason, I would question the validity of these predictors.

In order to elaborate more on this challenge, let me define the predicted oiGF/60 as PoiGF_60 and the actual oiGF/60 as AoiGF_60. Let's also define Ai_60_i as the ith assist, and theorize about the existance of 3rd and maybe 4th assists (and i'll stop at i = 4, because there are generally 5 skaters on the ice, and I don't want goalies to be involved). Basically, the linear model you'd try to build, with no interraction between the variables, is :
, where the k are the coefficients.
It is my hypothesis that the higher i is, the higher the correlation between PoiGF_60 and AoiGF_60 would be, because the higher i is, the higher are the odds of PoiGF_60 actually approaching AoiGF_60. In other words, if up to 4 assists could be awarded on a goal, the odds that a player would get a point on said goal is higher. Basically, you end up in a situation where the more assists are awarded, the closer a player's point total will be to the number of goals scored by his team, because the player is more likely to get a point on any goal.

I think this would explain why the "predictive" value of the model has increased when adding secondary assists.

One thing to keep in mind when people refer to secondary assists being statistical noise is that it's when evaluating a player's offensive ability as a whole compared to other players. While it's true that good playmakers will indeed get more secondary assists, the noise occurs when lesser players get "free" secondary assists due to their teammates being good. In general, the impact of the 2nd assist will be lesser than the impact of the 1st assist, which itself is lesser than the impact of the goal (without going into the frivolous "without that 2nd assist that goal doesn't happen!" thingy, which is just a deterministic argument, which in my opinion and that of many analysts, is erroneous when comparing players). Basically, the noise is simply that a lesser playmaker is more likely to have a number of 2nd assists that isn't "in line" with his number of 1st assists.

Now that my challenging point has been established, I think it would still be interesting to further analyze the data with more advanced statistical techniques. Would you mind sharing your dataset (spreadsheet format or .csv or .txt) ? I have access to Minitab at work, and if I have time during one of my breaks, I could try running some analysis on it. I'm still learning Minitab so that would be a good way to experiment with it at the same time.

DominicBoltsFan likes this.
9. ### daverRegistered User

Joined:
Apr 4, 2003
Messages:
21,000
2,618
Trophy Points:
231
SB Cash:
\$ 50,000
There is certainly a correlation between the forwards with the most 1st assists and the most 2nd assists. I think you can point to a couple high end playmakers as being the beneficiary of simply being on the ice most notably Backstrom for a very obvious reason.

Other than that, assist numbers should speak for themselves.

Dominance likes this.
10. ### Hockey OutsiderRegistered User

Joined:
Jan 16, 2005
Messages:
6,424
4,495
Trophy Points:
216
SB Cash:
\$ 50,000
Fair point - yes, the data combines different years and that might skew the results slightly. Scoring has been fairly stable over the entire period (with a slight uptick in 2018 and 2019). I no longer have access to SAS or anything similar unfortunately - I'll see what the best option is for dealing with this point.

I haven't done an F-test but the extra variable is significant (t-stat is 50.1).

For the 2nd model - I agree that A2's counting more than A1's doesn't make a lot of sense, but as I mentioned, that's probably a result of there being some multicollinearity in the model.

11. ### Hockey OutsiderRegistered User

Joined:
Jan 16, 2005
Messages:
6,424
4,495
Trophy Points:
216
SB Cash:
\$ 50,000
These types of comments are welcome - no hard feelings, we're all trying to get to the right answer here.

First - I can't figure out how to attach the entire Excel file. I've attached the raw data as a text file - let me know if that works for you?

I understand your point though. It seems to be a question of causation. Are players getting secondary assists because they happen to be on the ice when their teammates score a goal (in which case this is almost a circular exercise)? Or are their teams scoring more goals because players are good at generating secondary assists (which, perhaps, could be a proxy for puck possession)? Hopefully the file I attached is useful - interested to see what you can come up with using Minitab.

As to the point about "lesser" players getting a disproportionate number of secondary assists - it's been well established that defensemen (as a whole) get much more secondary assists than forwards. I don't know if it's ever been rigorously examined for forwards (ie do bottom six forwards get a disproportionate number of secondary assists)?

File size:
376.6 KB
Views:
4
12. ### toxic posterRegistered User

Joined:
Dec 24, 2017
Messages:
592
346
Trophy Points:
55
SB Cash:
\$ 50,000
Location:
Queen's
I'll just talk about data validation as other people have voiced the same concerns about model complexity & multicollinearity above.

SAS has inherent features along with SQL, for you to validate data. There's also packages in R or Python for you validate entire data sets.

Big fan of the work you've done here tho, I don't think the non-complete validation makes a big difference (in-fact it probably makes little to none), just thought it would be worth for you to know about the resources available. Great way of looking at the data in general and the relationship between scoring & assist reproducibility.

Hockey Outsider likes this.
13. ### Filthy DanglesRegistered User

Joined:
Oct 23, 2014
Messages:
20,248
22,862
Trophy Points:
166
SB Cash:
\$ 50,000
Appreciate the work per usual HO.

Wish I could understand the finer details and back and forth going on in here a little more though. Most of it is Chinese. I need a crash course in statistics

Hockey Outsider likes this.
14. ### ZaideDevil Eyes Come

Joined:
Aug 11, 2009
Messages:
97,369
5,604
Trophy Points:
186
SB Cash:
\$ 50,000
Location:
あなたの頭
Thanks for the data. My challenging hypothesis, which I do think makes sense somewhat, does somewhat seem like a question of causation. Good players will be on the ice for more goals for, so they're more inclined to gather more points (whether it be goals, primary assists or secondary assists). And the more points are available on a goal, the more the predicted GF/60 would be in line with the actual GF/60.

I've attached the first analysis I did with Minitab (PDF file). Nothing complicated, basically just the same thing you did with GF/60 vs G/60, A1/60 and A2/60. Didn't go much in depth since I still have to learn how to do things properly on this new piece of software we recently acquired, so what I did required like just 3 clicks. Probably could get more graphs and stuff if I dug up a little. I think the main take from this is that all three predictors have a significant effect (as expected), and the Pareto chart is probably the most interesting graph shown, showing the relative importance of each predictor on the model response. These seem to be in line with some previous studies done on the value of secondary assists, though that could just be me. I'll let you take a look. If I find any more time to play with that I'll let you know.

File size:
224.8 KB
Views:
7
15. ### SinistrilRegistered User

Joined:
Oct 26, 2008
Messages:
792
88
Trophy Points:
81
SB Cash:
\$ 50,000

R is free and easy enough to learn. If you do anything with statistics in real-life, then it's an asset for pretty much any data science job. There also should be applicable statistical modules in Python or other languages like Matlab or you can code your own packages from scratch. (R is just the best I can think of for dealing with big data, I regularly use it for data much, much bigger than this, one of the easiest languages to learn, is directly made for data science and thus has the most packages and a good amount of support).

I'm assuming you used some language like Python to scrape the data?

Last edited: Jan 15, 2020
Hockey Outsider likes this.
16. ### Hockey OutsiderRegistered User

Joined:
Jan 16, 2005
Messages:
6,424
4,495
Trophy Points:
216
SB Cash:
\$ 50,000
Thanks for this. I wasn't aware that the software was available for free now. (Back in the day, it was available to university students, if not, it was prohibitively expensive).

No, it was old fashioned copying and pasting to gather all the data (which is why it was important for me to do some validation checks). Naturalstattrick is pretty user-friendly so the entire process of gathering the data took under an hour.

17. ### Hockey OutsiderRegistered User

Joined:
Jan 16, 2005
Messages:
6,424
4,495
Trophy Points:
216
SB Cash:
\$ 50,000
Thanks for sharing this. That Pareto chart is showing that all three variables (goals, primary assists, and secondary assists) are statistically significant - but goals is the biggest driver in the model, and secondary assists the least. Consistent with some of the other research that I've seen (including my own).

I have a couple of other ideas to address the "chicken or the egg" question (ie are players getting a lot of secondary assists because they're on the ice with high-scoring teammates, or are they causing their teammates to score more due to their playmaking and/or puck possession ability, which secondary assists might be a proxy for)?

Idea 1 - look at "participation rates", that is - the percentage of goals for that a team scores, that a player earns a point on. If players who rely more on secondary assists have a lower "participation rate" (on average), that might suggest that they're less directly involved in the effort. Maybe divide the data into five or ten quintiles/deciles (from high to low secondary assist percentages), and try to identify any patterns from there.

Idea 2 - look at this year's point production vs next year's goals for results. So if we look at players based on this year's production vs next year's goals for data - does the pattern still hold? I think this addresses the "circularity" question pretty well.

18. ### DudeWhereIsMakarBergevin sent me an offer sheet

Joined:
Apr 25, 2014
Messages:
11,826
3,245
Trophy Points:
162
SB Cash:
\$ 50,000
Gender:
Male
Location:
Winnipeg
I tend to find most secondary assists are usually made by the player who starts the play and first assists are usually coming from a good setup guy/secondary scorer.

19. ### supsensRegistered User

Joined:
Oct 6, 2013
Messages:
4,243
996
Trophy Points:
109
SB Cash:
\$ 50,000
Random question, do you have any idea how many goals have zero assist and how many have just 1 assist. Total points awarded for the goals scored?

20. ### YzingRegistered User

Joined:
Jan 7, 2020
Messages:
4
13
Trophy Points:
3
SB Cash:
\$ 50,000
Not precisely what you asked for but the 2018-19 regular season had 1.6796 assists per goal. They were unevenly distributed depending upon play:
ES: 1.6495
PP: 1.8997
SH: 1.0515

Stats from NHL.com.

Hockey Outsider likes this.
21. ### Hockey OutsiderRegistered User

Joined:
Jan 16, 2005
Messages:
6,424
4,495
Trophy Points:
216
SB Cash:
\$ 50,000
In the study that I did (2008-2019 seasons), looking at both F and D, and ES only, there were:
• 58,447 goals
• 54,692 primary assists
• 43,095 secondary assists
So that means around 74% of goals had two assists, 20% had only one assist, and 6% were unassisted - again at ES only.

EDIT - those numbers are for data from a previous study. Still a pretty big sample size.

Last edited: Jan 26, 2020
22. ### supsensRegistered User

Joined:
Oct 6, 2013
Messages:
4,243
996
Trophy Points:
109
SB Cash:
\$ 50,000
Lol dang, you got some numbers. Anything I think about requires a full time job and is way out of my league.
when you compiled those numbers were you working on a point share %. Kind of like this current project but the players contribution towards total points instead of total goals for the line?
Or I guess what I am asking is it a way to stop the number of goals from driving up the ‘value’ of secondary assists,

Last edited: Jan 26, 2020