Database of all plays

notdave

Registered User
Apr 26, 2013
38
18
Ottawa, ON
Hi all,

I've been working on compiling a database of all plays for all games, using the NHL.com html Play By Play reports.

My plan is to get this all into an online database that is very user friendly and easily filterable, so anyone can extract, analyze and calculate whatever they want.

I exported one game's worth of the work-in-progress database to Excel and have attached it to this post. Could you guys take a look at this and suggest additional fields that should be added or changes that should be made?

I'm already working on cleaning up the player (PL1 to PL12) fields, so don't worry about those for now.

Thanks,
Dave
 

Attachments

  • Game 989.xlsx
    68 KB · Views: 55

SaskRinkRat

Registered User
Apr 1, 2010
502
0
Looks really good. Incidentally, I've been working on basically the exact same project over the past few weeks.

One thing I noticed - and perhaps you have a more sophisticated reformatting process than mine which would make this point irrelevant - is that sometimes the "player" columns get shifted, so that the first 6 don't always belong to one team and the next 6 the other team. I was using a fairly rudimentary scraping technique with excel, so if your's is more sophisticated, this might not be an issue.

The only other column I'd like to see you add, off the top of my head, is the shot distance column.

Also, I know you said you're still working on the player columns, so you might have this covered, but it might be wise to keep the formatting of the players in those columns the same as the players in the previous columns. So, for example, if you use [Team] [Number] [Last Name] format it the hit column, you might also want to use that same format in the PL1 column.

Great work though.
 

ponder

Registered User
Jul 11, 2007
16,966
6,296
Vancouver
In the fields describing the shooter, scorer, etc., it might be nice to transform the data before uploading it to a database so that every player has a unique id. Using data in the format in this spreadsheet will mean that one player will have different ids at different times if he changes teams, changes his number, etc. If you give each player a unique id in these fields, it will be much easier to look up any of these stats (penalties drawn, hits, etc.) for ever player over any amount of time.
 

HolyShot*

Guest
One thing that you could do is include hyperlinks to the big stuff like goals, assists, hits, saves, blocks, etc. that link to videos. That would be bad ass.
 

notdave

Registered User
Apr 26, 2013
38
18
Ottawa, ON
Thanks for the input, everyone.

Sask: Added Shot Distance per your suggestion, along with shot type.

Ponder: I'm working on a master list of players, and will be sure to add a unique ID. Wonder if I could just iterate through the NHL.com player pages (ex http://www.nhl.com/ice/player.htm?id=8470121) easily and use their identifiers.

Hitman: That's a good idea. I'll look into it for sure. The NHL.com boxscores link to the goals using javascript, which isn't my area of expertise, but I'll put it on the list.
 

66871

Registered User
May 17, 2009
2,514
716
Maine
Looks pretty good. However, a few comments.

I'm unclear about the shot data that is coming out. It looks like there are shots taken and then a subset of that is shots on goal (so you have miss types including out of play or wide right or, I think, blocked)

Secondly, maybe think about two forms of output. One would be a straight table and the other would be a normalized database with player IDs, team IDs, game IDs and a player_game relationship table which also indicated what team the player was on for that particular game, his position and what number he wore (trades). Such a set up would lend itself better to querying than repeating the team and sweater number info in every field

Since you have a category for strength, why not pack some more info into the data you are storing. Rather than EV or blank, maybe characterize the play with a number 3.3 for three on three, 4.3 if the home team has four men on the ice and the away team has three. etc etc. So your values would be 3.3, 4.4, 5.5, 3.5, 4.5, 5.3, 5.4. So if I was interested in all the five on three face-off data I would only need to filter for 3.5 and 5.3. Also, in gives a better indication of 4 on 4 and 3 on 3 situations.

I notice that the DESCR field would generally serve as Stoppage Type field except that sometimes there is mention of a TV time-out or a team calling timeout.

If it was me, I would take those timeouts and simply add them in as the next row. In reality they are a separate event. It's just that they are lumped in because the clock time is the same.

Are you going to try to parse out if a team has its goalie pulled?

Finally, perhaps a field which lists the penalty that is called (holding, boarding etc.).
 

number72

Registered User
Oct 9, 2011
6,150
3
Hi all,

I've been working on compiling a database of all plays for all games, using the NHL.com html Play By Play reports.

My plan is to get this all into an online database that is very user friendly and easily filterable, so anyone can extract, analyze and calculate whatever they want.

I exported one game's worth of the work-in-progress database to Excel and have attached it to this post. Could you guys take a look at this and suggest additional fields that should be added or changes that should be made?

I'm already working on cleaning up the player (PL1 to PL12) fields, so don't worry about those for now.

Thanks,
Dave

Is it possible to break out shots into multiple fields.
Shot distance
shot type

That is, if I want to look at shot distances I have no easy way to get that information from your table without editing it. As a stand alone column it would be more useful.
 

pucka lucka

Registered User
Apr 7, 2010
5,913
2,581
Ottawa
Not to derail but, I can create any format/view you want based on the way I import the data. I store all the games as streams of events. Mine also combines the json & html reports(includes event location data), links the nhl.com player id. I have a DB of 17000 players, as well as every draft. I am almost done. I am just cleaning up the handling of shootouts.

If anyone thinks this is interesting I will keep working on it. I am also doing all the shift reports.
 
Last edited:

DL44

Status quo
Sep 26, 2006
17,904
3,827
Location: Location:
300 (at least) 'plays' a game x 1230 gms/yr... that would be a pretty insane amount of data and info!

Its how the information can be used once it's all in there thats intriguing!

i.e.
-Player X was involved in Y plays the most.... in the 2nd period..
-Player W's average shot distance was Z..
-or W led the league in backhand goals...
-or most goals from within 5 feet...
-or player G led the league in hits in the 1st period..
- players are injured the most at the X point of the game.
-most offsides occur at X point of the game. hell.. just Offsides/gm alone..
- most givaways in the Ozone... or most giveaways in the Dzone?
-what teams put the puck out of the rink the most..

endless!
 
Last edited:

Ad

Upcoming events

Ad

Ad