NHL.com + Sochi 2014 Data URLs in JSON and JSONP

Kulstad

Registered User
Nov 14, 2015
5
0
Looks like the NHL.com site redesign knocked out some of the links, or at least the News feed.

http://www.nhl.com/ice/feeds.htm has data up to Jan 31st, then nothing. Same with the NHL's XML feeds, all dead, like this one: http://cdn.nhl.com/rss/news.xml

Anyone find an alternate?


Videocenter is out, too

This is the video list for the Toronto - Boston game on Feb 2
http://live.nhle.com/GameData/20152016/2015020744/gc/gcgm.jsonp

...however when using previously-stated formatting for specific video description, there's nothing
http://video.nhl.com/videocenter/servlets/playlist?ids=2015020744-94-h&format=json
 
Last edited:

Kane One

Moderator
Feb 6, 2010
43,262
10,874
Brooklyn, New NY
I just found a new JSON for the standings, even though the last one still works.

https://statsapi.web.nhl.com/api/v1...e.next,team.schedule.previous&season=20152016

This does have a copyright field:

"copyright" : "Copyright 2015 MLB Advanced Media, L.P. Use of any content on this page acknowledges agreement to the terms posted here http://gdx.mlb.com/components/copyright.txt"

The text in that link is:

The accounts, descriptions, data and presentation in the referring page (the "Materials") are proprietary content of MLB Advanced Media, L.P ("MLBAM").
Only individual, non-commercial, non-bulk use of the Materials is permitted and any other use of the Materials is prohibited without prior written authorization from MLBAM.
Authorized users of the Materials are prohibited from using the Materials in any commercial manner other than as expressly authorized by MLBAM.

Not sure if MLBAM will start enforcing this much stricter than the NHL has.
 

Vesalius

Registered User
Oct 21, 2015
31
0
Best Source for Raw Data

I've been doing some research in machine learning for a while now, and I want to apply some of the methods I've learned to sports data. What's the best source of raw team data for things like shot attempts, possession, etc. on an individual game basis? I'm new to hockey analytics, so I don't know what the best sites are yet.

Thanks!
 

Retire91

Stevey Y you our Guy
May 31, 2010
6,163
1,580
Might be best to continue discussion in the other thread if so this one could be closed :)

I have not found a one stop shop avenue for NHL or world hockey data. I mean there are web sites out there like HockeyDB and EliteProspects that have an enormous database of players but there is no download of the data set in its entirety. I think there are pay services if you wanted to get a serious website going. I have always created automated web scraping utilities to gather the data. Most sites display the data in 20-100 rows and you can do simple excel web queries or develop something more advanced in VB.net or PHP type scripts.

The process is not for everyone as it can be extremely tedious. You have to script like mad to clean the data up to a meaningful format and then any change in the site can lead to a redesign. Also merging data sets can be a pain as well if you get data from multiple sources player name are not the same. Like Steve vs Steven kind of things can disconnect a players data between sources.
 

lumiKone

Registered User
Aug 6, 2016
2
0
Helsinki
I would like to start developing my algorithms before the season starts but finding game-by-game stats is surprisingly difficult. I wouldnt want to start learning how to decrypt JSON but looks like I might have to. import.io is a possibility but I have not even found a site yet where I could do the scraping from :(

so instead of R+ML=42? looks like I will have to add acronyms JSON, PHP and MySQL to my list of things to learn. Which will certainly not hurt (in the long term) but it does require time.

Is there a good info package somewhere I could use to do the minimum effort possible to get the JSON thingys to R? I know some Python and little SQL if those help.
 

paulblack

Registered User
Aug 8, 2016
1
0
I just stumbled upon an incredibly handy tool you guys would love.

http://www.import.io.

As an example, enter: http://www.nhl.com/ice/standings.htm?type=lea#&navid=nav-stn-league

It returns a perfect table of the standings.

If you download the desktop version, you're given an API key so you could use their Python script to get a .json file as output. There are things you could use the program for, but the Python API is the only one I'll need so far.

Edit - It is free, by the way. Don't let that Pricing link on top scare you since it's mainly for people/companies with a bigger need.



I am also using import.io. But since it can't help me scrape data behind login. So I switched to Octoparse, a free web scraper. Have you used Octoparse to scrape data behind login using import.io or octoparse?
 

HeresLookingAtEuclid

Registered User
May 6, 2017
6
0
I'm wondering if anyone knows of a JSON equivalent to the NHL's TH and TV files (TOI Home and Visitor) with line items for each player's individual shifts?

One place I've been looking is the stats.api.nhl.com server.

I've found the editorial content feed and the schedule by date range feed. Since the editorial content is game-by-game, and the main play by play is also there, I could imagine more game-by-game stuff being there.

statsapi.web.nhl.com/api/v1/game/2016020123/content
statsapi.web.nhl.com/api/v1/game/2016020123/feed/live

Then there's live.nhle.com/GameData. It has a variety of things; looks more live stuff than archival

So a related question, I guess. I know of:

/SeasonSchedule-20162017.json
/GCScoreboard/2016-10-30.jsonp
/20162017/2016020123/gc/gcsb.jsonp
/20162017/2016020123/gc/gcbx.jsonp
/20162017/2016020123/PlayByPlay.json
/20162017/2016020123/Roster.jsonp

The PlayByPlay strangely only has "most" of the game events; not all of them.
The Roster seems to only be for older games; newer games don't seem to have it.

Anyone know if there's a Shift feed? Or indeed of any other feeds other than the ones listed?
 
Last edited:

morehockeystats

Unusual hockey stats
Dec 13, 2016
617
296
Columbus
morehockeystats.com
I'm wondering if anyone knows of a JSON equivalent to the NHL's TH and TV files (TOI Home and Visitor) with line items for each player's individual shifts?

One place I've been looking is the stats.api.nhl.com server.

I've found the editorial content feed and the schedule by date range feed. Since the editorial content is game-by-game, and the main play by play is also there, I could imagine more game-by-game stuff being there.

statsapi.web.nhl.com/api/v1/game/2016020123/content
statsapi.web.nhl.com/api/v1/game/2016020123/feed/live

Then there's live.nhle.com/GameData. It has a variety of things; looks more live stuff than archival

So a related question, I guess. I know of:

/SeasonSchedule-20162017.json
/GCScoreboard/2016-10-30.jsonp
/20162017/2016020123/gc/gcsb.jsonp
/20162017/2016020123/gc/gcbx.jsonp
/20162017/2016020123/PlayByPlay.json
/20162017/2016020123/Roster.jsonp

The PlayByPlay strangely only has "most" of the game events; not all of them.
The Roster seems to only be for older games; newer games don't seem to have it.

Anyone know if there's a Shift feed? Or indeed of any other feeds other than the ones listed?

What's wrong with the available HTML files?
 

HeresLookingAtEuclid

Registered User
May 6, 2017
6
0
There's quite a bit more information in the JSON feeds.

Ice coordinates being the most obvious example, but also both game clock and UTC times for each event.

The raw feed also use canonical player identification numbers so tracking information over time is easy even if they switch teams or jersey numbers.

Also, the html that's produced has to be extensively cleaned before being able to pass a W3 validation, which is important if you're going to parse the html using DOM style tooling.

I am parsing out everything from the HTML data but would strongly prefer having all that information from the "raw" feed.

Bottom line is actually that it's important to have both; I have cases of HTML info filling in blanks on the JSON side and vice-versa. (I actually have three since I also parse out the ESPN feed)

The Shift-by-Shift information, though, I only have on the HTML side and I have a number of games where it isn't available.

I'm hoping to find a JSON feed to fill in those blanks. (And the play-by-play is, unfortunately, insufficient to re-create the shift information in anything except an approximate way).
 

morehockeystats

Unusual hockey stats
Dec 13, 2016
617
296
Columbus
morehockeystats.com
There's quite a bit more information in the JSON feeds.

Ice coordinates being the most obvious example, but also both game clock and UTC times for each event.

The raw feed also use canonical player identification numbers so tracking information over time is easy even if they switch teams or jersey numbers.

Also, the html that's produced has to be extensively cleaned before being able to pass a W3 validation, which is important if you're going to parse the html using DOM style tooling.

I am parsing out everything from the HTML data but would strongly prefer having all that information from the "raw" feed.

Bottom line is actually that it's important to have both; I have cases of HTML info filling in blanks on the JSON side and vice-versa. (I actually have three since I also parse out the ESPN feed)

The Shift-by-Shift information, though, I only have on the HTML side and I have a number of games where it isn't available.

I'm hoping to find a JSON feed to fill in those blanks. (And the play-by-play is, unfortunately, insufficient to re-create the shift information in anything except an approximate way).

I wonder, since NHL does stat correction - and it reflects in the HTMLs (I mostly work with HTMLs) - do the corrections make it to the JSONs as well?

As for HTML parsing I haven't had any problems using HTML::TreeBuilder of Perl, with the exception of some really old ones that were truly broken beyond repair. No cleanups except of washing   out.
 

HeresLookingAtEuclid

Registered User
May 6, 2017
6
0
I wonder, since NHL does stat correction - and it reflects in the HTMLs (I mostly work with HTMLs) - do the corrections make it to the JSONs as well?

As for HTML parsing I haven't had any problems using HTML::TreeBuilder of Perl, with the exception of some really old ones that were truly broken beyond repair. No cleanups except of washing   out.

I would bet that both the JSON and the HTML are built from the "raw" underlying data. Further, I think there's access to that raw underlying data for special partners.

Evidence for this is that for games such as 2008-2009 Reg Season 259 (two away players with Jersey #5) and 1077 (two away players with Jersey #23) where the data problems apparently were enough to keep the HTML from getting built. The data DOES exist, though, since the ESPN Gamecast feed contains the full play-by-play. (2008-2009 was before the play-by-play was included in the NHL's JSON live/feed file -- that only started in 2010-2011)

And clock discontinuities appear in both HTML and JSON (example: 2016-2017 Playoffs Game 0136 where Event #188 is a stoppage with 13:55 left in the 2nd period and Event #189 is a faceoff with 13:51 left in the second) [Events 191 and 192 in JSON]

Also, while I noted that there was stuff in the JSON that wasn't in the HTML, that goes the other way, too. The HTML has the players on the ice for each team for each event. The live/feed JSON doesn't. (The live.nhle.com/.../PlayByPlay.json DOES have that information but does NOT have all the events; only some of them).

Bottom line is that if you want to get as complete a picture as possible, you need to be parsing and consolidating information from many sources.

Ultimately, the more sources the better; hence the question about wondering if anyone had additional feed URLs beyond the ones in my post...
 

morehockeystats

Unusual hockey stats
Dec 13, 2016
617
296
Columbus
morehockeystats.com
I would bet that both the JSON and the HTML are built from the "raw" underlying data. Further, I think there's access to that raw underlying data for special partners.

Evidence for this is that for games such as 2008-2009 Reg Season 259 (two away players with Jersey #5) and 1077 (two away players with Jersey #23) where the data problems apparently were enough to keep the HTML from getting built. The data DOES exist, though, since the ESPN Gamecast feed contains the full play-by-play. (2008-2009 was before the play-by-play was included in the NHL's JSON live/feed file -- that only started in 2010-2011)

And clock discontinuities appear in both HTML and JSON (example: 2016-2017 Playoffs Game 0136 where Event #188 is a stoppage with 13:55 left in the 2nd period and Event #189 is a faceoff with 13:51 left in the second) [Events 191 and 192 in JSON]

Also, while I noted that there was stuff in the JSON that wasn't in the HTML, that goes the other way, too. The HTML has the players on the ice for each team for each event. The live/feed JSON doesn't. (The live.nhle.com/.../PlayByPlay.json DOES have that information but does NOT have all the events; only some of them).

Bottom line is that if you want to get as complete a picture as possible, you need to be parsing and consolidating information from many sources.

Ultimately, the more sources the better; hence the question about wondering if anyone had additional feed URLs beyond the ones in my post...
The question is whether the JSONs are rebuilt with stat corrections. I know HTMLs are, and at some stage before JSONs weren't. I don't care about shot locations, but I care about deep history, so I have scraped HTMLs. Now I see the JSONs finally appear for the live feed since 1987 (like original HTML boxscores), so I will learn what I can harvest there. I wasn't aware of JSONs beyond the feed/live, I'll study them too.

As for errors, I have a dedicated page for them, although not updated for this season:
http://morehockeystats.com/data/brokenfiles
 

Kane One

Moderator
Feb 6, 2010
43,262
10,874
Brooklyn, New NY
I wonder, since NHL does stat correction - and it reflects in the HTMLs (I mostly work with HTMLs) - do the corrections make it to the JSONs as well?

As for HTML parsing I haven't had any problems using HTML::TreeBuilder of Perl, with the exception of some really old ones that were truly broken beyond repair. No cleanups except of washing   out.

Well the data flow for modern sites is generally Javascript to make an API call to the back-end to fetch the JSON, then populate the view with the JSON, so it would make sense the JSON is updated, unless there is another API the page uses for corrections.
 

morehockeystats

Unusual hockey stats
Dec 13, 2016
617
296
Columbus
morehockeystats.com
Well the data flow for modern sites is generally Javascript to make an API call to the back-end to fetch the JSON, then populate the view with the JSON, so it would make sense the JSON is updated, unless there is another API the page uses for corrections.

Or there's an auxiliary db used in JSON (during live action), which gets dumped in the main db, gets correction and re-publishes the HTMLs. The JSONs stay with the out-of-sync auxiliary db.
 

HeresLookingAtEuclid

Registered User
May 6, 2017
6
0
The question is whether the JSONs are rebuilt with stat corrections. I know HTMLs are, and at some stage before JSONs weren't. I don't care about shot locations, but I care about deep history, so I have scraped HTMLs. Now I see the JSONs finally appear for the live feed since 1987 (like original HTML boxscores), so I will learn what I can harvest there. I wasn't aware of JSONs beyond the feed/live, I'll study them too.

As for errors, I have a dedicated page for them, although not updated for this season:
http://morehockeystats.com/data/brokenfiles

If you had an example of an updated data point, I'd be happy to look it up in the JSON and report back whether it contains the old or the new data...
 

Kane One

Moderator
Feb 6, 2010
43,262
10,874
Brooklyn, New NY
Or there's an auxiliary db used in JSON (during live action), which gets dumped in the main db, gets correction and re-publishes the HTMLs. The JSONs stay with the out-of-sync auxiliary db.

That would then require the viewers to refresh the page. There must be at least one JSON file that's polled every few seconds.
 

morehockeystats

Unusual hockey stats
Dec 13, 2016
617
296
Columbus
morehockeystats.com
That would then require the viewers to refresh the page. There must be at least one JSON file that's polled every few seconds.

Why? The HTMLs are static and have a special directory when they reside... The mechanism of their publishing has been there since 1999 or something.
I agree that the scenario is unlikely, but with the way the current website runs you never know.
 

Kane One

Moderator
Feb 6, 2010
43,262
10,874
Brooklyn, New NY
Why? The HTMLs are static and have a special directory when they reside... The mechanism of their publishing has been there since 1999 or something.
I agree that the scenario is unlikely, but with the way the current website runs you never know.

If I'm on the page and NHL decides to republish the page (which is very archaic), I will not see those changes until I refresh.
 

Ad

Upcoming events

Ad

Ad