NHL.com + Sochi 2014 Data URLs in JSON and JSONP

Discussion in 'By The Numbers' started by Step7750, Feb 1, 2014.

  1. tonydowny

    tonydowny Registered User

    Joined:
    Feb 2, 2016
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    0
  2. Kulstad

    Kulstad Registered User

    Joined:
    Nov 14, 2015
    Messages:
    5
    Likes Received:
    0
    Trophy Points:
    0

    Videocenter is out, too

    This is the video list for the Toronto - Boston game on Feb 2
    http://live.nhle.com/GameData/20152016/2015020744/gc/gcgm.jsonp

    ...however when using previously-stated formatting for specific video description, there's nothing
    http://video.nhl.com/videocenter/servlets/playlist?ids=2015020744-94-h&format=json
     
    Last edited: Feb 4, 2016
  3. Kane One

    Kane One Registered User

    Joined:
    Feb 6, 2010
    Messages:
    35,458
    Likes Received:
    408
    Trophy Points:
    125
    Location:
    Brooklyn, New NY
    I just found a new JSON for the standings, even though the last one still works.

    https://statsapi.web.nhl.com/api/v1...e.next,team.schedule.previous&season=20152016

    This does have a copyright field:

    "copyright" : "Copyright 2015 MLB Advanced Media, L.P. Use of any content on this page acknowledges agreement to the terms posted here http://gdx.mlb.com/components/copyright.txt"

    The text in that link is:

    Not sure if MLBAM will start enforcing this much stricter than the NHL has.
     
  4. Vesalius

    Vesalius Registered User

    Joined:
    Oct 21, 2015
    Messages:
    31
    Likes Received:
    0
    Trophy Points:
    0
    Best Source for Raw Data

    I've been doing some research in machine learning for a while now, and I want to apply some of the methods I've learned to sports data. What's the best source of raw team data for things like shot attempts, possession, etc. on an individual game basis? I'm new to hockey analytics, so I don't know what the best sites are yet.

    Thanks!
     
  5. Bear of Bad News

    Bear of Bad News HFBoards Escape Goat

    Joined:
    Sep 27, 2005
    Messages:
    6,266
    Likes Received:
    1,739
    Trophy Points:
    156
    Location:
    Windsor
  6. Pizzarena91

    Pizzarena91 hot and REaDy Wings

    Joined:
    May 31, 2010
    Messages:
    3,827
    Likes Received:
    8
    Trophy Points:
    56
    Location:
    MI
    Might be best to continue discussion in the other thread if so this one could be closed :)

    I have not found a one stop shop avenue for NHL or world hockey data. I mean there are web sites out there like HockeyDB and EliteProspects that have an enormous database of players but there is no download of the data set in its entirety. I think there are pay services if you wanted to get a serious website going. I have always created automated web scraping utilities to gather the data. Most sites display the data in 20-100 rows and you can do simple excel web queries or develop something more advanced in VB.net or PHP type scripts.

    The process is not for everyone as it can be extremely tedious. You have to script like mad to clean the data up to a meaningful format and then any change in the site can lead to a redesign. Also merging data sets can be a pain as well if you get data from multiple sources player name are not the same. Like Steve vs Steven kind of things can disconnect a players data between sources.
     
  7. lumiKone

    lumiKone Registered User

    Joined:
    Aug 6, 2016
    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    0
    Location:
    Helsinki
    I would like to start developing my algorithms before the season starts but finding game-by-game stats is surprisingly difficult. I wouldnt want to start learning how to decrypt JSON but looks like I might have to. import.io is a possibility but I have not even found a site yet where I could do the scraping from :(

    so instead of R+ML=42? looks like I will have to add acronyms JSON, PHP and MySQL to my list of things to learn. Which will certainly not hurt (in the long term) but it does require time.

    Is there a good info package somewhere I could use to do the minimum effort possible to get the JSON thingys to R? I know some Python and little SQL if those help.
     
  8. paulblack

    paulblack Registered User

    Joined:
    Aug 8, 2016
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    0


    I am also using import.io. But since it can't help me scrape data behind login. So I switched to Octoparse, a free web scraper. Have you used Octoparse to scrape data behind login using import.io or octoparse?
     
  9. HeresLookingAtEuclid

    HeresLookingAtEuclid Registered User

    Joined:
    May 6, 2017
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    0
    I'm wondering if anyone knows of a JSON equivalent to the NHL's TH and TV files (TOI Home and Visitor) with line items for each player's individual shifts?

    One place I've been looking is the stats.api.nhl.com server.

    I've found the editorial content feed and the schedule by date range feed. Since the editorial content is game-by-game, and the main play by play is also there, I could imagine more game-by-game stuff being there.

    statsapi.web.nhl.com/api/v1/game/2016020123/content
    statsapi.web.nhl.com/api/v1/game/2016020123/feed/live

    Then there's live.nhle.com/GameData. It has a variety of things; looks more live stuff than archival

    So a related question, I guess. I know of:

    /SeasonSchedule-20162017.json
    /GCScoreboard/2016-10-30.jsonp
    /20162017/2016020123/gc/gcsb.jsonp
    /20162017/2016020123/gc/gcbx.jsonp
    /20162017/2016020123/PlayByPlay.json
    /20162017/2016020123/Roster.jsonp

    The PlayByPlay strangely only has "most" of the game events; not all of them.
    The Roster seems to only be for older games; newer games don't seem to have it.

    Anyone know if there's a Shift feed? Or indeed of any other feeds other than the ones listed?
     
    Last edited: May 6, 2017
  10. morehockeystats

    morehockeystats Unusual hockey stats

    Joined:
    Dec 13, 2016
    Messages:
    341
    Likes Received:
    40
    Trophy Points:
    36
    Occupation:
    sysadmin
    Location:
    San Jose
    Home Page:
    What's wrong with the available HTML files?
     
  11. HeresLookingAtEuclid

    HeresLookingAtEuclid Registered User

    Joined:
    May 6, 2017
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    0
    There's quite a bit more information in the JSON feeds.

    Ice coordinates being the most obvious example, but also both game clock and UTC times for each event.

    The raw feed also use canonical player identification numbers so tracking information over time is easy even if they switch teams or jersey numbers.

    Also, the html that's produced has to be extensively cleaned before being able to pass a W3 validation, which is important if you're going to parse the html using DOM style tooling.

    I am parsing out everything from the HTML data but would strongly prefer having all that information from the "raw" feed.

    Bottom line is actually that it's important to have both; I have cases of HTML info filling in blanks on the JSON side and vice-versa. (I actually have three since I also parse out the ESPN feed)

    The Shift-by-Shift information, though, I only have on the HTML side and I have a number of games where it isn't available.

    I'm hoping to find a JSON feed to fill in those blanks. (And the play-by-play is, unfortunately, insufficient to re-create the shift information in anything except an approximate way).
     
  12. Doctor No

    Doctor No Registered User

    Joined:
    Oct 26, 2005
    Messages:
    7,887
    Likes Received:
    1,284
    Trophy Points:
    139
    Home Page:
    I love the user name, for what it's worth. :D
     
  13. morehockeystats

    morehockeystats Unusual hockey stats

    Joined:
    Dec 13, 2016
    Messages:
    341
    Likes Received:
    40
    Trophy Points:
    36
    Occupation:
    sysadmin
    Location:
    San Jose
    Home Page:
    I wonder, since NHL does stat correction - and it reflects in the HTMLs (I mostly work with HTMLs) - do the corrections make it to the JSONs as well?

    As for HTML parsing I haven't had any problems using HTML::TreeBuilder of Perl, with the exception of some really old ones that were truly broken beyond repair. No cleanups except of washing   out.
     
  14. HeresLookingAtEuclid

    HeresLookingAtEuclid Registered User

    Joined:
    May 6, 2017
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    0
  15. HeresLookingAtEuclid

    HeresLookingAtEuclid Registered User

    Joined:
    May 6, 2017
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    0
    I would bet that both the JSON and the HTML are built from the "raw" underlying data. Further, I think there's access to that raw underlying data for special partners.

    Evidence for this is that for games such as 2008-2009 Reg Season 259 (two away players with Jersey #5) and 1077 (two away players with Jersey #23) where the data problems apparently were enough to keep the HTML from getting built. The data DOES exist, though, since the ESPN Gamecast feed contains the full play-by-play. (2008-2009 was before the play-by-play was included in the NHL's JSON live/feed file -- that only started in 2010-2011)

    And clock discontinuities appear in both HTML and JSON (example: 2016-2017 Playoffs Game 0136 where Event #188 is a stoppage with 13:55 left in the 2nd period and Event #189 is a faceoff with 13:51 left in the second) [Events 191 and 192 in JSON]

    Also, while I noted that there was stuff in the JSON that wasn't in the HTML, that goes the other way, too. The HTML has the players on the ice for each team for each event. The live/feed JSON doesn't. (The live.nhle.com/.../PlayByPlay.json DOES have that information but does NOT have all the events; only some of them).

    Bottom line is that if you want to get as complete a picture as possible, you need to be parsing and consolidating information from many sources.

    Ultimately, the more sources the better; hence the question about wondering if anyone had additional feed URLs beyond the ones in my post...
     
  16. morehockeystats

    morehockeystats Unusual hockey stats

    Joined:
    Dec 13, 2016
    Messages:
    341
    Likes Received:
    40
    Trophy Points:
    36
    Occupation:
    sysadmin
    Location:
    San Jose
    Home Page:
    The question is whether the JSONs are rebuilt with stat corrections. I know HTMLs are, and at some stage before JSONs weren't. I don't care about shot locations, but I care about deep history, so I have scraped HTMLs. Now I see the JSONs finally appear for the live feed since 1987 (like original HTML boxscores), so I will learn what I can harvest there. I wasn't aware of JSONs beyond the feed/live, I'll study them too.

    As for errors, I have a dedicated page for them, although not updated for this season:
    http://morehockeystats.com/data/brokenfiles
     
  17. Kane One

    Kane One Registered User

    Joined:
    Feb 6, 2010
    Messages:
    35,458
    Likes Received:
    408
    Trophy Points:
    125
    Location:
    Brooklyn, New NY
    Well the data flow for modern sites is generally Javascript to make an API call to the back-end to fetch the JSON, then populate the view with the JSON, so it would make sense the JSON is updated, unless there is another API the page uses for corrections.
     
  18. morehockeystats

    morehockeystats Unusual hockey stats

    Joined:
    Dec 13, 2016
    Messages:
    341
    Likes Received:
    40
    Trophy Points:
    36
    Occupation:
    sysadmin
    Location:
    San Jose
    Home Page:
    Or there's an auxiliary db used in JSON (during live action), which gets dumped in the main db, gets correction and re-publishes the HTMLs. The JSONs stay with the out-of-sync auxiliary db.
     
  19. HeresLookingAtEuclid

    HeresLookingAtEuclid Registered User

    Joined:
    May 6, 2017
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    0
    If you had an example of an updated data point, I'd be happy to look it up in the JSON and report back whether it contains the old or the new data...
     
  20. Kane One

    Kane One Registered User

    Joined:
    Feb 6, 2010
    Messages:
    35,458
    Likes Received:
    408
    Trophy Points:
    125
    Location:
    Brooklyn, New NY
    That would then require the viewers to refresh the page. There must be at least one JSON file that's polled every few seconds.
     
  21. morehockeystats

    morehockeystats Unusual hockey stats

    Joined:
    Dec 13, 2016
    Messages:
    341
    Likes Received:
    40
    Trophy Points:
    36
    Occupation:
    sysadmin
    Location:
    San Jose
    Home Page:
    No, I don't. I'll improve my logging of that stuff for the next season.
     
  22. morehockeystats

    morehockeystats Unusual hockey stats

    Joined:
    Dec 13, 2016
    Messages:
    341
    Likes Received:
    40
    Trophy Points:
    36
    Occupation:
    sysadmin
    Location:
    San Jose
    Home Page:
    Why? The HTMLs are static and have a special directory when they reside... The mechanism of their publishing has been there since 1999 or something.
    I agree that the scenario is unlikely, but with the way the current website runs you never know.
     
  23. Kane One

    Kane One Registered User

    Joined:
    Feb 6, 2010
    Messages:
    35,458
    Likes Received:
    408
    Trophy Points:
    125
    Location:
    Brooklyn, New NY
    If I'm on the page and NHL decides to republish the page (which is very archaic), I will not see those changes until I refresh.
     
  24. morehockeystats

    morehockeystats Unusual hockey stats

    Joined:
    Dec 13, 2016
    Messages:
    341
    Likes Received:
    40
    Trophy Points:
    36
    Occupation:
    sysadmin
    Location:
    San Jose
    Home Page:
    The pages post-2006 have a small piece of self-refreshing javascript.
     
  25. Kane One

    Kane One Registered User

    Joined:
    Feb 6, 2010
    Messages:
    35,458
    Likes Received:
    408
    Trophy Points:
    125
    Location:
    Brooklyn, New NY
    To refresh the whole page? No, they don't. That's a terrible idea.

    You should make an API call instead and only refresh the data on the page instead of refreshing the whole page.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice
monitoring_string = "358c248ada348a047a4b9bb27a146148"