Web Scraping 101

Bear of Bad News

Your Third or Fourth Favorite HFBoards Admin
Sep 27, 2005
13,553
27,137
I have to admit that I'm exceptionally naive in this regard, by the way - 100% of the data on my website has been entered in by hand. Part of that is because of trust issues (the issue I noted about the NHL's special teams goaltending statistics? For minor leagues, multiply the magnitude by one hundred). Most of that is because I just don't know how to do it.

If there's someone out there with web scraping background and time to spare, this is what I'd like to do (as an intro):

For each 2013-14 regular season boxscore on NHL.com - typically in this format:
http://www.nhl.com/scores/htmlreports/20132014/GS020001.HTM

I'd like to scrape off the date, each team's two dressed goaltenders, and their sweater numbers.

My intermediate goal is twofold - to flesh out my team logs to include any goaltenders dressed during the season, and also to catch any sweater numbers that I may have missed. For instance, 2013-14 for Buffalo:

http://www.hockeygoalies.org/bio/nhl/buffalo.html

My larger goal would be for someone to show me how this is done in practice, publish the code here, and then I could see how it's actually done (I'm surprisingly good at modifying something that already exists).

Anyone interested in assisting?
 

Bear of Bad News

Your Third or Fourth Favorite HFBoards Admin
Sep 27, 2005
13,553
27,137
Oh my God, pnep - I love you! :handclap:

This is awesome - thank you so much; I wasn't familiar with TextPipe (until now) but I'm going to get facile with this if it kills me.
 

Bear of Bad News

Your Third or Fourth Favorite HFBoards Admin
Sep 27, 2005
13,553
27,137
Ooh nice - thanks! I know python decently well; I'll have to play around a bit. :handclap:
 

Ad

Upcoming events

Ad

Ad