Data Scrape Needed - CHLStats

sfan

Registered User
Jun 26, 2013
573
0
Ottawa
thing is nobody has those stats for the CHL. What people have very likely done is extrapolate the shot stats to make various "advanced" stats.

Exactly, its about extrapolations and building sample size. For example, if i understand correctly, estimated player TOI is derived from player specific event data on the game sheets. Shots, points, +/-, special teams utilization. I have never seen the algorithms and coefficients may be used by some because event data has bias.
 

sfan

Registered User
Jun 26, 2013
573
0
Ottawa
yeah Larry Hoover sent me a PM and asked me if I could build one. I'm working on it at the moment. Don't have much experience in HTML/CSS/JS and so on but I started learning it last weekend as my exams are now over (planned to learn it anyways).

I may share a dropbox folder or something of the sort before the website is actually done so people could benefit from the data.

I thought about making a more open website dedicated to scouting as you pointed out rather than just sharing stats. I also plan on sharing stats for the USHL, NCAA, European leagues and so on.

Let me know if you have any ideas you'd want me to implement.

Just wondering how this is progressing. Do you have a project folder on Github? Would you like help with something like a project mediawiki?
 

Mathletic

Registered User
Feb 28, 2002
15,777
407
Ste-Foy
Just wondering how this is progressing. Do you have a project folder on Github? Would you like help with something like a project mediawiki?

Coming along. I'm also working on 2 other projects at the moment, so I'm not yet fully dedicated to making the website as more important stuff came up in 1 of my other projects but I should be able to post something before long.
 

phillipsj89

Registered User
Jan 9, 2012
1,123
54
Canada
Well done! What are your plans and priorities for new features and or the rest of the CHL?

As of now, The stats guy, @rdfigs19 is working on Home and Away records for the team sections.

Next to come for players stats is Even strength. As of now, no "big" plans to integrate the other leagues. With that said, it is a logical step.
 

sfan

Registered User
Jun 26, 2013
573
0
Ottawa
As of now, The stats guy, @rdfigs19 is working on Home and Away records for the team sections.

Next to come for players stats is Even strength. As of now, no "big" plans to integrate the other leagues. With that said, it is a logical step.

I'm a big fan of even strength analysis, so looking forward to that.

I like the way your table is set up to allow scrolling while freezing the player name columns. That said, the table doesn't seem to support copy&paste into excel. You may be able to find a simple export to csv widget that could solve that.

It would also be helpful to filter for player age.

Do you plan to do an eTOI?

Finally, I follow the OHL so you'd have a grateful fan if you'd add that league next ;-)
 

phillipsj89

Registered User
Jan 9, 2012
1,123
54
Canada
I'm a big fan of even strength analysis, so looking forward to that.

I like the way your table is set up to allow scrolling while freezing the player name columns. That said, the table doesn't seem to support copy&paste into excel. You may be able to find a simple export to csv widget that could solve that.

It would also be helpful to filter for player age.


Do you plan to do an eTOI?

Finally, I follow the OHL so you'd have a grateful fan if you'd add that league next ;-)

- While exporting as CSV is not formuly in the plans, I think it's something i'm gonna add, sooner rather than later.

- Player Age filter is in the works.

- eTOI, yes, but it's halfway useless IMO (all the forumlas ive seen is just averaging a player's % of team points and and comparing that number to the player's GP and the Team's GP.)

- Other leagues have been discussed, maybe for next year.
 

sfan

Registered User
Jun 26, 2013
573
0
Ottawa
- While exporting as CSV is not formuly in the plans, I think it's something i'm gonna add, sooner rather than later.

- Player Age filter is in the works.

Great

- eTOI, yes, but it's halfway useless IMO (all the forumlas ive seen is just averaging a player's % of team points and and comparing that number to the player's GP and the Team's GP.)

I agree it is inexact, but it is important, especially for assessing talented young players on a team loaded with capable older players. Different coaches have very different player utilization patterns.

In addition to what you mention above, you could also capture and incorporate:
- who is on ice for all goals for and against even strength
- who is on ice for all goals for and against for power play and penalty kill
- from the special teams utilization, the top six and bottom six can be inferred and a TOI coefficient for this incorporated
- the percentage of the game played even strength based on the start and finish of all penalties. Again this can be factored in

There is a great article here http://blog.extraskater.com/2014/06/introducing-chl-statistics/ by Extra-Skater including the results of testing eTOI with actual TOI (using the same algorithms against NHL data). It is quite impressive.
 
Last edited:

sfan

Registered User
Jun 26, 2013
573
0
Ottawa
Stay Tuned, my twitter handle is @jessephil you get updates faster there.

Even strength numbers have been added.

Great Jesse! Me and others that are OHL stat-starved can't wait for you to go live.
 
Last edited:

Lovonov

Registered User
Jun 7, 2018
80
13
Is anyone interested in helping me with a rather large project based around Fantasy hockey insights? I would need help web scraping. Not a programmer, been looking for weeks about how to do it and I just can't get my head wrapped around it. Algorithms are in their first versions but I'd love to explain in full to anyone interested in working together.
 

Kane One

Moderator
Feb 6, 2010
43,292
10,913
Brooklyn, New NY
Is anyone interested in helping me with a rather large project based around Fantasy hockey insights? I would need help web scraping. Not a programmer, been looking for weeks about how to do it and I just can't get my head wrapped around it. Algorithms are in their first versions but I'd love to explain in full to anyone interested in working together.
What exactly do you want scraped?
 

Lovonov

Registered User
Jun 7, 2018
80
13
What exactly do you want scraped?

Already have NHL API


AHL
QMJHL
WHL
OHL
NCAA
USHL

KHL
LIIGA
SHL
ALLSVENSKAN
CZECH / EXTRALIGA
MHL (RUSSIAN JUNIOR LEAGUE)
SUPERELIT U20


Its alot, but doing this will allow me to implement automatic updates and apply my algorithms to all leagues and prospects - giving my fantasy insight service the edge against all other competitors.

Sportradar.com has an API service for $12,000/year which is too expensive.

My budget is at maximum $500 for now.
 

Kane One

Moderator
Feb 6, 2010
43,292
10,913
Brooklyn, New NY
Already have NHL API


AHL
QMJHL
WHL
OHL
NCAA
USHL

KHL
LIIGA
SHL
ALLSVENSKAN
CZECH / EXTRALIGA
MHL (RUSSIAN JUNIOR LEAGUE)
SUPERELIT U20


Its alot, but doing this will allow me to implement automatic updates and apply my algorithms to all leagues and prospects - giving my fantasy insight service the edge against all other competitors.

Sportradar.com has an API service for $12,000/year which is too expensive.

My budget is at maximum $500 for now.
What do you plan on scraping from their sites? Do you have a few of the links? Where do you plan on storing the data? Do you have knowledge of SQL or do you plan on storing all this in spreadsheets?
 

Lovonov

Registered User
Jun 7, 2018
80
13
What do you plan on scraping from their sites? Do you have a few of the links? Where do you plan on storing the data? Do you have knowledge of SQL or do you plan on storing all this in spreadsheets?
In a perfect world I store this data in a database.

I plan on scraping:
GP
G
A
SHOTS
+/-
TOI
HITS
BLOCKS

WINS
SHOTS AGAINST
GOALS AGAINST

I also need to find a way to pull all ice hockey players in these leagues age's.

I have links to every statistic website, but the problem is many of these sites (CHL for example) requires a certain way of entering a form to retrieve data and go through many pages of data, something I am unsure the scraper can do.
An example site:
Молодежная хоккейная лига - Статистика
WHL Network
Pelaajatilastot | Perustilastot | Runkosarja 2017-2018 | Tilastot | Liiga

If you're interested in helping me out and/or collaborating, I'd be interested.
 

Kane One

Moderator
Feb 6, 2010
43,292
10,913
Brooklyn, New NY
In a perfect world I store this data in a database.

I plan on scraping:
GP
G
A
SHOTS
+/-
TOI
HITS
BLOCKS

WINS
SHOTS AGAINST
GOALS AGAINST

I also need to find a way to pull all ice hockey players in these leagues age's.

I have links to every statistic website, but the problem is many of these sites (CHL for example) requires a certain way of entering a form to retrieve data and go through many pages of data, something I am unsure the scraper can do.
An example site:
Молодежная хоккейная лига - Статистика
WHL Network
Pelaajatilastot | Perustilastot | Runkosarja 2017-2018 | Tilastot | Liiga

If you're interested in helping me out and/or collaborating, I'd be interested.
I'm just trying to look for the data in a way where you don't need to scrape the page. Instead, just get the JSON or XML data.

Here is the WHL stats for example:
http://lscluster.hockeytech.com/fee...lang=en&league_code=&season_id=262&limit=1000

You can change the &fmt in the URL from xml to to json if that's what you prefer. Also set the &limit to a number you want. The &key in the URL may expire. If it does, you can go back to the WHL page you are trying to get the data for, right click the page and click Inspect. Then go to the Network tab and refresh the page. From there, one of the URLs in the list will be to a page called "feed" where you will copy the URL.
 

Lovonov

Registered User
Jun 7, 2018
80
13
I'm just trying to look for the data in a way where you don't need to scrape the page. Instead, just get the JSON or XML data.

Here is the WHL stats for example:
http://lscluster.hockeytech.com/fee...lang=en&league_code=&season_id=262&limit=1000

You can change the &fmt in the URL from xml to to json if that's what you prefer. Also set the &limit to a number you want. The &key in the URL may expire. If it does, you can go back to the WHL page you are trying to get the data for, right click the page and click Inspect. Then go to the Network tab and refresh the page. From there, one of the URLs in the list will be to a page called "feed" where you will copy the URL.

Kane, this is awesome. If that key ID expires, is there a way to create an automated action/macro/program to fetch that key and put it into those urls? Again, automation is key here because the stats need to update every day in order to reflect accurate data.
 

jc17

Registered User
Jun 14, 2013
11,031
7,760
I'm assuming you guys know about it, but prospect-stats.com is pretty solid if you haven't been there
 

Ad

Upcoming events

Ad

Ad