Series Discussion: My Completely Statistical NHL Ranking Input please

Janks · Nov 6, 2018

SKRusty said:
It is not about understanding mathematics which from your condescending manner you did not pick up in my initial statement. It is that it is all written in Python (A scripting Language). It is not that the math is too difficult it is a matter that 99.99% of people on this planet don't know how to script in Python.

The bell curve analysis portion that I have written is a package with more than 200,000 lines of code with a university grad student guiding and testing. It isn't that I am "GREAT" because I am a great mathematician-- I am great at what I do because there are few people that can do what I do with code.

Despite the rude manner that I have been treated to here I had been willing to let that go until now.

This was supposed to be a fun project to play with my spare time. I was taking my passion of hockey and coding and bringing it here for you guys to use in fantasy leagues. A tool that could make your fantasy leagues more fun, and maybe allow you to make modeled decisions for who to drop, pick-up, or trade for in fantasy leagues.

I didn't take it to the main board because this wasn't about flexing my ego about my knowledge level. It was to get a couple people to root around in the advanced stats and go yeah I do think that is close to how it would flush out and then giving you guys access to the league manager I was in the process of building but screw that.

Complete waste of my time thinking that this board would be at all respectful of what I was trying to do.

Thanks for explaining what Python is, I would have never guessed it wasn’t a snake.

The biggest problem I had was that you came in here vaguely talking about something, and then asking for critiques while supplying no details. You then alluded to everyone being less intelligent and that you were the only subject matter expert here, and have now had a mini tantrum and have taken your ball and gone home. I’m not here to shoot down your idea, just pointing out why you’re getting comments like the above from Volica, FF, tfong etc.

You’ll have more success in arguments and trying to talk to people if you don’t interject that you’re so great at what you do.

SKRusty · Nov 6, 2018

Janks said:
Thanks for explaining what Python is, I would have never guessed it wasn’t a snake.

The biggest problem I had was that you came in here vaguely talking about something, and then asking for critiques while supplying no details. You then alluded to everyone being less intelligent and that you were the only subject matter expert here, and have now had a mini tantrum and have taken your ball and gone home. I’m not here to shoot down your idea, just pointing out why you’re getting comments like the above from Volica, FF, tfong etc.

You’ll have more success in arguments and trying to talk to people if you don’t interject that you’re so great at what you do.

No the issue is that people didn't take stock in the fact when I said the complexity was beyond what is capable of being put on the boards.

Most of the equations are conditional based tied into other equations with conditionals. So for each individual stat -there are 12-24 breakpoints (different versions) all with differing conditionals. In order to understand what I have done you would need a workflow chart then a classroom with 4 walls of whiteboard to do 1 segment. Much of that whole process is performed by the bell chart analysis code I have already written but that is not what is important. (The bell chart analysis code took over a year to write but is very powerful and allows me to adapt its use for other projects relatively easily which is why I did this.)

When I said people would not understand how it was currently written I was speaking about how it was written into the Python scripting language and as even with the well named variable names it wouldn't make sense if you wrote Python unless you understood the architecture of the code.

I get as any stats geek you like to look at the formula but with conditionals it makes that almost impossible. To clarify this is not because others won't understand it is more a matter of the sheer volume that would be needed to be stepped through before it did make sense.

wasunder · Nov 7, 2018

I code in python. Feel free to post and I'll take a look. Currently in Beijing though, so I'll have to find the time.

MNNumbers · Nov 7, 2018

SKRusty said:
It is not about understanding mathematics which from your condescending manner you did not pick up in my initial statement. It is that it is all written in Python (A scripting Language). It is not that the math is too difficult it is a matter that 99.99% of people on this planet don't know how to script in Python.

The bell curve analysis portion that I have written is a package with more than 200,000 lines of code with a university grad student guiding and testing. It isn't that I am "GREAT" because I am a great mathematician-- I am great at what I do because there are few people that can do what I do with code.

Despite the rude manner that I have been treated to here I had been willing to let that go until now.

This was supposed to be a fun project to play with my spare time. I was taking my passion of hockey and coding and bringing it here for you guys to use in fantasy leagues. A tool that could make your fantasy leagues more fun, and maybe allow you to make modeled decisions for who to drop, pick-up, or trade for in fantasy leagues.

I didn't take it to the main board because this wasn't about flexing my ego about my knowledge level. It was to get a couple people to root around in the advanced stats and go yeah I do think that is close to how it would flush out and then giving you guys access to the league manager I was in the process of building but screw that.

Complete waste of my time thinking that this board would be at all respectful of what I was trying to do.

With respect to the above discussion, in which you break down a measure of what you are doing....

This is meant as a constructive criticism....
You begin with points and points-percentage. FULL STOP HERE FOR JUST A MINUTE. ALLOW ME TO DISCUSS POST #11 IN MIND..... Thank you.

Now, with ABSOLUTELY ZERO MATH INVOLVED, in post #11, you display your predictions or analysis, side by side with year end from last year. What happened? You start with EXACTLY THE RIGHT NUMBER OF POINTS AND PTS %AGE. Exactly the right ones. Then, you run a bunch of math, and the result is numbers that ALMOST match what you started with. ALMOST. What that tells me is that something isn't going well here, because I could write about 10 lines of code and get a better result......Flow chart:
What are the current standings and pts %age? >>>>> Print exactly that.

Not meant as criticism. Really. Not. But, somehow I'm not seeing the advantage in this case.

Now, if you started from, say (GF/GA)^2, and then adjusted by everything you've got going, and so on, and got the results you did, I would be very impressed. You would be starting with some completely different data, and then getting close results. I would say, "Here is a very good statistical proxy." But, when you start from the actual number which you are trying to predict, the result is far less impressive.

I hope that makes sense.

The equivalent is this:

How long is each cheetah in a group of cheetahs?

Well, first let's measure them all.
Ok, now we take the average thigh bone length, and the average jaw set, and the average angle at which the right from leg comes out of its 'hip' or 'shoulder'.
Multiply by some number that approximates the length of a cheetah fur....
Add it all up and....Voila!!!

Whereas, if I did this:
Average length of a cheetah?
Hmmmm,
Well, I need some bone density, some tail proxy, something about their noses, and probably the way their legs fit together is important, too....
And, I put that all together statistically, WITHOUT ONCE MEASURING AN ACTUAL CHEETAH, and I get a close answer.....
Now, THAT would be impressive.

2nd EDIT:
This is why I suggested upthread that a neat thing to do would be to try your analysis on last year's results after about Dec 1, and see what the end of the year comes out as. If that's close, then you know you are getting something.

It would be the equivalent, in my cheetahs, of:
How long is an adult cheetah?
Measure a few kittens (cubs?).
Then a bunch of statistical data on the cubs.
Then add.
Then get a result.

That's a very valid analysis.

Flames Fanatic · Nov 7, 2018

SKRusty said:
This was supposed to be a fun project to play with my spare time. I was taking my passion of hockey and coding and bringing it here for you guys to use in fantasy leagues. A tool that could make your fantasy leagues more fun, and maybe allow you to make modeled decisions for who to drop, pick-up, or trade for in fantasy leagues.

Dude, you literally posted this in the OP

"Please give constructive input don't give me subjective I don't think they belong there because I don't think they are that good."

And then gave us nothing to not be subjective about. What did you expect to happen?

MNNumbers · Nov 7, 2018

And, if I may, reading again what you wrote....

Essentially:
You are doing an adjustment to pts %age based on CORSI first, with some suggestion of a season long regression to mean.
Then, an adjustment to the pts %age based on a few other statistical factors with respect to HDS both for and against.
And, then you are applying that to the remainder of the season.

Correct?

I think I saw a piece of the your earlier post, describing all of this which said something about adding 10 points because of 10 games played, or something. If that 10pts/10gms is intended to be adding an average, it should be:
Add 1.12 pts/game. That's about league average for the last several years.

Cheers....

InfinityIggy · Nov 7, 2018

Any good programmer can break what they are doing down into some psuedo-code. Which, you could post here.

Source: am programmer.

Janks · Nov 7, 2018

InfinityIggy said:
Any good programmer can break what they are doing down into some psuedo-code. Which, you could post here.

Source: am programmer.

But didn’t you see above, it’s something only OP can understand and knows.

Anglesmith · Nov 7, 2018

InfinityIggy said:
Any good programmer can break what they are doing down into some psuedo-code. Which, you could post here.

Source: am programmer.

If I'm reading it correctly, I don't think SKRusty is a programmer (partly because of what he's saying, but also because of how he speaks of scripting as some kind of mystical art), but is using Python with the assistance of someone who is. Or maybe I misunderstood.

InfinityIggy · Nov 7, 2018

Anglesmith said:
If I'm reading it correctly, I don't think SKRusty is a programmer (partly because of what he's saying, but also because of how he speaks of scripting as some kind of mystical art), but is using Python with the assistance of someone who is. Or maybe I misunderstood.

I considered the same however:

He specifically mentions "Much of that whole process is performed by the bell chart analysis code I have already written".

Janks · Nov 7, 2018

InfinityIggy said:
I considered the same however:

He specifically mentions "Much of that whole process is performed by the bell chart analysis code I have already written".

He’s written it, you just wouldn’t understand bell curves.

As someone with Finance and Accounting as my field of expertise, I’ve taken a few stats and economics classes so I at least have a basic knowledge (ie definitely not an expert, but have some experience). Id be interested in seeing what kind of analysis is here and if it’s actually complex, or SKRusty is just yanking everyone’s chains and playing to rouse the board (which is my initial impression).

SKRusty · Nov 7, 2018

As far as I am concerned this thread can be closed. I obviously made a misjudgment in asking for what I thought was a couple people knowledgeable in stats to pick out a couple teams and see if their hypothetical calculations were close to mine. I at the very least was misunderstood.

Initially I was excited to share some of my work with what I thought were fellow enthusiasts but I quickly learned otherwise. After getting extremely frustrated I have since decided not to bother those here with my project any longer. Instead I decided to build an automated testing tool. I was up all night putting together a data scraper so I can run simulations of seasons gone by. I no longer need input as I am able to use historical stats over the last decade and compare the predicted vs the actual.

Janks said:
But didn’t you see above, it’s something only OP can understand and knows.

The mathematical premise and logic behind is something many people on these boards at least claim to be knowledgeable in. Where the issue came in is few people code in Python and unless you can read code and are familiar with the mathematical libraries it is a completely pointless en devour.

Anglesmith said:
If I'm reading it correctly, I don't think SKRusty is a programmer (partly because of what he's saying, but also because of how he speaks of scripting as some kind of mystical art), but is using Python with the assistance of someone who is. Or maybe I misunderstood.

I practice functional coding with an emphasis on DRY so I can recycle any function over and over again so I do not have to keep writing the same code. TBH unless you are a fairly high end Python developer you will likely not understand what you are seeing. I have been coding for 30 years and recently went back to school to update some of my accreditations. The issue with coding in general is and why people think it is voodoo is because few aspiring developers refuse to keep expanding their knowledge within a programming language. They know enough to get what they need done with no eyes to the future on where their product could be.

InfinityIggy said:
Any good programmer can break what they are doing down into some psuedo-code. Which, you could post here.

Source: am programmer.

Yes you are right but with the permutations worked in it makes things very complex as my explanation clearly stated above. Working with standard deviations and trending forecasting is way more complex than you try to let everyone believe. There are teams of hundreds working on actuarial programming and modeling for entire economies. Insurance companies spend hundreds of millions of dollars designing forecast, trend and risk management software much of it based on the same concepts as what I have developed.

MNNumbers said:
And, if I may, reading again what you wrote....

Essentially:
You are doing an adjustment to pts %age based on CORSI first, with some suggestion of a season long regression to mean.
Then, an adjustment to the pts %age based on a few other statistical factors with respect to HDS both for and against.
And, then you are applying that to the remainder of the season.

Correct?

I think I saw a piece of the your earlier post, describing all of this which said something about adding 10 points because of 10 games played, or something. If that 10pts/10gms is intended to be adding an average, it should be:
Add 1.12 pts/game. That's about league average for the last several years.

Cheers....

MNN you understand what I am doing perfectly. Rather than what most Ranking systems do in using only current win percentage to extrapolate final point totals I am trying to bring in CORSI, PDO, Scoring Chances, and time in zone to see if I can forecast the final results more accurately. The next part of the project is to bring in fantasy league data and have it forecast based upon moving players in and out of the line-up but without a fairly accurate ranking system (using actual results) there is no way to be accurate for the rest of the project involving fantasy leagues.

wasunder said:
I code in python. Feel free to post and I'll take a look. Currently in Beijing though, so I'll have to find the time.

Thank you for the offer. I may message you later in the project. I am currently tweaking weights as I am really close with the ranking system but I just have to keep simulating with the at this point.

Flames Fanatic · Nov 7, 2018

Victim card? Really?

SKRusty · Nov 7, 2018

Flames Fanatic said:
Victim card? Really?

Nope not at all. Just realized it was a waste of my time and the effort is better used in doing what I enjoy without grief.

MNNumbers · Nov 7, 2018

SKRusty said:
....snip....

MNN you understand what I am doing perfectly. Rather than what most Ranking systems do in using only current win percentage to extrapolate final point totals I am trying to bring in CORSI, PDO, Scoring Chances, and time in zone to see if I can forecast the final results more accurately. The next part of the project is to bring in fantasy league data and have it forecast based upon moving players in and out of the line-up but without a fairly accurate ranking system (using actual results) there is no way to be accurate for the rest of the project involving fantasy leagues.

...snip...

Just to get back to this Rusty. I'm not at all familiar with the statistical methods you are employing, but in a qualitative sense it seems intriguing....

The idea being that, early in the season, teams are probably playing either better or worse than their actual ability, and there will be some regression to the mean associated with the remainder of the year is quite valid. I'm all in for that.

The idea that, along with that, some teams results are poorer or better than they should be because of 'puck luck' which will also regress to the mean over the course of the year also seems quite valid.

So, again, in a qualitative sense, this is what you're doing, with each stat:
1- Compare the current PDO, or CORSI or whatever stat is in question to the pts %age, and adjust the expected pts %age for the rest of the year accordingly, but FIRST....
1a - adjust the PDO or CORSI slightly toward the mean if its SD is a long way 'off the charts' because it's quite clear that no team will have a 60% CORSI all year, for example.

Then, continue this process for all the stats in which you are interested.

Then, predict....

In general, I would suggest the following as things that can be adjusted.....
1- By how much do you adjust the expected Pts %age for each stat? This is the big one, of course..... I think there is good room for doing this, especially in the sense that, if you have last year's data, and a few year's before that, you can look at the team records at 1/4 of season, and run your analysis, and then tweak your weighting, and see if you can fine tune to a good result. Being a stat man and a programmer, I'm sure you can write code to minimize the error so you don't have to do it manually.

2- Does your "rest of season simulation" include strength of schedule? In other words, do you go through and play each remaining game and give each team the appropriate number of fractions of points given a comparison of their pts %age? That might be a fine tune adjustment as well.

And, while you are at it, I have another question for fellows like you. Two members here do a Elo Ranking of the NHL teams, in which they begin the year with all the teams having the identical rankings. Elo has 2 adjustments available if it's used as predictive:
1- What's the K factor?
2- What's the right advantage to use for playing at home?
Have you ever thought of messing with that?

Again, thanks for posting. Things like this intrigue me. I work in the caring fields, but I would have enjoyed being an actuary....

Janks · Nov 7, 2018

SKRusty said:
Nope not at all. Just realized it was a waste of my time and the effort is better used in doing what I enjoy without grief.

But you came for constructive criticism, and then told everyone the basis for actual criticism was too complex to post? What are people supposed to critique then? Your use of chart formatting on HF?

Bounces R Way · Nov 7, 2018

I mean it's an interesting study but surely you can see why your original post was met with a bit of sarcasm and contempt right? Without knowing at least in some way how you came to those projected points there's basically no worthwhile kind of feedback posters could offer you.

I see you've given a cursory explanation of the bell curve standard deviations model used for some of the stats but are they all weighted the same? Is it all based off of the original points % or were they calculated in sequence? I'd be interested to see what it looks like halfway through the season with more data being available. Generally don't really have too much use for power rankings and predictive models in Hockey as it's a sport with very very thin margins in which regressing to the mean isn't always guaranteed and there's a ton of factors which can influence a game, a road trip, or a season that aren't necessarily quantifiable. Either way good luck with your model

SKRusty · Nov 7, 2018

MNNumbers said:
Just to get back to this Rusty. I'm not at all familiar with the statistical methods you are employing, but in a qualitative sense it seems intriguing....

So, again, in a qualitative sense, this is what you're doing, with each stat:
1- Compare the current PDO, or CORSI or whatever stat is in question to the pts %age, and adjust the expected pts %age for the rest of the year accordingly, but FIRST....
1a - adjust the PDO or CORSI slightly toward the mean if its SD is a long way 'off the charts' because it's quite clear that no team will have a 60% CORSI all year, for example.

Yes. But the number of games played (In essence time) factors in how much it will be corrected. Lets say a team stat has a sd of 2 the calculation corrects less at 45 games than it does at 10 because the likelihood the stat will return to the normal range decreases with games played. This is where different permutations for both numbers of games played and the standard deviation come in. If you have a standard deviation of 3 the equation is going to correct itself more aggressively than a 1.5.

Then, continue this process for all the stats in which you are interested.

Then, predict....

In general, I would suggest the following as things that can be adjusted.....
1- By how much do you adjust the expected Pts %age for each stat? This is the big one, of course..... I think there is good room for doing this, especially in the sense that, if you have last year's data, and a few year's before that, you can look at the team records at 1/4 of season, and run your analysis, and then tweak your weighting, and see if you can fine tune to a good result. Being a stat man and a programmer, I'm sure you can write code to minimize the error so you don't have to do it manually.

I am playing with the stats as we speak. The most difficult part of the process is figuring weights to stats thus why I had to bring in PDO, Save percentage and Shooting Percentage. High or low save percentages within the 1-1.5 sd range don't tend to correct as aggressively while shooting percentage does so I had to adjust PDO according to where the individual percentages were. FYI PDO x 2 after it has been adjusted works great as a multiplier to win percentage.

2- Does your "rest of season simulation" include strength of schedule? In other words, do you go through and play each remaining game and give each team the appropriate number of fractions of points given a comparison of their pts %age? That might be a fine tune adjustment as well.

Not at this point. I am trying to get the expected results and the real results within 3%. Working in more complexity at this point would only add to the error tolerance.

And, while you are at it, I have another question for fellows like you. Two members here do a Elo Ranking of the NHL teams, in which they begin the year with all the teams having the identical rankings. Elo has 2 adjustments available if it's used as predictive:
1- What's the K factor?
2- What's the right advantage to use for playing at home?
Have you ever thought of messing with that?

Again, thanks for posting. Things like this intrigue me. I work in the caring fields, but I would have enjoyed being an actuary....

This is where a person like yourself comes in handy in that I have not really thought about applying things in that manner. For me it is just trying to predict things as accurately as I can with my skills. Stats to me are a side hobby and trying to make tools that interpret are fun to me. I said earlier though when I created the standard deviation modeling software for an economics package I had to hire a grad student to look after the high end math portion for me. I know my limitations and though I am sure in 15 years I could have figured it out-- it was much more advantageous to bring in somebody with that talent.

SKRusty · Nov 7, 2018

Janks said:
But you came for constructive criticism, and then told everyone the basis for actual criticism was too complex to post? What are people supposed to critique then? Your use of chart formatting on HF?

Janks like I said I was not clear enough for what I was looking for and I said it was my fault... Twice now.

It is all looked after.

Fig · Nov 7, 2018

SKRusty said:
MNN you understand what I am doing perfectly. Rather than what most Ranking systems do in using only current win percentage to extrapolate final point totals I am trying to bring in CORSI, PDO, Scoring Chances, and time in zone to see if I can forecast the final results more accurately. The next part of the project is to bring in fantasy league data and have it forecast based upon moving players in and out of the line-up but without a fairly accurate ranking system (using actual results) there is no way to be accurate for the rest of the project involving fantasy leagues.

I had a similar question and comment to MNN, but I was a bit busy, so I couldn't post until now. I get that you're no longer interested in posting about your project here, but I wanted to throw my comment into the ring just to get it off my chest because I think your project is pretty cool. I agree with MNN that this seems more like a forecaster of sorts (ie: An improved power ranking) than a true analytical tool, but I do think it sounds like a fun nerdy dinner topic over drinks and if refined would have its uses.

As MNN noted, if you run data through your script, if you're getting similar results, then I'm also concerned like he is that there might be a fundamental "issue" even before inserting data. For instance, in post 11, if expected vs actual are nearly identical, then I am curious you're just rinsing the actual data through the script as opposed to building a strong script capable of forecasting. For instance, Las Vegas being near the top is something that many would not have predicted last season. Furthermore, many are probably going to use the word "unsustainable" to describe them for this season. I'd also imagine a significant change in some of the underlying data in teams with significant shifts such as Calgary and Carolina (coaching), which means historical data isn't accurate. I think you mentioned this slightly as you said some of the power ranking were lazy projection of points from the previous season.

So... if I were even remotely capable of working on a cool project like yours... Fundamentally, I'd believe individual stats has far less place for a points projection ranking than what it sounds like you're employing. IMO, you should be projecting a coach's performance, then tying the coaches performance in points to the team the coach will be coaching for. This... if you're doing an improved variation of a power ranking.

1. "Some stats should be based on coaches, not the team playing for the coach"

Certain bits of data are not functions of the team that created the data. It's a function of a coach. For instance, project 2018/2019 Flames using a bunch of Carolina numbers from last season because Bill Peters... because who the hell expects the Flames to have nearly identical stats as last season when they played differently under Gully?

2. "Coaches also have career highs and lows stats. AKA shelf life"

Certain coaches have predictable highs and lows in their stats. By taking a look at historical data, you can project a coach's expected highs and lows. For instance, Gallant I believe has seasons where the team performs great, then crashes into the ground. A guy like Q will have historically high stats, but if he's at the end of this shelf life, he should be performing near career lows.

2B. There are only so many points to win. If one team outperforms, that must mean a specific team must under perform to allow them to do so.

3. "The focus on certain individual stats should actually be plugged into the highs/lows of a coach."

Typically, certain stats are considered more significant in evaluation of a team. I'm thinking it's the opposite. Individual stats are more a predictor of whether a team will tune out a coach, or mesh well with a coach (ie: Gully). For instance, the coach often discusses how "bad habits" trickle into games and we on occasion discuss teams that tune out a coach and find success. If this is true, then certain teams are not actually playing in a way that is dictated by the coach and the stats of the org (good or bad) are not functions of the coach at all.

So, IMO, in an improved power ranking, I'd actually be projecting the points I think a coach should be able to get in a season. I'd then use certain stats and determine if the coach will under perform/meet expectations/outperform. IMO, you'd calculate the end points as points acquired by the coaches of each team. Then you'd put the label of the teams based on where the coaches will be coaching.

But that's just my opinion.

SKRusty · Nov 7, 2018

Fig said:
As MNN noted, if you run data through your script, if you're getting similar results, then I'm also concerned like he is that there might be a fundamental "issue" even before inserting data. For instance, in post 11, if expected vs actual are nearly identical, then I am curious you're just rinsing the actual data through the script as opposed to building a strong script capable of forecasting. For instance, Las Vegas being near the top is something that many would not have predicted last season. Furthermore, many are probably going to use the word "unsustainable" to describe them for this season. I'd also imagine a significant change in some of the underlying data in teams with significant shifts such as Calgary and Carolina (coaching), which means historical data isn't accurate. I think you mentioned this slightly as you said some of the power ranking were lazy projection of points from the previous season.

I hadn't even thought about the coaching aspect. Brilliant point because I think certain coaches can influence results more than any of the stats especially early in the season and it may allow for better predictive capabilities year over year.

There are anomalies in stats and the 2014-15 Flames are a perfect example of stats gone wrong. Outliers are a part of statistical analysis and in this case you are going to have 2-3 outliers every year but if the predictive capabilities have the other 28 teams within 3% at the midpoint of the season you should have a workable model.

The 15-16 Penguins are one of the largest outliers and it didn't click until you mentioned this. That's part of the problem with projects like this in that you get tunnel-vision where you miss important pieces.

As a side note coaches are going to be difficult to measure.

From a logic perspective and I don't know if the numbers will show this but one would think a coach will have his largest impact in the first 2-3 years and then slowly tail off for a couple reason. Players tuning out the coach, coach doesn't adapt and adaptations made by the opposition.

Claude Julien changed his coaching plan entirely this year and you don't have to look further than the standings to see the results with an arguably less talented team.

Fig · Nov 7, 2018

SKRusty said:
I hadn't even thought about the coaching aspect. Brilliant point because I think certain coaches can influence results more than any of the stats especially early in the season and it may allow for better predictive capabilities year over year.

There are anomalies in stats and the 2014-15 Flames are a perfect example of stats gone wrong. Outliers are a part of statistical analysis and in this case you are going to have 2-3 outliers every year but if the predictive capabilities have the other 28 teams within 3% at the midpoint of the season you should have a workable model.

The 15-16 Penguins are one of the largest outliers and it didn't click until you mentioned this. That's part of the problem with projects like this in that you get tunnel-vision where you miss important pieces.

As a side note coaches are going to be difficult to measure.

Oh, I was even going to mention that'd I'd have a literal random number generator to factor into some of the randomness (illness, injuries,luck etc.) that happens every season. However, I'd mainly use the random number generator to determine the total of "loser points" then again to essentially allocate "loser points" to teams with some rule that a team receives a minimum 3 points to a max of like 12 or something (which is a significant swing in the standings). The reason why it'd be by team is that it would essentially lazily deal with the fact coaches are fired partway though the season and lazily deal with things like games that a team has no business winning or outplaying teams and just running into a hot goalie.

I don't know if predicting coaches is going to be as hard as you think. Hockey is unpredictable, but most of the margins are razor thin and fundamentally, I think there's a balancing act to follow (ie: Corsi). For instance, a good vs bad coach is probably going to be as thin as goalie sv% margins. I'd arbitrarily think the maximum difference in a coaches variation is similar to goalies. Perhaps around 10-15% which is like the difference between a .940 goalie and a .790 goalie. Furthermore, a coach with .500 or less generally doesn't stick around too long, which probably should mean most coaches are going to be between +/- 0.100 from .500 with loser points potentially pushing that % above or below that min/max.

Something basic might sound like...

Expected games played by coach
x
Expected win % by coach (Based on historical win% patterns, and adjusted for shelf life/fit, whatever)
=
Base points.

Calgary point projection =
Owned coaches base points
+
Random number generator loser points

So many something like this for Bill Peters?

GP vs Wins last 4 seasons
.3659
.4268
.4390
.4390

eGP = 82
ePoints% = .4390 + .100 = .5390
= 88 points

Calgary Flames =
Bill Peters = 88 points
RNG (3-12) = 5

= 93 points?

I mean... this calc seems wonky already... so the idea is probably badly flawed assumptions wise. So you're probably right prediction of coaches will be hard.

MNNumbers · Nov 7, 2018

SKRusty said:
...snip.....

I am playing with the stats as we speak. The most difficult part of the process is figuring weights to stats thus why I had to bring in PDO, Save percentage and Shooting Percentage. High or low save percentages within the 1-1.5 sd range don't tend to correct as aggressively while shooting percentage does so I had to adjust PDO according to where the individual percentages were. FYI PDO x 2 after it has been adjusted works great as a multiplier to win percentage.

.snip.....

The interesting part of "getting the results right...." is, to me.....
How do you predict the rest of the year? Simply take the resultant pts/ game and multiply by the number of games remaining....?

Series Discussion: My Completely Statistical NHL Ranking Input please

Pope Janks

Napalm

Registered User

HFBoards Sponsor

Mediocre

HFBoards Sponsor

Zagidulin's Dad

Pope Janks

Setting up the play?

Zagidulin's Dad

Pope Janks

Napalm

Mediocre

Napalm

HFBoards Sponsor

Pope Janks

Registered User

Napalm

Napalm

Absolute Horse Shirt

Napalm

Absolute Horse Shirt

HFBoards Sponsor

Ad

Ad

Ad