There have been hundreds of paper written in baseball, and in hockey. I'm surprised you haven't seen them.
This paper on pulling the goalie has been influential to NHL teams pulling their goalies earlier in recent years, as an example.
The thing about baseball is that it is built on
decades of work during which time people have argued, debated, tried different formulas, tested different things, failed, gone back to the drawing board, and performed testing and analysis under critical peer review to get us to the current state. Modern baseball WAR metrics can be traced directly back to research done on
linear weights metrics published in the 1950's. In hockey, it's like we want to "catch up" by doing 70 years worth of work in 2 years, and the way we do this is by fast-forwarding through all of the pesky "research, testing and analysis" stuff.
The way modern hockey analytics works, is that someone publishes some numbers, and if they have enough Twitter followers, people will blindly accept it as valuable. Nobody will bother to test the numbers to see if they are useful. Nobody can answer what the year-to-year correlations are of these numbers, nobody will subject the numbers to any rigorous testing or analysis. They just get re-tweeted an re-posted all over social media at face value because they look cool and have a lot of red and blue. Well, that's not how science works. That is not the scientific method in action.
That isn't to say that I don't believe in these metrics, to some extent. I am open-minded to them, but I believe firmly in subjecting them to rigorous testing and analysis before I put much stock into them. What is the year-to-year correlations of these numbers? What is the value? Do they do a better of job of projecting year-to-year who will be good and who won't, especially when taking into account context changes like for example TEAM changes?
I feel like as we've moved away from long-form discussion on blogs and message boards and towards short-form hot takes like Twitter and Instagram, it has become harder and harder to publish any sort of serious analysis that anyone will read, as the incentive is to just post a pretty chart on twitter that will get re-tweeted tens of thousands of times
without anyone conducting any sort of critical review or analysis. Imagine if real science worked this way? f*** publishing boring papers that take 50 people with PHd's to review meticulously for any errors and months of back and forth with the original authors before finally concluding something that is just step 1 to the next phase of other scientist needing to perform replication studies to see if they get the same results in different samples. f***, that could take years! Instead Scientists should just draw some equation on a napkin, stare at it, say that should work, translate it into a pretty chart with a lot of red and blue and then post it on Twitter. That is how science should be done. Right? Right?
MS is 100% right to be skeptical of all these metrics, because as far as I am aware none of them have proven any reason why they are more reliable than even the dreaded eye test. No matter how much something """Makes Sense"""" it is worthless if it hasn't been tested. That is what the scientific method is all about! As one example, I have posted before about how mid-season CORSI does not project the 2nd half of the season better than traditional first half rankings does. This is not being down on analytics, it's just me
subjecting assumptions to a basic test. A LOT of people will post, in January, say, about how such-and-such team is in 20th place but they have good "underlying numbers" (i.e. CORSI) and thus they will be better in the second half. This is a very testable hypothesis. Do teams whose CORSI exceed their rankings in the first half perform better in the second half? According to my analysis, the answer is no. But that doesn't stop a thousand people from assuming that it does, every year. That is not being data-driven, that is not being smart or scientific, that is just assuming that your hypothesis is true without testing it, which is no better than any traditional evaluation method including the eye test.