abo9
Registered User
- Jun 25, 2017
- 9,091
- 7,184
Thats not much of an answer I guess google didnt help a lot. R2 does not mean a lot and 33% in the real world is not predictive.
This example is made up of random numbers, but bear with me.
Say you wanted to predict the weight of a banana. There's a couple factors you could use: It's length, width, it's color (yellow or green for example), days of growth, etc. Let's say for simplicity that we use length only, it's gonna be a pretty good predictor right? Well you could use a very simple model like:
3*length + e = Weight
Where "e" is an error term, because you omitted other parameters like "days of growth", "color", or "width". And since you don't know ALL the parameters you're missing, we usually refer to this "error" term as "luck" or "randomness" in the model.
In our banana example, if the R^2 was 90%, you would expect the "length" factor to explain 90% of the variance between the weight of each banana you look at, and 10% of the difference in weight would be explained by other factors. It's one of numerous "scoring" methods to assess the quality of a model, but should not be trusted blindly nor used as an be-all end-all. It's a good indicator though.
I guess 33% would still be fairly low in terms of "absolutes", but if that's the benchmark to beat, we just have to work to find the next best metric/model. If you want to use previous year's standings to predict this year's standings, it's a fair model to use, usually called a "naive prediction mode". I believe it would actually be the benchmark used to build more complex models trying to predict standings. That said, you could find that your "naive" model gives fairly strong results even compared to models incorporating goals scored and goals allowed (I have not looked into that, just an example), and decide that it's not worth the effort to build more complex stuff.