In my last article I was looking at prediction limits for machine learning and sports. More specifically I answered the question of how much of the standings are because of luck (aka, random chance, stochastic process etc). By using classic test theory and looking at the variance of the observed win percentage over 7 seasons between 2005-2006 and 2011-2012 and comparing it to a theoretical league where the level of teams talents are normally distributed. From this we were able to conclude that luck explains ~38% of the variance in the standings.
This is interesting, much higher than one might initial think, but it makes sense and I will discuss this further on. As my area of research is in Machine Learning and using Machine Learning to make predictions in hockey. What I am curious today is to answer the question, is there a theoretical limit to predictions we can make in hockey?
This subject is not totally brand new, it was looked at before by a guy wanting to know the theoretical limit for predictions in the NFL. His results found “The actual observed distribution of win-loss records in the NFL is indistinguishable from a league in which 52.5% of the games are decided at random and not by the comparative strength of each opponent.” I a similar method as the author to calculate the prediction limit in the NHL.
Data & Methodology
Using the work from the first part of this, I have an observed standard devation (SD) of win % of 0.09. This is looking at the win loss records of all teams between the 2005-2006 and 2011-2012 for a total of seven seasons. I also looked at the SD of an “all skill” league where the better team always wins and an “all luck” league where each game is a 50/50 chance of winning. To determine these I used a 10,000 iteration monte carlo method. You can see the code for the monte carlo here. On each iteration each team was given a random strength and a full schedule was run. After all the iterations the SD of win% for the “all skill” league was 0.3 and for the “all luck” league 0.053. I used an F-Test to compare how similar they are to the observed league and I got p=0.02 for the “all luck” league and p=4.8×10^-16 for the “all skill” league. I graphed them below to give a visualization.
They are not close, the “all luck” league appears to be more similar than the “all skill” league. So to figure it out I modified the monte carlo and tried varying degreess of luck and skill (i.e. 10% luck, 90% skill). The rule used was: if rand() < luck then game is has 50% chance of winning; else better team wins. I kept trying various percentages until I found one that was closest to the observed NHL. I tried 50% skill, 75% luck; 25% skill, 75% luck, 23% skill, 77% luck; and 24% skill, 76% luck. Their SDs and F-Test values respectively are: 0.1584 (p=0.002), 0.0923 (p=0.908), 0.874 (p=0.894) and 0.898 (p=0.992). I’ve graphed the distributions below:
After trying the Monte Carlo method of these leagues and comparing them to the observed league it appears that the 24% skill league is the closest to the NHL observed league. To put this in the same words as the original author to avoid confusion and misinterpretation:
The actual observed distribution of win-loss records in the NHL is indistinguishable from a league in which 76% of the games are decided at random and not by the comparative strength of each opponent.
To relate this to theoretical limits in prediction we know that 24% of outcomes are determined by the better team, and 76% are by luck. With luck you win half the time. This would suggest the theoretical limit in prediction for hockey is 24% + (76%/2) = 62%.
Discussion & Conclusion
62% seems low but it makes sense. If we look at a hockey game there are very low events (goals) (unless you’re watching a Timbit league). A single goal can make all of the difference between winning a game and lossing it. In my own Machine Learning experiments the best I have ever gotten is 59.3% which is close to the theoretical limit. I haven’t seen hockey in machine learning before so I have nothing to compare it to.
The original author gets results of approximately 76% which falls in line with the football predictions in machine learning. I would hypothesize that basketball would have an ever higher limit with the large number of events in a single game (200-250 points a game) and that machine learning can predict basketball in the 80s. Tennis has a large number of events in a game so I would assume a similar prediction and soccer would be interesting to look at its limit too.
Thanks to the people who helped me with this: Michael Guerault, Adam Kubaryk and Patrick D.