Luck is part of our everyday
life. When you go to the store and there is no beer left of your favorite
brand, you feel unlucky. When you go to the mall and out of the blue meet Maria Sharapova, you feel lucky (I can confirm that one).
As we also know, luck is also omnipresent in
hockey. Injuries are probably one the biggest luck factors involved in the
success or failure of NHL teams, and man-games lost has become a popular statistic to explain the W-L-OTL columns at the end of the season.
In recent years, shooting
and save percentages have also been showed to be somewhat related to luck,
given that a high difference with the average is generally not sustainable season-over-season.
A new advanced statistic, PDO, has been developed from these considerations
and is now a key statistic to consider when evaluating a player. Goalposts, the
difficulty of the schedule and questionable decisions by the referees are
other factors that are lumped in this luck factor.
So, we know that luck is
part of hockey. However, what we don’t know is, how big a part is it? For
instance, can we quantify how many points obtained during a season can be
attributed to luck? It turns out that we can get a pretty good idea using what
we call time-series models.
Let’s
regress a little
A time-series model in this
case allows us to split luck from merit. If you want to get all technical, the
time-series model that we will use is an autoregressive(1) model. I am going to
jump right in and write the four mathematical equations of this model, after
which I will go over them one by one so that they become a little bit more
familiar. So, here we go:
Pts_{i,t}
= Pts*_{i,t} + ε_{i,t} (Equation
1)
ε
~ N(0, σ) (Equation
2)
Pts*_{i,t} = Pts*_{i,t-1}
+ β_{i,t}_{ } (Equation 3)
β
~ N(0, ρ) (Equation
4)
The first parameter, Pts_{i,t}
is the number of points that a team i
obtained at the end of season t.
So, i could be any team, for instance
the Oilers, and t could be any
season, for instance 2015-2016. We here make the assumption that the number of
points that a team obtained is attributed to two factors: the real number of
points that it deserves in a luck free world, Pts*_{i,t}, and the
number of points won or lost because of luck, ε_{i,t}. In other words, Pts*_{i,t} describes the
real underlying quality of the team, and ε_{i,t}
describes how the hockey gods made life easier or harder for that specific
team. A positive value of ε_{i,t }_{}indicates that the team was more lucky then the others during that
season, and vice versa for a negative
value.
The second equation describes
how these luck factors ε_{i,t }are
related to one another. Here we assume that they are described using a normal
distribution, or a Gaussian if you prefer. A normal distribution is described
by two parameters: a mean and a standard deviation, that we will call σ. The mean is zero, because hockey is a
zero-sum gain, meaning that the unluckiness of one team is the luckiness of
another. For instance, let’s say that team A should have won a game against
team B, but lost it because of bad luck. In this case, the overall impact of
luck is – 2 points for team A and + 2 points for team B, the sum of both being back
to zero. Hockey is not exactly a zero-sum game because of the Bettman point,
but we’ll forget about that, which should not be a problem given that the league-wide
number of 3-point games is roughly consistent season-over-season.
The third equation describes
the season-over-season quality of a team. Here we assume that the quality of a
team during a season t, as describes
by Pts*_{i,t}, is equal to the quality of that same team the season
before, Pts*_{i,t-1}, added with a factor describing how the quality of
the team has evolved, β_{i,t}.
This last parameter, β_{i,t},
describes how trades, players getting older, a coaching change, etc., have made
the team better or worse.
Finally, the last equation
describes how these β_{i,t}
are related to one another. We again describe this parameter using that
neat little normal distribution, with a mean of zero and a standard deviation
that we will call ρ. The mean of this
normal distribution is also zero because, in total, approximately the same
number of points is awarded each season.
At the end of the day, what interests
us are these two standard deviations, σ and
ρ. A high standard deviation σ indicates that luck is a big part of hockey,
while a high standard deviation ρ indicates
that the quality of the team is what matters. The ratio between these two pretty much indicates how much control
a team has on its own destiny.
To estimate σ and ρ, we are going to do what we do in any modeling study, that is, to
estimate what we do not know from what we do!
Meet
the Kalman
We know the number of points
that each of the 30 teams has obtained over the last several seasons. I have
extracted the number of points that each team has obtained since 2005-2006, to
obtain a 30 teams x 11 seasons table. For the 2012-2013 lockout season, I prorated
the number of points over 82 games.
Our objective is to
calculate the value of σ and ρ with which our model best reproduces the
30 teams x 11 seasons table. To do so, I
used a method called Kalman filtering, which is a grown-up expression for a
slightly more complicated trial-and-error process.
After performing these
calculations, we obtain the following results: history tells us that the value
of the standard deviation for luck, σ,
is approximately of 5 points, and the standard deviation for the quality of the
team, ρ, is approximately of 8 points.
These results can be roughly
interpreted as follows: when a team has one of these lucky seasons, it can
expect, on average, to obtain approximately 5 more points than it deserves, In
contrast, because of the zero-sum consideration that I have mentioned, during a
season where a team is less lucky than the others, it can expect, on average,
to finish 5 points below what it deserves.
Similarly, history tells us
that, during seasons and off-seasons where the management of a team makes good
decisions, it can expect, on average, to increase its number of points by 8. If,
on the contrary, the management makes crappy decisions, a team can expect to
finish a season 8 points below the previous one.
The ratio between the standard
deviation for luck and the sum of these two standard deviations (σ/(σ+ρ))
is of 0.4, meaning that approximately 40% of the difference between the number
of points obtained by a team for two consecutive seasons can be attributed to
luck. In other words, a team controls in the neighborhood of 60% of its
destiny.
The model is highly statistically significant by the way, so we can be confident about its results. It is also interesting to note that these results fit well with the findings of Josh, who showed that,
mainly because of luck, it is very hard to have a success rate above 60% when
predicting the outcome of NHL games.
The
take-home message
Luck is a big part of
hockey. Therefore, if you are a GM and want to make sure, like the Red Wings,
to make the playoffs every season, you better build a team which deserves at
least 98-100 points in a luck-free world, otherwise the hockey gods my play
tricks on you. If, like the Oilers, you want to make sure to get a head start
on your spring golf season, build a team that deserves less than 84-86 points
and you should be safe. In between, you’ll have to wait and see if the dices
roll in your favor.
Glossary
Term | Definition |
Time-series model | Model used to describe how a variable (in this case the number of points obtained by a team) at a certain time step (in this case at the end of a season) evolves from the value of that same variable at the previous time step. |
Kalman filtering | A mathematical algorithm used to estimate the value of the parameters in a model from measurements (in this case the number of points obtained by a team) containing random noise (in this case caused by luck). |
Normal distribution | A mathematical equation describing the probability that a variable (in this case ε or β) takes a certain value. |