“Who is the best forward in the NHL?” used to be a pretty simple question. As points used to be the main publicly available indicator of offensive production, it was pretty much a given the reigning Art Ross Trophy winner was also typically considered the best forward in the game.
As we all know, the number of different
statistics available to evaluate the performance of forwards has literally
exploded over the last 5 to 10 years.
While two-way forwards always had some value level, the backing of evidence of just how effective certain players can be in other areas than simply point production has changed the way we view NHL players significantly.
On corsica.hockey alone, we
can extract more than 130 (!) different statistics to evaluate a forward at
Yet, it is simply impossible for a human being to properly consider 130
statistics at once.
Even if you have a Ph.D. in mathematics, when you try to
reconcile at once the information provided by more than 10 statistics, your
head pretty much begins to spin.
This is why, when evaluating the performance
of a forward, a typical approach is to select a few, maybe 3 or 4, key statistics
considered to be the most important and disregard most others unless something
out of the ordinary stands out. Such an approach is taken, for instance, when we
use a Vollman player usage chart to compare different forwards.
Some effort has
also been made to derive an ultimate “catch-all” statistic to discriminate the
performance of forwards from a single number. We recently attacked this question in the NHLNumbers “Stat of the Union” Roundtable, asking nine of hockey’s brightest minds about how they’d evaluate a player.
The search for an accurate
strategy to discriminate forward performance based on a few statistics is
understandable given the considerations mentioned above, but raises some very
important questions: can we accurately evaluate the performance of forwards
from 3-4 statistics?
Is it possible to develop a single statistic
discriminating the performance of multiple forwards? Or, more specifically:
what is the minimum number of statistics required to properly reflect the
performance of forwards? Luckily for us, we can get a pretty good answer using a
technique called principal component analysis.
fabulous world of dimensionality reduction
Let’s say for a moment that
we evaluate the performance of a set of forwards using two statistics, called
S1 and S2. The forwards are distributed on a S2 vs S1 plot as follows:
Based on these two
statistics, the performance of the forward represented by the red dot was
similar to the forward with the blue dot, but very different from the forward
with the green dot. Let’s now make a simple linear regression between S1 and S2,
described by the following black line:
And then, let’s project each
forward on that line. Projecting means that, for each forward, we find the closest
point on the line, as represented by the orange arrows:
Given that each forward has
its own projection, we have essentially created a new statistic S3, which is an
implicit combination of S1 and S2. Considering that the projection has been obtained
from linear regression, our new statistic S3 represents the best statistic that
we can construct (linearly) from S1 and S2 to discriminate the performance of the
forwards from a single number. Here is a plot of each forward on our new S3
As you can see, S3 correctly
indicates again that the performance of the forward represented by the red dot
was similar to the forward with the blue dot, but very different from the forward
with the green dot. We have thus successfully created a new statistic S3, which
incorporates at once all the relevant information in S1 and S2 required to discriminate
the performance of the forwards from a single number. By considering S3 instead
of S1 and S2, our evaluation of forward performance has become a simpler one
Let’s now consider two other
statistics, that we will again call S1 and S2. This time, the forwards are
distributed on a S2 vs S1 plot as
In this scenario, the forwards
represented by the red, blue and green dots all performed very differently.
Let’s once again draw a line obtained by linear regression and project each forward
on that line to create a new statistic S3:
Finally, here is how this
time the forwards are distributed on the new S3 axis:
Our new statistic S3
correctly identifies the very different performance of the forward with the
blue dot with respect to those with the red and green dots, but incorrectly suggests that the performance of the forwards with the green and red dots was nearly identical. The poor accuracy of our new statistic (even though it is the best
one possible!) results from the lack of correlation between S1 and S2, meaning that
these two statistics provide different types of information. For instance, S1
could be a measure of offensive performance and S2 a measure of physicality. In
this case, our new statistic S3 may not successfully discriminate a strong offensive
but not very physical forward from a weak offensive but very physical one.
So, what is the moral of the
story? Sometimes, we can combine statistics to reduce the dimensionality of our
forward evaluation problem and still get an accurate picture of their
performance. Some other times, we simply can’t.
Now, back to our 137 (to be
exact) different statistics available on corsica.hockey to evaluate the
performance of forwards at 5v5.
I have extracted all the data for forwards at
5v5 over the last three seasons. I have removed forwards with less than 250 min
of 5v5 time on ice (roughly 20 games played) as well as statistics on FO% and
hits against, which were not available for all forwards, to finally obtain a
table of 1227 forwards x 135 statistics (a forward can appear up to three times
in the table if he has played more than 250 min in all of the previous three
We first want to see if these 135 statistics can be combined into a single
new shiny statistic which still accurately reflects the performance of the forwards.
We can do that by projecting all forwards on a line, as in the examples above.
To quantify how well this single statistic discriminates the performance of the
forwards, we calculate the proportion of variance that it explains, which
corresponds to the ratio between the variance of each forward’s projection on
the line and the total variance in the 1227 x 135 table. This proportion of
variance is pretty similar to the R2
coefficient that we calculate to quantify the quality of a model’s fitting.
a single statistic is not sufficient, we then verify if a combination of two statistics
gives better results. To do so, we project every forwards on a 2-D plane
(instead of a line), which is where that principal component analysis technique
that I have mentioned comes into play. Without getting too much into the math
of it, the best possible combination of two statistics to discriminate all
forwards corresponds to the first two right-singular vectors of the 1227 x 135 table, which we can calculate by singular value decomposition
(obviously). We can again calculate
the proportion of variance explained by our two new statistics and if they
still do not contain enough information to accurately reflect the performance
of the forwards, we can project our forwards on a 3-D space and so on.
So, after performing all
these calculations, here is the proportion of variance explained according to
the number of statistics on which we project our forwards:
As you can see, if we
combine the 135 statistics into a single new shiny statistic, it can contain
at most 37% of the information in the 1227 x 135 table.
To keep at least half
of the information provided by the 135 statistics, the best we can do is to
combine them into 3 statistics. At the minimum, we need 10 statistics to
discriminate the performance of the forwards without losing more than 20% of
the information; it is pretty good, and certainly better than having to evaluate
135 statistics at once, but maybe not as good as one would have hoped.
If you evaluate the
performance of a forward using 2-3 statistics, it is all well and good, but
keep in mind that you are missing at least half of the story. Also, the
development of a single “catch-all” statistic may sadly be a problem without a
|Principal component analysis||A mathematical technique used to combine variables (in this case hockey statistics) together in a way to retain the maximum possible amount of information.|
|Dimensionality reduction||The process of reducing the number of variables (in this case hockey statistics) required to investigate a problem (in this case to evaluate the performance of forwards at 5v5).|
|Proportion of variance explained||The cost of dimensionality reduction is that some information may be lost. The proportion of variance explained indicates the percentage of information that is retained by the variables (in this case hockey statistics) that we have obtained by dimensionality reduction.|