Tyler Dellow recently observed that hockey analysis is encountering a problem that has been a significant issue for analysts in other sports.
In short, the community does not get as much out of people with expert technical knowledge about statistical analysis as it could because those people are often unaware of much excellent work that has been published on blogs and therefore direct their high-powered tools towards outdated problems.
Yesterday on Twitter, an academic paper (“Estimating player contribution in hockey with regularized logistic regression“) made the rounds and received some criticism. I have emailed the authors to make suggestions in the hopes of helping get them up to speed on what is already known so that their future analyses can do more to advance the community.
The text of that email is below.
Hello. Your article on hockey analysis (“Estimating player contribution in hockey with regularized logistic regression”) came to my attention. I wanted to provide some feedback as someone who is less well-versed in the technical tools of data mining but perhaps more up to speed on the state of the art in hockey analysis. I think the baseball community consistently gets less than it could out of analytical experts like yourself because they are often directing their high-powered tools at the wrong problems. I’m hoping to help ensure that hockey, with its greater analytical challenges, gets as much out of your expertise as possible.
I think my biggest concern is that by focusing exclusively on goals, you allow for shooting percentage variance to have a significant impact on a player’s calculated value. Even with four years of data, variance plays a large role in the shooting and save percentages with a given player on the ice.
I suspect this is a big part of why you rate Roloson so highly, for example. His teams scored 2.02 even strength goals per game in the games that he played, while scoring just 1.82 even strength goals in the games he didn’t play — largely because they shot 0.7% better at 5v5 in the games he played than the games he didn’t. Over some 4200 shots, a change of 0.7% represents a change of less than twice the standard error, so I find it much easier to believe that Roloson’s teammates just happened to run hot when he was on the ice than that he possesses a unique skill for producing high-percentage shots from across the rink which makes him a “quantifiable star”.
The problem doesn’t just plague goalies. Your model has Kent Huskins as the second-best defenseman over this four-year period, a result that is entirely driven by shooting percentages. Huskins’ teams were roughly even in shot differential when he was on the ice, but the shooting percentages tilted dramatically in his favor. In a league where the average shooting percentage is about 8.1% at 5v5, when he was on the ice, his teams shot 9.0% and the opponents shot 5.9%.
Again, what’s the more plausible explanation, that a guy who can barely get ice time has a unique unrecognized talent for dramatically suppressing opponents’ shooting percentages or that coincidentally the goalies ran hot over the 1500 shots that came with him on the ice? Bear in mind how predictive similar streaks have been: Sean O’Donnell’s opponents shot 6.1% over 1500+ shots from ’07-10 and have shot 7.8% since; Mark Stuart’s opponents shot 6.3% over 1500+ shots from ’07-10 and have shot 8.1% since; David Krejci’s opponents shot 5.9% over 1500+ shots from ’08-11 and have shot 10.1% since…and, of course, Huskins himself has been at 7.5% since the period covered in your story.
This is why much of modern hockey analysis starts with shot-based metrics; the shooting percentages introduce a lot of variance which must be accounted for to get a reasonable assessment of talent. If you used shots for your model, I suspect you’d easily identify more than a mere 60 players who have significantly non-zero talent levels — and the model could be further refined from there (e.g. give each shot a weight based on the shooter’s career shooting percentage).
Which brings me to a criticism that is admittedly less substantial, but affects how the article is received nonetheless — and therefore how much the community will take from it, how much impact it will have. In the introduction, you dismiss approaches that make use of events other than goals in part because there is no consensus best approach there, and then throughout the article you compare your metric to plus/minus. I agree that there is no consensus best performance metric, but plus/minus might be the consensus worst proposed choice. Comparing against plus/minus because it still retains popularity among some mainstream fans is like arguing that a new baseball metric is useful because it tells us more than runs batted in, or that a new memory technology is useful because it is more efficient than stone tablets.
Continuing with the picking of nits that affect how the article might be perceived, I feel that some simple errors of hockey knowledge in this article undermine your positioning as an expert and make it easy for people to write you off. You say that the canonical definition of plus-minus does not include short-handed or overtime goals, when in fact it includes both. You say that Roloson played with four teams (T.B, NYI, EDM, MIN); if you meant in his career then you overlooked CGY and BUF, whereas if you are talking just about the period studied here, then MIN should be dropped from the list. You say that Boucher cost just $92.5K, when the NHL minimum salary was at least $475K over this period. And when claiming that a successful team could be constructed of low-paid players, I’m left wondering whether you considered contract status — there is no doubt that an outstanding and very cheap team could be constructed from the best players who have not yet reached eligibility for unrestricted free agency, but this is not a particularly useful output of a model.
But most of all, the biggest stylistic issue I have is that I feel that claims this surprising need either stronger evidence to support them or more discussion of the uncertainty surrounding them. Certainly, conventional wisdom is wrong in places and some of the value of quantitative analysis is helping to identify those places. But if my model had Roloson as a top-five goalie, I would ask how that came about before I proclaimed him a star. If my model suggested Colton Orr was being dragged down by his teammates, I would question the model before I questioned the coaching staff’s usage patterns. Does it pass the sniff test that Manny Malhotra and Colby Armstrong and Kent Huskins are ranked ahead of Evgeni Malkin and Henrik Sedin and Chris Pronger? If not, then where is the discussion about the weaknesses of the model?
Finally, I am always nervous about analysis that is exclusively backwards-looking. I would have liked to see some analysis of whether this model was predictive to any appreciable extent. How likely was a player who was highly rated over the 2007-2011 period to perform well in the following seasons? How did teams’ performance in 2011-12 correlate with their players’ cumulative estimated value over the four previous years? Answering these kinds of questions is critical to demonstrating that the model is useful, in my opinion — and is especially crucial when the model is giving shocking results.
I think if you start putting the model to that kind of test of predictive capabilities, you will see how important it is to reduce shooting percentage variance by incorporating shots that are saved or miss the net as well as those that go in.
Thanks for hearing me out.