Last week, Columbus Blue Jackets head coach John Tortorella sent hockey Twitter into another round of arguing about which statistics are and aren’t meaningful when he denounced both corsi and fenwick.
Hearing an NHL coach say that shots and unblocked shots are not meaningful in measuring team performance is patently absurd, but Tortorella long ago passed into self-parody so any discussion on that topic is a waste of time for all involved.
What is more interesting is CBJ beat writer Aaron Portzline‘s tweet that said while Tortorella doesn’t care about shots and unblocked shots, he does care about scoring chances. And in fact, he cares about them so much that he displays the charts for the players to view. Tortorella definitely isn’t the only coach to value scoring chances. Tampa Bay Lightning head coach Jon Cooper has also discussed publicly that the organization tracks scoring chances and views them as important. And earlier this week, Penguins coach Mike Sullivan discussed how his team uses scoring chances.
What would make a traditional coach like Tortorella so willing to embrace a statistic like scoring chances, which is newer, less proven, and less valid than shots when he is clearly adamant about rejecting shots and unblocked shots, which are both more well-established and more valid measures of team performance?
It obviously isn’t math. No publicly available statistic predicts future goals better than shots. I would argue that the appeal of scoring chances as a statistic is largely due to its intuitive significance and its name. Obviously, creating chances to score is important to scoring goals and therefore winning games. And while corsi and fenwick have obscure names that do not suggest their meaning, scoring chances is named exactly what it purports to measure. So it makes sense that people would gravitate towards the term scoring chances.
The Problem with Scoring Chances
The unfortunate part of the widespread acceptance of scoring chances is that it has several flaws that are rarely discussed. The first, and most significant, is that the term does not have an actual definition meaning that the Blue Jackets scoring chances are different from the Lightning scoring chances are different from the Penguins scoring chances are different from the War-On-Ice scoring chances are different from the Corsica scoring chances. All of them are based on counting shots that meet certain characteristics such as location and shot type to determine how dangerous a given shot is. But determining those characteristics is left to each organization to define.
The second problem is one that Garret Hohl has been vocal in criticizing. Scoring chances represent “binning” of data. Binning is almost always a bad idea because it draws arbitrary boundaries around an otherwise continuous set of data. To use our scoring chance example, all shots have a given danger level. The most logical way to measure that is in expected shooting percentage.
To rephrase, if a player has a clean look from the slot, what is the chance that shot gets past the goalie? Does that shot go in 5% of the time? 10% of the time? 15% of the time? What if the pass preceding the shot came from behind the net meaning that the goalie will have a harder time finding the puck? What if the shooter one-times the shot? What if they catch it and wrist it? All of these things contribute to the likelihood that the puck will end up in the net. Using Ryan Stimson’s passing project data, I established that shooting percentage is impacted by the passing sequence that precedes it.
And thus, shots of various types could be expected to go in the net anywhere from nearly 0% of the time (a shot from the opposite end of the ice) to nearly 100% of the time (an empty net shot where the shooter is standing in the blue paint). So what makes something a scoring chance? A shot with an 8% chance of being a goal? A shot with a 10% chance? 12%? I’m guessing you get the point.
Fortunately, we have a solution to this problem. Unfortunately, the solution is terrible in terms of getting more traditional minded hockey people to embrace it. The solution is to start thinking in terms of expected goals. Emmanuel Perry has already introduced an expected goals calculation and made it available on demand at corsica.hockey. DTMAboutHeart has his own expected goals model and he shares his methodology and results regularly but hasn’t yet created or partnered with a site that would allow us to access it on demand. Both models follow the same overall concept. They calculate the likelihood that each shot taken will lead to a goal and use that to calculate the expected goals for a team. So, if a team generates 20 shots that each have a 5% chance of being a goal, they would accumulate one expected goal.
This line of thinking removes the need for a term like scoring chance because we can define the danger level of shots based on their expected shooting percentage instead of a collection of characteristics. That shot from the slot we discussed earlier might be a 12% shot and a shot from the point might be a 2% shot. Talking in those terms allows for assessing the danger of each chance without resorting to binning. We could even look at a given game where one team had 30 shots accumulating 3 expected goals and the other had 24 shots accumulating 2 expected goals and know that not only did the first team generate more shots, they also generated more dangerous shots with an expected shooting percentage of 10% (3/30) compared to 8% (2/24).
The Problem with the Solution
So that’s all great and makes perfect sense and seems like a great solution to using questionably binned data like scoring chances. Unfortunately, as is often the case, the NHL has botched our ability to introduce this concept smoothly. The source of all of the data that you see on sites like Corsica is various types of files that the NHL publishes providing information about what happened during the game. The NHL tracks every shot attempt and its location as well as the shot type, which is what both Perry and DTMAboutHeart use to calculate expected goals. They have significant differences in their methods but that’s a topic for another time so for now, the only point is to say they share the same basic sources of data.
The problem is that the location that the NHL provides for blocked shots is the location of the block and not the location of the shot. Therefore, blocked shots can’t be included in the expected goals calculations. And thus, we can’t simply call the expected shooting percentage that we discussed above “expected shooting percentage” because that would be inaccurate. We have to call it either “expected unblocked shot shooting percentage” or “expected fenwick shooting percentage,” which becomes xFSh%. And that is a genuinely terrible name for a stat if the goal is to get the public and traditional hockey people to be willing to consider a new idea.
I have now written 1200 words calling attention to a problem in nomenclature for which I have no solution. Thinking of each shot in terms of the likelihood that it becomes a goal is the correct way assess that danger of the shot. And due to the limitations of the data we currently have, getting the public and teams to think in those terms will be a challenge. But that doesn’t mean we shouldn’t try. And so going forward, I’m going to try to shift to discussing shots in terms of xFSh% instead of relying on scoring chances whenever possible.
As a final note, please don’t walk away from this post with the message that using scoring chances is bad. It isn’t. My only point is to say that if the goal is to understand which team is creating more dangerous chances, xFSh% is a better approach. But it also presents real barriers in naming and definition that make it difficult to use in casual discussion of the game. So until we get better data that includes shot locations for all shots, I would expect to see scoring chances remain as a statistic. But hopefully, we can start working towards using calculations of expected shooting percentage as the most sound approach to measuring how dangerous a given shot is.
In case you’re wondering which teams generate and allow the most dangerous shots based on xFSh%, here are some charts. The first shows teams’ xFSh% for and against. But just looking at xFSh% ignores the amount of shots a team generates or allows. The second chart shows xFSh% for and fenwick (unblocked shots) for. The final chart shows xFSh% against and fenwick against.