The Myth of the Hot Goalie: Consistent Goaltenders vs. Inconsistent Goaltenders

Updated: January 10, 2018 at 7:14 pm by Eric T.


Consistent, or random?
Photo by Michael Miller via Wikimedia Commons, commons license




Some goaltenders are thought of as being particularly streaky or particularly consistent, but are those labels fair? In this article, we compare Marc-Andre Fleury, Ilya Bryzgalov, Henrik Lundqvist, Pekka Rinne, Jaroslav Halak, and Carey Price and find two things: they all exhibit about the same amount of variability as each other, and they all exhibit about the same amount of variability as would be expected from simple random chance.


Heading into the Flyers-Penguins series, one common defense of Marc-Andre Fleury from those who argued he was better than his stats was that he was an extremely consistent goalie, whereas Ilya Bryzgalov was unreliable and streaky. (I always feel compelled to prove that I’m not fighting a straw man, so here are a few examples from Darren Dreger, the Sporting News, Pro Hockey Talk, and the Penguins-focused blog HighHeelsAndHockey.)

Fleury’s play in the series may have done more to address that claim than I ever could, but at the time it started me down the road of trying to answer two questions:

  • Can I find evidence that goalies are more streaky or more consistent than would be expected from random variance?
  • Can I find evidence that some goalies are more streaky or more consistent than others?

To answer these questions, I simulated a random career for each goalie and compared it to his actual career to see which was streakier. I’ll save the details of the simulation methodology for the appendix, but the output was a 10,000 game simulated career which assumed that the goalie’s odds of stopping any given shot was exactly his career average save percentage, with no streakiness at all except for random chance.

Once we can have that, we can make a plot of how often a goalie posts a certain save percentage over a given number of games, and compare the results to his perfectly steady simulated counterpart.

If the goalie is truly streaky, we would expect him to run hot or cold more often than the coinflip model does, which would mean the distribution of save percentages in his actual career would be broader than in the simulation.

The Results

Here’s what we see for Fleury’s results over any given 3-game stretch compared to the simulated Fleury’s 3-game stretches:

Distribution of Fleury results over 3-game stretches

Fleury looks awfully similar to his randomly simulated counterpart. Nearly identical, even.

However, the decision to look at 3-game stretches was arbitrary; maybe streaks last longer than that, so let’s look at some other cutoffs to be sure we aren’t missing something.

Distribution of Fleury results over 5-game stretchesDistribution of Fleury results over 10-game stretches

It could be argued that Fleury’s distribution over 5-game stretches is just the slightest bit broader than the simulated Fleury, but the difference is not large. The standard deviation – a measure of the spread of the results – is 0.026 for Fleury’s actual results and 0.025 for simulated Fleury, a difference that is virtually imperceptible in reality and as likely as not due to imperfections in the model (see appendix).

To a first approximation, it’s fair to say that Fleury’s consistency is what you’d see from a robot goalie that had no effects of injury, confidence, focus, or whatever else might cause a goaltender to appear more dialed in at some times than at others.

Perhaps that’s evidence that Fleury is indeed unusually consistent. Do other goalies fluctuate more than we’d expect by random chance, with streaks of running hot and cold beyond what the coinflip model achieves? Let’s take a look at Bryzgalov.

Distribution of Bryzgalov results over 3-game stretchesDistribution of Bryzgalov results over 5-game stretches
Distribution of Bryzgalov results over 10-game stretches

Again, that’s pretty darn similar. Bryzgalov had three bad starts in a row a little more often than the model did, but other than that his career is virtually identical to that of a .915-save-percentage puck-stopping robot. (Question: could we build such a device for less than $51 million?)

I looked at six goalies using this method, evaluating players who were suggested to me on Twitter as being either particularly consistent or particularly streaky. I’ll spare you the rest of the plots and summarize with a table showing the standard deviation of the distribution of results for each goalie and his simulated counterpart:


3-game stretches

5-game stretches

10-game stretches

Fleury / SimFleury

.033 / .033

.026 / .025

.018 / .018

Bryzgalov / SimBryzgalov

.034 / .032

.026 / .024

.017 / .017

Lundqvist / SimLundqvist

.032 / .031

.024 / .024

.018 / .017

Rinne / SimRinne

.033 / .032

.025 / .024

.018 / .018

Halak / SimHalak

.033 / .031

.026 / .024

.018 / .017

Price / SimPrice

.031 / .030

.025 / .023

.018 / .017

Each goalie is just a tiny bit less consistent than the random variance model, with differences pretty comparable to those plotted above. All of the factors that might contribute to making a real goalie less consistent (imperfections in the model, injury to the goalie, psychological factors, change in talent level over the years) all add up to increasing the standard deviation by about 0.1%.


I have written previously about people’s tendency to underestimate how streaky random chance is, and I think that is what has happened here. There is very little difference between goalies and perfectly consistent robots, and certainly nowhere near enough difference between the goalies to label one of them streaky and another consistent.

Goalies should be evaluated based on how much skill they have demonstrated, not how often we remember them going on a hot streak.

Appendix: How the model works

For each goalie, I went through the following process:

  1. For each of his starts since the lockout, note how many shots he faced
  2. Produce a histogram, a distribution of how often he faced a given number of shots (e.g. Fleury faced 23 shots in a game 11 times, 24 shots in a game 21 times, etc)
  3. Simulate 10,000 games by the following method:
    1. Select the number of shots faced randomly, using the histogram from step 2
    2. Simulate each shot, assuming that the likelihood of stopping any given shot is exactly the goalie’s career save percentage since the lockout
    3. Record the number of shots faced and saves made in each simulated game

That gave me a simulated 10,000 game career in which the distribution of shots faced mirrors his real life distribution of shots faced and he had the exact same chance of stopping each shot. From there, the distribution of results in a 3-game (or 5-game, or 10-game) moving average could be compared to the distribution of results from his actual career.

I mentioned that the model is not perfect and might be expected to give a slightly tighter distribution than reality. Here are some examples of why:

  • I did not separate out even strength shots faced and power play shots faced. That adds a random factor that might cause a greater spread than would be predicted from this simpler model. In real life, sometimes Fleury went three games without seeing many power play shots, and sometimes he was under siege for three games, but in the model every shot came with the same .909 save percentage.
  • In real life, most of the time that a goalie faces only a few shots in a game, it is because he let in multiple goals and got pulled, and those short games can have a big impact on the goalie’s save percentage over the three-game stretch. In the simulation, the number of shots faced and goals scored are determined independently, so the goalie who lets in three goals on the first five shots will usually have another 20-30 shots to regress to the mean.
  • This study makes no effort to account for change in skill over time. Over the seven years in question, Fleury has gone from a 21-year-old rookie to a 27-year-old in his prime, so we might expect that he had more bad stretches in 2005-06 and more good stretches in 2011-12. This would look exactly the same in the plots above as a goalie who had hot and cold stretches throughout his career, but would not normally be considered streakiness.
  • Similarly, over those seven years the goalies have had a variety of coaches, teammates, and in some cases have switched teams altogether. If any of those things impact save percentage, they would have the same effect as aging, making the goalie appear more variable than he really is.
  • Goalies sometimes play through an injury that hampers their performance for a stretch of time. Simulated goalies never have to do that.

My hunch is that all of those factors put together easily account for the small differences between the simulated and actual distributions, and that all of the psychological factors commonly cited to explain variability (confidence, focus, etc) collectively add up to virtually zero effect. I haven’t proven that, however; all I can say with confidence right now is that all of the model imperfections and psychological factors put together collectively add up to something very small, and that goalie streakiness is mostly just random chance.

Recently on NHLNumbers