Abstract
Some goaltenders are thought of as being particularly streaky or particularly consistent, but are those labels fair? In this article, we compare MarcAndre Fleury, Ilya Bryzgalov, Henrik Lundqvist, Pekka Rinne, Jaroslav Halak, and Carey Price and find two things: they all exhibit about the same amount of variability as each other, and they all exhibit about the same amount of variability as would be expected from simple random chance.
Introduction
Heading into the FlyersPenguins series, one common defense of MarcAndre Fleury from those who argued he was better than his stats was that he was an extremely consistent goalie, whereas Ilya Bryzgalov was unreliable and streaky. (I always feel compelled to prove that I’m not fighting a straw man, so here are a few examples from Darren Dreger, the Sporting News, Pro Hockey Talk, and the Penguinsfocused blog HighHeelsAndHockey.)
Fleury’s play in the series may have done more to address that claim than I ever could, but at the time it started me down the road of trying to answer two questions:
 Can I find evidence that goalies are more streaky or more consistent than would be expected from random variance?
 Can I find evidence that some goalies are more streaky or more consistent than others?
To answer these questions, I simulated a random career for each goalie and compared it to his actual career to see which was streakier. I’ll save the details of the simulation methodology for the appendix, but the output was a 10,000 game simulated career which assumed that the goalie’s odds of stopping any given shot was exactly his career average save percentage, with no streakiness at all except for random chance.
Once we can have that, we can make a plot of how often a goalie posts a certain save percentage over a given number of games, and compare the results to his perfectly steady simulated counterpart.
If the goalie is truly streaky, we would expect him to run hot or cold more often than the coinflip model does, which would mean the distribution of save percentages in his actual career would be broader than in the simulation.
The Results
Here’s what we see for Fleury’s results over any given 3game stretch compared to the simulated Fleury’s 3game stretches:
Fleury looks awfully similar to his randomly simulated counterpart. Nearly identical, even.
However, the decision to look at 3game stretches was arbitrary; maybe streaks last longer than that, so let’s look at some other cutoffs to be sure we aren’t missing something.
It could be argued that Fleury’s distribution over 5game stretches is just the slightest bit broader than the simulated Fleury, but the difference is not large. The standard deviation – a measure of the spread of the results – is 0.026 for Fleury’s actual results and 0.025 for simulated Fleury, a difference that is virtually imperceptible in reality and as likely as not due to imperfections in the model (see appendix).
To a first approximation, it’s fair to say that Fleury’s consistency is what you’d see from a robot goalie that had no effects of injury, confidence, focus, or whatever else might cause a goaltender to appear more dialed in at some times than at others.
Perhaps that’s evidence that Fleury is indeed unusually consistent. Do other goalies fluctuate more than we’d expect by random chance, with streaks of running hot and cold beyond what the coinflip model achieves? Let’s take a look at Bryzgalov.
Again, that’s pretty darn similar. Bryzgalov had three bad starts in a row a little more often than the model did, but other than that his career is virtually identical to that of a .915savepercentage puckstopping robot. (Question: could we build such a device for less than $51 million?)
I looked at six goalies using this method, evaluating players who were suggested to me on Twitter as being either particularly consistent or particularly streaky. I’ll spare you the rest of the plots and summarize with a table showing the standard deviation of the distribution of results for each goalie and his simulated counterpart:

3game stretches 
5game stretches 
10game stretches 
Fleury / SimFleury 
.033 / .033 
.026 / .025 
.018 / .018 
Bryzgalov / SimBryzgalov 
.034 / .032 
.026 / .024 
.017 / .017 
Lundqvist / SimLundqvist 
.032 / .031 
.024 / .024 
.018 / .017 
Rinne / SimRinne 
.033 / .032 
.025 / .024 
.018 / .018 
Halak / SimHalak 
.033 / .031 
.026 / .024 
.018 / .017 
Price / SimPrice 
.031 / .030 
.025 / .023 
.018 / .017 
Each goalie is just a tiny bit less consistent than the random variance model, with differences pretty comparable to those plotted above. All of the factors that might contribute to making a real goalie less consistent (imperfections in the model, injury to the goalie, psychological factors, change in talent level over the years) all add up to increasing the standard deviation by about 0.1%.
Conclusion
I have written previously about people’s tendency to underestimate how streaky random chance is, and I think that is what has happened here. There is very little difference between goalies and perfectly consistent robots, and certainly nowhere near enough difference between the goalies to label one of them streaky and another consistent.
Goalies should be evaluated based on how much skill they have demonstrated, not how often we remember them going on a hot streak.
Appendix: How the model works
For each goalie, I went through the following process:
 For each of his starts since the lockout, note how many shots he faced
 Produce a histogram, a distribution of how often he faced a given number of shots (e.g. Fleury faced 23 shots in a game 11 times, 24 shots in a game 21 times, etc)
 Simulate 10,000 games by the following method:
 Select the number of shots faced randomly, using the histogram from step 2
 Simulate each shot, assuming that the likelihood of stopping any given shot is exactly the goalie’s career save percentage since the lockout
 Record the number of shots faced and saves made in each simulated game
That gave me a simulated 10,000 game career in which the distribution of shots faced mirrors his real life distribution of shots faced and he had the exact same chance of stopping each shot. From there, the distribution of results in a 3game (or 5game, or 10game) moving average could be compared to the distribution of results from his actual career.
I mentioned that the model is not perfect and might be expected to give a slightly tighter distribution than reality. Here are some examples of why:
 I did not separate out even strength shots faced and power play shots faced. That adds a random factor that might cause a greater spread than would be predicted from this simpler model. In real life, sometimes Fleury went three games without seeing many power play shots, and sometimes he was under siege for three games, but in the model every shot came with the same .909 save percentage.
 In real life, most of the time that a goalie faces only a few shots in a game, it is because he let in multiple goals and got pulled, and those short games can have a big impact on the goalie’s save percentage over the threegame stretch. In the simulation, the number of shots faced and goals scored are determined independently, so the goalie who lets in three goals on the first five shots will usually have another 2030 shots to regress to the mean.
 This study makes no effort to account for change in skill over time. Over the seven years in question, Fleury has gone from a 21yearold rookie to a 27yearold in his prime, so we might expect that he had more bad stretches in 200506 and more good stretches in 201112. This would look exactly the same in the plots above as a goalie who had hot and cold stretches throughout his career, but would not normally be considered streakiness.
 Similarly, over those seven years the goalies have had a variety of coaches, teammates, and in some cases have switched teams altogether. If any of those things impact save percentage, they would have the same effect as aging, making the goalie appear more variable than he really is.
 Goalies sometimes play through an injury that hampers their performance for a stretch of time. Simulated goalies never have to do that.
My hunch is that all of those factors put together easily account for the small differences between the simulated and actual distributions, and that all of the psychological factors commonly cited to explain variability (confidence, focus, etc) collectively add up to virtually zero effect. I haven’t proven that, however; all I can say with confidence right now is that all of the model imperfections and psychological factors put together collectively add up to something very small, and that goalie streakiness is mostly just random chance.