The Myth of the Hot Goalie: Consistent Goaltenders vs. Inconsistent Goaltenders

Eric T.
May 11 2012 04:13PM

 

Consistent, or random?
Photo by Michael Miller via Wikimedia Commons, commons license

Stinks

 

Abstract

Some goaltenders are thought of as being particularly streaky or particularly consistent, but are those labels fair? In this article, we compare Marc-Andre Fleury, Ilya Bryzgalov, Henrik Lundqvist, Pekka Rinne, Jaroslav Halak, and Carey Price and find two things: they all exhibit about the same amount of variability as each other, and they all exhibit about the same amount of variability as would be expected from simple random chance.

Introduction

Heading into the Flyers-Penguins series, one common defense of Marc-Andre Fleury from those who argued he was better than his stats was that he was an extremely consistent goalie, whereas Ilya Bryzgalov was unreliable and streaky. (I always feel compelled to prove that I’m not fighting a straw man, so here are a few examples from Darren Dreger, the Sporting News, Pro Hockey Talk, and the Penguins-focused blog HighHeelsAndHockey.)

Fleury’s play in the series may have done more to address that claim than I ever could, but at the time it started me down the road of trying to answer two questions:

  • Can I find evidence that goalies are more streaky or more consistent than would be expected from random variance?
  • Can I find evidence that some goalies are more streaky or more consistent than others?

To answer these questions, I simulated a random career for each goalie and compared it to his actual career to see which was streakier. I’ll save the details of the simulation methodology for the appendix, but the output was a 10,000 game simulated career which assumed that the goalie’s odds of stopping any given shot was exactly his career average save percentage, with no streakiness at all except for random chance.

Once we can have that, we can make a plot of how often a goalie posts a certain save percentage over a given number of games, and compare the results to his perfectly steady simulated counterpart.

If the goalie is truly streaky, we would expect him to run hot or cold more often than the coinflip model does, which would mean the distribution of save percentages in his actual career would be broader than in the simulation.

The Results

Here’s what we see for Fleury’s results over any given 3-game stretch compared to the simulated Fleury’s 3-game stretches:

Distribution of Fleury results over 3-game stretches

Fleury looks awfully similar to his randomly simulated counterpart. Nearly identical, even.

However, the decision to look at 3-game stretches was arbitrary; maybe streaks last longer than that, so let’s look at some other cutoffs to be sure we aren’t missing something.

Distribution of Fleury results over 5-game stretches Distribution of Fleury results over 10-game stretches

It could be argued that Fleury’s distribution over 5-game stretches is just the slightest bit broader than the simulated Fleury, but the difference is not large. The standard deviation – a measure of the spread of the results – is 0.026 for Fleury’s actual results and 0.025 for simulated Fleury, a difference that is virtually imperceptible in reality and as likely as not due to imperfections in the model (see appendix).

To a first approximation, it’s fair to say that Fleury’s consistency is what you’d see from a robot goalie that had no effects of injury, confidence, focus, or whatever else might cause a goaltender to appear more dialed in at some times than at others.

Perhaps that’s evidence that Fleury is indeed unusually consistent. Do other goalies fluctuate more than we’d expect by random chance, with streaks of running hot and cold beyond what the coinflip model achieves? Let’s take a look at Bryzgalov.

Distribution of Bryzgalov results over 3-game stretches Distribution of Bryzgalov results over 5-game stretches
Distribution of Bryzgalov results over 10-game stretches

Again, that’s pretty darn similar. Bryzgalov had three bad starts in a row a little more often than the model did, but other than that his career is virtually identical to that of a .915-save-percentage puck-stopping robot. (Question: could we build such a device for less than $51 million?)

I looked at six goalies using this method, evaluating players who were suggested to me on Twitter as being either particularly consistent or particularly streaky. I’ll spare you the rest of the plots and summarize with a table showing the standard deviation of the distribution of results for each goalie and his simulated counterpart:

 

3-game stretches

5-game stretches

10-game stretches

Fleury / SimFleury

.033 / .033

.026 / .025

.018 / .018

Bryzgalov / SimBryzgalov

.034 / .032

.026 / .024

.017 / .017

Lundqvist / SimLundqvist

.032 / .031

.024 / .024

.018 / .017

Rinne / SimRinne

.033 / .032

.025 / .024

.018 / .018

Halak / SimHalak

.033 / .031

.026 / .024

.018 / .017

Price / SimPrice

.031 / .030

.025 / .023

.018 / .017

Each goalie is just a tiny bit less consistent than the random variance model, with differences pretty comparable to those plotted above. All of the factors that might contribute to making a real goalie less consistent (imperfections in the model, injury to the goalie, psychological factors, change in talent level over the years) all add up to increasing the standard deviation by about 0.1%.

Conclusion

I have written previously about people’s tendency to underestimate how streaky random chance is, and I think that is what has happened here. There is very little difference between goalies and perfectly consistent robots, and certainly nowhere near enough difference between the goalies to label one of them streaky and another consistent.

Goalies should be evaluated based on how much skill they have demonstrated, not how often we remember them going on a hot streak.

Appendix: How the model works

For each goalie, I went through the following process:

  1. For each of his starts since the lockout, note how many shots he faced
  2. Produce a histogram, a distribution of how often he faced a given number of shots (e.g. Fleury faced 23 shots in a game 11 times, 24 shots in a game 21 times, etc)
  3. Simulate 10,000 games by the following method:
    1. Select the number of shots faced randomly, using the histogram from step 2
    2. Simulate each shot, assuming that the likelihood of stopping any given shot is exactly the goalie's career save percentage since the lockout
    3. Record the number of shots faced and saves made in each simulated game

That gave me a simulated 10,000 game career in which the distribution of shots faced mirrors his real life distribution of shots faced and he had the exact same chance of stopping each shot. From there, the distribution of results in a 3-game (or 5-game, or 10-game) moving average could be compared to the distribution of results from his actual career.

I mentioned that the model is not perfect and might be expected to give a slightly tighter distribution than reality. Here are some examples of why:

  • I did not separate out even strength shots faced and power play shots faced. That adds a random factor that might cause a greater spread than would be predicted from this simpler model. In real life, sometimes Fleury went three games without seeing many power play shots, and sometimes he was under siege for three games, but in the model every shot came with the same .909 save percentage.
  • In real life, most of the time that a goalie faces only a few shots in a game, it is because he let in multiple goals and got pulled, and those short games can have a big impact on the goalie’s save percentage over the three-game stretch. In the simulation, the number of shots faced and goals scored are determined independently, so the goalie who lets in three goals on the first five shots will usually have another 20-30 shots to regress to the mean.
  • This study makes no effort to account for change in skill over time. Over the seven years in question, Fleury has gone from a 21-year-old rookie to a 27-year-old in his prime, so we might expect that he had more bad stretches in 2005-06 and more good stretches in 2011-12. This would look exactly the same in the plots above as a goalie who had hot and cold stretches throughout his career, but would not normally be considered streakiness.
  • Similarly, over those seven years the goalies have had a variety of coaches, teammates, and in some cases have switched teams altogether. If any of those things impact save percentage, they would have the same effect as aging, making the goalie appear more variable than he really is.
  • Goalies sometimes play through an injury that hampers their performance for a stretch of time. Simulated goalies never have to do that.

My hunch is that all of those factors put together easily account for the small differences between the simulated and actual distributions, and that all of the psychological factors commonly cited to explain variability (confidence, focus, etc) collectively add up to virtually zero effect. I haven’t proven that, however; all I can say with confidence right now is that all of the model imperfections and psychological factors put together collectively add up to something very small, and that goalie streakiness is mostly just random chance.

Recently on NHLNumbers

2654ef2681c88bc3252431ec45e30590
Eric T. writes for NHL Numbers and Broad Street Hockey. His work generally focuses on analytical investigations and covers all phases of the game. You can find him on Twitter as @BSH_EricT.
Avatar
#1 Kent Wilson
May 11 2012, 04:16PM
Trash it!
0
trashes
Props
1
props

Outstanding work. Thanks Eric.

Avatar
#2 RJ
May 11 2012, 04:34PM
Trash it!
0
trashes
Props
0
props

Excellent work here, Eric. I love to see someone acknowledging random chance as a reason as to why a particular outcome happened. It's a pitifully underused, albeit unsatisfying, explanation for events.

Avatar
#3 dan
May 11 2012, 04:35PM
Trash it!
0
trashes
Props
0
props

Awesome work as always EricT. To take things further, how close do you feel we are to stating in effect nhl results are 'all luck'. In other words, Bryz did not just have a bad playoffs it was justa blip we expect by random chance. And, is it time to become more radical,in rule changes to reward skill. i.e. make the nets larger, reduce number of blocked shots, etc etc so that skill takes more of a role. Personally, I would like to get hockey to a 50 /50 split. I believe its around 60/40 luck right now.

Avatar
#4 Jared Lunsford
May 11 2012, 04:51PM
Trash it!
0
trashes
Props
0
props

Abstract and Appendix? Someone went the extra nerdy mile!

Great work.

Avatar
#5 Geoff Detweiler
May 11 2012, 04:58PM
Trash it!
0
trashes
Props
0
props

@dan

I don't think we are stating NHL results are "all luck".

Bryzgalov did have a bad playoffs. There are any number of reasons for that, some of which is skill, some of which is mental, some physical, some luck.

Just because goalies don't show a tendency to be any more consistent than expected doesn't mean everything is luck. It means that every goalie has variance, "hot streaks" and "cold streaks", but that the frequency is what should be expected based on skill.

Avatar
#6 dan
May 11 2012, 06:00PM
Trash it!
0
trashes
Props
0
props

@Geoff Detweiler

Thanks Geoff for your response. I know it's not 'all luck', but was just wondering if any one else feels 'luck' now plays too big a role.

Avatar
#7 dan
May 11 2012, 06:16PM
Trash it!
0
trashes
Props
0
props

@Geoff Detweiler

Geoff; Just found an article you wrote... on "Bryz and regression": Very well done & really helped me get it..thanks

http://philly.sbnation.com/philadelphia-flyers/2012/3/7/2850120/ilya-bryzgalov-regression-shows-people-should-think-more-often

Avatar
#8 LJ21
May 11 2012, 07:05PM
Trash it!
0
trashes
Props
0
props

Apparently Pittsburgh had an imposter in goal against Philadelphia in the 2012 playoffs. In games 2, 3, and 4 of that series, nhl.com reports that Fleury saved 67 of 83 shots, for a save percentage of 80.7%. This three-game total is completely outside the distribution that you present for Fleury; ie your graph implies that 0% of his three-game stretches had save percentages that low. What gives? The five-game stretch does not fit with your graphs much better. In the last five games of that series Fleury saved 109 / 131 shots, or 83.2%. Right at the edge of your 5-game distribution.

Avatar
#10 LJ21
May 11 2012, 10:22PM
Trash it!
0
trashes
Props
0
props

Nevertheless that bad streak does speak against your thesis. A robot goalie would not have gone stone cold as he did.

Your phrase "surprising since he's such a clutch big-game goalie" suggests that you don't believe yourself that these human athletes are repeating robots that only express pure statistical variability.

Avatar
#11 Kent Wilson
May 11 2012, 10:44PM
Trash it!
0
trashes
Props
0
props

@LJ21

Time to check your sarcasm meter.

Avatar
#13 draglikepull
May 12 2012, 09:49AM
Trash it!
0
trashes
Props
0
props

I wonder if we should consider this evidence that shot quality is not a major factor in goaltender performance. If actual goaltender performance lines up very closely with what would be expected if all shots had the same probability of going in, then that seems like evidence that shot quality is not significantly affecting the real life save percentage. Does that sound accurate?

Avatar
#14 lj21
May 12 2012, 09:54AM
Trash it!
0
trashes
Props
0
props

I'm not sure where you got 1 in 200 from. The blue curve - actual Fleury three-game stretches - hits zero before that, indicating that it never happened in real life. If you have omitted blue values below that level, that omission would hide a fat tail that may exist. Whether or not the actual tail is much fatter than a robot's is central to your thesis. My earlier comment wondered how often a robot goalie would be as cold as Fleury was in the recent playoffs. Looking at your red simFleury three-game curve, and imagining its extension to lower values, it seems to me that the area under the curve of that extension is smaller than 1 / 200 as big as the total area under the curve. While I don't have access to your data, a simple estimate indicates that the robot's probability of performing that poorly is lower. (If a robot faces 83 shots, with p=0.909 binomial distribution, the probability of saving 67 or fewer is p=0.00308, or 1 in 325).

I'm also not sure about your claim of 1 in 63 playoff stretches. It's not as if I did an exhaustive search. I looked at exactly one instance - the most recent playoff series - guided by my anecdotal prejudice (which you claim is totally unreliable). The one case I looked at was outside your distribution. Imagining the vast array of potential checks like I did, the probability of finding such an outlier so easily should be very low.

Sorry about missing your sarcasm re clutch goalie. However a choker is also not a robot. One could presumably test whether the career playoff record has a distribution that is distinct from the regular season distribution that you have looked at. That would be non-robotic.

Another issue is that when coaches pull a cold goalie, they are trying to minimize the tail fatness - ie avoid the horror show performances. So that strategy is intended to keep the distribution narrower.

Avatar
#17 lj21
May 12 2012, 03:48PM
Trash it!
0
trashes
Props
0
props

The importance of tail shape is one area where we disagree. A three-game "result of .807 every once in a while" is very imprecise. Is it 1 / 325 or 1 / 63 or 1 / 20? These are all once in a while, but the differences among them can carry large impacts on an NHL team's success. I think Pittsburgh's coach and GM did indeed expect never to have 8 goals against in two playoff games in a row. I would have liked to see the full version of your 3-game graph that included all of the points for the real goalie. Anything popping up way out there is interesting.

Apart from the issue of a tail shape that does not change in time is the issue of shifting distribution come playoff time. Your statement "he's only achieved his career .909 save percentage in one of his six playoff appearances" suggests that there may be a distinct distribution, even before the most recent debacle. That shifting distribution would be another individual trait that coaches and GMs are interested in. People have nerves, and some individuals handle them better than others; robots do not. And some athletes overcome earlier difficulties later in their career (such as Tom Watson in golf). These individual human struggles are part and parcel of spectator sports, and are the main reason why so many people want to watch, as compared to near-zero spectators for robot hockey.

Avatar
#18 Derek
May 12 2012, 06:56PM
Trash it!
0
trashes
Props
0
props

Awesome work. So I guess this means there isn't a lot of value to tracking things like quality starts?

Off-topic but you guys should get Vic Ferrari to write for you. This corner of the internet was a better place with him active.

Avatar
#20 grumble_grumble
May 13 2012, 11:05AM
Trash it!
0
trashes
Props
0
props

I first want to say this is an interesting topic and would like to see the results of all the goaltenders in the league.

Secondly, I just want to let you know what I am concluding from this so that you may let me know of something I could be missing to help me further understand.

From what has been shown, it looks like these goalies have shown to produce consistent results (namely consistent SV%) over their careers compared to what a "robotic" goalie of their skill would produce. Also, it is able to conclude that this does not show whether these goalies actual play (movement, focus, rhythm, decision making, etc.) is consistent or not, but that just their measurable results are. This tells me that if one of these goaltenders does show inconsistencies in his play, then it does not affect the consistency of his SV% over a three, five, or ten game period.

Again, I think it would be interesting to the results of all the goaltenders (mostly because I am curious what Steve Mason's looks like) and also maybe the effects of a two game window as well (not sure what that would do or if you would consider that long enough to be relevant to consistency).

Have to get going. Very interesting though, thanks for the read!

Avatar
#21 DLS
May 15 2012, 01:24PM
Trash it!
0
trashes
Props
0
props

I am tending to agree with lj21 about fleurys playoff performance to be way off his statisical norm. The bell curve represents what is suppose to represent over 99 percent of his save percentage over three game sets. If Fleurys historical save percentage is .909 then his playoff save percentage of .807 should represent the same chance as having three shutouts in a row, or a 100% save percentage. The statement that it happens once in a while, I think, is a bit of a stretch.

Comments are closed for this article.