Almost famous: Lessons from a near miss

Eric T.
August 07 2012 06:19AM

Hubble
The universe - it's not so static.
Photo by NASA, Public Domain

It was one of those moments that you live for in research. A shocking result popped out of the analysis. "Holy crap, this is going to change the game," I thought.

But that's not the story here. The real story is what happened next.

Let's back up. It all started for me when I read this quote from Vic Ferrari:

Nah, the obvious weakness of this model (and Gabe's for that matter) is that defenders and forwards are given equal weight. It becomes obvious very quickly that forwards are driving the outchancing-at-evens bus ... defensemen are just riding it.

This has stayed with me ever since, because it's a pretty important claim. If scoring chance totals are really driven by forwards, then we probably overrate the defensemen who play with good forwards -- and overrate defensemen in general, for that matter.

This year's work on zone entries brought it to the forefront of my mind again. Recall that we have preliminary data suggesting that shot differential (and therefore scoring chances) may be driven largely by performance in the neutral zone. I think the forwards do the majority of the work in the neutral zone, so if the neutral zone is really that important, it stands to reason that the forwards might indeed be responsible for the majority of the outcomes.

Yet if you look at the top defensemen, it is clear that everyone on the team does much better with them on the ice, which doesn't seem consistent with them being pure passengers on the outchancing bus. How much of a hand do they have on the wheel?

It occurred to me that we could get an empirical assessment by looking at defensemen who changed teams. If the forwards are really doing the bulk of the work, then when a defenseman moves to a whole new group of forwards, that should lead to a big change in his Corsi rating (the team's shot differential with him on the ice). So I pulled some Corsi data from behindthenet.ca, and here's what I found for players who played at least 500 even strength minutes both seasons:

The correlation between '10-11 relative Corsi and '11-12 relative Corsi for defensemen who stayed on the same team was 0.54; for the 40 defensemen who changed teams it was just 0.16.

In contrast, for forwards it was 0.62 if they stayed with the same team and 0.58 if they changed teams.

Whoa.


This is where things get exciting. It's also where things get dangerous.

At this point circumstances seem pretty compelling. I have a hypothesis that came from an analytical authority I deeply respect. I have some circumstantial data making Ferrari's claim seem plausible. And now I have some direct evidence showing a huge effect. Everything is lined up to support this conclusion so far. It's exhilarating; this data could change the way we view the game.

But that emotion is part of what makes this a dangerous moment -- once a person forms a conclusion, it becomes difficult to objectively view data on the subject, especially if the issue is emotionally charged. That's why a Wild fan who dug in early in the year against the predictive power of statistics was arguing at the end of the year that not one person used stats to predict the Wild's save percentage would fall, even though he had personally written about several stat-based articles that said their save percentage was unsustainable. The subconscious desire not to be wrong can be so powerful that when reminded of those articles, he concluded that they must have been changed after the fact.

I don't want to fall into that trap. I need to be sure about this before I go any further.


So what do I do? I start looking for flaws in my approach, other ways to view the situation. I describe the data to Derek Zona, who immediately starts in with a rapid-fire list of potential issues for me to look into.

  • Are the formulas wrong?
  • What impact does quality of competition have? Are defensemen who change teams usually changing roles?
  • What impact do zone starts have?
  • Is there a change in ice time?
  • What if we use straight Corsi instead of relative? Or a zone-start-adjusted version?
  • Can we look at the list of players and see if anything obvious jumps out about who went up and who went down?

I plug through all of these possibilities, and at the end, the effect is looking as strong as ever. With all of the correction factors we could think of, the year-over-year correlation for defensemen who changed teams this year never got above 0.2.

There's one last hole to check. There were 40 defensemen in my study -- a decent sample, but not huge. Just by random variation, there's about a 1-2% chance I could've gotten a correlation as small as 0.16 for those 40 guys even if the true effect is similarly small to what we saw for forwards. It's an outside chance, but to make an extraordinary claim, I want to have extraordinary proof. So let's see what we find...

For the 44 defensemen who changed teams between '09-10 and '10-11, the year-over-year correlation is 0.41. For the 45 guys who changed teams between '08-09 and '09-10, the correlation is 0.45.

Poof, just like that, the amazing result is gone. It turns out to be just an improbable statistical quirk. I sink into my chair, deflated.

And then it hits me, a flood of relief as I realize how close I came to an embarrassingly public error. Only the desire to pressure-test my findings from every possible angle saved me here. Having a critic trying to poke holes helped. And this is what I take away from that emotional ride; this is what I hope to share with you today.


Hockey is a complicated game, with a lot of things to see, track, and think about -- and a lot of ways to be wrong. If I am trying to generalize and develop my understanding of the game, I need to investigate my theories as rigorously as possible and constantly remind myself which beliefs are not yet well-tested.

This is not a simple stats-versus-eyes issue; it is a general approach to rational thought. We need to constantly challenge our beliefs, examine the evidence against (both visual and statistical), seek out the critics, and adjust our conclusions as needed. As Hawerchuk's tagline for Behind The Net used to read, "The facts have changed, so my position has changed. What do you do, sir?"

Every approach has weaknesses; everybody makes mistakes. No matter how you come to conclusions, you should be constantly updating your information and reassessing your beliefs. The more angles you use to come at a problem, the better your solution will be.

I may not have changed the game today, but the reminder to seek and embrace criticism will serve me well tomorrow.


Recently by Eric T.

2654ef2681c88bc3252431ec45e30590
Eric T. writes for NHL Numbers and Broad Street Hockey. His work generally focuses on analytical investigations and covers all phases of the game. You can find him on Twitter as @BSH_EricT.