June 06 2012 08:44AM
Taking the NHL’s official real time statistics at face value, we come across some astonishing things. For instance, the Chicago Blackhawks are more than 50% more physical on the road than at home. Perhaps they’re showing off for the home fans, but a more likely explanation is that the home scorer is more generous counting hits than road scorers are (this seems especially plausible given that home scorers dramatically over count both giveaways and takeaways for the home side as well).
How do we make the data usable?
The perfect solution would be more stringent recording of the data, perhaps having a pair of independent markers out of a central office for each game, rather than the current system that sees each arena employ their own recorder.
Another solution is to write the data off completely – pretend the NHL doesn’t even keep it. This is also problematic, as people are interested in knowing which players are hitting and blocking shots and the like.
The final solution is to compensate for home scorers who count far more or far fewer events than the league norm. That’s what I will try to do here.
HHITS/GP represents the number of hits the team recorded in an average home game; RHITS/GP is the same except that it’s for the average road game. MARKUP represents the number of hits (expressed as a percentage) recorded in the average home game as opposed to an average road game.
For the most part, home scorers dramatically over count the number of hits their team lands as opposed to road scorers. Nineteen teams are more frequently credited with a hit at home than on the road; the reverse is true for just 10 clubs. Only five teams undercount hits by 10% or more; 18 teams over-count by more than 10%.
My belief is that this has more to do with road undercounting than with favoritism by the home team. The home scorer sees their team more frequently; naturally, they’re more familiar with the players involved and they’re less likely to miss a hit. With an unfamiliar road team, it’s easier to miss out.
Nevertheless, we’re interested less in a strictly accurate count than we are a consistent one, one that allows us to compare players across teams.
How much difference can scorer bias make? Let’s consider the leading hitters on our two outlying teams.
Brent Seabrook led the Blackhawks in hits in 2011-12, and was credited with 198 of them. He played 41 home games and 37 road games. Using the averages above, in those home games Chicago was credited with 910 hits; in the road games, just 533 – meaning that in all likelihood, 37% of Seabrook’s hits were counted on the road, and 63% at home. In Seabrook’s case, that means 73 road hits and 125 home hits. Leaving the road hits alone and adjusting the home hits down by the same percentage that the Blackhawks home scorer over counts them, we come to a total of 154 hits – 44 fewer than the NHL officially credits him with.
Tim Jackman led the Calgary Flames in hits in 2011-12, credited with 160 of them. He played 37 home games and 38 road games. Using the averages above, in those home games Calgary was credited with 607 hits; in the road games, 863 – meaning that it’s likely that ~59% of Jackman’s hits were counted on the road, and just 41% at home. That works out to 101 hits on the road, and 59 at home. If we leave the road hits alone and increase the number of home hits by the same ratio that the Calgary scorer undercounts them, we end up with a total of 183 hits – an extra 23 hits that he wasn’t credited with.
Somebody a little better at data mining could do a better job of this than I can – actually taking the number of hits recorded at home and on the road from the play-by-play sheets and then adjusting them as I just have. For individual players on different teams though, this is one way to compensate for scorer bias.