The Theory and Nature of Current Advanced Hockey Analysis

Kent Wilson
May 11 2012 09:16AM



(NHLNumbers will occassionally publish some of our author's archival material. This article was originally posted on July 23rd, 2011)

Before we move forward, it makes sense to discuss and explicate the basis of the analyses we tend to engage in here. The theory, the application and the practicality of the stats we employ and the resultant interpretations. 

Robert Cleave's article on Rene Bourque is an illustration of one of the basic tenets of analysis that is frequently overlooked: the the game of hockey isn't one of totals but of differentials, of ratios. In 2009-10, Bourque scored 27 goals and 58 points. Last season, he managed a similar stats line: 27 goals and 50 points. Judging by the boxscores, Bourque was marginally worse than his career peak the prior year. In truth, he took a drastic step back. 

His stark devolution as a player was somewhat obvious to regular observers of the team last season, but is fully revealed when we peel back to the thin, but opaque layer of counting stats and delve into the numbers underneath: Bourque spent more time in the defensive zone relative to the prior year and more time getting outchanced and outscored. His presence caused a dip in scoring chance ratios with other players across the board. He was, by and large, a detriment. The eight point dip in his statsline was only the merest hint of his decline. 

Hockey is about getting more not lots. The distinction is an important one, because the latter does not necessarily guarantee the former. The statement about totals and differentials above is an axiom I've come to accept since I began writing about the game. A player who drives differentials helps teams win and often times the basic counting numbers have only a passing relationship to the his actual value. 

On Books and Covers

"Counting numbers" is the name given to the familiar, conventional stats everyone recognizes and references in their analysis (goals, assists, points, etc). A better moniker would probably be "surface stats". They are the seemingly calm sea above a roiling soup of antecedents that are largely hidden from view. For some guys, counting stats are but a pale reflection of their abilities. For others, the boxscores act as a facade, worn to mask significant warts. For the former, a statsline can be a thin veil obscuring his value or even a scarlet letter, a stigma unfairly imposed by the results that impugns his true quality. For the latter, the boxcars are a vanity and inexorably fleeting. The quest to dump bad contracts that were previously signed by overeager managers every summer could be dubbed the bonfire of the vanities I think. 

The penchant for even NHL general managers to be dazzled by the superficial illustrates the the seductiveness of surface stats to even the highest level decision-makers in the game. Results, after all, are what everyone is ultimately after, so it remains forever tempting to chase results in the pursuit of success. Doing so means inverting the causal chain, however: true analysis is understanding the variables and agents that give rise to outcome. The means to the end rather than just the end itself. The coal before before the diamond, if you will.

This is perhaps the crossroads at which conventional thought and so called "advanced stats" most frequently clash. Every hockey fan's (coach's, GM's, etc.) perception of a player is inevitably anchored by heuristics; "rules of thumb" which evolve from information that is perceptually impactful or easily available. Human cognition works in general by conjuring habitual psychological markers that act as lighthouses in the maelstrom of data that is life in general. The problem is, a heuristics is not a principle: it is a quick-start guide at best, an inherent bias at worst. It isn't the template. It is the stereotype. 

The conflict occurs when analysis of the underlying numbers disagrees with the connotations we attach to the surface results. Think about the common mental signposts that are almost universally employed. A 20-goal scorer is a pretty good player, right? What comes to mind when one thinks of a 50-point player, for instance? How about a 10-goal forward versus and 10-goal defender? The automatic mental ranking that is next to unconscious for the experienced hockey fan is the activity of heuristics - rules of thumb - that are essentially functional assumptions in aggregate, but not necessarily accurate in the specific.

If we were to draw a venn diagram of the common perceptions of players based on results/surface stats and their true value or skill level, there would likely be some overlap. All things being equal, a 20-goal scorer probably is a pretty good hockey player.

All things are rarely equal, however, which is where the two tracks often part ways. Some 20-goal scorers play against lesser opponents, or spend a lot of time on the powerplay or boast a career high SH%. Others play against superstars and start out more often in the defensive zone. The surface stats say the two guys are roughly equal. The heuristics we've developed decide that there probably isn't much to separate them. But, in truth, one is far more valuable to his team than the other.

As humans, we tend to cling to already held beliefs even in the face of competing evidence. In some ways, the confirmation bias ensures our cognitive landscapes aren't consistently upended by new information. In others, it means we reject what might be accurate or true because it doesn't accord with what we believe or prefer. In the first, it means we aren't overly gullible or endlessly indecisive. In the second, it means we're stubborn or willfully ignorant. That is the inevitable tug-of-war we all wage and the framework through which data and analysis is filtered.

Possession based analysis

The theoretical value and legitimacy of the corsi statistic and possession based analysis is a hotly debated topic in some quarters as result. Not only is it new and therefore relatively unknown and untested in the eyes of most, but it sometimes flies in the face of long held beliefs or seemingly common sense conclusions. Here I will present the general theory surrounding the corsi school of thought, the mounting evidence for it's efficacy as well as the practical applications. 

As mentioned, surface stats are the effect of an interplay of causes. Goals, assists and goal differential are determined by two primary factors: volume of shots for and against and the frequency of goals scored for and against. We'll call the former possession and the latter percentages.

Possession in this context is short-form for "possession of the puck in the offensive zone". Teams that control possession at even strength tend to have higher shot counts overall as well as better corsi differentials or ratios. The corsi stat is best conceptualized as a proxy for zone time: a high corsi differential or ratio means a player or team spends more time in the offensive zone, and vice versa.

The importance of possession has been demonstrated over and over by various investigations: the correlation between corsi ratios and scoring chance ratios is persistently high (ranging from 0.7 to 0.9), for instance and JLikens of the Objective NHL has shown that the correlation between corsi and outscoring at even strength is on the order of 0.5-0.6 over a sufficiently large sample. Possession stats tends to persist as well (all things being equal) meaning corsi is reliably measuring a skill rather than merely chance or some other variable.

Percentages, on the other hand, tend to regress to the mean over time. This suggests that natural variance - or "luck" - has a stronger influence on them than ability.

To understand this, most people need to strip "luck" of the connotations is carries regarding issues of fairness and justice. A lot of discussions get sidetracked by people bristling at the suggestion that a given athlete or team on a hot-streak is undeserving of their success when variance is mentioned. An analogy that might help -

Consider each shot on net to be a lottery ticket, with the prize being a goal. Some tickets have a higher chance of winning than others: the chances range from about 3% (unscreened point shots, shots from very sharp angles) to about 25% (crease shots, break-aways, etc.). Mid-range scoring chances tend to fall in the middle - a lottery ticket with about a 15% chance of winning. In just about every game, there's a lot more lower quality chances than higher, which is why even mediocre goalies have a SV% at or about .900 in the league. Percentages or chances of scoring in the NHL seem rather low because of the quality of the competition between players, the quality of netminding in the modern league, the size and strength of the guys in general, the advancement of equipment, the proliferation of advanced scouting, etc.

- cartoon via wondermark

To score, teams try to up their number of relatively high percentage shots or lottery tickets every game by driving the puck into the middle of the ice and closer to the net. They also do everything possible to restrict the opposition from doing the same. This is essentially what we're trying to capture with possession stats: finding players and teams that spend more time in the offensive zone, essentially driving scoring chances for and/or lessening scoring chances against. Remember, the correlation between corsi and scoring chance differentials is persistently high.

Of course, goals are relatively random events and randomness tends to be rather untidy in small samples. Some games, the teams who outchance the bad guys don't win: they'll scratch more high probability (15%-25%) tickets, but won't end up with as many winning as a matter of variance. After all, 15% is 15 out of 100, or just 1.5 out of ten. If you were to gather 100 lottery tickets, each with a 15% chance of winning, it doesn't mean the spread of winners would be uniform over each, say, 10 ticket sample. In some, you might get 4 or 5 winners. In others, none. This is why we generally regard the percentages as fickle.

In time, however, the chap who collects the most high probability lottery tickets is probably going to win the most. And the greater degree to which he collects lottery tickets, the greater chance he has of winning. Of course, outside of getting all the tickets (versus none) there's always the slim chance he'll lose. 

The possession versus percentages issue is basically one of skill versus variance or luck. A few commenters and analysts hold out a belief that the percentages can be driven in absense or even contrast to possession, but most analyses agree that NHL players and teams have far more ability to drive possession, whereas the percentages are more or less at the mercy of the hockey gods.

Most advanced stats analysis is focused around this general theory currently. The bulk of on-going inquiries is aimed at teasing apart the variables that moderate possession at both the individual and team level. Some of these factors include quality of line mates, quality of opposition and starting position. The overarching goal is to isolate individual contributions to possession, be it from the players themselves, or coaching systems, face-off zones, playing-to-score effect, the nature of different positions, etc.

These are the multitude of variables that go into determining a player or teams surface stats: understood and viewed through the prism of possession analysis, we are just now starting to appreciate the influence all of the varying factors have on corsi (and therefore on scoring and goal differential), how variance sullies the waters, and the resultant perceptions of his abilities.

In truth, we have only made a few cautious steps forward in terms of truly extricating a players true skill level quantitatively from all the competing noise. But I'd argue we're on the right track.

Practical Applications

Obviously, prediction of the future is the ultimate goal of such analysis. The practical application of possession theory has allowed myself and others to predict a number of outcomes over the last few years, including:

- The fact that Olli Jokinen wasn't the #1 center the Flames were hoping for

- The Colorado Avalanches Cinderella season being an aberration

- The Panthers decision to not trade Jay Bouwmeester at the deadline would be a poor one

 These are only a few examples. I invite others to share more in the comments.

Of course, there have been errors too. As mentioned, hockey quantitative analysis is only now getting beyond its infancy. Further complicating matters is that fact there are always variables and events that can't be foreseen: injuries, the effect of unknown or unpredicted additions/subtractions to the roster, locker room discord, etc. The future can never be fully known. That's what makes the whole thing worth watching after all.

The number of correct "hits" has improved and the body of evidence surrounding this school of thought grows larger all the time, however. To dismiss it out of hand would be folly I think. This discussion of the topic wasn't exhaustive, but I hope it shed some light on the broader issues at hand for those who were confused or unsure.

Former Nations Overlord. Current FN contributor and curmudgeon For questions, complaints, criticisms, etc contact Kent @ kent.wilson@gmail. Follow him on Twitter here.