December 03 2012 12:57PM
Dealing with sum data should inevitably make a fancy stats person a bit uneasy; sums perpetually have a wealth of additional factors a person needs to know before they try to conclude anything. For instance, let's say your team's prospect is Lukas Sutter, and you want to know what the hell is going on with his point totals (aka, sum data). Well, Sutter's ice time is suffering right now, and the penalties he's taking aren't helping him get out of the doghouse. Hence, low points. Let's also say I'm predicting that, had the season happened, Evander Kane was going to score 50 goals...would he be taking enough shots? Will he play enough games? Will he receive enough ice time? Et cetera, et cetera...
Well, one of my previous posts pointed out two things we know about team 5v5 time on-ice: a) it can be volatile and independent of team talent, and b) it has gradually increased over the years. This throws a little bit of a wrench into using raw 5v5 TOI/60 to look at player quality, although that wrench can pretty easily removed.
My thinking is that you could control for those two elements of volatility by taking a player's 5v5 TOI and divide it by the team total 5v5 TOI in the games the player played. Whenever you do something like that, you want to make sure that you're actually improving predictability and either easing access to or learning new information, otherwise there's no point in creating the new metric. So, how did creating 5v5% work for me?
The first thing I wanted to look at was straight-up correlation, year-to-year. If it's not performing any better than TOI/60, then there's not really a point in adjusting it.
Taking all NHLers who played at least 200+ 5v5 minutes two years in a row within the years 2007-08 through 2011-12, I found a correlation of 0.86 between the two years' 5v5% (n = 2,231), a slight improvement over TOI/60 (0.85). So that's nice, but not good enough.
I subsequently ran a number of further tests of different data, including correlation between every other year, correlation between three years' TOI to future TOI, correlation between and career to-date TOI to future TOI. Each time the correlation was a bit higher for 5v5%, anywhere from 0.01 to 0.05. Each test was confirming my suspicion, that this might be a worthwhile adjustment.
But there's more than just testing that causes me to like 5v5%, because when you control a measure to things that are random like the fluctuations of team 5v5 TOI and era effects, you can be safer in your cross-team and cross-era comparisons. You can also run a number of more interesting studies, such as "hmm, how did this trade effect x player's playing time?" or "hmm, would x player be getter a larger slice of the pie on a different team?" Rather than needing to check if a change in 5v5 TOI/60 is beyond the normal team fluctuations in the measure, you're already controlled for that.
I also wanted to look at predictability. How much time does a player need to log before I become increasingly positive it can be predictive? So, I plotted a rolling average (n = 50) of correlations, previous 5v5% to following 5v5%, and sorted by to-date 5v5 TOI.
Good deal, it's behaving like we should expect. About two to three years' data is sufficient to predict 5v5 percentage. Notice the drop-off after 4,000 even-strength minutes, where a host of things including shifting player personnel, age effects, and player development can eventually alter your predictions. This is always important to remember - you want the most recent years, typically, for predictions, not career data.
I can already hear the critiques, concerning those player personnel shifts; of course, if you were using 5v5 TOI/60 in the first place, you'd still have that issue...in fact, 5v5% makes it a bit easier to measure those effects. Additionally, if you are detecting stability in a players 5v5% despite many shifts in player personnel, you can wager that person is being pretty consistently assessed by it (and with a 0.86 correlation, that should often be the case).
The whole reason I came across this was because I was looking for something that might improve how I was predicting player TOI for the fantasy predictions series. Yet 5v5%, I think, has applicability well beyond that and provides the opportunity for a lot of interesting studies.
What I also found interesting is that, at least for 5v5%, it sits around the league average pretty heavily (about 27.5% for the entire player population; roughly 33% for defensemen and roughly 25% for forwards). Here is a rolling average chart of 5v5% versus age, using player performances of 200 5v5 TOI and up (n = 3,335):
The standard deviation trendline is almost identical to the above in form and even value, which really gives you a sense, along with the strong correlation, that this is a pretty rigid deployment indicator. Now, if you take a look at the percentage change of 5v5% in individual cases as you move from one age to the next, you see a nice development curve:
The solid line is the change from the previous season to the current season, and the dotted is the change from the current to the next season. The x axis is the age during the previous season.
So, enough with the scholarly frou frou, let's put some faces to these numbers. One thing 5v5% doesn't resolve is the disparity between defensemen and forwards, but this could be a critical distinction that we don't recognize often enough: forwards score more points, but defensemen are individually more involved in team possession. Anyway...your top 10 5v5% active defencemen and forwards logging 100+ GP over the last five years:
- Duncan Keith - CHI - 41.3%
- Jay Bouwmeester - CGY - 39.9%
- Joni Pitkanen - CAR - 39.5%
- Ryan McDonagh - NYR - 39.1%
- Francois Beauchemin - ANA/TOR - 39%
- Dan Boyle - SJS - 38.9%
- Brian Campbell - FLA/CHI - 38.804%
- Zdeno Chara - BOS - 38.802%
- Dion Phaneuf - TOR - 38.7%
- Willie Mitchell - LAK/VAN - 38.1%
- Jarome Iginla - CGY - 34.7%
- Martin St. Louis - TBL - 34.33%
- Alex Ovechkin - WSH - 34.29%
- Ryan Getzlaf - ANA - 34.2%
- Ilya Kovalchuk - NJD - 34.1%
- Corey Perry - ANA - 33.6%
- Dany Heatley - MIN - 33.4%
- Sidney Crosby - PIT - 32.733%
- Evgeni Malkin - PIT - 32.730%
- Eric Staal - CAR - 32.67%
And, because they can be interesting to look at, too, our bottom 10 defensemen and forwards (okay, some of these guys are semi-"active"...in which case I'll put most recent team):
- Alec Martinez - LAK - 26.8%
- Adam McQuaid - BOS - 26.7%
- Cody Franson - TOR - 26.6%
- Keaton Ellerby - FLA - 26.1%
- Jordan Hendry - ANA - 25.2%
- Matt Smaby - ANA - 25.1%
- Derek Joslin - VAN - 23.7%
- Peter Harrold - NJD - 23.5%
- Derek Meech - WPG - 21.9%
- Doug Janik - DET - 21.3%
- Krys Barch - NJD - 14%
- Raitis Ivanans - CGY - 13.9%
- Brad Staubitz - ANA - 13.6%
- George Parros - FLA - 13.4%
- Paul Bissonnette - PHX - 11.4%
- Darcy Hordichuk - EDM - 11.39%
- Cam Janssen - NJD - 10.6%
- Eric Godard - DAL - 9.8%
- David Koci - COL - 9.1%
- Brian McGrattan - NSH - 8.3%
Probably not a lot of surprising names there, especially among the forwards. In the future, I'm going to look at this same kind of metric in regards to 5v4 and 4v5 times, as these will tell us different but nevertheless valuable things as well.
As usual, my research is indebted to Gabe Desjardins and his invaluable website, behindthenet.ca.
PREVIOUSLY BY BENJAMIN WENDORF