Patrick D. (SnarkSD)
January 10 2013 04:15PM
This stastistical study of the percentages (aka: PDO) came to us from Shark Fan and writer SnarkSD. It is an in-depth, thorough investigation, so be sure to settle in for a long, but interesting, read. We will try to answer any questions in the comments for anyone not well versed in statistical theory.
Objective: Define the standard deviation in all strength non-shootout, and even-strength excluding empty net PDO (the sum of shooting percentage and save percentage). Separating the variance into that accounted for by chance, and that accounted for by talent. Investigate variables that may influence PDO. Calculate Points-Per-Game (aka, Expected Points, or EP) for a league average team, given a non-one PDO.
Methods: Data from the 2005-2006 through 2011-2012 season was extracted from NHL.com game-logs and imported into excel and STATA to generate linear and logistic models for all strength data. Data was imported from timeonice.com for even strength excluding empty net data. A program in excel was created to randomly select games, generating thousands of iterations for each regression model, with the average of these models shown below. To generate chance data, the normal approximation interval for binomial proportion confidence intervals was used.
Results: The standard deviations, correlations, and standard error of the model over different samples of games was generated and depicted in table 2. Eg. 1 SD at 30 games for all teams (N=210) is 0.0174, with 1 SD for chance being 0.0140. Figure 2 demonstrates Table 2 over a set of 82 games. Teams that fall outside the red line represent ±3 SD from chance, and are therefore highly likely to be suffering from a low or high PDO from poor or exceptional performance, respectively. League average points-per-game (EP) for a single game is shown in Figure 3. Figure 4 demonstrates EP for multiple game samples.
Conclusion: PDO has boundaries of normal variance, both in observed variance, and variance strictly by chance. With the data below one can measure the expected performance in points-per-game for a given PDO. A deviation from the expected results (ie. points-per-game) indicates a deviation in performance outside of PDO, likely the result of shootout and OT performance if not corrected for, followed by factors that influence PDO intra-game, shot%, home/road, and competition.
Since its inception PDO has gained considerable attention at its descriptive ability and poor repeatability. Simply, PDO is a player's or team's shooting percentage plus save percentage (goals for/ shots for + [1- goals against / shots against]). Over the course of a typical NHL hockey game, the team with the higher all strength PDO is nearly guaranteed the victory.
So you might assume that general managers would be insane not to maximize PDO, but as you will see, PDO is very flighty. Much like BABIP to baseball sabermetricians, a large deviation from PDO’s mean of 1, for either player or team, is a red flag. It is understood that this large deviation is unsustainable, and will regress to the mean. It is expected that teams are likely to win or lose far more or less games than previously. However, at least to my knowledge no one has clearly demonstrated how large a deviation from the mean is to be expected, over specific samples of games.
I recently concluded a similar scenario had played out with the Sharks in their recent 2011-2012 season, where a mid-season collapse almost left them out of the playoff picture. Despite attributing much of their failure to poor PDO, and therefore luck, I was left with many questions about PDO, specifically;
1. How do we know if a team’s PDO is lucky/unlucky vs. something their actually doing wrong? Is it right to assume that all deviations from 1 are unlucky?
2. Could other factors influence PDO, like home vs. road, power-play opportunities, empty net effects, or level of competition?
And lastly, somewhat a reverse of the above;
3. How would an average team perform; given a PDO that isn’t 1?
At the end of the article, I provided some examples of how to use the information provided in to interpret PDO over a given set of games.
Data from the 2005-2006 through 2011-2012 season was extracted from NHL.com game-logs, into excel, including all 17,220 inter-lockout games to ensure the best results possible. I also extracted even strength, empty net data from TOI for the past 5 years, to look at that data set briefly, although this was not a complete dataset (a few missing games per season, usually as a result of incomplete NHL RTSS feeds).
I first generated several models of PDO auto-regression for discrete samples of random games. I randomized the sequence of games for each team. For each team (N=30 teams x 7 years = 210) in the data set, 2 values were generated.
The first value is the PDO of the sample of games of interest (ie. 1, 5, 10, 20 and so on), and the second is the PDO for the remaining games in that randomized season. My choice to analyze the data with the second value as “games remaining” instead of another random set of equal number of games revolves around the idea of what we are trying to predict. Often when a team starts the season with a high (or low) PDO, we want to predict their true PDO (that is, their PDO at the end of the season) and not necessarily what we think their PDO over the next 10 games will be.
I used these 2 arrays to generate a correlation, standard deviation, and standard errors for our auto-regression models. I then ran these models 500 to 1000 times (to account for the randomization) and took the median for each value. These values give us information about how real teams perform in terms of PDO over specific samples of games.
Next, I created what I call the "random models". Using the normal approximation interval for calculating binomial proportion confidence intervals, we can calculate the expected standard deviation for any proportion, (ie. a discrete, independent event that results in either a success or failure) over a sample of shots, occurring strictly by chance, or the distribution we would get if we were rolling dice.
Where "p" is the Sh%, and "n" is the number of observations. In our case I used average number of shots per game multiplied by the sample of games, eg. 29.9 x 5 = 149.5 events for a 5 game sample.
These values represent the absolute random variation one might expect for PDO. A comparison of the “real” models and “random” models allows us to calculate the “talent” portion of PDO over a given sample of games. This essentially answers our first question "how do we know if a team's PDO is lucky/unlucky?"
Next I ran a regression of the factors that can reasonably influence a single game’s PDO. The variables initially selected for review (with their correlation to PDO) can be found in table 1. I then choose 4 variables, home/road, power-play differential (PP opportunities – PK opportunities, although PP time – PK time would have been better but not available), shot% (all shots for – all shots against), and PtDiff (year-end standings points – year end standings points of the opponent, excluding the points obtained in that particular game). This gives us the factors that influence single game PDO, answering our second question, "could other factors influence PDO?"
I then analyzed single game outcomes. Our dependent variables are discrete 0, 1, or 2 contingent on the game outcome (standings points), whether we were looking at games ending in regulation, OT, or shootout. I used the advanced statistical software package STATA to generate ordered logistic and logistic regression models for various game states and outcomes to arrive at predictive models. We looked at 2 variables that have by far the biggest impact on single game outcomes, PDO, and Shot%.
Next, I calculated the average points obtained, and the PDO for a given random sample of games (for each team), and ran a simple linear regression model for 9 different samples of games with points as the dependent variable, and PDO as the single independent variable. This provides an answer for our last question, "how would an average team perform with a non-1 PDO?"
1.) PDO Variance Due to Random Fluctuation vs. Fluctuation Due to Difference in Talent
|All-Strength||1 -> 81||5 -> 77||10 -> 72||20 -> 62||30 ->52||41 -> 41||50->32||60->22||70->12||75->6||81->1|
|EV excluding EN||1 -> 81||5 -> 77||10 -> 72||20 -> 62||30 ->52||41 -> 41||50->32||60->22||70->12||75->6||81->1|
The first graph is a good way to visualize table 2. You can see that as the number of games increases, the collective standard error (SE) of the mean decreases, that is, we could argue that a team’s PDO tightens closer to their true mean (1 for a league average team) as teams play more games. This has often been expressed as regression to the mean; but what’s being shown is how much regression over a given sample of games. As you can see in the table, the standard error parallels the actual standard deviation almost precisely, and the correlations remain very low suggesting heavy regression to the mean.
If we use Tom Tango’s method for calculating regression, we can approximate the proportion of variance in PDO attributed to talent (by subtracting our randomly generated variance from our observed variance).
For those curious, Tango’s constant comes out to be around 70 games for all strength data, which is the number of games a team would need to play for the talent portion of PDO is equal to the random portion. Over an 82 game season both the “talent” and “random” parts of PDO accounts for about ± 6 points. It turns out that even-strength, excluding empty net data is influenced much more by randomness. Tango’s constant comes out to 293, the number of games required before SD due to talent is equal to SD due to randomness, ie. r = 0.5. At the end of the season, the SD from the “random” portion is ± 5 points, and the “talent” ± 4 points.
The graph above may be the most important and most useful graph of everything presented in this study. The graph represents the random variation expected over a sample of games for PDO (See Methods above for model details) indicating 3 standard deviations in red. The gray area represents greater than 2 SD. This is the “danger” area in which a PDO would be unlikely to be strictly due to chance, although still possible. Essentially the graph represents the the amount of “give” we might suspect for a team’s PDO (y-axis) over a sample of games (x-axis).
To give an example, at 20 games we expect a PDO of 1 (for a league average team). If we asked this team to play 20 games, 100 times (for a total of 2000 games), we expect their PDO to fall roughly between 0.98 and 1.02 or ± 0.02 if it were due strictly to random fluctuation as predicted by our “random” model. This is 2 standard deviations (or about the 5th and 95th percentile) from the mean. A deviation from this (<0.98 or >1.02) would lead us to believe that their PDO is likely not the result of random deviation. League average teams do go through stretches of rough games (home/road trips, competition, injuries, etc.) which accounts for the wider SD in our observed data.  The goal of this study is not to address all of the controllable factors, just to hint a line in the sand.
So, as you see, this graph is very valuable for analyzing a team’s PDO. We can use the lines as a test of statistical significance. Any team with a PDO that differs from 1.0 and enters into the gray zone, (eg. a team with a PDO greater than 1.048 or less than 0.952 at 10 games and so on) would lead us to believe that their PDO is a result of something that team is doing, and probably not normal variation. However, we would expect 1 in 20 teams to be in the gray area over any sample of games; or about 1.5 teams in the NHL. A real world analogy would be that if you were to go out and buy a lottery ticket, your chance of winning is random, and very unlikely. However someone always wins the lottery but it’s not because they have some talent for picking the right lottery numbers; it happened by chance despite incredibly long odds against.
The more stringent red line represents 3 SD, which leads us to believe that a given team’s PDO is very likely to be due to poor or great play; as only 1 in 500 teams are outside of these values (>1.07 or <0.92 for 10 games) by random fluctuation alone.
2.) Factors That Influence PDO
|Adjusted R Square||0.116433766|
Next we analyzed the factors that may contribute to PDO. In total, they do amount to a statistically significant difference in PDO (R^2 = 0.12), but are greatly outweighed by the inherent single game variability of PDO.
The largest drivers are not unexpected, Shot% representing the bulk of movement, power-play differential, and home/road. The Shot% has a negative slope, which brings score effects most to mind. Likewise, a power-play significantly influences PDO. Even while holding home/road and shot% constant, the PDO increases roughly 0.0025 per additional power-play. A team with a significant advantage, eg. (2SDs=) 5 additional power plays boost their PDO by 0.0125, or effectively 0.2 standings-points (EP) per game, put another way, increases a team’s chance of winning 2 points by slightly over 10%. (As a corollary, the standings-points value [also termed Expected Points] of an early first period goal is about 0.3)
The PDO value of being home is a bit interesting. If we assume teams at home generate more shots (average of 52% in this dataset), the PDO value of being at home, independent of Shot% is 0.0015. The unadjusted home/road model (not shown) puts the home team at about a 0.004 PDO advantage.
When filtering out empty net goals (not shown above) the effect of PtDiff falls to 0.0001, so that teams that are 40 points apart (2 SD away from each other) would represent an increase of only 0.004. These aren’t huge swings in PDO, when one takes into account the single game variability of PDO (2SE) being ± 0.18. Not analyzed here is shot distance, which was not assessable in the dataset I was using. It’s possible that it may contribute to PDO, and needs further studying.
3. A League Average Team’s Expected Points by PDO Performance
|Ordered logistic regression Number of obs = 17220|
|LR chi2(4) = 23507.38|
|Prob > chi2 = 0.0000|
|Log likelihood = -4865.1364 Pseudo R2 = 0.7073|
|TeamPts | Coef. Std. Err. z P>|z| [95% Conf. Interval]|
|TeamPDO | 94.75349 1.451662 65.27 0.000 91.90828 97.59869|
|TeamShot | 32.06602 .6162182 52.04 0.000 30.85826 33.27379|
|PtDIff | .0067885 .0014487 4.69 0.000 .003949 .0096279|
|PPdiff | .0382537 .0127292 3.01 0.003 .0133048 .0632026|
|/cut1 | 108.1876 1.670732 104.913 111.4622|
|/cut2 | 110.8128 1.704624 107.4718 114.1538|
|Note: 1022 observations completely determined. Standard errors questionable.|
Lastly, we look at the impact PDO has on game outcomes. Our logistic regression gives us an idea of the effect PDO has on the outcome of single games. Table 4 is the output generated from STATA that gives us the proportions we are interested in. The graphs below are a much easier visual interpretation of that data. As is obvious, PDO has a dramatic effect on the likelihood of a team winning. As a corollary if we know that the correlation between the PDO of 2 games, (even if it involves the exact same teams at the same location) is essentially 0, then it’s safe to assume that on any given night, any team’s PDO is likely to be league average, with a normal distribution given a SD of 0.092. As a consequence the largely unpredictable (technically regresses 96% to the mean) PDO will completely determine the outcome of the game. While Shot% can have an impact on the game, it’s really just a modifier, shifting the PDO curve to the left and right. I also ran a logistic regression of games ending in regulation, which gives us a much tighter pattern as a result of filtering out the randomness of shootouts and overtime wins.
|All Strength SE||0.27||0.21||0.16||0.14||0.13||0.13||0.12||0.12||0.12|
|EV excluding EN Coeff||6.10||6.19||6.31||6.35||6.36||6.48||6.52||6.43||6.93|
|EV excluding EN Const||-4.99||-5.08||-5.20||-5.23||-5.25||-5.37||-5.40||-5.31||-5.83|
|EV excluding EN SE||0.33||0.24||0.18||0.16||0.15||0.14||0.13||0.13||0.12|
Finally, we take a look at our simple linear regression of Pts vs. PDO for multiple game samples. Table 5 gives the equation over different subsets of randomly ordered games. Clearly the data is uniform, and is virtually the same whether looking at a 5-game sample, or 82-game sample. The only change is in the standard error; which is the error estimate for the model. It’s no surprise that as we add games, our “average” error (average amount of points we’re off between the prediction from the model and the actual data) decreases.
I hope that this exhaustive look into PDO provides some more evidence for its substantial impact on the outcome of not just individual games, but also larger sets of games. I’d be hesitant to refer to PDO as entirely random, as I believe there are currently documented and undocumented factors that likely contribute to PDO, despite its heavy regression. In all, one would be wise to treat PDO not as a simple black and white stat, or just call it “luck.” What’s evident from the data above is that PDO does have boundaries of normal variance, which should guide anyone interested in analyzing a team’s PDO. With the data above and given a PDO, one can measure the expected performance. A deviation from which could signal true differences in talent.
PDO regresses heavily to the mean with a sharp decline in the standard deviation as the number of games increases. Table 2 and Figure 2 provide the boundaries of variance due to luck over a given amount of games. The percent regression to the mean over the remaining games in the season, for a given set of games, can be calculated by taking 1-r in table 2.
The factors that may influence PDO were home/road, PPdiff (power-plays for minus power-plays against), PtDiff (year-end points – year end opponent points, excluding points accrued from that match), Shot% (Shots For – Shots Against) were mostly insignificant, but were statically measureable, in total explaining 12% of the change in PDO.
o Shot% had the biggest impact; on average, an extra shot is associated with a 0.0038 decrease in PDO, (or 0.005 decrease in EV PDO, if using EV, non-empty net shots) o The value of being home, independent of the gain in shot% (roughly, 2% or 2 extra shots) is 0.0014 o An increase in the Power-play differential by 1 increased PDO 0.0025. o An increase in the difference in year-end standings points by 1 increased PDO 0.0006.
• Lastly, we can calculate how an average team would perform, given a specific PDO using table 5 (see below). This gives us an understanding of how a team is performing independent of their PDO.
talent and random portions of 1 SD of PDO both account for approximately ± 6 points at the end of the season. • At about 70 games, the talent portion of PDO equals the random portion.
We can extrapolate from the above data to generate some interesting figures presented in the table below. We will use 3 notable examples from years past, in which early season rises in PDO were unsustainable, the 2009-2010 Colorado Avalanche, 2010-2011 Dallas Stars, and 2011-2012 Minnesota Wild, and Bruce Boudreau’s 2011-2012 Washington Capitals.
The fastest way to the data is through timeonice for even-strength data, and [team].nhl.com/club/gamelog.htm, where [team] is the name of the team of interest. After calculating Sh%, Sv% and PDO, we can generate specific numbers of interest, or simply compare results to the graphs above. Here we will calculate the numbers...
Let’s first define some equations that will be useful:
Binomial equation: Var(chance) = p(1-p)/n, where var(chance) is the variance expected from a totally random model (eg. flipping coins). P is the average proportion, and n is the number of observations. If we take the square root of variance, we get the standard deviation. This is an incredibly useful tool for generating the standard deviation due to chance of any event that results in a success or failure, ie. goal-per-shots.
I think it also can be somewhat misleading. Although we use this model to estimate the proportion of variance due to luck, it’s important to remember that in actuality, it’s an estimate of variables to infrequent or non-quantifiable that are unrepeatable, and not simply “luck.”
Var(talent) = Var(observed) – Var(chance) Intuitively, if we subtract the variance due to random chance from an observed variance, we are left with some variance, that is not strictly from chance. We assume this number to be the variance accounted for by talent. If we take the square root of Var(talent) we get the standard deviation for talent.
Again “talent” being a catch-all term for variables that are repeatable. % regression to the mean (1-r) = Var(chance)/Var(observed). To calculate the %regression to the mean for any variable, we can divide the observed variance from the variance due to chance. r is simply Var(talent)/Var(observed)
Now, for each team we will calculate the standard deviation of PDO due to chance, SD(PDO_Chance). At this level we can calculate the SD(PDO_Chance) for the exact total number of shots a team was on the ice for, giving us a better estimate.
In excel; =SQRT((.095*(1-.095)/[total shots for])+(.095*(1-.095)/[total shots against]))
Eg. for COL = SQRT((.095*(1-.095)/729)+(.095*(1-.095)/917)) = 0.015
0.095 can be substituted for any sh%; here we are using league average all-strength shooting percentage.
Next, we can generate a z-score for each team. The z-score is a comparison of standard deviations, often used to standardize variables with unrelated units. Here we use it to compare a team’s PDO distance from 1, with that expected by chance. The higher the number the less likely a team’s PDO is due to chance.
In excel, = (PDO-1) / SD(PDO_Chance)
Eg. for DAL =abs(1-1.017)/0.016 = 1.049. This tells us that Dallas' PDO was 1.049 standard deviations from the mean, if the distribution of PDO is entirely due to chance. 1.05 is essentially the 85th percentile. Often in research, we don’t accept values as statistically significant until it reaches more than the 95th percentile from chance (ie. 2 standard deviations), and in some areas, even more so. Here we can see that DAL is only 1 SD, well within the realm of an elevated PDO due to chance. Figure 2 above uses 3 SD (99th percentile), and thus we would need a z-score greater than 3 to be above the red line, representing a statistically significant deviation greater than expected from chance alone.
Lastly we calculate expected points for a given PDO. A deviation from this represents the points accrued independent of PDO performance. Unfortunately the NHL awards bonus points for the coin flip that is the shootout. This drastically skews the points accrued by teams over small samples of games. One way to deal with this is to throw out games not ending in regulation. Alternatively, if you want a larger sample size, then converting SO or OT games to 1.5 points, instead of 2 or 1 could even out some of the luck associated with the extra periods.
In the column “Corrected Points” I changed any points gained from OT or SO to 1.5, exactly the expected average. Over short samples, it’s likely that teams will benefit/suffer from lucky/unlucky SO and OT streaks, altering their point totals quite a bit.
Lastly, we calculate expected points for a given PDO. We are trying to see if a team out-performed, or underperformed a given PDO. The model is not absolutely perfect, but pretty good over shorter samples of games. Eg. for 20 games, the standard error is ± 3 points.
Let’s take the 2011-2012 Capitals prior to Boudreau’s firing. To calculate expected points given their PDO, we travel over to the 20 column and plug in the coefficient, PDO, and constant into our linear equation.
In excel, =(Coeff*PDO+Const) *number of games played = 8.92*0.992-7.80 = 1.05*22 = 23 points
The conclusion we can draw from the above is that the Washington Capitals actually outperformed their PDO of 0.992, collecting 25 points over that span. Correcting for SO/OT, their 23 points is completely league average. And just for fun, we see their PDO z-score of 0.48 is well below 2, telling us that their PDO was likely variation outside of skill. Given their performance in the context of their PDO, and their Fenwick% at that time of the year (0.525), the Capitals despite what looked ugly, were still a formidable team.
 As many of you are likely aware, the expected number of teams that will be above and below these cutoffs in a single season vary by the games in the sample. For example, in a given season there will be (82 games in a season/ 5 games = 16.4 “5-game samples” x 30teams) = 492 “5-game samples” total, which means on average there will be (492/20=) 25 teams that over a 5 game sample will fall below the yellow line by random fluctuation (variance) alone, and 1 team that falls below the red line due to random fluctuation alone.
2] Running a binomial equation to generate the “random” SD, and thus everything seems reasonable here, but what I found was the binomial equation did not fit the “simulation random” model (a alternative to the binomial equation), and furthermore, as compared to the correlations, the SD of the “simulated random” model fit the data much better. My theory is that because Sh% (0.095) and Sv% (0.905) are close to the limits of a binomial (1,0) distribution, the binomial equation won’t work as well, as if they had been closer to 0.5. Regardless, because we are trying to define true random variation, we use the binomial equation for our “random” model.