# Machine Learning and Hockey Predictions: Part II

**Josh W**

May 28 2013 12:57PM

## Background

Last time I tried to use Machine Learning to create a simple classifier that can predict which of two teams is more likely to win a hockey game. Machine Learning is a class of artificial intelligence that can take a large amount of data, learn from it and then make future decisions. A simple description of it is that the algorithms it uses (such as Neural Networks, Decision Trees, and Support Vector Machines) act like a black box, you feed it in data, it learns from it, and then it can make decisions on new and future data.

Our project last time was using 386 NHL games over ten weeks in this season and included both traditional and advanced statistics such as Goals For and Against, Power Player %, Power Kill %, Conference standing, win streak, Fenwick Close %, PDO and 5-on-5 Goals For/Against Ratio. Feeding them into a number of algorithms the best result came out to ~60%. This is not bad, with betting we could probably make a profit, but we want to improve on it.

Since the last time I've updated the data to include all of the games I collected in the regular season, there is now a total of 517 games which is about 3/4 of the shortened season. The goal is still to try and increase our classifiers prediction rate. While many of these advanced statistics have been shown to work well at predicting the long term success of a team we are looking at single game success which is much more difficult as 38% of the standing results are the cause of luck.

## Experiment Description

PDO is a very interesting performance metric here in the NHL. It is the simple addition of Shooting % and Save %. In the long run they will regress to 100% for all teams, but in the short term we can see who is playing better or worse that the normal variance. There's a great post on PDO here on NHLNumbers which goes much more in depth and I would recommend everyone interested to read it.

PDO is often used with Fenwick to predict which teams will fall into or out of playoff contention mid season. In the 2011-2012 season it was easy to see that Minnesota, who had a great start to the year, had a high PDO but low posession stats and the internet analysts were predicting their downfall despite what the mainstream media analysts were saying. Ultimately their downfall was a true prediction and they didn't make the playoffs. This year in the 2012-2013 season, if there had been a full 82 games team such as Anahiem (with 48% Fenwick Close) and Toronto (44% Fenwick Close) , both with a PDO of 102%, would have ultimately not made it. Likewise, if it were an 82 game, New Jersey with a 55% Fenwick and a 97% PDO would have made it (similar to last years LA Kings).

So PDO can be used in the long run to make predictions, but using it in our initial classifier we only got an accuracy of 60%. I had used the entire seasons PDO, and as games went along most teams would have been regressed to near similar values. It is possible that if we use a shorter game period, such as the PDO over the last 1,3,5, 10, or 25 games, we might see a change in the accuracy.

Using the same algorithms as last time: Neural Networks (NN), Decision Trees (J48), Support Vector machines (SVM) and Naive Bayes, (NB), using the full 517 game data set and the same mixed features of traditional and advanced statistics, except for the modified PDO, we will plug in the values and see if we get a different classifier accuracy.

## Results

PDO1 | PDO3 | PDO5 | PDO10 | PDO25 | PDOAll | |

Baseline | 49.71% | 49.71% | 49.71% | 49.71% | 49.71% | 49.71% |

SVM | 58.61% | 58.61% | 58.61% | 58.61% | 58.61% | 58.61% |

NB | 56.38% | 56.96% | 56.38% | 56.58% | 56.58% | 55.51% |

J48 | 54.93% | 55.71% | 55.42% | 55.90% | 55.61% | 55.51% |

NN | 57.64% | 56.67% | 58.03% | 58.03% | 57.74% | 58.41% |

## Conclusion

Well that is interesting, the results have not really changed by shortening the PDO length. Unless we use a Friendman test I can't say for sure if they are statistically different. The best results is still using the Neural Networks on the "mixed" (or PDOAll) data set, and with some tuning I get 59.38% accuracy using ten-fold cross-validation. If I train on only 66% of the data and test on 33% I get an accuracy of the brand new data at 55%. Based on this data, I do not see shortening the PDO length adding any value to the results of the prediction accuracy.

I would think to bet on that over a long period would make a profit (but that is a whole other project), there must be ways to increase this accuracy (and I will try and do that in future posts). Looking at similar work in other sports (Football, Soccer, and Basketball) they are able to acheive accuracies in the 70s. But given the low event, continuous nature of hockey, as well as the large results of luck in the outcome of a game it makes it much more difficult in hockey to predict the winner of a game.

What is interesting is I can use the Weka function CfsSubsetEval which tells me which features are contributing the most to the accuracy of the classifier. I am surprised to see it is: Home/Away location, Goals Against and Goal differential. These are not advanced statistics, they are the traditional statistics that are making the biggest difference on predicting of winning a single game. It should be reiterated that this is NOT me disproving the use of advanced statistics such as Fenwick Close but rather saying in predicting in the short term of a single game there is still value in these traditional statistics.

This will be needed to be run again on next years data to ensure the results are consistant. If you want access to my data set for your projects let me know.

Numbers from Behindthenet

## Related Posts

**Comments are closed for this article.**

Jared LunsfordMay 29 2013, 10:27AM

"In the long run [PDO] will regress to 100% for all teams"

This isn't true, as Snark's very good article makes clear.

RexLibrisMay 29 2013, 08:41PM

I'm working on something right now in evaluating advanced stats and scanning for potential inherent biases. I think I will have to cite this series.

Excellent work here and I'm intrigued by the difficulty the machines are having in anticipating a correct outcome. It certainly speaks to the more chaotic nature of the game, although I do fear that somehow this will find its way into the statistical luddite camp as a way of "proving" that advanced analytics are the product of witchcraft and sleight of hand.

Mark TinordiMay 30 2013, 09:15PM

For what it's worth, the theoretical upper limit for correctly predicting the winner of individual games in the NHL is about 58%. This follows from the fact that in a typical NHL game, the favorite will have a win probability of 0.58. We know this because we can ascertain the spread in talent between teams generally with respect to their theoretical winning percentage.

Needless to day, this only applies to the post-lockout era. The figure will be larger for past eras, for which the spread in talent between teams was greater.

Josh WMay 31 2013, 08:19AM

@RexLibris - I agree, that's why I try and stress that I am only looking at a single game, and that's difficult. That doesn't mean you can't predict long term success. Let me know what you're doing, I'd be interested to see your project.

@Mark Tinordi - I do agree that I think the theoretically prediction limit is not high, especially compared to other sports. I am not sure how you get the 58% though. I've calculated the s.d of wins over the last 7 seasons is 0.09346346 and the s.d of actual wins over the last seven seasons is 0.0208696

Mark TinordiMay 31 2013, 10:23AM

1. Since the lockout, the average STDEV in winning percentage, among teams, is 0.088.

2. I can't remember what the breakdown is in terms of the skill/luck split among teams with respect to winning percentage over 82 games. But I remember that Phil Birnbaum calculated that you can ascertain a team's true talent winning percentage by determining what their winning percentage would be by "adding" 37 games of 0.500 hockey. So if a team went 82-0, and had an actual winning percentage of 1, their true talent winning percentage would be (100.5/119), which equals 0.844. That means that the regression between actual winning percentage and true winning percentage, after an 82 game sample, is about 0.68. Which in turn gives us our skill value with respect to the skill/luck split after 82 games.

3. If you multiply 0.088 by 0.68, you get 0.06. Thus, since the lockout, the standard deviation in true talent winning percentage, among teams, is 0.06.

4. In a league where the standard deviation in true talent winning percentage is 0.06, and home advantage is worth roughly 0.05, in an average single-game matchup, the favorite will have an expected wining percentage of 0.58.

5. If you could identify the favorite with certainty every single time, you would bet on the favorite. Because the favorite has an expected winning percentage of 0.58, the favorite will win 58% of the games, over time. So the theoretical upper limit for correctly predicting the results of single games is 58%.

RexLibrisJune 01 2013, 12:41PM

@Josh WI'm in the middle of an interesting book related to this topic. Big Data by Viktor Mayer-Schonberger and Kenneth Cukier. It has some pop psych angles, but is overall a good read and has some perspectives that may be pertinent to your approach.

Thanks, I hope to have something finalized by the end of July (I've got a few long-term articles on the go right now).

Josh WJune 04 2013, 07:41AM

@Mark Tinordi Do you have an email so we can discuss this further?