Machine Learning and Hockey: Is there a theoretical limit on predictions?

Josh W
August 01 2013 09:39AM

Graph from Arctic Ice Hockey 

Intro

In my previous work I was working on predictions in hockey, at the micro level of a single game, to determine who will win and who will lose. I was using Machine Learning techniques (or algorithmic modeling) and one thing I've noticed from my work, and from reading papers on other peoples material, is that for every sport there seems to be a prediction limit (that is different for every sport). This is what I want to explore; how much of the results that we see are based on luck, and is there a theoretical prediction limit in the NHL? 

In this post I will be focusing on the first question, what part of the standings is made up by luck?

Background

This subject is not brand new, it's been touched on a few times on the internet and sports analytics community but I have not yet seen it in academic sources. I need to do the math myself in order to bring it to the academic community, show it is possible to calculate, rather than just assume the numbers on websites are correct. 

The oldest source of information I could find goes way back to 2006 in the comments on this sports blog. On a similar topic of luck in the standings there is this post looking at basketball. Arctic Ice Hockey had a post a few years back but just stated that luck is 38% of the results, but I wasn't happy with trying to recreate it from the lack of detail. I also found this post which looks at calculating the theoretical limit in the NFL.

What is luck

We talk about luck a lot in sports analytics, we even have the stat PDO to show how lucky a person or a team has been. But a lot of people tend to dismiss it saying there's no such thing as luck.  To start out, we need to define what luck exactly is. In the statistics community they refer to luck as a stochastic process where regardless what the initial conditions are there are many different processes that can evolve. 

Bringing this to hockey, over a season the luck of a team regresses to the norm over 82 games.  But in a single game luck plays an important part. Given how few events (goals) there are in a game, a single can be won by a puck deflecting off of a player. If that player was standing in a slightly different location or facing a different direction then the puck may not have gone in. It's easy to start to see how luck can be important in a single game. Patrick D expanded upon this to show how luck plays more of a role especially in the shortened season we just had. 

Methods

To calculate luck in the standings we can compare the standard distributions (St.Dev) between a binomial distribution and the observed distribution; this is what I believe that Arctic Ice Hockey did.  We can also use Classical Test Theory.  In this we get the formula that

variance(observed) = variance(talent) + variance(error)

 I collected all the win percentages (win%) of all teams between 2004-205 and 2011-2012.  I excluded 2013 season because of the lockout and I treated all OT losses as a loss and OT wins as a win. There's only two outcomes, win or loss. This gives a St.Dev of all win% over this time period of 0.09. The variance of this error can be calculated as (.5*.5)/82.

Plugging these numbers into the formula we get:

var(obs) = var(talent) + var(error)
var(talent) = var(obs) - var(error)
var(talent) = 0.09^2 - (.25/82)
var(talent) = 0.0081 - 0.003049
var(talent) = 0.005051

Now this is our estimated variance due to talent.  We can then find out what portion of the observed results is made up by talent. So:

luck = 1 - (0.005051 / (0.09^2)) = 0.376419753

We can say that luck explains 37.64% of the variance in the results in the standings.

Future Work and Conclusion

This number seems to be in line with the numbers presented by Arctic Ice Hockey. Next, I will be expanding on this to see if there is a theoretical limit to predicting in hockey. Some would argue that this limit is just 100%-37.64%, or ~62%.  I will need to do the math myself before I can agree with that stement.

Thanks and credit

There's many people I've been in discussion with this while trying to figure it out, so many thanks to those.  This include Michael Guerault, Phil Birnbaum, Patrick D, Gabriel Desjardins and many more. 

Previous Posts

172ff756e336b4deef407cc7fc644369
I am a Van Fan in Bytown. Living in Ottawa for work, I research Sports Analytics and Machine Learning at the University of Ottawa. I play hockey as well as a timbit but I compete in rowing with hopes of 2016 Olympic Gold. Follow me on twitter at @joshweissbock and feel free to give me a shout.
Avatar
#1 leafnerd
August 01 2013, 11:47AM
Trash it!
1
trashes
Props
3
props

I dislike the word the luck as it sometimes has negative implications that a team for example did not earn their success and this causes folks to get defensive about their team in discussions.

Instead referring to luck as random chance removes any "fault" or the negative connatation of the word.

Avatar
#2 leafnerd
August 01 2013, 11:53AM
Trash it!
0
trashes
Props
2
props

For some context on this, Fenwick Close Rsquared is about 35% descriptive of team winning (and 65% due to random chance). This work shows in theory the best we can get to 38% random (and possibly TBD 62% descriptive).

And so, there is possibly another 32% skill components that we have not identified yet (like PK, PP, shot quality the unicorn of advanced stats, faceoffs skill, facepunching (I jest) etc) that are important to team winning.

Avatar
#3 leafnerd
August 01 2013, 11:56AM
Trash it!
0
trashes
Props
2
props

@leafnerd

Also, I was just thinking the other day, that the math behind this resembles the nyquist limit in information theory. The limit of how much information that can be received when transmitted from point A to point B.

And it turns out to be the same function that limits information from escaping a black hole in theoretical physics. This is not really applicable to hockey statistics as much as to say there is always some mystery in life (and hockey) that alludes quantification and measurement.

Avatar
#4 garret9
August 01 2013, 11:19PM
Trash it!
1
trashes
Props
2
props

Heh, I just finished writing an article on luck too!

We both used bell curves!

Yay!

http://www.arcticicehockey.com/2013/7/26/4513170/winnipeg-jets-nhl-stats-probabilities-luck

Avatar
#5 Pierce Cunneen
August 02 2013, 01:58PM
Trash it!
0
trashes
Props
0
props

Nice work Josh.

Avatar
#6 John Lofranco
August 07 2013, 06:38AM
Trash it!
1
trashes
Props
0
props

"Bringing this to hockey, over a season the luck of a team regresses to the norm over 82 games."

This seems convenient. Why not 100 games or 75? Or is it because "the norm" is defined as "what happens over 82 games"?

Comments are closed for this article.