# Machine Learning and Hockey: Is there a theoretical limit on predictions?

**Josh W**

August 01 2013 09:39AM

## Intro

In my previous work I was working on predictions in hockey, at the micro level of a single game, to determine who will win and who will lose. I was using Machine Learning techniques (or algorithmic modeling) and one thing I've noticed from my work, and from reading papers on other peoples material, is that for every sport there seems to be a prediction limit (that is different for every sport). This is what I want to explore; how much of the results that we see are based on luck, and is there a theoretical prediction limit in the NHL?

In this post I will be focusing on the first question, what part of the standings is made up by luck?

## Background

This subject is not brand new, it's been touched on a few times on the internet and sports analytics community but I have not yet seen it in academic sources. I need to do the math myself in order to bring it to the academic community, show it is possible to calculate, rather than just assume the numbers on websites are correct.

The oldest source of information I could find goes way back to 2006 in the comments on this sports blog. On a similar topic of luck in the standings there is this post looking at basketball. Arctic Ice Hockey had a post a few years back but just stated that luck is 38% of the results, but I wasn't happy with trying to recreate it from the lack of detail. I also found this post which looks at calculating the theoretical limit in the NFL.

## What is luck

We talk about luck a lot in sports analytics, we even have the stat PDO to show how lucky a person or a team has been. But a lot of people tend to dismiss it saying there's no such thing as luck. To start out, we need to define what luck exactly is. In the statistics community they refer to luck as a stochastic process where regardless what the initial conditions are there are many different processes that can evolve.

Bringing this to hockey, over a season the luck of a team regresses to the norm over 82 games. But in a single game luck plays an important part. Given how few events (goals) there are in a game, a single can be won by a puck deflecting off of a player. If that player was standing in a slightly different location or facing a different direction then the puck may not have gone in. It's easy to start to see how luck can be important in a single game. Patrick D expanded upon this to show how luck plays more of a role especially in the shortened season we just had.

## Methods

To calculate luck in the standings we can compare the standard distributions (St.Dev) between a binomial distribution and the observed distribution; this is what I believe that Arctic Ice Hockey did. We can also use Classical Test Theory. In this we get the formula that

variance(observed) = variance(talent) + variance(error)

I collected all the win percentages (win%) of all teams between 2004-205 and 2011-2012. I excluded 2013 season because of the lockout and I treated all OT losses as a loss and OT wins as a win. There's only two outcomes, win or loss. This gives a St.Dev of all win% over this time period of 0.09. The variance of this error can be calculated as (.5*.5)/82.

Plugging these numbers into the formula we get:

var(obs) = var(talent) + var(error)

var(talent) = var(obs) - var(error)

var(talent) = 0.09^2 - (.25/82)

var(talent) = 0.0081 - 0.003049

var(talent) = 0.005051

Now this is our estimated variance due to talent. We can then find out what portion of the observed results is made up by talent. So:

luck = 1 - (0.005051 / (0.09^2)) = 0.376419753

## Future Work and Conclusion

This number seems to be in line with the numbers presented by Arctic Ice Hockey. Next, I will be expanding on this to see if there is a theoretical limit to predicting in hockey. Some would argue that this limit is just 100%-37.64%, or ~62%. I will need to do the math myself before I can agree with that stement.

### Thanks and credit

There's many people I've been in discussion with this while trying to figure it out, so many thanks to those. This include Michael Guerault, Phil Birnbaum, Patrick D, Gabriel Desjardins and many more.

### Previous Posts

**Comments are closed for this article.**

leafnerdAugust 01 2013, 11:47AM

I dislike the word the luck as it sometimes has negative implications that a team for example did not earn their success and this causes folks to get defensive about their team in discussions.

Instead referring to luck as random chance removes any "fault" or the negative connatation of the word.

Garret HohlAugust 01 2013, 11:19PM

Heh, I just finished writing an article on luck too!

We both used bell curves!

Yay!

http://www.arcticicehockey.com/2013/7/26/4513170/winnipeg-jets-nhl-stats-probabilities-luck

John LofrancoAugust 07 2013, 06:38AM

"Bringing this to hockey, over a season the luck of a team regresses to the norm over 82 games."

This seems convenient. Why not 100 games or 75? Or is it because "the norm" is defined as "what happens over 82 games"?

leafnerdAugust 01 2013, 11:53AM

For some context on this, Fenwick Close Rsquared is about 35% descriptive of team winning (and 65% due to random chance). This work shows in theory the best we can get to 38% random (and possibly TBD 62% descriptive).

And so, there is possibly another 32% skill components that we have not identified yet (like PK, PP, shot quality the unicorn of advanced stats, faceoffs skill, facepunching (I jest) etc) that are important to team winning.

leafnerdAugust 01 2013, 11:56AM

@leafnerdAlso, I was just thinking the other day, that the math behind this resembles the nyquist limit in information theory. The limit of how much information that can be received when transmitted from point A to point B.

And it turns out to be the same function that limits information from escaping a black hole in theoretical physics. This is not really applicable to hockey statistics as much as to say there is always some mystery in life (and hockey) that alludes quantification and measurement.

Pierce CunneenAugust 02 2013, 01:58PM

Nice work Josh.

Josh WAugust 07 2013, 06:02PM

I wasn't trying to imply that at exactly 82 games all teams luck will have regressed to the norm. Patrick D did a good study on PDO here at NHLNumbers (http://nhlnumbers.com/2013/1/10/studying-luck-other-factors-in-pdo) and he shows that the PDO of all teams will be in the 100 +/- 2% range after 82 games.

What I am saying is basically all teams, over 82 games, the luck will balance out. There was math done by Tom Tango (and others) using the same method and they show that in hockey skill becomes more important than luck at the ~36 game mark.

See: http://blog.philbirnbaum.com/2006/08/on-correlation-r-and-r-squared.html and http://www.insidethebook.com/ee/index.php/site/comments/true_talent_levels_for_sports_leagues/