NHLNumbers in Academia

Josh W
August 08 2013 02:20PM



If you've been following my few posts here you will know that I am studying Machine Learning for my masters in Computer Science and I am using those techniques and applying them to hockey analytics. I have found some pretty interesting stuff.  Some of my first few posts I wrote here (namely this one and this one) were based off some of my academic research. I have since turned those into an academic paper that was accepted to a Sports Analytics paper.

You can read it: here

The conference it will be presented at is the European Conference for Machine Learning. Specifically at the Sports Analytics and Machine Learning workshop on Friday, 27 September in Prague, Czech Republic.

If you are near Prague at the end of sepember and want to talk shap, drink a Pilzen and check out a local KHL/Czech hockey game, then send me a message via twitter. I will be there for the week.

Read Article | 5 Comments

Western Conference Rolling Team Corsi

Travis Yost
August 08 2013 09:28AM




Whether it's on a per-game basis or season-to-season, we spend an awful lot of time looking at shot differentials to get a better feel for the strength of a hockey team. Unfortunately, the hockey blogosphere doesn't actively track rolling numbers at the team-level, a study that could give us a better handle of teams truly gaining/fading over the course of a season.

After the jump, a compilation of rolling team Corsi for the Western Conference during the 2012-2013 season.

Read Article | 16 Comments

Theoretical Predictions in Machine Learning for the NHL: Part II

Josh W
August 06 2013 10:02AM




In my last article I was looking at prediction limits for machine learning and sports.  More specifically I answered the question of how much of the standings are because of luck (aka, random chance, stochastic process etc).  By using classic test theory and looking at the variance of the observed win percentage over 7 seasons between 2005-2006 and 2011-2012 and comparing it to a theoretical league where the level of teams talents are normally distributed.  From this we were able to conclude that luck explains ~38% of the variance in the standings.

This is interesting, much higher than one might initial think, but it makes sense and I will discuss this further on.  As my area of research is in Machine Learning and using Machine Learning to make predictions in hockey.  What I am curious today is to answer the question, is there a theoretical limit to predictions we can make in hockey?  

Read Article | 9 Comments

Eastern Conference Defensemen: Pairing/Tier vs. Team Production

Dimitri Filipovic
August 06 2013 09:45AM

Cheer up, Alfie; who needs Erik Karlsson when you have Patrick Wiercioch and Andre Benoit, right?.
Image via Jean Levac/Ottawa Citizen.


If you've been following along with us here at NHL Numbers, you'll know that this is the fourth and final installment of a series in which we've went ahead and broken down forward lines and defensive pairings into "tiers", and compared their production to not only that of their own team, but also the tiers of the other clubs's in their conference.

We separated the conference into two separate posts since there were zero cross-conference games during this previous lockout shortened season. Therefore, it doesn't make sense to compare the two. We took a look at the Western Conference defensemen on Monday. Today, it's time to evaluate the Eastern Conference blueliners. 

Read Article | 5 Comments

Western Conference Defensemen: Pairing/Tier vs. Team Production

Dimitri Filipovic
August 05 2013 09:50AM

Johnny Oduya and Niklas Hjalmarsson probably aren't the 1st Blackhawks defensemen that come to mind, but their stellar play warrants recognition.
Image via Getty Images/Harry How.


On Thursday and Friday of last week, Travis Yost made his debut on this platform with a fascinating little research project. His goal was to take a look at how individual tiers of forwards for each team performed relative to their team's overall production, using even-strength zone-adjusted Corsi. Now, I've been tasked with doing the same, but for defensemen.

Why are we choosing to use "tiers", instead of terms such as "lines" and "pairings", which you are likely more familiar with? By doing so, we're able to account for things such as injuries, trades, call-ups, all of which naturally take place over the course of a season. There are so many little variations, and so much tinkering that takes place when it comes to specific usage, that it would really be a hassle to evalute things using those more commonly used terms. These tiers usually correlate very strongly with the most-regularly used combinations, anyways (i.e. the team's top two defensemen are more often than not in tier 1, and so on).

Just remember: context is crucial, especially when looking at data such as this. A substantial injury to a crucial player thrusts an unlikely candidate into a role (more specifically a higher tier, based on more frequent usage) that you may not have expected. But that's sort of the point of this. It tells us how well teams were able to perform with the pieces available at their disposal over an entire season.

Read Article | 12 Comments