Machine Learning and Hockey Predictions: Part II

Updated: May 28, 2013 at 12:57 pm by Josh W




Last time I tried to use Machine Learning to create a simple classifier that can predict which of two teams is more likely to win a hockey game.  Machine Learning is a class of artificial intelligence that can take a large amount of data, learn from it and then make future decisions.  A simple description of it is that the algorithms it uses (such as Neural Networks, Decision Trees, and Support Vector Machines) act like a black box, you feed it in data, it learns from it, and then it can make decisions on new and future data. 

Our project last time was using 386 NHL games over ten weeks in this season and included both traditional and advanced statistics such as Goals For and Against, Power Player %, Power Kill %, Conference standing, win streak, Fenwick Close %, PDO and 5-on-5 Goals For/Against Ratio.  Feeding them into a number of algorithms the best result came out to ~60%.  This is not bad, with betting we could probably make a profit, but we want to improve on it.

Since the last time I’ve updated the data to include all of the games I collected in the regular season, there is now a total of 517 games which is about 3/4 of the shortened season.  The goal is still to try and increase our classifiers prediction rate.  While many of these advanced statistics have been shown to work well at predicting the long term success of a team we are looking at single game success which is much more difficult as 38% of the standing results are the cause of luck

Experiment Description

PDO is a very interesting performance metric here in the NHL.  It is the simple addition of Shooting % and Save %.  In the long run they will regress to 100% for all teams, but in the short term we can see who is playing better or worse that the normal variance.  There’s a great post on PDO here on NHLNumbers which goes much more in depth and I would recommend everyone interested to read it. 

PDO is often used with Fenwick to predict which teams will fall into or out of playoff contention mid season.  In the 2011-2012 season it was easy to see that Minnesota, who had a great start to the year, had a high PDO but low posession stats and the internet analysts were predicting their downfall despite what the mainstream media analysts were saying.  Ultimately their downfall was a true prediction and they didn’t make the playoffs.  This year in the 2012-2013 season, if there had been a full 82 games team such as Anahiem (with 48% Fenwick Close) and Toronto (44% Fenwick Close) , both with a PDO of 102%, would have ultimately not made it. Likewise, if it were an 82 game, New Jersey with a 55% Fenwick and a 97% PDO would have made it (similar to last years LA Kings).

So PDO can be used in the long run to make predictions, but using it in our initial classifier we only got an accuracy of 60%.  I had used the entire seasons PDO, and as games went along most teams would have been regressed to near similar values.  It is possible that if we use a shorter game period, such as the PDO over the last 1,3,5, 10, or 25 games, we might see a change in the accuracy. 

Using the same algorithms as last time: Neural Networks (NN), Decision Trees (J48), Support Vector machines (SVM) and Naive Bayes, (NB), using the full 517 game data set and the same mixed features of traditional and advanced statistics, except for the modified PDO, we will plug in the values and see if we get a different classifier accuracy.


Baseline 49.71% 49.71% 49.71% 49.71% 49.71% 49.71%
SVM 58.61% 58.61% 58.61% 58.61% 58.61% 58.61%
NB 56.38% 56.96% 56.38% 56.58% 56.58% 55.51%
J48 54.93% 55.71% 55.42% 55.90% 55.61% 55.51%
NN 57.64% 56.67% 58.03% 58.03% 57.74% 58.41%


Well that is interesting, the results have not really changed by shortening the PDO length.  Unless we use a Friendman test I can’t say for sure if they are statistically different.  The best results is still using the Neural Networks on the “mixed” (or PDOAll) data set, and with some tuning I get 59.38% accuracy using ten-fold cross-validation.  If I train on only 66% of the data and test on 33% I get an accuracy of the brand new data at 55%.  Based on this data, I do not see shortening the PDO length adding any value to the results of the prediction accuracy. 

I would think to bet on that over a long period would make a profit (but that is a whole other project), there must be ways to increase this accuracy (and I will try and do that in future posts).   Looking at similar work in other sports (Football, Soccer, and Basketball) they are able to acheive accuracies in the 70s.  But given the low event, continuous nature of hockey, as well as the large results of luck in the outcome of a game it makes it much more difficult in hockey to predict the winner of a game.

What is interesting is I can use the Weka function CfsSubsetEval which tells me which features are contributing the most to the accuracy of the classifier.  I am surprised to see it is: Home/Away location, Goals Against and Goal differential.  These are not advanced statistics, they are the traditional statistics that are making the biggest difference on predicting of winning a single game.  It should be reiterated that this is NOT me disproving the use of advanced statistics such as Fenwick Close but rather saying in predicting in the short term of a single game there is still value in these traditional statistics. 

This will be needed to be run again on next years data to ensure the results are consistant.  If you want access to my data set for your projects let me know.

Numbers from Behindthenet

Related Posts