Machine Learning & Pre-Game Text Reports to Forecast Success in Hockey

Updated: February 8, 2014 at 10:06 am by Josh W

In the final part of my Machine Learning (ML) / Prediction series I’ve explored the ability to forecast games using Natural Language Processing (NL) and text, specifically the text from the pre-game reports on While this research is much more in depth on the ML/NLP side of things I’ve tried to minimize that and focus this post on the hockey #fancystats. If you are interested in learning more about the ML and NLP techniques send me a message and I can share that academic research.

To learn how I used text to predict games, click past the jump.



First to recap some of the relevant work in this project: Machine Learning is a subset of artificial intelligence where algorithms can learn from previous data and predict outcomes of new data. ML can predict from a subset of classes (supervised learning), predict a numerical outcome (regression) or it can figure out what data is similar (un-supervised learning). Regression has been used a little bit in hockey analytics such as at but I have seen very little to no other words using the other methods of machine learning in hockey.

In some earlier workI used machine learning to analyze advanced and traditional statistics. I found that in predicting a single game, the best features were to use location (home/away), goal differential and goals against. Others have also found that predicting a single game with shot differential returns decent result as well.  

Despite trying different features, the best results I was able to achieve was with a voting classifier combining Support Vector Machine, Naive Bayes and Neural Networks (Multilayer Perceptron) with an accuracy of 59.8%. Further work seems to suggest that there is an upper bound in hockey predictions of ~62%. With the home team winning 56% of games this leaves only a small gap for improvement.


For this experiment I used all 720 games of the 2012-2013 shortened NHL season. Previous work has suggested there is concept drift when using multiple seasons of data to train on, so I only analyzed the most recent.  

For each of the 720 games I calculated their location (Home/Away), Goal Against and Goal Differential. Then for each game I found the pre-game report (example here). I divided the text into parts that were relevant for both teams.  A few of the pre-game reports did not have proper markup so it was not possible to split the text apart. As a result, I was only able to train on 708 of the 720 games.  

Additionally I used a method called “Sentiment Analysis” which looks at text to determine if it is “positive” or “negative”.  I counted the number of “positive” and “negative” words in each text and gave it a sentiment score based on the ratio of positive to negative words. I used the AFINN sentiment lexicon to determine which words are positive or negative.

All games were split into two vectors: one for the winning team and one for the losing team with their appropriate data as per the model. All data vectors were given a final output label (Win or Loss).


To learn on this data I made four models, three for each of the different types of data and a fourth meta-classifier which learns from the output of the first layer models.  All models were trained on the same 930 games (~66%) and tested on the remaining 478 games (~33%) to ensure that I could feed the predicted labels on   

The first model used only the numeric data, the second model used only the text reports with NLP features such as 1,2,3-grams. The third model used only the sentiment analysis features. A fourth model (meta-classifier) took the outputs from the three previous models and learned from their decisions and used a number of different ways to come up with a final decision: Cascade-Classifier, a second algorithmic-model; Highest Confidence, selecting the decision with the highest probability of likelihood as determined from the first layer; and Voting, selecting the output label based on majority voting of the three other layers.  


For comparison we tried all features in one data set and the best results came from a variant of Naïve Bayes called NBSimple with an accuracy of 58.27%. The results from the first three classifiers are:

Model Algorithm Results
Numeric NeuralNetworks 58.58%
Text Jrip 57.39%
Sentiment NaiveBayes 54.39%

The results for the meta-classifiers are:

Name Results
Cascade Classifier & SVM 58.78%
Highest Confidence 57.53%
Majority Voting 60.25%

Out of interest I ran InfoGain on the text to see which words and pairings of words contribute the most to predictions. They are:

Percentage Instances ngram
0.01256 1010 whos hot
0.0115 1009 whos
0.01124 424 hot
0.0084 918 three
0.00703 160 chicago
0.00624 470 kind
0.0061 98 assists
0.00588 673 percentage
0.00551 1180 trotz
0.0054 340 games
0.00505 1152 richards said
0.00499 1052 barry trotz
0.00499 1051 barry
0.00499 1066 coach barry
0.00499 1067 coach barry trotz
0.00497 354 given
0.00491 317 four
0.00481 686 pittsburgh penguins
0.00465 130 body
0.00463 778 save perceentage

Discussion, Future Work, Conclusion

There are two things that can be taken away from using these methods.  

The first is the increase in overall prediction level, slowly approaching the upper bound. 

The second, and more importantly, is that text on its own seems to be as good as predicting single games. 

This needs to be explored further as it could be a result of the upper bound being so low. Trying a similar experiment with another league would be ideal, but I could not find a large amount of pre-game text for other leagues. The SHL, with an upper bound closer to 70%, has some textular pre-game reports, but only 100 or so games (and in Swedish) which is not a large enough sample size. It makes me wonder what else text could predict in hockey.

One last point of interest is that the most important words were often team, player or coach’s names. This makes sense due to the ebb and flow of the league where some years a team is good and then other terrible and these help pinpoint them. 

(Side Note: the decision tree I generated for this data, the root word it looks for was Philadelphia and would automatically select ‘Loss’.)  

This would support the fact we cannot train on one year to test on another due to concept drift.  It is interesting to see Nashville Predators / Barry Trotz show up frequently  as a key word – in 2012-2013 they weren’t a great team but they weren’t the worst. They also are not a team that he NHL overly loves to cover in the media.  Chicago and Pittsburgh are understandable, as is Richard.  But I can’t quite place the important of Smashville to predictions.