Building a NHL Fantasy Predictor: What’s in a Line?

Updated: November 2, 2012 at 8:04 am by Ben Wendorf

I’ve been pretty determined to figure out how to build my own fantasy hockey predictor for a while, and it’s led me to bounce a few ideas off fellow NHLN writers and friends. What seemed to hold was my assertion of starting with players sorted into “line” thinking, with forwards considered 1st/2nd/3rd liners and defensemen as 1st/2nd pair defenders. You’re not rostering players beyond that unless you are anticipating their involvement in those lines/pairings down the road.

But before I would even move into thinking about lines that a player might play in, I have to think about the statistical composition of the lines themselves, as a player’s production has a talent component but, more importantly, a situational component. It’s hard for the talent to be productive without the time to make it happen. So what’s in a line?

Who gets the top minutes?

What would be helpful for me to sort player data into lines would be to know that coaches are, as Tom Awad asserted in his Good Player series, actually pretty good at giving more time to better overall players. Just to add a little to that, I ran a test of the relationship between 5v5 TOI/60 to Corsi Relative (the shot differential when a player is on the ice – controlled for team effects), shots-for/60 minutes when the player is on the ice, shooting percentage when the player is on the ice, and shots-against/60 minutes when the player is on the ice. Because we’re doing lines and not pairings, I’m focusing on all the forwards playing 20+ games over the last five years, a data-set of 2,182.

  • TOI/60 to Corsi Rel: r = 0.516
  • TOI/60 to SF/60: r = 0.514
  • TOI/60 to On-Ice Sh%: r = 0.501
  • TOI/60 to SA/60: r = 0.062

So, our r (or correlation) stands for, based on the sample, the suggested amount of change in x (TOI/60) that could be due to y (Corsi Rel, SF/60, On-Ice Sh%, SA/60). r runs from -1 to +1, where the numbers above zero suggest that, when TOI/60 goes up, Corsi Rel also seems to be up, and when TOI/60 is down, usually Corsi Rel is down as well.  Below zero suggest that when TOI/60 goes up, Corsi Rel seems to go down; strong positive or negative correlations are usually between 0.75 and 1 in either direction.  Good is around 0.50 – 0.75, and anything beneath that there’s some question about the relationship, though if you have a really big data-set (say, 10,000+) a 0.4 in either direction isn’t a bad result. When charted, our TOI/60 to SF/60 (0.514), therefore, looks like this:

TOI/60 to SF/60 Graph

A result near zero suggests almost zero relationship between the two variables; what makes the above interesting is that the zero relationship for SA/60 seems to suggest that TOI/60 is a better predictor of good offensive players than good defensive players. That’s okay for fantasy, but something to keep in mind when using 5v5 TOI to assess overall player value, and let’s forget about trying to predict +/-.

Basically, our TOI/60 is suggesting that coaches give more time to better offensive players, and the fact that offensive players aren’t always great defensive players brings us to the low correlation for SA/60. Once again, that’s okay for our purposes, because in fantasy we are typically interested mostly in offensive value anyway.

Where the talent lies

Looking further into the 5v5 data, initially I went team-by-team and selected the top 3 forwards in TOI/60 as the “first line,” next 3 forwards as the “second line,” and the next 3 forwards as the “third line,” and looked at the data. I soon realized this wasn’t really bringing me to true first-liners; the data was suggested there wasn’t much different in shooting talent among the lines. I concluded that not all teams have 3 true first-liners on their top line. Scrapping that, I turned to taking the entire forward population, sorting them by TOI/60, and quartering the entire population (creating a virtual 1st line = top quarter, 2nd line = second quarter, 3rd line = third quarter, 4th line = fourth quarter). Further theorizing that not all top liners stay on the top line, and not all fourth liners even stay in the league the entire season, I changed the splits among the four to 1st line = top 20% TOI/60, 2nd line = next 25%, 3rd line = next 25%, and 4th line = final 30%.  Thus, for every two teams top lines (or 6 forwards), at least one and possibly two forwards might move down off the top line.

Now, I was starting to see distinctions among the lines, and important ones at that.  For instance, it helped me established a simple triptych of TOI/60 among the three lines, 14.5 / 13 / 11.5 .  Across this sample, those were your average even-strength TOI/60’s. Some other helpful distinctions:

  • Offensive zone starts, or the percentage of time a forward starts his shift in the offensive zone…1st Line: 52.56%, 2nd Line: 50%, 3rd Line: 48.46%
  • SF/60…1st Line: 28.386, 2nd Line: 27.639, 3rd Line: 26.457
  • On-Ice Sh%…1st Line: 9.4%, 2nd Line: 8.5%, 3rd Line: 7.7%

I’m no engineer, but for me these numbers have all the beauty of the celebrated relationship of 3, 4, 5. Offensive zone starts, as I and many others have noted in the past, have a strong relationship to shots-for, so you’d be safe in attributing some of the difference in SF/60 to deployment. The big difference, then, is the shooting talent on the ice, which is clearly different between lines. Individual shooting talent is pretty easily born out in individual shooting percentage data, but this on-ice shooting percentage lends to the likelihood that the player can get assists and a larger point total.

By looking at the situational component of a forward’s offensive potential (something our own Josh L. has likewise done in regards to powerplay time), we can see that the line on which a forward is placed increases their opportunities to score points for their (and your fantasy) team. Trio relationships like 14.5 / 13 / 11.5 and 9.4% / 8.5% / 7.7% for the lines can also help provide some of the building blocks for a fantasy predictor. Next time, I’ll take a look at the potential impact of coaches on these situational components, to see if coaches are another factor you need to consider.