The ability to predict the success of hockey prospects at young ages has long been a goal in the business of hockey. Now more than ever, the success of young players is directly related to the success of an NHL team. For the most part, rosters are built through the draft, rather than trades and free agency. Knowing who will is likely to be successful and who is likely to fail can be the difference between winning and losing in the future – and that can subsequently be the difference between employment and unemployment for the person who is choosing the players.
It wasn’t that long ago (although it seems like ages) that Canucks Army had access to such a tool. PCS, the Prospect Cohort Success project developed by Money Puck and Josh Weissbock, used historical data to project players in the here and now. Unfortunately, we lost access to the system when those two were hired by the Florida Panthers.
As you may have noticed, we’ve been using comparable percentages to assess prospects again over the past couple of months, beginning with this article here. It’s been a bit of a mystery until now, but it’s time to pull back the curtain. Draft and prospect analytics are returning to Canucks Army and the Nation Network. This is not a rebirth of PCS, but instead an alternative, using similar underlying principles.
This is pGPS: the prospect Graduation Probabilities System.
First and foremost, I have to make note of the fact that pGPS and PCS are entirely separate entities. I had absolutely no input in Money Puck and Josh’s system, and likewise they had no input in mine. Of course I was inspired by their work – it was an objectively brilliant creation. At the time that it went offline, I was relatively new at Canucks Army. My knowledge of their system came only from reading their literature rather than direct contact. I found the idea incredibly fascinating, and I was gutted by losing it. And so I set out to create my own system, not with the intention of achieving fame or monetizing the system, but simply to satisfy my own curiosity.
With the invaluable help of Canucks Army programmer Dylan Kirkby, I was able to gather statistical data for a variety of leagues dating back decades and began to compile a massive database. Like PCS, the goal was to compare present players with past players based on a few key factors that are known to correlate with NHL success: age, stature and production.
This is where PCS and pGPS likely diverge – while the concepts are similar, it’s likely that the underlying math and formulas are entirely different. There are several steps beyond the basic premise that lead to the final numbers, and any difference in adjustment, scaling, weighting, and so on will yield different results. Without having seen PCS numbers for some time, I cannot comment on the degree to which the final numbers are similar. From here on out, I can only account for what I’ve done personally.
PCS is a tough act to follow, and while there are other projection systems out there, I wholeheartedly believe that PCS was the gold standard. There are models like the Projection Project that use NHLe to compare players. Draft by Numbers uses a Poisson Generalized Additive Model, and measures results in interesting ways like Time on Ice. Another model called DEV uses Euclidean mathematics to measure similarity, like PCS, though with a considerably smaller database. In the interest of giving pGPS its own flavour, I have experimented with different ways of inputting data, measuring outputs, and making adjustments here and there.
The Advantage of pGPS
Though we’re only unveiling it by name today, we’ve been kicking pGPS around behind the scenes for quite some time. In that time, I’ve built the database to a massive size that includes over 150,000 player-seasons from over 25 leagues, dating back over 30 years (in some cases). Of course, goal scoring rates have changed considerably since that time, so every single player-season is rigorously era-adjusted.
As with any projection model, pGPS is simply a tool to be used in addition to traditional scouting applications, rather than in place of them. It makes it far simpler to compare the likelihood of success of players between leagues, and quickly identifies players who are piling up points simply because of their age and size relative to their peers.
Of course, it’s not without its biases. pGPS is subject to biases that are already present within the system. Historically speaking, the likelihood of success for players diminishes greatly the further they get under six feet. But how much of this is a result of their lack of abilities and how much is due to biases from coaches, managers, and scouts that prevent them from even getting a fair chance? Clearly there is a lot of both involved, but we may never know the degree to which each affects the end result. That is why context is critically important.
When a percentage is given by pGPS, it is up to the scout or analyst to determine which side the player will fall on. If a player’s comparable group achieves NHL success 33 per cent of the time, it’s their job to determine whether the player will be among the 33 per cent that makes it, or the 67 per cent that fails, as well as whether there are roadblocks that can be overcome with solid prospect development practices. pGPS percentages can also tell you which players are better bets before you even go about determining development strategies at an individual level.
Components of pGPS
In pGPS, similarity between players is measured by the distance in Euclidean space, where age, stature and production are the three points of imaginary shapes some distance apart. The closer the players are in Euclidean space, the more similar they are. Players with a high degree of similarity are deemed to be compatible matches. Each match is cross-referenced with the NHL’s all-time data, and a series of results are formed. Of all the players deemed a match, how many made it to the NHL? How many forged careers as NHL regulars? How many goals did they score, how many points? Each of these questions boil down to singular statistics designed to project young players in a myriad of ways.
pGPS is measured by a series of different numbers, each of which indicates something different about the relationship between the subject and the historical sample.
- pGPS n: The number of matches between the subject and the player-seasons (one season by a single player, i.e, John Tavares 2008 OHL) in the historical sample.
- pGPS s: The number of statistical matches that became NHL regulars. This is determined by playing 200 NHL games.
- pGPS %: The bread and butter. Simply s divided by n, this is the percentage of statistical matches that successfully became NHL players.
- pGPS P/GP: The NHL points per game of successful matches.
- pGPS R: A bit of a hybrid number, this pGPS Rating combines the percentage and points per game to produce a number that includes both likelihood of success and potential upside.
To assess the capabilities of pGPS, I ran the entire 2007-08 OHL season, using data from season played previously as the comparison sample. I found a high correlation (R2 = 035) between the players’ pGPS % and their eventual NHL games played.
Within the 2007-08, pGPS was very impressed with John Tavares and Steven Stamkos, giving them percentages of 100 and 87.5 per cent, respectively. It wasn’t fooled by 19-year old Justin Azevedo, who led the OHL in scoring that year. Azevedo was assigned a pGPS% of 0.0%, and subsequently went on to play zero NHL games.
When the OHL database for that season is arranged by pGPS R, many of the most successful players rise to the top, with 15-going-on-16-year old Taylor Hall at the top of the list, still two years away from being drafted first overall in 2010. Of the top 38 players by pGPS R, only three failed to play a game in the NHL, while 21 players in that group played at least 200 games. Here’s a list of the top 11 in pGPS R:
Obviously there are a couple of whiffs here and then, but by and large pGPS was able to identify eventual NHLers with impressive consistency.
pGPS R was also a very strong indicator of eventual NHL production.
This is just a small taste of what pGPS has to offer. Over the next several weeks, you’ll see these percentages making appearances in the Nation Network’s Draft Profiles series that go from now until the week of the draft. We’ve already been using this metric in assessing potential free agents, both of the CHL and NCAA variety, and we will continue to do so moving forward.
As the draft gets nearer, we can use it for one of its most valuable benefits – picking out potential late round steals, especially in lesser known leagues. While most North American leagues are standard, and European elite leagues were must haves, the European junior and second tier leagues are new additions to the pGPS family.
pGPS needs more rigorous statistical testing to determine significance and validity. That will be something that I get into as we head into the summer and will be updating the masses on as we go along.
In the future, we’ll experiment with different ways of displaying pGPS data, analysis of drafting by the Canucks and by teams around the National Hockey League, as well as explanations of anomalies and the relationships between the world’s various hockey leagues. Buckle up stats fans, this should be quite an adventure.