**Delta Corsi and Assessment of Individual Player Impacts on Possession Accounting for Usage**

Statistical analysis of skaters in the NHL (and other hockey leagues) is a difficult and multifaceted process. At this point in the hockey analytics (or #fancystats) community one of the biggest problems with analyzing and assessing hockey play from a possession – and particularly defensive – perspective is tracking which players actually make a significant impact within a given system. Generally this has been tackled by assessing a player’s results within the context of their usage, then comparing a variety of statistical metrics to those of their “usage” peers. The best tools available at this point are player usage charts (Vollman), which are now available in a few locations. Metrics have also been devised that consider expected goals (Parkatti) and expected shooting percentage (Pfeffer), and win contributions – i.e. Total Hockey Rating aka THoR (Schuckers and Curro).

The problem one typically finds with trying to analyze players in “context”, is that eyeballing context isn’t amazingly easy. We all know what tough minutes are generally speaking, i.e. facing top opposition; playing big minutes; starting in your own zone frequently; playing with weaker line-mates, etc. Unfortunately there are implicit assumptions and unfounded impressions floating around that very few people seem to have spent much energy on assessing. Work must be done to determine what combination of various factors have meaningful impacts, thus providing us with a context for interpreting each player’s results.

Following a number of rough forays into modelling impacts based on context, I decided that a multi-variate linear regression would be an effective means to predict what Corsi results a given player should expect based on their usage. As a means of assessing usage, a series of variables were correlated to a player’s 5v5 Corsi. “Usage” in this instance is determined through weighting a number of factors, which implicitly are beyond an individual skater’s control during the course of play, that may impact upon their Corsi results.

^{2}) of 0.02. Thus the explanatory value of Corsi For to Corsi Against or vice versa is very weak (approximately 2%). The linkage between the two has been over-stated in many corners in the past – apparently at the individual skater level this is a flawed assumption.

Thus a player’s Expected Corsi For and Expected Corsi Against are determined using the following factors:

**Expected Corsi For – Regression Variables**^{†‡}

__Expected Corsi Against – Regression Variables__^{†‡}

^{}

†* A note on the Team variables at this point for the sake of explanation. No players that switched teams mid-season were included in the regression, as manipulation of data in order to separate out their TOI with distinct clubs was deemed labourious.*

‡ *Secondarily – this same process has been conducted with Yearly Team Effects accounted for as dummy variables, which then showed high collinearity with Team Mate CF/20 and CA/20. These considerations are still being examined – and look to be an improvement on the current model – but have not been completed as of the date of this posting.*

The regression was run using data from stats.hockeyanalysis.com, behindthenet.ca, and hockey-reference.com.

__Analysis of Results__^{2}= 0.6117 (61.17%). The correlation between Expected Corsi Against and the skater’s observed Corsi Against was found to have an adjusted coefficient of determination r

^{2}= 0.5542 (55.42%). The overall correlation between Expected Corsi and the skater’s observed Corsi was thus r = 0.7334, with coefficient of determination r

^{2}= 0.5379 (53.79%). This translates into the view that contextual factors outside of the individual skater’s control – i.e. Usage – explain at least 53% of what is being observed on the ice in terms of shot differentials (and likely more as the model is improved).

^{2}= 0.6168 (61.68%). This would indicate that the model is quite effective in predicting outcomes for the population in question.

^{§}

§*NOTE – The Linked Tableau Viz is updated using the most recent regression model which differs from the one discussed in this posting fairly significantly. Update includes seasonal Team Effects Factors (TEF), and the removal of TMCF20 and TMCA20 which were found to be collinear with the aforementioned TEF.*

Readers interested in the full dCorsi paper including tables of player results can find the document in PDF form here.