Selection Bias, Methodologies and Outcomes

Updated: July 23, 2013 at 10:30 am by Kent Wilson



By: Patrick D. (SnarkSD) of Fear the Fin


When it comes to statistical analysis the population of interest is everything. Outcomes only apply to the population in the study. Furthermore, any manipulation of the population adds bias to the study. Sometimes this is obvious, but other times it might hide behind a curtain which isn’t obvious, even if we apply tests of statistical significance. That is why methodology is important. In an effort not to bore you, let’s first discuss the more pertinent issue; selection bias, and return to methodology in the end of the article.

On Selecting Data to Use

Analyzing NHL data often involves selecting the largest population one can collect. In hockeymetrics analysts are often forced to do this in order to increase the power of the study (Power is a statistical term meaning the likelihood of a confirmatory result; ie. reject the null hypothesis). But this doesn’t come without perils. I tend to focus on team measures for a variety of reasons, but one of which is I don’t have to deal with as many selection bias issues.

Selection bias refers to an error in the outcomes as a result of the method in which the data was collected or sampled. When analyzing teams, this is easily avoided by collecting as many if not all games in each season, then randomizing the team-games for every year. For analysis of skaters or goalies this is tremendously more difficult. We aren’t working with a natural population with new individuals entering the population at random, nor leaving at random. Individuals are selected (remember Darwin, Survival of the fittest!), on past (not current or intrinsic) characteristics. This is made all the more difficult because these selection pressures are often the same variables we are interested in studying.

One such example is shooting percentage.[1] As Eric T. showed recently, GMs heavily select based on shooting percentage. Any analysis we perform on shooting percentage will feature significant selection bias, which an analyst must take into account.

I ran a simple simulator to show the effects of selection bias on shooting percentage, and then compared it to actual NHL data just to see the fit of the data as a secondary outcome. We are really interested in seeing (our primary outcome) how shooting percentage changes when we introduce selection pressure.

Briefly, I selected every forward to appear on BTN; 5v5 strength, 2007-2012 (N=941), and compiled their GP, and cumulative ice time (TOI) over that time period. I then ran a histogram to see how long (in minutes) each of these players played for in this sample. (Note: this isn’t a study of survivorship because some players are still active at the end of the time period selected):


I selected bins of TOI based on this histogram, with each bin representing approximately 10% of the population. I then generated average TOI/G, average cumulative TOI, GP, and shooting percentage for these bins of forwards.


TOI cut-off 45 155 345 685 1070 1955 2755 3725 4720 6605
On-ice Sh% 3.92% 6.38% 7.20% 6.46% 7.30% 7.47% 7.59% 7.94% 8.50% 9.19%
Avg GP 2.97 11.74 30.29 57.45 86.09 149.93 216.07 280.78 336.77 379.25
Avg total TOI 21.52 91.16 238.59 503.45 859.37 1520.32 2331.42 3230.12 4251.63 5219.34
Avg TOI/G 7.25 7.77 7.88 8.76 9.98 10.14 10.79 11.50 12.62 13.76


On-ice Sh%: On-ice shooting percentage (The shooting percentage of a forward’s team while he is on the ice). GP: Games Played TOI: Time on Ice, 5v5. TOIΓ: Time On Ice per game

As you can see, there is a sharp reduction in the total population early in the career of many forwards. On average, our first 10% barely make it to 3 games, playing an average of 7 min. This steadily increases as we move up to our “elite” players, the top 10% that averaged 379.25 games, playing 13.76 minutes 5v5 per game.

I’m sure your eyes went to On-ice Sh% immediately. There clearly is a trend in shooting percentage here. As our population decreases, (remember we lose 10% every time we move to a new column) our shooting percentage increases until we have our final population shooting at a career rate of 9.19%, well above our total population average of 8.28%. That qualifier is significant however. Our population isn’t decreasing randomly, they are being selected for by GMs, and we know that GMs prefer players with higher career shooting percentages, despite evidence that shooting percentage regressing heavily to the mean.

But maybe GMs and scouts are deft at identifying high shooting percentage players even when they aren’t shooting well. They can see through the initial variance to pick long-term winners, hence the increased shooting percentage of our final 10-20%. While I can’t equivocally prove that wrong, I’d think GMs and scouts are much more likely to act on the results they have, which are players with low shooting percentages and drop them from the NHL. (Which by the way, still wouldn’t explain regression to the mean.)

In order to show how selection bias will result in a population with a career average above the mean I created a simulator in excel. It’s very basic. I created a sample of “simulation” forwards that all have an on-ice shooting percentage of NHL league average 8.28%. I then let them “play” for the average ice time for each bin in our population, (eg. 21 minutes for bin 1, 70 + 21 = 91 total minutes for bin 2).

I let their on-ice shooting percentages vary as if each shot had a completely random chance (based on the mean, and TOI of that bin) of going in. That is to say the variance in the population is the same variance if on-ice shooting percentage is all luck and no skill. For each bin I calculated the weighted “career” shooting percentage.

Now the fun part.

I dropped the worse 10% “sim” forwards by on-ice shooting percentage after each bin of ice time ended, and calculated the on-ice shooting percentage of that group of dropped forwards. Thus our population decreases at the same rate as we saw in our NHL sample, and we have calculated on-ice shooting percentage in a similar way as well. We can compare the on-ice shooting percentage of our dropped “sim” forwards to our sample derived from NHL data, to see what the trends look like.

The most important part of this study is not necessarily that the “sim” data fits our NHL data nicely, it’s that as we move across our time bins, shooting percentage increases, despite the fact that our “sim” forwards have a true shooting percentage of league average (8.28%), and our sample variance was all due to luck (ie; was the variance expected for a completely random distribution).

As you can see, the selection pressure we induced artificially by cutting off the bottom 10% of each ice time bin resulted in an effect in the population that was still “active.” We selected for higher shooting percentages, even though no such skill existed. This, as it turns out, is similar to front offices’ in the NHL. A few notes, I gave all “sim” players exactly the same total TOI for each bin, which is different from our NHL sample that have an average total TOI about that bin. Also, the “sim” forwards generate exactly league average shot and goal rates, which is probably different from the NHL.

In conclusion, you can see how selecting only a subset population in the total of NHL skaters can create biases. This effect is not found only in on-ice shooting percentage, but every stat that NHL offices’ use to select players, including Corsi or zone starts. The NHL is not a random system, the population is selected for, and those selection pressures will alter our data. If we are going to perform analysis of individual skaters, the effect must be either mitigated, or accounted for.

We’ll switch up gears here, and discuss methodology in general.


When I refer to “method,” I use it as a basket term for everything one does in a study to come to a conclusion. This includes a) the initial inquiry, b) the study design, c) the method used to collect data, d) analysis of that data, and e) interpretation of the results. Each of those steps feature potential biases. I’ll synopse each stage briefly.

A) The intial inquiry is the question that’s being asked and the reason for collecting data. For example, “What is the average length of attacking zone time?” Solid research must have a foundational question it is attempting to answer. A common error is to collect a ton of data, and then “look” for patterns. Due to random chance, trends will emerge. Furthermore, no outcome (what your final conclusion will be) is specified, rendering the study a fishing expedition that is heavily influenced by chance (random chance that these trends show). Conclusions are then based on interpretation of the trends or new findings not initially considered, instead of an initial question.

B) I could talk about study design for pages, but let’s keep it simple. A study that asks a question and then collects data real-time, moving forward is known as prospective. Drawing conclusions at the end of this study period provides the best evidence for the initial inquiry. For example, a question for this season might be, “Are injuries reduced by the new divisional schedule?” By asking this question now, we don’t introduce biases we may know about the data in the future. Due to time constraints data is often looked at retrospectively (looking back at data), which works well, but is certainly not as strong evidence as prospective studies.

C) Data collection is critical, and this is where a lot of bias is introduced. As discussed above, filtering data can lead to selection bias, which can really change the results of a study. It’s best to try to include as much of the population of interest as possible, and search diligently for confounders.

D) Data analysis is perhaps the most critical. The method of analysis must be tailored to the data. If we have a normal distribution, we can often apply standard tools such as Pearson correlation. If the data is linear we can use regression. The character (distribution, mean, skewness, or pattern) of the data must be confirmed before using these tools.

These powerful statistical tools are (almost) a must if we want to offer any evidence toward our conclusions. A list of the top 20 or bottom 20 by a stat is in no way statistical proof, nor even suggestive of it. Binning data arbitrarily confers a generous ability to manipulate data toward a conclusion pre-formed by the analyst. If you’re interested in analyzing NHL data, learn about these powerful tools (they’re all easily available in excel as add-ons).

E) Lastly we interpret our results, applying our findings to real-world concepts. Again we must be careful not to overstate, or understate our results. We must again consider the population we selected, the strength of the evidence we collected. Tests of statistical significance (p-value, T-statistic, 95%CI, hazard ratio) are all available to help us figure out the strength of our findings. But it is on the researcher to determine the applicability, generalizability, and validity of the study.


1. I used to think there would be a time when a majority of the analysis would move beyond this, but it is such a pervasive truism held by a huge population of fans and analysts alike that it’s existence will never go away. It’s likely that there will always be those that oppose shot quality existence, and those that believe.

A substantial amount of evidence is available for the former, and I still haven’t seen a study fimrly confirming shot quality as a repeatable skill. The one area that I think one may show shot quality at the NHL level is in face-punchers. These players have been selected for an entirely different skill set than the vast majority of the NHL, and if there ever was to exist a population that showed (lack of) shot quality, it would be in these skaters.

Other Thoughts

1. Be open minded. It’s easy to see through authors biases after a few articles. These will ultimately reduce the quality of your work because the community knows how these biases will unconsciously and consciously change your results.

2. If you want the study to carry weight, you must show %regression to the mean, and correlation with winning. That way we know how the stat compares to other important stats.

3. Although a formal review process is not available, sending work to others who are familiar with the inquiry question can be very helpful. They often uncover errors and additional implications in the work that that may have been missed. The liberal transparency of the statistical community is its strongest attribute.

4. Diligently look for errors before posting, and always include a paragraph about what may have caused a false confirmatory finding (if applicable).

(Thanks to Patrick for sumitting this article. Follow him on twitter @FTFs_SnarkSD)