The Limits of Observation

Updated: January 11, 2018 at 3:07 am by Kent Wilson

KHQEKCKQOX

(This article was originally published at The Score almost 5 years ago and has since been dustbinned. It is re-published here for posterity) 

An enduring debate in hockey
analysis circles currently centers around “observation” versus “stats”. Like
most arguments that enter the public domain, the debate has become polarized to
such a degree that the dichotomy presented is an utterly false one. 

The truth
is, rather than “observation versus stats”, the actual debate is over
traditional analysis (a mix of observation, counting numbers and conventional
perception about what wins hockey games) and co called “advanced” analysis
(which foregrounds testing observation and perceptions with statistical
methods). 

Advocates of the former more or less subscribe to a “I know what I
like and I like what I see” method of player and team evaluation, while the
latter couch performance in a framework of norms, means, percentages and rates
in order to strip, as best as possible, subjectivity from the analysis. For the
purposes of this piece, I’ll focus on explaining this latter viewpoint: why
observation alone is inadequate for evaluation.

Observation is the most
data-rich method of gathering information. It is also the one most fraught with
error. Unfortunately, the human mind isn’t the most accurate or reliable
instrument when it comes to collecting and interpreting data: attention,
perception and recall can all be skewed, biased or influenced in a multitude of
ways, potentially distorting the signal and sullying the final analysis. 

As
such, observation is a necessary but not sufficient element of hockey
analysis. Alone and untested against measurements of performance, even the most
ardent or experienced hockey fan or pundit can be led astray by his or her
observations.

Cognition is Conservative

In general, humans tend to
fight for that which they already know. That is, instead of re-ordering our
views of the world or even minor aspects of it whenever we’re
presented with a new piece of info, people tend to either shoe-horn the new
data into already established frameworks or act to dismiss the information as either
irrelevant or incorrect. While this tendency sounds maladaptive, it actually
serves a valuable purpose: imagine, for example, having to continually
re-assess or re-evaluate your entire lexicon of knowledge whenever a potentially
conflicting bit of info is presented. The world would dissolve into
incoherence.

Of course, conservatism comes
with it’s obvious costs as well. By establishing and protecting assumptions and
cognitive rules-of-thumb (heretofore to be referred to as heuristics),
the human mind finds ways to distort or ignore new information, thereby skewing
perspectives and offering an incomplete or incorrect view of reality.

Perhaps the best known
heuristic amongst sports fans is
confirmation bias, defined as the tendency to seek to confirm original beliefs or
theories. Even when information is incomplete or sullied by confounding
factors, people tend to generate hypotheses about its meaning and then try to
confirm these original guesses thereafter. For example, an Oilers fan sees a 30
second clip of an Edmonton prospect on youtube. He has no other real
information on the player initially, but the clip shows the prospect flying
through the opposition and scoring a highlight reel goal. The perception in the
Oilers fan’s mind is now established: Prospect X is good.

Afterwards, the fan
will be motivated to seek out and foreground data that confirms this belief
(other highlights, performances at training camp, in drills, etc.) and
disregard data that conflicts with this belief (poor counting numbers in a
prior season, a bad shift during a game, etc.). Of course, the more info the
fan finds to confirm his perception, the more ingrained it will become as a
fact in his or her mind and, therefore, the more resistant the fan will become to
conflicting input. By actively finding information consistent with the
perception (while rejecting data inconsistent with the belief), the
confirmation bias acts as a sort of psychological positive feedback loop over
time.

This is helpful when the
hypothesis or perception is an accurate one, but hinders decision making when
it’s not. Of course, confirmation bias is often employed to defuse
cognitive
dissonance, 
a feeling of discomfort when two
conflicting thoughts are held simultaneously, particularly when people are
emotionally oriented towards a given perception or outcome. Visit any NHL
team-focused messageboard, for example, and casually suggest that one the
club’s favored players isn’t as good as he’s generally perceived to be by the
fan base. Then sit back and watch the stirred hornets nest.

Even if one were to include a
host of defensible facts or stats, there would be little chance of swaying
public opinion or assuaging the resultant tidal wave of abuse. Not only are
fans motivated to confirm their beliefs (like everyone else), they’re motivated
to construct their beliefs in a fashion that accords with their experiences and
established self-identity as a fan of Team X. Positive associations based on
emotional experiences can powerfully influence the construction of a perception
and, therefore, the maintenance of a bias. And sports fans aren’t the only
folks subject to these perceptual machinations: consider how often NHL GM’s
re-sign players to an obviously inflated salary in the wake of an emotional or
improbable playoff run.

There are other flavors of
mental shortcuts beyond the confirmation bias. The
availability
heuristic
, for example, refers to the tendency
to base a judgement on how easy it is to bring specific examples of something
to mind. The pitfalls of the availability heuristic is that sometimes what is
easily recalled is not necessarily accurate or representative of the object or
person in question. 

This can be observed in hockey analysis frequently. For example, a
player who appears on a lot of post-game highlight reels is often considered to
be good. If a guy makes high-impact, highly memorable plays during a game, his
performance will be easily recalled (this can work in both good and bad
directions). This can result in accurate performance evaluation if the players
game is more or less congruent with the exciting or obvious plays. If it isn’t,
however, the availability heuristic will work to skew the observers perceptions
in the direction of the “highlights” and away from the mean level of his
performance in general.

The halo effect can be a result of the availability heuristic. In hockey, this
generally applies to players whose performance markers are a little more
subtle. The halo effect is a bias in which a general impression of a person affects
inferences about future expectations. This tendency is especially obvious when
it comes to players who play or act in a pleasant or likable manner, either on
the ice (always works hard! Sticks up for teammates!) or with the media/fans
(funny, easily approachable, etc.). 

In the NHL, these are typically the players
who consistently fail to produce worthwhile results but tend to stick around
year after year due to some perceived quality that both fans and hockey
decision makers tend to value. These acquisitions are usually rationalized in
terms of “intangibles” – character, leadership, work ethic – and the guys are
valued even when their tangible contributions don’t tend to actually help the
team win.

Similarly, evaluations
frequently suffer from
illusory correlation, which is the tendency to perceive a relationship where none
exists. The human brain is excellent at identifying (and sometimes fabricating)
patterns, but lousy at identifying true relationships within those patterns. In
hockey analysis, for instance, it’s simple to fall prey to the
post hoc
ergo propter hoc
fallacy, which is
the tendency to assume that since one event follows another, the latter was
necessarily caused by the former. 

Of course, the sequence of events can be
entirely coincidental rather than causal. In what can be referred to as
“building a narrative”, fans will observe a sequence of events and codify the
entirety into a typical story structure featuring heroes, villains, rising
action, a climax, etc. The narrative lends structure and meaning to observed
events, but often substitutes archetypes and assumptions for analysis and causal understanding.

An illustration

The New York Rangers and
Philadelphia Flyers are tied in the third period of a hockey game. The two
fourth lines for each team take the ice and the enforcers decide it’s time to
dance. Derek Boogaard wrestles with Jody Shelley for a minute and half before
scoring a devastating left hook and a decisive victory. The fans cheer, the
players bang their sticks on the boards and the goons go to the penalty box. 

Three shifts later, a puck skips over Prongers stick, Gaborik skates in alone
and scores the game winning goal. After the game, pundits and teammates alike
talk about the Boogaard fight as the decisive momentum swing and a causal agent
in the Gaborik goal (and, ultimately, the Ranger victory). The relationship
between the Boogaard fight and Gaborik goal is perceived to be causal due to
the sequencing of events and resultant story that can be told afterwards.

And thus, fans watching the
game might be tempted to come to that conclusion. However, the question
remains: was Boogaard truly a causal agent in the Gaborik tally? Or did the
fight merely precede the goal? One sequence of events similar to the ones
described doesn’t prove anything either way, but it does work to plant the idea
in the minds of players, coaches, GM’s and fans alike who observed it. In the
future, when a Boogaard fight doesn’t result in any positive events, all the
heuristics described previously will likely come into play for the observers in
question – the confirmation bias will either cause them to ignore or downplay
the competing evidence that perhaps fights don’t lead to game-changing
momentum swings. 

At the same time, the availability heuristic will cause them
to recall previous instances where an obvious Boogaard victory preceded a
favorable result while the halo effect may cause fans and management alike to
view him in a favorable light despite his other various faults as a hockey
player. I’m not sure this completely explains why Glen Sather would choose to
pay Boogaard $6.5 million over four years, but it probably comes close.

Attention and Memory

Much of the above could applied
to both observation and statistical analysis: after all, fans frequently cherry
pick counting numbers to justify certain evaluations of players or teams.
Confirmation bias doesn’t only apply to watching and mentally evaluating
the sport, for instance. Heuristics can be marshaled to skew a perspective in
both areas of analysis.

However, observation tends to
be more susceptible to such cognitive short-cuts because the act of encoding
information during something as fast paced and emotionally involved as a hockey
game is rife with other limitations. Consider that attention is in fact
volitional and is both directed and framed by preconceptions and expectations.
Meaning, an observer doesn’t passively absorb everything that’s happening
during a hockey game: his or her attention is directed like a spotlight at
certain aspects or events. What the observer focuses on is dependent on
idiosyncratic factors: who is she cheering for? What players does she like? Not
like? What’s happening in the game? Where are her eyes focused on the ice? And
so forth. 

Of course, the ability to concentrate our attention on certain
stimuli is actually an adaptive response (having to attend to all input, all
the time would make for a highly chaotic, disruptive world), but it also means
that some data is necessarily lost when it’s being encoded. What data (and how
much) is, again, dependent on the individual variables described above as well
as filters such as the heuristics listed in the previous section. This is what
gives rise to subjectivity in observations: the difference in
expectations, perceptions, attentional faculties and biases in each observer
often results in competing interpretations of the same event(s).

Recall of events is similarly
problematic.
Human memory is re-constructive in nature. Rather than a simple recording of events, data is not only
filtered during the encoding phase (observation) but is also subject to
revision during recall. In fact, it’s been proven that memories of an event can
be altered after the fact by something as simple as the wording used to
describe it. For example:

1.“In the third period of the game last night, Robyn Regehr crushed
Ryan Kelser at the Flames blueline.”

2.“In the third period of the game last night, Robyn Regehr checked
Ryan Kesler at the Flames blueline.”

If you were to read the first
description of the play in question, what are the chances you’d be more likely
to remember the Regehr bodycheck as a game changing ‘BIG HIT’ later? According
to studies done this area of human memory, the chances are very good. 

In 1974,
Elizabeth Loftus found that after viewing a car crash sequence, future recall
by participants could be influenced by the wording of the crash description.
For example, the inclusion of the verb “smashed” in the description instead of
“bumped” or “hit” meant that participants were more likely to answer “yes” to
the question “did you see any broken glass?” one week later (even though there
was no broken glass in the film). Participants who read the “smashed”
description were also more likely to estimate higher speeds for the cars in the
video. So not only did a simple verb change increase the perceived severity of
the crash, when combined with a leading question (“did you see any broken
glass?”) it was able to ferment a wholly new aspect to the memory itself.

The Constraints of Time and the
Perception of Means

The issues mentioned are an
indicative but not exhaustive survey of the human mind’s inherent ability to
skew the intake, recall and interpretation of data. There are ways to combat
observational biases, obviously: highly experienced viewers, assuming some
level of competence, can work to separate personal preferences and emotional
attachments in favor of what is relevant. A seasoned scout versus a casual fan,
for instance. 

Strict definitions for observable behavior can also cut back on
subjectivity. The amateur scoring chance counters during the past season agreed
on a specific definition of “scoring chances” in order to preserve a reasonable
level of agreement across counters. In addition, merely being cognizant of the
above issues can help an observer to actively resist their effects. That said,
no one can be perfectly rational all the time, which is why the scientific
method of double blind trials (and the replication of results) is considered
the best process available for discovering the truth or falsehood of a
hypothesis. Perception and observation alone are considered inadequate.

However, even if one were able
to enforce an inhuman level of discipline in an observer, the limits of
observation extend beyond perceptual filters, biases and memory
re-construction. For example, human’s are inherently innumerate. 

Patterns and
generalities are easy for the brain. What’s difficult is effectively perceiving
things like rates, means, medians and variance in behaviors or a performance
measure within a given population. For example, imagine a perfectly rational
hockey scout: a man who had basically conquered all the previous biases issues
discussed earlier. He’s able to detach personal investment, resist confirmation
bias, defuse other mental heuristics and observe hockey with an informed but
impartial eye. 

Now, such a scout would seem theoretically invaluable and his
employer would likely send him to observe as many junior games as possible.
Let’s say 200 games per season. For the purposes of this thought experiment,
let’s also assume that the scout is completely qualitative in his performance
evaluations, meaning he doesn’t bother to record or look up stuff like point
totals, plus/minus, shot rates etc.

His value at the end of the
season would actually be fairly limited. His sense of the prospects and their
performances would be sketched in general terms like: some, many, few, often,
usually, a lot and rarely. It would be impossible to determine the validity and
degree of these impressions. He would have almost no way to contextualize each
prospects performance over the season. 

The distribution and variance of the
junior population for each performance measure would be a total mystery to our
hypothetical scout. He might have some general awareness of approximate totals, but
he’d never be able to judge rates and norms without gathering the requisite
information and conducting calculations. The human brain just doesn’t do this
naturally, especially over a large number of events and/or a long period of
time. So even if he could observe a prospects performance through a completely
rational framework, he’d be unable to properly couch the prospect’s level of
performance in the context of his peers without the use of quantitative
methods.

A related, non-trivial problem
with observation is the high time and resource cost associated. Watching a
single hockey game is a two-to-four hour investment, minimum. Watching 200
games, like the perfectly rational scout envisioned above, would represent an
entire sixth month season of attending multiple rinks in different cities for 3
hours at a time. Even then, assuming his interest is in five different
prospects, he’d only observe about 40 games for each. 

That’s just over half a
season of work for a junior in the CHL. In statistical terms, that’s a
relatively small sample to work with: a ten game scoring streak or drought
could completely skew a player’s results (and the resultant perception of his
abilities) for instance. If one considers 30 teams and hundreds of players at
the NHL level, the issue grows by several orders of magnitude: it would take
one observer 7,380 hours – or about 308 straight days – to watch every single
NHL game during a typical season. Not including playoffs. That’s obviously just
a straight forward viewing of each contest. The inclusion of replays or the
close inspection of certain sequences would vastly increase the time and effort
commitment.

The result for fans and pundits
committed to the “saw him good” school of analysis is that their impression of
a player or team is mostly made up of a relatively limited number of viewings.
Particularly of players/teams they don’t habitually follow, meaning they’ll
fall prey to the issue of
small sample size and the variance/luck confounders that accompany it. 

In
addition, their analysis will be cluttered with biases and skewed information: perceptually
obvious plays and “highlight reel” events, high impact plays at critical
moments (ie; “clutch”), as well as easily recalled (but potentially altered)
memories that may not on even be truly indicative of overall impact or
performance of the entity in question. They will also lack important
referential data, such as rates and means, which place an individual team or
players results in the proper context from which accurate conclusions can be
made.

As mentioned at the onset,
observation is the most primary, most data-rich source of information. It is
also beset by psychological pitfalls and other limitations. While it is widely
considered folly in conventional hockey analysis circles to only consider stats
absent observation, the opposite is equally true: observation absent stats can
leave one hopelessly lost amongst one’s own preconceptions and assumptions.

**This post would not have been
possible without the excellent Elliot Aronson textbook
“The Social Animal”.