Score for Score

From the Academy: Judging Bias at the 1984 Olympics

From the Academy: Judging Bias at the 1984 Olympics

by Brina May 28, 2020

Today’s article is a piece of history as much as a piece of academic research. In 1988, Ansorge and Sheer explored the extent to which gymnastics judges are nationally biased in a paper entitled "International Bias Detected in Judging Gymnastic Competition at the 1984 Olympic Games."

The 1980s were the height of corruption in gymnastics judging. The FIG turned a blind eye to numerous allegations of score fixing and shady backroom deals. But the authors of this study are gymnerds after my own heart: they weren’t convinced by the anecdotal evidence and they wanted to see the numbers. In their words, "In spite of the rhetoric, however, the existence of international bias in gymnastics remains unsubstantiated by scientific inquiry."

The Data

This study uses scores from compulsories and optionals during both the men’s and women’s team competitions in Los Angeles. Four judges for each event were selected randomly from a pool. For men, there was a pool of 26 judges including two from each country that had qualified a team to the Olympics. For women, there was a pool of 16 with one from the country of each qualifying team.

The authors take advantage of this judging arrangement. Specifically, they can look for cases where a judge is evaluating a gymnast from her own country, and compare that judge’s scores to the scores of the other three judges on that event.

The Questions

The authors are interested two separate questions:

  1. Are judges biased towards gymnasts from their own country? Specifically, when a judge evaluates gymnasts from her own country, is her score higher than the average score given by the other three judges?

  2. Are judges biased against gymnasts from countries that are in direct competition with their own country? For each team competing, the team ranked directly above and below that team can be seen as its as direct competitors. When a judge evaluates gymnasts directly competiting with her own country, is her score lower than the average score given by the other three judges?

The Results

To answer these questions, the authors count the number of cases where a judge’s score is above, below, or equal to the average of the other three judges’ score in a relevant scenario.

They then apply a sign test to these numbers. Intuitively, for the first question, this test estimates how likely we’d be to see the results observed in Los Angeles if judges are equally inclined to score a gymnast from their home countries higher or lower than the rest of the judges’ scores. Similarly, for the second question, it estimates how likely the real results are if judges really aren’t biased against their country’s direct competition. (It’s worth noting that the data doesn’t really satisfy the assumptions that this test requires — the scores for different routines aren’t independent — but that’s never stopped social scientists before.)

In both cases, there was strong evidence for bias. These results are in the table below.

Judges were statistically significantly more likely to overscore gymnasts from their own country, and significant more likely to underscore gymnasts from their country’s competitors. The evidence is right there.

The Implications

In my fantasies, this sort of research immediately drew the FIG’s attention and spurred the long and ongoing fight for fairer scoring forward. In reality, I have no idea if anyone took note. One of the authors contributed a long article on scoring in men’s gymnastics to a 1991 issue of Technique, the USGF’s official magazine, highlighting national bias among other issues — so at least someone in the gymnastics world must have cared.

This study is so compelling for its simplicity. The authors asked simple questions that can be directly answered by the data. They had access to numbers from a high-stakes international competition, lending extra weight to their findings. And they didn’t make outsized claims about the implications of their results. Of course, there are numerous sources of bias that we can’t examine using this simple method of comparing one judge to the others — including the sort of handshake deals in which gymnasts’ scores were determined before a single skill was even performed. But because national bias is an important issue in and of itself, none of that matters for the purposes of this study.

While this paper is an interesting piece of history, I have to wonder whether the same levels of national bias exist today. This has actually has been studied using slightly different methods (here), and the short answer is that bias still seems to exist. But if you want to hear my full thoughts on the study, you’ll have to wait for another edition of From the Academy!

Tags: From the Academy