Score for Score

Posts tagged "Robot Judges"

“I think [the sport of gymnastics] would have more credibility if it had some objective component that the audience can understand. If done right, it would give the audience, media and sponsors a new level of confidence in the accuracy of results.” –Mike Jacky, a former official of the FIG.

The sport of gymnastics is one of the most watched and attended sports at the Olympic Games, but in between those Olympic years it doesn’t get that much attention at all. Can Fujitsu’s robot judges transform four-year fans into all-year-long fans?

The FIG and Fujitsu are collaborating to develop a new technology that will combine human judgment with AI and will revolutionize the sport, as they say. I talked to several (retired) international Belgian elites, their coaches, international judges, fans of gymnastics, and non-fans as well, and asked them about their opinion on the technology. Through this research, I explored the plausibility of three potential ways that automated judging technology could bring fans to the sport:

  1. It could make judging less prone to human bias, giving credibility to a subjectively judged sport.

  2. It could assist athletes in training, allowing them to accomplish greater feats that excite fans.

  3. It could make the sport easier to understand and thus easier to enjoy.

Robot judges could alleviate human bias.

Belgian gymnasts get more points from Dutch judges, and vice versa. Yes, this is what one of the gymnasts I interviewed confessed. The sport of gymnastics has had to deal with several judging scandals in its history, scandals which are often the consequence of human bias.

Humans are subjective creatures and the evaluation of gymnasts can be influenced by factors including nationality, reputation, order of performance, fatigue, and so on. The viewing position of the judges can also play a role. International WAG judge Marleen van Dooren speaks from experience: "At Worlds in Doha I was a D1 judge on floor. When a gymnast performs on the other side of you, she is 15 meters away from us. Then, you don't have sight of her feet.”

But the robots are coming to the rescue. The technology would be used to make the judging process more objective and accurate because that is what the judging now is lacking. We don’t know a lot of details yet about the exact implementation of the technology as it is still being developed and Fujitsu hasn’t really communicated details about its new product. For now, the system will only be used to support the judges for the difficulty score, but how exactly is not known.

The aim of this technology is to make the judging fairer for the gymnasts. During my interviews, gymnasts, coaches, and fans said that they would have more trust in a score given by a technology than by a human judging panel. Former Belgian gymnast Laura Waem says: "I really think that it will enhance the sport by making it more objective and that there will be less discussion like, ‘Is the angle good or not? Is the turn fully completed or not?’ These are things that you can perfectly see with the technology, which I think is great."

They are convinced that it will be more accurate and objective because we are taught that technologies can’t think like humans and thus cannot cheat or be biased. But that is a misconception. Multiple studies have shown that AI can be biased, for the simple reason that AI is trained on human decisions which can involve bias; bias is transmitted from humans to the technology. So we will have to wait and see how exactly Fujitsu will develop and use the technology to determine if it will really help to eliminate bias in judging.

Robot judges could assisting athletes with training.

The impact of robot judges on the sport may go beyond competition: the technology could also be installed in gyms. In the future, gymnasts could train with the exact same technology that will evaluate them at World Championships or the Olympic Games. This would let gymnasts perfectly know how their routines will score internationally, because their score won’t depend on the composition of the judging panel, or whether they compete in an early or a late subdivision.

With the system, gymnasts and coaches could perfectly detect technical as well as execution errors. This means that they could train more efficiently and have a competitive advantage over gymnasts who cannot train with the technology.

This could impact the popularity of the sport. It is known that high-level achievements attract the attention of the media. The use of the technology in the gym could lead to more international achievements which could result in increased popularity of the sport in the country of the athlete. This is what happened with Nina Derwael in Belgium. Since she’s won multiple European and world medals, the sport got more attention in the Belgian media and size of the Flemish gymnastics federation has increased. So Fujitsu’s technology could indirectly have a positive effect on the popularity of the sport.

But let’s be honest: this technology won’t be cheap. This could give a huge advantage to gymnasts who will be able to train with it. It seems likely that big programs who can afford the system will get even better and distance themselves from the smaller programs who can’t afford the technology, and will thus be disadvantaged. Gymnasts that train with the system could potentially achieve better results internationally, worsening existing disparities between gymnasts with resources and those without.

Robot judges could make gymnastics more understandable and enjoyable.

The difficulty of understanding the sport and its scoring system is another reason why the sport isn’t that popular. To help people to better understand the scores in gymnastics, Fujitsu’s technology will be offering TV and online broadcasting content, while also providing display boards and smartphone systems for visitors at competitions.

These additions will offer some visual explanations of the score. The content could show how much deductions were taken off for certain skills, but also how high a gymnast went, how many degrees it under- or overrated a twisting skill, and so on. These features could make it more understandable and interesting to watch the sport. Belgian head coach Marjorie Heuls explains: "I think it's really a great thing for the audience, it can really give back some interest to our sport because it's always difficult to understand for people. And with this it will open new perspectives. The people may get more interested in the fact that they better understand the discipline and the judging. And also for the TV channels, it will be a sport that is more measurable so I think it will create more enthusiasm to broadcast it on television with this system."

This possible outcome of robot judges is a good one. The media rarely uses effective graphics that help explain the sport, and any new effort would help.

Robots alone aren’t the ultimate solution.

But as a lot of gymnasts, coaches, judges, and fans told me, if we really want to increase the popularity of the sport, more things will be needed than only this new technology. This is especially true for smaller gymnastics countries such as Belgium where they struggle to attract spectators to competitions.

First of all, more advertising and communication about gymnastics events would help. Competitions should also be made more attractive. In Belgium for example, the spectators have the perception that they have to be quiet, which makes that it is always very quiet and quite boring to go to a competition. A good speaker that interacts with the audience, some entertaining music, and exciting lights could make a competition more enjoyable to watch. Ideas like this would do a lot to make the sport more attractive by enhancing the viewing experience and draw attention to gymnastics events – with no robots necessary.

Fujitsu’s technology is certainly a step in the right direction, but robot judges alone will not make gymnastics a popular sport worldwide.

Editor's Note: Got an idea for a guest post for the blog? E-mail me at!

Tags: Robot Judges

The FIG has stated repeatedly that robot judges will be used in some capacity on six events at the 2020 Tokyo Olympics — and were already in use at Worlds last year. While this modest rollout falls short of some earlier rhetoric surrounding the robot judging technology, it is still wrong to subject gymnasts to judgements by any technology that has not been vetted by anyone outside Fujitsu or the FIG.

I've written previously about why robots are not necessarily less biased than humans. However, there is still an aura of objectivity that surrounds talk of the robot judges — they use lasers and science, so how is it even possible for them to be wrong? With robot judges so close to reality, I thought it was time to have a more concrete discussion about exactly how a system like Fujitsu’s could observe a skill and make an incorrect judgement.

In short, any system that uses “artificial intelligence” makes predictions about reality based on limited inputs. The quality of these predictions depends on 1) the quality of the inputs, and 2) the process that the computer uses to analyze those inputs. And because no set of inputs can perfectly record reality, there is some uncertainty in all judgements made by artificial intelligence.

While the robot judges will almost certainly use inputs that are more reliable than the human eye, we cannot be so sure about the process they will use to analyze those inputs. Information on the technology behind the robot judges is hard to come by, though I’ve done my best to find what’s out there. A machine learning model is used for the most important step of the process: taking the input visual information and deciding which points in space represent each part of the gymnast’s body. At the end of the day, the machine is making an informed guess about the location of the gymnast’s hips, knees, ankles, etc. Those location points mean the difference between gorgeous extension and sloppy bent knees. The system’s judgements come from uncertain predictions and not an objective measurement of reality.

On the bright side, it is almost always possible to quantify the uncertainty inherent to machine learning. Most types of models, including neural networks, can be built to predict the probability of each possible outcome rather than simply predicting the most likely outcome. Roughly speaking, Fujitsu could structure its neural network to say, “I’m 80% sure that the gymnast’s knee is in this particular point in space, but there’s also a 10% chance it could be a little further right and a 10% chance it’s a bit further down.” Down the line, such numbers could be used to calculate how sure the robot is that a gymnast’s knees were bent on a particular skill. This information would be invaluable in a setting where robot judges and human judges are working together in concert. We want to be able to make informed decisions about when to value a human’s perspective over a machine’s, and vice versa.

Now, it’s worth noting that Fujitsu is using particular type of machine learning — a deep neural network — that is very good at processing visual information. Neural network are a class of machine learning models that mimic the structure of the human brain: each neuron takes in a signal, processes it, and passes a signal to other neurons. Similarly, each layer of a neural network takes in some input data, processes it, and then passes it on to another layer. When a neural network has many such layers connected in complex ways, we call it a "deep learning" model. Such models are currently the state of the art for most image processing tasks; for example, they help driverless cars identify the salient parts of their surroundings. But despite the hype, deep neural networks are still ultimately making uncertain predictions, like any other machine learning model.

Furthermore, deep neural networks are notoriously opaque. Unlike a human, a neural network cannot explain why it arrived at any given conclusions. While there are now new systems that attempt to explain neural networks, applying these systems usually requires access to the model itself. At the very least, such systems require the ability to provide many inputs to the model and receive many outputs — even if the model itself remains off limits. Unfortunately, none of the AI components of the Fujitsu system can be vetted by the public. This is disappointing but not terribly surprising: Fujitsu’s neural network is valuable intellectual property, and anything they release will make it easier for other companies to develop competing technology. This creates a common problem in the field of artificial intelligence: the non-experts commission experts to develop complicated machine learning models, but, because they are non-experts, do no not have the means to evaluate how well the experts have done. They are taken in by hot buzzwords and slick demonstrations.

I am not writing off robot judges as an impossible task, or a bad idea that should be abandoned. I would like nothing more than for this technology to work as well as promised the first time around. There is no reason to believe it won’t.

But there’s also no reason to believe it will.

It is the FIG’s responsibility to make the testing process public when the only things at stake are the FIG’s reputation and Fujitsu’s bottom line - not Olympic medals and gymnasts’ dreams.

Tags: Robot Judges

After a long and winding string of Google searches, I’ve finally tracked down some actual information about Fujitsu’s robot technology. In particular, I found this paper and this patent to be quite informative.

I hope you’ll join me for a few minutes to try to understand how this technology works. My expertise lies in machine learning, not in sensing technology, but I’m drawing on the advice of helpful experts (especially @babybluecollar) to explain what’s going on.

Broadly speaking, we can divide the system into two parts: the part that gathers data on what each gymnast is doing, and the part that analyzes that data.

Gathering Data

Fujitsu uses a customized Light Detection and Ranging (LiDAR) system to gather information about the position of a gymnast’s body in three-dimensional space. LiDAR shoots out a light wave — that is, a laser — and measures how long it takes for that light wave to come back. The longer it takes, the further away the object. Each laser returns three coordinates (x, y, and z) that together tell us the position of the object that the laser bounced off of.

The main reason Fujitsu uses LiDAR as opposed to, say, a depth camera is that LiDAR can measure objects up to fifteen meters away — meaning the equipment can sit out of the way of the gymnast. However, basic LIDAR systems aren’t really adequate for the purpose of measuring fast-moving athletes at a high resolution, so Fujitsu had to make some modifications. The system is diagrammed in the image below.

Source: Sakai, Masui, Tezuka 2018

First, to collect a lot of information in a short period of time, they bounce the lasers off of a micro electro mechanical scanning (MEMS) mirror that keeps changing position. This means that the laser beams are sent out at a variety of angles, returning with data on a wider area. This is a pretty common practice — for example, most autonomous vehicles use it.

Next, Fujitsu needs to ensure that the sensing system works both for nearby athletes and far away athletes. To do this, they can optionally use a wide-angle lens to receive the returning light beams and focus them in on the receptor when the athlete is far away. It seems like Fujitsu staff need to manually decide whether to use a wide-angle lens ahead of time, which might explain why we’re only seeing this system used on events like rings, vault, and pommel horse where all the gymnastics happens in one spot.

However, the wide-angle lens makes the system more susceptible to ambient light in the arena - because the lens collects more light, it gets more noise along with the signal. To solve this issue, they discount all light that gets to the receptor except for light on the single cell that has the highest concentration of light from the laser. While I don’t know for sure how Fujitsu distinguishes the laser’s light from other ambient light, there are distinctive ways to modulate light waves that could be used.

Finally, Fujitsu uses multiple LiDAR systems that all provide different viewpoints of the gymnast. I get the sense that carefully positioning these instruments in a busy competition arena has thus far proved to be a challenge that requires multiple Fujitsu experts on hand; we’re a long way from the day when this technology can be easily disseminated to clubs and governing bodies.

Analyzing the Data

From the LiDAR system, the computer receives a set of 3D coordinates from multiple viewpoints. The next step is essentially a series of automated steps to clean up the data.

First, each viewpoint image is subject to thresholding. This in essence divides up the 3D space into pixels and determines whether each pixel of the image is or isn’t part of the gymnast. Next, the image is thinned, meaning that the outer pixels are removed to get a minimal outline of the shape. There are numerous ways to do this, but you can read about one such algorithm here if you’re interested.

Hypothetical image after thresholding Hypothetical image after thinning
Source: US Patent Application Publication No. US 2019 / 0066327

This thinned image is used to identify the location of specific points on the gymnast’s skeleton. Pretty much everything Fujitsu is concerned with right now - skill identification and basic technique - can be determined from a basic skeleton rather than a fully fleshed-out image of the gymnast’s body. In particular, they focus identifying on 18 points on the skeleton: head, neck, spine, pelvis, shoulders, elbows, wrists, ankles, knees, and toes. (They refer to these points as joints, so I’ll use that term going forward, but the head is not a joint and I stand by that.)

To identify the joints from the thinned image, a machine learning model is first used to classify each pixel as a belonging to a specific body part. (The patent mentions using a random forest at this step, but other more recent documents say deep learning was used.) Then, for the body parts of interest (knee, shoulder, etc.), the joint is pinpointed at the center of gravity of all the points belonging to that body part. This is diagramed below.

Source: US Patent Application Publication No. US 2019 / 0066327

(Does anyone else find that diagram haunting? Just me? Okay.)

All of the preceding steps are performed separately on each viewpoint — that is, each individual LiDAR system’s output. Here, for the first time, all the viewpoints are brought together and fed into a supervised machine learning model that determines the “posture” of the gymnast. The gymnast’s posture falls into one of a very few broad categories; the examples Fujitsu gives are forward-facing, rear-facing, and handstand. These categories seem neither comprehensive nor mutually exclusive to me, but it’s possible that the real system uses a different taxonomy of postures.

Finally, the system predicts the final shape of the skeleton using a skeleton recognition model that has been trained specifically for images of gymnasts in that posture; there is a separate trained model for forward-facing gymnasts, gymnasts in handstands, etc. This use of a posture model before the final skeleton is output is one of Fujitsu’s innovations to improve the accuracy of their machine learning model — they claim that it limits the range of motions that each individual skeleton recognition model is required to recognize and thus improves accuracy.

Each skeleton recognition model takes as inputs the joints identified in each viewpoint from each individual LiDAR. To determine where the left knee is, the model treats the left knee joints from all the viewpoints as a “point cloud” around the true left knee. I couldn’t find many details about this model, but I believe it uses some form of maximum likelihood estimation with hard-coded constraints on what a skeleton looks like. In other words, the algorithm is trying to find the most likely joints given a certain point cloud of possible joints and some basic information about skeletons.

Finally, cylinders are fitted between the final joints to fill in the rest of the skeleton. You can get a better sense of this whole process from the diagram below.

Source: Sakai, Masui, Tezuka 2018

There is very little information on the training data used for the machine learning models involved in this process. The one sentence I could find implies that Fujitsu created training data from scratch and labeled the joints by hand. Information on this training data will be crucial to understanding any biases baked into the models.

It will also be crucial to understand what metrics were used to evaluate the performance of each model. One of the engineers working on the project said that “the new system’s target error is less than ±1 centimeter,” which I presume refers to the distance between the predicted joint location and the actual joint location. However, we don’t know how close they are to meeting that target, or how the system’s performance varies across different types of skills and different types of gymnasts.

Application to Gymnastics

Now, all that Fujitsu has done so far is created a crude model of the gymnast’s skeleton. To actually make use of this information in a gymnastics setting, they compare the movement of the predicted skeleton to a set of rules about gymnastics skills that have been coded up by hand.

This explains how the system will be able to recognize new skills and rare skills. The system doesn’t need to learn what an iron cross is by looking at hundreds of examples of iron crosses; instead, someone just typed in the joint positions that define an iron cross.

More broadly, this system has clearly been built with an eye for growth opportunities. Fujitsu is right: the foundation for all future technology in this field will be the ability to quickly and accurately identifying the basic skeleton of an athlete in motion. This is true for gymnastics and beyond.

Some Reflections

This is my understanding of the Fujitsu system based on the limited information available online. There’s every chance that I’m totally wrong, or that I’m leaving out something important. At the very least, I’ve almost certainly over-simplified the process.

But it’s better than nothing.

After combing through this process, I’ve gained a renewed appreciation for just how hard this task is. This information has done nothing to allay my concerns about the speed with which the FIG is rolling out Fujitsu’s work. I understand why they are only using the technology in a limited capacity on a small subset of events: it’s nowhere near ready to take on anything more.

There’s one thing that Fujitsu and the FIG could do to make me less concerned: release the results from the tests that are currently happening and will continue to happen at Worlds. It’s their responsibility to prove that the technology is ready. The information available so far isn’t enough.

Tags: Robot Judges

A few recent articles seem to suggest that Fujitsu's "robot judges" will be used in competition much sooner than we expected. If trials at an upcoming (unspecified) World Cup are successful, we might even see the technology at 2019 Worlds.

The best argument for the new judging technology is that robots will be less biased than human judges. Let me say this as clearly as I can: it is a mistake to assume that computers are less biased than humans. I'm a data scientist as well as a gymnerd, and I hope I can bring some lessons from one world to the other.

First, machines are only as unbiased as the data you train them on. Let's say that the engineers at Fujitsu have set themselves the goal of getting the robot judge to take deductions whenever a human judge would. This seems like an obvious goal. However, this strategy would lead to robot judges that are exactly as biased as human judges. This issue has come up time and time again in other contexts. For example, Amazon tried to train an algorithm to look at resumes the same way its recruiters had in the past. Because its recruiters were biased, the algorithm learned to throw out women's resumes.

Second, bias in a machine is often harder to spot than bias in a human. We assume that humans are biased until they've proven themselves objective. At the same time, we assume that machines are objective until proven biased. I like to think that the gymternet writ large is approaching the introduction of robot judges with a healthy dose of skepticism. However, I would be surprised if that skepticism extends to the upper ranks of the FIG and to those working directly with Fujitsu. As far as I can tell, the current priority seems to be to push out the technology in time for Tokyo, perhaps as part of Japan's larger effort to show off its robot technology when it hosts the Olympics. This is exactly the sort of situation in which I would expect to find individuals who are overly willing to believe that a new technology has all the answers.

Third, any bias in a machine scales more easily than bias in a human judge. There are definitely a handful of human judges out there who could be described as bad eggs: they under- or over-score a certain type of "look" whenever they see it. But each one of those people is only judging a single event at a few meets each year. If Fujitsu's technology is rolled out at every FIG competition, gymnasts who don't score as well under that technology will be at a disadvantage every single time they hit the floor. An entire system of slightly biased robot judge technology is way, way scarier than the most biased human judge out there.

So, here are three questions I'll be asking as this technology rolls out:

  1. What level of performance have these machines achieved? What metrics were used to measure that level of performance? In order to test out the robot judge technology, Fujitsu must have a way of determining whether the robot's judgement was right or wrong. Did they compare the machine's deductions to those of a human judge? If so, we run the risk of having robot judges that simply replicate human biases as described above. Or did it use some other method? If so, what was it?

    Whatever metric the company is using, it's probably safe to assume that the machine isn't 100% accurate - it would be mind-blowing if it were. The engineers have therefore likely had to decide what type of error the machine is optimized for. More specifically, such machines can usually be optimized to avoid either false positives - cases where the robot takes a deduction when the skill was actually fine - or false negatives - cases where the gymnast should have gotten a deduction but the robot missed it. The choice between these options should inform the relationship between human judges and robot judges. For example, it would be wrong to mandate that the final score must include all deductions caught by the robot judge if the machine is optimized to avoid false negatives.

  2. What gymnasts were these machines tested and calibrated on? And, in particular, how diverse were they? Machines tested at a handful of Japanese club gyms might not work as well on the the full array of skin colors, body types and skill levels that we see on the world stage.

    Examples abound of cases where algorithms trained on one population fail when applied to a more diverse population. For example, facial recognition software that was trained largely on pictures of white faces performs poorly when trying to identify even the gender of black faces. It's easy to imagine that a 180-degree split might look different on someone with bulkier leg muscles, or a flexed foot might be easier to see on someone with dark skin.

    But there might also be issues beyond physical differences. I want to know how the technology generalizes from skills it has seen before to skills it hasn't seen before. Let's think for example about a machine measuring the height a gymnast gets on her bars release. How many Nabievas do you think Fujitsu could test its technology on? Do you think the technology will be ready when it has to record Sunisa Lee's amplitude on that skill for the first time in Tokyo?

  3. Has anyone outside of Fujitsu evaluated the underlying technology? It's extremely common for private companies to hide their algorithms from the public behind a shield of intellectual property law. At times, they may not even share the algorithms with customers like the FIG who are paying to license the technology. There are good reasons for this - no one would ever invest in product development if they couldn't profit from the results - but it would be downright irresponsible of the FIG to unleash this technology without having it vetted by someone who will not directly profit from its adoption.

    And even the FIG does have access, we'll know that the algorithm hasn't been properly vetted if we can't identify anyone at the FIG with the technical skills to do the vetting. If the people who have okay'd the technology are all members of the technical committee and there's not a single engineer, I won't be satisfied.

    To be clear, looking at a trial and confirming that the results cohere with what a human judge might find is not the same as picking through the underlying hardware and software. We need to check the process that the robot judge goes through, not simply the results of that process. Gymnasts deserve better than to be judged by a black box.

Of course, these questions just begin to scratch the surface. I'll write another post sometime about the tension in gymnastics scoring between the things that can be objectively judged and the things that cannot -- I think robot judges really helps us get to the heart of the matter there. But I do want to make sure these bigger picture issues don't get lost in the rush to make robot judges a reality by Stuttgart 2019.

I'm thrilled by the potential of robot judges, but I haven't seen enough information to convince me that this is being done right. I've had a hard time finding information on the Fujitsu technology or how the FIG plans to use it. So if you've seen any documentation, please, please comment below or send me an e-mail at Thanks guys!

Tags: Olympic Talk, Robot Judges