You Look Like a Thing and I Love You

Page 13

by Janelle Shane

—@DougBlank1

I once used an evolutionary algorithm to evolve a unicycle control law. Fitness function was “duration the seat keeps a positive z-coordinate.” The EA worked out that if it banged the wheel into the floor *just so*, the collision system would send it into the heavens!

—@NickStenning2

In movies like The Matrix, superintelligent AIs build incredibly rich, detailed simulations where humans live their lives, never knowing that their world isn’t real. In real life, though (at least as far as we know), it’s humans who build simulations for AIs. Remember from chapter 2 that AIs are very slow learners, needing years’ or even centuries’ worth of practice at playing chess or riding bicycles or playing computer games. We don’t have time to let them learn by playing against real people (or enough bicycles to let an inept AI rider bust them all up), so we build simulations for the AIs to practice in instead. In a simulation, we can speed up time or train lots of AIs in parallel on the same problem. This is the same phenomenon that leads researchers to train AIs to play computer games. There’s no need to build the complex physics of a simulation if you can use the premade simulation of Super Mario Bros.

But the problem with simulations is that they have to take shortcuts. Computers just can’t simulate a room down to every atom, a beam of light down to every photon, or years of time down to the shortest picosecond. So walls are perfectly smooth, time is coarsely granular, and certain laws of physics are replaced with nearly equivalent hacks. The AIs learn in a Matrix that we created for them—and the Matrix is flawed.

Most of the time, the flaws in the Matrix don’t matter. So what if the bicycle is learning to drive on pavement that stretches infinitely in all directions? The curvature of the planet and the economics of infinite asphalt aren’t things that matter for the task at hand. But sometimes AIs end up discovering unexpected ways to exploit the flaws in the Matrix—for free energy, superpowers, or glitchy shortcuts that only exist in their simulated world.

Remember the silly walks from chapter 5: AIs given the task of moving their humanoid robot bodies across the landscape ended up with weird tilted postures or even extreme somersaulting gaits. These silly walks worked because inside the simulation the AIs never got tired, never had to avoid running into walls, and never got cricks in their backs from running while bent nearly double. Weird friction in some simulations means that AIs will sometimes end up dragging one knee in the dirt as they use the other leg to scoot forward, finding it easier than balancing on two legs.

But algorithms whose world is a simulation end up doing way more than just walking funny—they end up hacking the very fabric of their universe just because it seems to work.

WELL, YOU DIDN’T SAY I COULDN’T

One useful application for AIs is design. In a lot of engineering problems there are so many variables, and so many possible outcomes, that it’s useful to get an algorithm to search for useful solutions. But if you forget to thoroughly define your parameters, the program will likely do something really weird that you didn’t technically forbid.

For example, optical engineers use AI to help design lenses for things like microscopes and cameras—to crunch the numbers to figure out where the lenses should be, what they should be made of, and how they should be shaped. In one case, an AI’s design worked very well—except it contained a lens that was twenty meters thick!3

Another AI went further, breaking some fundamental laws of physics. AIs are increasingly being used to design and discover molecules with useful configurations—to figure out how proteins will fold, for example, or to look for molecules that might interlock with a protein, activating or deactivating it. However, AIs don’t have any obligation to obey laws of physics that you didn’t tell them about. An AI tasked with finding the lowest-energy (most stable) configuration for a group of carbon atoms found a way to arrange them in which the energy was astoundingly low. But upon closer inspection, scientists realized that the AI had planned for all the atoms to occupy the exact same point in space—not knowing that this was physically impossible.4

EATING MATH ERRORS FOR DINNER

In 1994, Karl Sims was doing experiments on simulated organisms, allowing them to evolve their own body designs and swimming strategies to see if they would converge on some of the same underwater locomotion strategies that real-life organisms use.5, 6, 7 His physics simulator—the world these simulated swimmers inhabited—used Euler integration, a common way to approximate the physics of motion. The problem with this method is that if motion happens too quickly, integration errors will start to accumulate. Some of the evolved creatures learned to exploit these errors to obtain free energy, quickly twitching small body parts and letting the math errors send them zooming through the water.

Another group of Sims’s simulated organisms learned to exploit collision math for free energy. In video games (and other simulations), collision math is what is supposed to prevent creatures from walking through walls or sinking through the floor, pushing the creature back if it tries. The creatures discovered that there was an error in the math that they could use to propel themselves high into the air if they banged two limbs together just so.

Yet another set of simulated organisms reportedly learned to use their children to generate free food. Astrophysicist David L. Clements reported seeing the following phenomenon in simulated evolution: if the AI organisms started with a small amount of food, then had lots of children, the simulation would distribute the food among the children. If the amount of food per child was less than a whole number, the simulation would round up to the nearest integer. So tiny fractions of one food item could become lots of food when distributed to lots of children.8

Sometimes simulated organisms can get very sneaky about finding free energy to exploit.9 In another team’s simulation, organisms discovered that if they were fast enough, they could manage to glitch themselves into the floor before the collision math “noticed” and popped them back out into the air, giving them an energy boost. By default, creatures in the simulation weren’t supposed to be fast enough to outrace the collision math like this, but they found that if they were very, very tiny, the simulation would also allow them to be fast. Using the simulation math for an energy boost, the creatures traveled around by glitching repeatedly into the floor.

In fact, simulated organisms are very, very good at evolving to find and exploit energy sources in their world. In that way, they’re a lot like biological organisms, which have evolved to extract energy from sunlight, oil, caffeine, mosquito gonads,10 and even farts (technically a result of the chemical breakdown of hydrogen sulfide, which gives farts their characteristic rotten-egg smell).

Sometimes I think the surest sign that we’re not living in a simulation is that if we were, some organism would have learned to exploit its glitches.

MORE POWERFUL THAN YOU CAN POSSIBLY IMAGINE

Some of the Matrix hacks that AIs find are so dramatic that they resemble nothing like actual physics. This is not a matter of harvesting a little bit of energy from math errors but something more akin to godlike superpowers.

Not bound by limits on how quickly human fingers can push buttons, AIs can break their simulations in ways that humans never anticipated. Twitter user @forgek reported being frustrated when an AI somehow discovered a button-mashing trick that it could use to crash the game whenever it was about to lose.11

The Atari video game Q*bert came out in 1982, and over the years its fans thought they had learned all its little tricks and quirks. Then in 2018, an AI playing the game started doing something very strange: it found that leaping rapidly from platform to platform caused the platforms to blink rapidly and let the AI suddenly accumulate ridiculous numbers of points. Human players had never discovered this trick—and we still can’t figure out how it works.

In a rather more sinister hack, an AI that was supposed to land a plane on an aircraft carrier found that if it applied a large enough force to the landing, it would overflow its simulatio
n’s memory, and, like an odometer rolling over from 99999 to 00000, the simulation would register zero force instead. Of course, after such a maneuver, the airplane pilot would be dead, but hey—perfect score.12

Another program went even further, reaching into the very fabric of the Matrix. Tasked with solving a math problem, it instead found where all the solutions were kept, picked the best ones, and edited itself into the authorship slots, claiming credit for them.13 Another AI’s hack was even simpler and more devastating: it found where the correct answers were stored and deleted them. Thus it got a perfect score.14

Recall, too, the tic-tac-toe algorithm from chapter 1, which learned to remotely crash its opponents’ computers, causing them to forfeit the game.

So beware of AIs that did all their learning in something other than the real world. After all, if the only things you knew about driving were what you had learned from playing a video game, you might be a technically skillful but still highly unsafe driver.

Even if an AI is given real data, or a simulation that’s accurate where it counts, it can still sometimes solve its problem in a technically correct but nonuseful way.

CHAPTER 7

Unfortunate shortcuts

We’ve seen plenty of examples in which AIs have done inconvenient things because their data had confusing extra stuff in it. Or examples in which the problem was too broad for the AI to understand or in which the AI was missing crucial data. We’ve also seen how AIs will hack their simulations to solve problems, bending the very laws of physics. In this chapter we’ll look at other ways AIs tend to take shortcuts to “solve” the problems we give them—and why these shortcuts can have disastrous consequences.

CLASS IMBALANCE

You may remember class imbalance as the problem that led the sandwich-sorting neural network in chapter 3 to decide that a batch of mostly bad sandwiches means humans never enjoy sandwiches.

Many of the most tempting problems to solve with AI are also problems prone to issues of class imbalance. It’s handy to use AI for fraud detection, for example, a situation where it can weigh the subtleties of millions of online transactions and look for signs of suspicious activity. But suspicious activity is so rare compared to normal activity that people have to be very careful that their AIs don’t conclude that fraud never happens. There are similar problems in medicine with detecting disease (diseased cells are much rarer than healthy ones) and in business with detecting customer churn (in any given time period, most customers don’t leave).

It’s still possible to train a useful AI even if the data has class imbalance. One strategy is to reward the AI more for finding the rare thing than for finding the common thing.

Another strategy for fixing class imbalance is to somehow change the data so that there are roughly equal numbers of training examples in each category. If there aren’t enough examples of the rarer category, then the programmer may have to get more somehow, maybe by turning a few examples into many using data augmentation techniques (see chapter 4). However, if we try to get away with using variations on just a few examples, the AI may end up solving the problem in a way that only holds true for those few examples. This problem is known as overfitting and is a huge pain.

OVERFITTING

I discussed overfitting in chapter 4—the case of an ice cream flavor–producing AI that memorized the flavors in its short training list. It turns out that overfitting is common in all kinds of AIs, not just in text generators.

In 2016 a team at the University of Washington set out to create a deliberately faulty husky-versus-wolf classifier. Their goal was to test a new tool called LIME, which they’d designed to detect mistakes in classifier algorithms. They collected training images in which all the wolves were photographed against snowy backgrounds and all the husky dogs against grassy backgrounds. Sure enough, their classifier had trouble telling wolves from huskies in new images, and LIME revealed that it was indeed looking at the backgrounds rather than at the animals themselves.1

This happens not just in carefully staged scenarios but in real life as well.

Researchers at the University of Tübingen trained an AI to identify a variety of images, including the fish pictured below, called a tench.

When they looked to see what parts of the image their AI was using to identify the tench, it showed them that it was looking for human fingers against a green background. Why? Because most of the tench pictures in the training data looked like this:

The tench AI’s finger-finding trick would help it identify trophy fish in human hands, but it was going to be ill prepared when looking for the fish in the wild.

Similar problems may lurk in medical datasets, even those that were released for the research community to use in designing new algorithms. When a radiologist looked carefully at the ChestXray14 dataset of chest X-rays, he discovered that many of the images of the condition pneumothorax showed patients who had already been treated for the condition with a highly visible chest drain. He warned that a machine learning algorithm trained on this dataset would probably learn to look for chest drains when trying to diagnose pneumothorax rather than looking for patients who hadn’t already been treated.2 He also found many images that had been mislabeled, which could further confuse an image recognition algorithm. Remember the ruler example from chapter 1: an AI was supposed to learn to identify pictures of skin cancer but learned to identify rulers instead, because many tumors in the training data were photographed with rulers for scale.

Another likely example of overfitting is the Google Flu algorithm, which made headlines in the early 2010s for its ability to anticipate flu outbreaks by tracking how often people searched for information on flu symptoms. At first, Google Flu appeared to be an impressive tool, since its information arrived in nearly real time, much faster than the Centers for Disease Control and Prevention (CDC) could compile and release its official numbers. But after the initial excitement, people started noticing that Google Flu was not that accurate. In 2011–12, it vastly overestimated the number of flu cases and turned out to be generally less useful than a simple projection based on already released CDC data. The phenomena that had let Google Flu match the CDC’s official records at first had only been true for a couple of years—in other words, its reported success is now thought to have been the result of overfitting,3 making faulty assumptions about future flu epidemics based on the specifics of outbreaks in the past.

In a 2017 competition to program an AI that could identify specific species of fish from photographs, contestants found that their algorithms had impressive success on small sets of test data yet did terribly when trying to identify fish from a larger dataset. It turned out that in the small dataset, many of the photos of a given type of fish had been taken by a single camera in a single boat. The algorithms discovered that it was much easier to identify the individual camera views than to identify the subtleties of a fish’s shape, so they ignored the fish and looked at the boats.4

HACKING THE MATRIX ONLY WORKS IN THE MATRIX

In chapter 6 I wrote about AIs that found neat ways to solve problems in simulation by hacking the simulation itself, exploiting weird physics or math errors. This is another example of overfitting, since the AIs would be surprised to find that their tricks only work in their simulations, not in the real world.

Algorithms that learn in simulations or on simulated data are especially prone to overfitting. Remember that it’s really hard to make a simulation detailed enough to allow a machine learning algorithm’s strategies to work both in the simulation and in real life. For the models that learn to drive bicycles, swim, or walk in simulated environments, some kind of overfitting is almost guaranteed. The virtual robots in chapter 5 who developed silly walks as a way of getting around (walking backwards, hopping on one foot, or even somersaulting) had discovered these strategies in a simulation that didn’t include any obstacles to watch out for or any penalties for exhausting gaits. The swimming robots who learned to twitch rapidly for free energy were
harvesting this energy from mathematical flaws in their simulation—in other words, it only worked because there was a Matrix they could hack. In the real world, they would have been shocked to find that their hacks no longer worked—that hopping on one foot is a lot more tiring than they had anticipated.

Here’s one of my favorite examples of overfitting, which happened not in a simulation but in a lab. In 2002 researchers tasked an AI with evolving a circuit that could produce an oscillating signal. Instead, it cheated. Rather than producing its own signal, it evolved a radio that could pick up an oscillating signal from nearby computers.5 This is a clear example of overfitting, since the circuit would only have worked in its original lab environment.

A self-driving car that freaked out when it went over a bridge for the first time is also an example of overfitting. Based on its training data, it thought that all roads had grass on both sides, and when the grass was gone it didn’t know what to do.6

The way to detect overfitting is to test the model against data and situations it hasn’t seen. Bring the cheating radio circuit into a new lab, for example, and watch its radio fail to grab the signal it had been counting on. Test the fish-identifying algorithm on photos of fish in a new boat and watch it start guessing randomly. Image-identifying algorithms can also highlight the pixels they used in their decisions, which can give their programmers a clue that something’s wrong when the “dog” the program identifies is actually a patch of grass.

‹ Prev Next ›