You Look Like a Thing and I Love You
Page 12
COMPUTER GAMES ARE CONFUSING
A popular test problem for AI is learning to play computer games. Games are fun: they make good demonstrations, and many of the earliest computer games can run very quickly on a modern machine so the AIs can go through thousands of hours of game play in sped-up time.
But even the simplest of computer games can be very difficult for an AI to beat—often because it needs very specific goals. The best are those in which the algorithm can get feedback right away on whether it’s doing the right thing. So “win the game” is not a good goal, but “increase your score” and even “stay alive for as long as possible” might both be. Even with good goals, however, machine learning algorithms can still struggle to understand the task at hand.
In 2013, a researcher designed an algorithm to play classic computer games. When playing Tetris, it would place the blocks seemingly at random, letting them pile up nearly to the top of the screen. The algorithm would then realize that it would lose as soon as the next block appeared, and so it… paused the game forever.13,14
In fact, “pause the game so a bad thing won’t happen,” “stay at the very beginning of the level, where it’s safe,” or even “die at the end of level 1 so level 2 doesn’t kill you” are all strategies that machine learning algorithms will use if you let them. It’s as if the games were being played by very literal-minded toddlers.
If the AI is not told it has to avoid losing lives, it has no way of knowing that it shouldn’t die. A researcher managed to train a Super Mario–playing AI that made it all the way through level 2 only to immediately jump into a pit and die at the beginning of level 3. The programmer concluded that the AI—which had not been specifically told not to lose lives—had no idea that it had done something bad. It got sent back to the beginning of the level when it died, but since it was so close to the beginning of the level already, it didn’t really see what the problem was.15
Another AI was supposed to play a sailing race.16 The AI controlled a boat that would collect markers as it progressed through the racecourse. But crucially, the goal was to collect the shiny markers, not specifically to finish the race. And once a marker was collected it would eventually reappear in its original spot. The AI discovered that it could collect lots of points by circling endlessly between three markers, collecting them over and over again as they reappeared.
Many game developers rely on AI to power the nonplayer characters (NPCs) in complex computer games—but they often find that it’s difficult to teach an AI to move in a virtual world without disrupting the game. When developing the game Oblivion, Bethesda Softworks wanted its NPCs to have varied, interesting behaviors rather than acting out a preprogrammed, repetitive routine. The developers tested Radiant AI, a program that uses machine learning to simulate the inner lives and motivations of the background characters. However, Bethesda found that these new AI-driven NPCs could sometimes break the game. In one case, there was a drug dealer who was supposed to be part of a quest but who sometimes would fail to show up to play his part. It turned out that the drug dealer’s customers were murdering him rather than paying for their drugs, since there was nothing in the game to prevent them from doing so.17 In another case, players entering a store found that there was nothing on the shelves to buy because an NPC had come by earlier and bought everything.18 The game designers ended up having to tone down the system considerably so the NPCs wouldn’t cause havoc.
DON’T WALK
Why walk when you can fall?
Let’s say you want to use machine learning to create a robot that can walk. So you give an AI the task of designing a robot body and using it to travel from point A to point B.
If you give that problem to a human, you would expect them to use robot parts to make a robot with legs, then program it to walk from A to B. If you program a computer step-by-step to solve this problem, that’s also what you’d tell it to do.
But if you give the problem to an AI, it has to come up with its own strategy for solving it. And it turns out that if you tell an AI to go from A to B and don’t tell it what to build, what you tend to get is something like this:
It assembles itself into a tower and falls over.
Technically, that solves the problem: get from A to B. But it definitely doesn’t solve the problem of learning to walk. And, it turns out, AIs love to fall over. Give them the task of moving at a high average speed, and you can bet they’ll do it by falling over if you let them. Sometimes the robots even learn to somersault for extra travel distance. Technically, this is an excellent solution, though this isn’t what the humans had in mind.
It’s not just AIs that figure out how to fall. It turns out that some prairie grasses move from generation to generation by falling over at the end of their life cycles and thus dropping their seed heads one stem length from the place where they started. Walking palms are said to use a similar strategy, falling over and then resprouting from their crowns.
High-speed versions of somersaulting have evolved as well. There’s a spider called the flic-flac spider that normally walks in the usual spider fashion. But when it needs to put on a burst of speed, it will start somersaulting instead.19 Virtual AI evolution and biological evolution sometimes come up with eerily similar strategies.
Why jump when you can cancan?
There was once a team of researchers trying to train simulated robots to jump. To give the robots a value to maximize, they defined their jumping height as the maximum height attained by the robot’s center of gravity. But rather than learn to jump, some of the robots became very tall and simply stood there, being tall. Technically this was success, since their center of gravity was very high.
The researchers discovered this problem and altered their program so that the goal instead was to maximize the height of the part of the body that had been the lowest at the start of the simulation. Rather than learn to jump, the robots instead learned to cancan. They became compact robots perched on the top of a skinny pole. When the simulation started, they would kick the pole high up above their heads, reaching a huge height as they fell to the ground.20
Why drive when you can spin?
Another research team was trying to build light-seeking robots. These were simple robots that had two wheels, two eyes (simple light sensors), and two motors. The robots were given the goal of spotting a light and driving toward it.
The human-designed solution to this problem is a well-known robotics strategy called the Braitenberg solution: tie the right and left light sensors to the right and left wheels so the robot drives in a mostly straight line toward the light source.
The researchers gave AIs the task of controlling the cars and were curious to see if the AIs could figure out the Braitenberg solution. Instead, the cars began to spin toward the light source in giant loops. And the spinning worked pretty well. In fact, spinning turned out to be a better solution in many ways than the solution the humans had expected. It worked better at high speed and was even more adaptable to different types of vehicles. Machine learning researchers live for moments like this—when the algorithm comes up with a solution that’s both unusual and effective. (Though perhaps the spinning car won’t catch on for human transport.)
In fact, spinning in place is something AIs often use as a sneaky alternative to traveling. After all, moving can be inconvenient—the AIs risk falling over or running into obstacles. A team trained a virtual bicycle to travel toward a goal only to discover that the bicycle was circling the goal forever instead. They had forgotten to penalize the bicycle for driving away from the goal.21
Silly Walks
Robots, real or simulated, tend to solve the problem of locomotion in all kinds of strange ways. Even when they’re given a two-legged body design and told that their goal is to walk, their definition of walk can vary. A team of researchers from the University of California at Berkeley used OpenAI’s DeepMind Control Suite22 to test strategies for teaching humanoid robots to walk.23 They found that their simulated robots were
coming up with high-scoring solutions for getting around on two legs, but the solutions were weird. For one, nobody had told the robots that they had to face forward when they walk, so some of the robots were walking backwards or even sideways. One slowly rotated in a circle as it walked (it might enjoy riding in that spinning car). Another traveled forward but did so while hopping on one leg—the simulation didn’t seem to be detailed enough to penalize solutions that might be rather tiring.
They weren’t the only team to find the DeepMind Control Suite robots acting weirdly; the team that first released the program also released a video of some of the gaits their robots had developed. The robots, not having any other purpose for their arms, used them vigorously as counterweights for their own deeply strange running styles. One arched its back and leaned way forward as it ran but maintained its balance by clasping its hands to its neck as if it were dramatically clutching pearls. Another ran sideways with its arms held high over its head. Another robot traveled rapidly by stumbling backwards with its arms flung out, somersaulting, then rolling to its feet, only to stumble backwards and somersault again.
The Terminator robots probably should have been a lot weirder. Maybe they should have had extra limbs, strange hopping or spinning gaits, a design like a pile of garbage rather than a sleek humanoid—if there’s no reason to care about aesthetics, an evolved machine will take any shape that gets the job done.
When in doubt, do nothing
It’s surprisingly common to develop a sophisticated machine learning algorithm that does absolutely nothing.
Sometimes it’s because it discovers that doing nothing is truly the best solution—like that AI from the beginning of the chapter that was supposed to place bets on horse races but learned that the best strategy for avoiding losing bets was not to bet at all.24
Other times it’s because the programmer accidentally set things up so that the algorithm thinks doing nothing is the best solution. For example, a machine learning algorithm was supposed to build simple computer programs that could do tasks like sorting lists of numbers or looking for bugs in other computer programs. To make the program small and lean, the people setting up the AI decided to penalize it for the computing resources it used. In response, it produced programs that just slept forever so they would use zero computing resources.25
Another program was supposed to learn to sort a list of numbers. It learned instead to delete the list so that there wouldn’t be any numbers out of order.26
So we’ve seen that one of the most important tasks a machine learning programmer can undertake is to specify exactly what problem the algorithm should be trying to solve—that is, the reward function. Should it maximize its ability to predict the next letter in a sequence or tomorrow’s number in a spreadsheet? Should it maximize its score in a video game, the distance it can fly, or the length of time a pancake stays in the air? A faulty reward function could result in a robot that refuses to move just so it doesn’t incur a penalty for hitting a wall.
But there’s also a way to get machine learning algorithms to solve problems without ever being told the goal at all. Rather, you give them a single, very broad goal: satisfy curiosity.
CURIOSITY
A curiosity-driven AI makes observations about the world, then makes predictions about the future. If the thing that happens next is not what it predicted, it counts that as a reward. As it learns to predict better, it has to seek out new situations in which it doesn’t yet know how to predict the outcome.
Why would curiosity work as a reward function all by itself? Because when you’re playing a video game, death is boring. It returns you to the start of the level, which you’ve already seen. A curiosity-driven AI will learn to move through a video-game level so it can see new stuff, avoiding fireballs, monsters, and death pits because when it gets hit by those, it sees the same boring death sequence. It isn’t specifically told to avoid dying—as far as it knows, death is just like moving to a different level. A boring one. It wants to see level 2 instead.
But a curiosity-driven strategy doesn’t work for every game. In some games, the curious AI will invent its own goals, which are not the same as what the game makers intended. In one experiment, an AI player was supposed to learn to control a spider-shaped robot, coordinating the legs to walk to the finish line.27 The curious AI learned to stand up and walk (standing still is boring) but had no reason to travel along the racetrack toward the finish line. It trundled off in another direction instead.
Another game, Venture, looked a lot like Pac-Man: a maze with randomly moving ghosts that the player was supposed to avoid while collecting lighted floor tiles. The problem was that because the ghosts moved randomly, their movements were impossible to predict—and therefore very interesting to the curiosity-based AI. No matter what it did, it got maximum rewards just by observing the unpredictable ghosts. Rather than collect floor tiles, the player darted around in apparent ecstasy, perhaps exploiting some unpredictable (and therefore interesting) controller glitches. The game was heaven for a curiosity-driven AI.
The researchers also tried putting the AI in a 3-D maze. Sure enough, it learned to navigate the maze so it could see interesting new sections it hadn’t explored yet. Then they put a TV on one of the maze walls, a TV that showed random unpredictable images. As soon as the AI found the TV, it was transfixed. It stopped exploring the maze and focused on the superinteresting TV.
The researchers had neatly demonstrated a well-known glitch of curiosity-driven AI known as the noisy TV problem. The way they had designed it, the AI was chaos-seeking rather than truly curious. It would be just as mesmerized by random static as by movies. So one way of combating the noisy TV problem is to reward the AI not just for being surprised but also for actually learning something.28
BEWARE THE FAULTY REWARD FUNCTION
Designing the reward function is one of the hardest things about machine learning, and real-life AIs end up with faulty reward functions all the time. And as I mentioned, the consequences can range from annoying to serious.
In the cute-but-annoying category: an AI was supposed to learn to convert a satellite image into a road map, then turn the map back into a satellite image. But instead of learning to turn road maps into satellite images, the AI found it was easier to hide the original satellite image data in the map it made so it could extract it later. Researchers were tipped off when the algorithm not only did suspiciously well at converting the map back to a satellite image but was also able to reproduce features like skylights that didn’t make it into the maps at all.29
That faulty reward function never made it past the troubleshooting stage. But there are also faulty reward functions in products that have serious effects on millions of people.
YouTube has tried multiple times to improve the reward function in the AI that suggests videos for its users to watch. In 2012 the company reported that it had discovered problems with its previous algorithm, which had sought to maximize the number of views. The result was that content creators poured their effort into producing enticing preview thumbnail images rather than videos that people actually wanted to watch. A click was a view, even if viewers immediately clicked away when they saw that the videos were not what the previews promised. So YouTube announced it was going to improve its reward function so that the algorithm suggested videos that would encourage longer viewing times. “If viewers are watching more YouTube,” the company wrote, “it signals to us that they’re happier with the content they’ve found.”30
By 2018, however, it was clear that YouTube’s new reward function also had problems. A longer viewing time didn’t necessarily mean that viewers were happy with the suggested videos—it often meant that they were appalled, outraged, or couldn’t tear themselves away. It turned out that YouTube’s algorithm was increasingly suggesting disturbing videos, conspiracy theories, and bigotry. As a former YouTube engineer noted,31 the problem seemed to be that videos like these do tend to make people watch more of them, even if the effect
of watching them is terrible. In fact, the ideal YouTube users, as far as the AI is concerned, are the ones who have been sucked into a vortex of YouTube conspiracy videos and now spend their entire lives on YouTube. The AI is going to start suggesting whatever they’re watching to other people so that more people will act like them. In early 2019, YouTube announced that it was going to change its reward function again, this time to recommend harmful videos less often.32 What will change? As of this writing, it remains to be seen.
One problem is that platforms like YouTube, as well as Facebook and Twitter, derive their income from clicks and viewing time, not from user enjoyment. So an AI that sucks people into addictive conspiracy-theory vortexes may be optimizing correctly, at least as far as its corporation is concerned. Without some form of moral oversight, corporations can sometimes act like AIs with faulty reward functions.
In the next chapter, we’ll look at faulty reward functions taken to the extreme: AIs that would rather break the laws of physics than solve the problem the way you want them to.
CHAPTER 6
Hacking the Matrix, or AI finds a way
An evolutionary algorithm found in an early version of the Robocup soccer simulator that if it held onto the ball, repeatedly kicking it, the ball would build up energy and, when released, would fly into the goal at the speed of light.