You Look Like a Thing and I Love You
Page 11
The creators of Microsoft’s Azure image recognition algorithm (the same AI that saw sheep in every field) designed it to accurately caption any user-uploaded image file, whether a photograph, a painting, or even a line drawing. So I gave it some sketches to identify.
Now, my art isn’t great, but it isn’t that bad. This is just a case of an algorithm trying to do too much. Identifying any image file is pretty much the opposite of the narrow tasks we know AIs excel at. Most of the images Azure saw during training were photographs, so it relies a lot on textures to understand the image—is it fur? Grass? In my line drawings, there are no textures to help it, and the algorithm just doesn’t have enough experience to understand them. (The Azure algorithm fared better than many other image recognition algorithms, though, which when faced with any kind of line drawing will identify it as some kind of “UNK”—an unknown.) Researchers are working on training image recognition algorithms on cartoons and drawings as well as on photographs with highly altered textures, reasoning that if the AI understands what it’s looking at as well as a human does, it ought to be able to figure out cartoons.
There is an algorithm that specializes in recognizing simple sketches. Researchers at Google trained their Quick Draw algorithm on millions of sketches by having people play a kind of Pictionary game against the computer. As a result, the algorithm can recognize sketches of more than three hundred different objects, even with people’s highly variable drawing ability. Here’s just a small sampling of the sketches in its training data for kangaroo:11
Quick Draw recognized my kangaroo right away.12 It also recognized the fork and the ice cream cone. The pipe gave it some trouble, since that wasn’t one of the 345 objects it knew about. It decided it was either a swan or a garden hose.
In fact, since Quick Draw only knew how to recognize those 345 things, its response to a lot of my sketches was utter weirdness.
This is all fine and good if, like me, you establish weirdness as your goal. But this incomplete picture of the world does lead to problems in some applications—for example, autocomplete. As we learned in chapter 3, the autocomplete function in smartphones is usually powered by a kind of machine learning called a Markov chain. But companies have a tough time stopping the AI from blithely making depressing or offensive suggestions. As Daan van Esch, project manager for the Android system’s autocorrect app, called GBoard, told internet linguist Gretchen McCulloch, “For a while, when you typed ‘I’m going to my Grandma’s,’ GBoard would actually suggest ‘funeral.’ It’s not wrong, per se. Maybe this is more common than ‘my Grandma’s rave party.’ But at the same time, it’s not something that you want to be reminded about. So it’s better to be a bit careful.” The AIs don’t know that this perfectly accurate prediction is nonetheless not the right answer, so human engineers have to step in to teach it not to supply that word.13
THERE ARE FOUR GIRAFFES
There are a lot of interesting data-related quirks that crop up in Visual Chatbot, an AI that was trained to answer questions about images. The researchers who made the bot trained it on a crowdsourced dataset of questions and answers relating to a set of pictures. As we know now, bias in the dataset can skew the AI’s responses, so the programmers set up their training data collection to avoid some known biases. One bias they set out to avoid was visual priming—that is, humans asking questions about an image tend to ask questions to which the answer is yes. Humans very rarely ask “Do you see a tiger?” about an image in which there are no tigers. As a result, an AI trained on that data would learn that the answer to most questions is yes. In one case, an algorithm trained on a biased dataset found that answering yes to any question that begins with “Do you see a…” would result in 87 percent accuracy. If this sounds familiar, remember the class imbalance problem from chapter 3—a big batch of mostly terrible sandwiches resulted in an AI that had concluded the answer was Humans Hate All Sandwiches.
So to avoid visual priming, when they collected their crowdsourced set of questions, the programmers hid the image from the humans asking the question. By forcing the humans to ask generic yes-or-no questions that could apply to any image, they managed to achieve a rough balance between yes answers and no answers in the dataset.14 But even this wasn’t enough to eliminate problems.
One of the most entertaining quirks of the dataset is that no matter the content of the picture, if you ask Visual Chatbot how many giraffes there are, it will almost always answer that there is at least one. It may be doing relatively well with a picture of people in a meeting, or surfers on a wave, up to the point where it’s asked about the number of giraffes. Then, pretty much no matter what, Visual Chatbot will report that the image contains one giraffe, or maybe four, or even “too many to count.”
The source of the problem? Humans who asked questions during dataset collection rarely asked the question “How many giraffes are there?” when the answer was zero. Why would they? In normal conversation people don’t start quizzing each other about the number of giraffes when they both know there aren’t any. In this way, Visual Chatbot was prepared for normal human conversation, bounded by the rules of politeness, but it wasn’t prepared for weird humans who ask about random giraffes.
As a result of the AIs’ training on normal conversations between normal humans, they’re completely unprepared for other forms of weirdness as well. Show Visual Chatbot a blue apple, and it will answer the question “What color is the apple?” with “red” or “yellow” or some normal apple color. Rather than learning to recognize the color of the object, a difficult job, Visual Chatbot has learned that the answer to “What color is the apple?” is almost always “red.” Similarly, if Visual Chatbot sees a picture of a sheep dyed bright blue or orange, its response to “What color is the sheep?” is to report a standard sheep color, such as “black and white” or “white and brown.”
In fact, Visual Chatbot doesn’t have very many tools with which it can express uncertainty. In the training data, humans usually knew what was going on in the picture, even if some details like “What does the sign say?” were unanswerable because the sign was blocked. To the question “What color is the X?” Visual Chatbot learned to answer “I can’t tell; it’s in black and white,” even if the picture was very obviously not in black and white. It will answer “I can’t tell; I can’t see her feet” to questions like “What color is her hat?” It gives plausible excuses for confusion but in completely the wrong context. One thing it doesn’t usually do, however, is express general confusion—because the humans it learned from weren’t confused. Show it a picture of BB-8, the ball-shaped robot from Star Wars, and Visual Chatbot will declare that it is a dog and begin answering questions about it as if it were a dog. In other words, it bluffs.
There’s only so much an AI has seen during training, and that’s a problem for applications like self-driving cars, which have to encounter the limitless weirdness of the human world and decide how to deal with it. As I mentioned in the section on self-driving cars in chapter 2, driving on real roads is a very broad problem. So is dealing with the huge range of things a human might say or draw. The result: the AI makes its best guess based on its limited model of the world and sometimes guesses hilariously, or tragically, wrong.
In the next chapter, we’ll look at AIs that did a great job solving the problems we asked them to solve—only we accidentally asked them to solve the wrong problems.
CHAPTER 5
What are you really asking for?
I tried to write a neural network to maximise profit from betting on horse races once. It determined that the best strategy was *drumroll* to place zero bets.
—@citizen_of_now1
I tried to evolve a robot to not run into walls:
1) It evolved to not move, and thus didn’t hit walls
2) Added fitness for moving: it spun
3) Added fitness for lateral moves: went in small circles
4) etc.
Resulting book title: “How to Evolve a Pro
grammer”
—@DougBlank2
I hooked a neural network up to my Roomba. I wanted it to learn to navigate without bumping into things, so I set up a reward scheme to encourage speed and discourage hitting the bumper sensors. It learnt to drive backwards, because there are no bumpers on the back.
—@smingleigh3
My goal is to train a robotic arm to make pancakes. As a first test, [I tried] to get the arm to toss a pancake onto a plate… The first reward system was simple—a small reward was given for every frame in the session, and the session ends when the pancake hits the floor. I thought this would incentivize the algorithm to keep the pancake in the pan as long as possible. What it actually did was try to fling the pancake as far as it possibly could, maximizing its time in the air… Score—PancakeBot: 1, Me: 0.
—Christine Barron4
As we’ve seen, there are many ways to accidentally sabotage an AI by giving it faulty or inadequate data. But there’s another kind of AI failure in which we discover that they’ve succeeded in doing what we asked, but what we asked them to do isn’t what we actually wanted them to do.
Why are AIs so prone to solving the wrong problem?
1. They develop their own ways of solving a problem rather than relying on step-by-step instructions from a programmer.
2. They lack the contextual knowledge to understand when their solutions are not what humans would have preferred.
Even though the AI does the work of figuring out how to solve the problem, the programmer still has to make sure the AI has actually solved the correct problem. That usually involves a lot of work in:
1. Defining the goal clearly enough to constrain the AI to useful answers.
2. Checking to see whether the AI has, nevertheless, managed to come up with a solution that’s not useful.
It’s really tricky to come up with a goal that the AI isn’t going to accidentally misinterpret. Especially if its misinterpreted version of the task is easier than what you want it to do.
The problem is that, as we’ve seen throughout this book, AIs don’t understand nearly enough about their tasks to be able to consider context or ethics or basic biology. AIs can classify images of lungs as healthy versus diseased without ever understanding how a lung works, what size it is, or even that it’s found inside a human—whatever a human is. They don’t have common sense, and they don’t know when to ask for clarification. Give them a goal—data to imitate or a reward function to maximize (such as distance traveled or points collected in a video game)—and they’ll do it, whether or not they’ve actually solved your problem.
Programmers who work with AI have learned to be philosophical about this.
“I’ve taken to imagining [AI] as a demon that’s deliberately misinterpreting your reward and actively searching for the laziest possible local optima. It’s a bit ridiculous, but I’ve found it’s actually a productive mindset to have,” writes Alex Irpan, an AI researcher at Google.5
Another frustrated programmer’s attempts to train virtual robot dogs to walk resulted in dogs that twitched across the ground, did weird push-ups with their back legs crossed, and even hacked the simulation’s physics so they could hover.6 As engineer Sterling Crispin wrote on Twitter:
I thought I was making progress… but these JERKS just found a flaw in the physics simulation and they’re using it to glide across the floor like total cheaters.
Battling with the robots’ tendencies to do anything but walk, Crispin kept tweaking their reward function, introducing a “tap dancing penalty” to stop them from shuffling rapidly in place and a “touch the damn ground reward” to, well, stop the hovering problem. In reaction, they started scooting ineffectually across the ground. Crispin then introduced a reward for keeping their bodies off the ground and, when they started shuffling around with their rears stuck in the air, a reward for keeping their bodies level. To stop them from insisting on walking with their rear legs crossed, Crispin rewarded them for keeping their lower legs off the ground, and to stop them from lurching around, he introduced another reward for keeping their bodies level, and so forth. It was hard to tell if it was a case of a benevolent programmer trying to give the robodogs hints on how to use their legs or a test of wills between the programmer and robodogs that Did. Not. Want. To. Walk. (There was also a slight difficulty the first time the robodogs encountered anything other than the perfectly flat, smooth terrain they’d seen in training. Faced with slightly textured dirt, they would face-plant.)
It turns out that training a machine learning algorithm has a lot in common with training dogs. Even if the dog really wants to cooperate, people can accidentally train them to do the wrong thing. For example, dogs have such excellent senses of smell that they can detect the odor of cancer in humans. But the people who train cancer-sniffing dogs have to be careful to train them on a variety of patients, otherwise they will learn to identify individual patients rather than cancer.7 During World War II there was a rather grim Soviet project that involved training dogs to bring bombs to enemy tanks.8 A couple of difficulties arose:
1. The dogs were trained to retrieve food from under the tanks, but to save fuel and ammunition, the tanks had not been moving or firing. The dogs didn’t know what to do with moving tanks, and the firing was scary.
2. The Soviet tanks the dogs had trained on smelled different from the German tanks that the dogs were supposed to seek out—they burned gasoline rather than the diesel that the Soviet tanks burned.
As a result, in battle situations, the dogs tended to avoid German tanks, to return to Soviet soldiers in confusion, and even to seek out Soviet tanks. This was less than okay with the Soviet soldiers, since the dogs were still carrying their bombs.
In the language of machine learning, this is overfitting: the dogs were prepared for the conditions they saw in training, but these conditions didn’t match those of the real world. Similarly, the robodogs also overfit the weird physics of their simulation, using hovering and gliding strategies that would never have worked in the real world.
There’s another way that training animals can be like training machine learning algorithms, and that is the devastating effect of a faulty reward function.
REWARD FUNCTION HACKING
Dolphin trainers have learned that it’s handy to get the dolphins to help with keeping their tanks clean. All they have to do is teach the dolphins to fetch trash and bring it to their keepers in exchange for a fish. It doesn’t always work well, however. Some dolphins learn that the exchange rate is the same no matter how large the bit of trash is, and they learn to hoard trash instead of returning it, tearing off small pieces to bring to their keepers for a fish apiece.9
Humans, of course, also hack their reward functions. In chapter 4 I mentioned that people who hire humans to generate training data through remote services like Amazon Mechanical Turk sometimes find that their jobs are completed by bots instead. This could be considered a case of a faulty reward function—if the pay is based on the number of questions answered rather than on the quality of the answers, then it does indeed make financial sense to build bots that can answer lots of questions for you rather than answering a few questions yourself. By that same token, many kinds of crime and fraud could be thought of as reward function hacking. Even doctors can hack their reward functions. In the United States, doctor report cards are supposed to help patients choose high-performing doctors and avoid those with worse-than-average surgery survival rates. They’re also supposed to encourage doctors to improve their performance. Instead, some doctors have started turning away patients whose surgeries will be risky so that their report cards won’t suffer.10
Humans, however, usually have some idea of what the reward function was supposed to encourage, even if they don’t always choose to play along. AIs have no such concept. It’s not that they’re out to get us or that they’re trying to cheat—it’s that their virtual brains are roughly the size of a worm’s, and they can only learn one narrow task at a time. Train an AI t
o answer questions about human ethics, and that’s all it can do—it won’t be able to drive a car, recognize faces, or screen resumes. It won’t even be able to recognize ethical dilemmas in stories and consider them—story comprehension is an entirely different task.
That’s why you’ll get algorithms like the navigation app that, during the California wildfires of December 2017, directed cars toward neighborhoods that were on fire. It wasn’t trying to kill people: it just saw that those neighborhoods had less traffic. Nobody had told it about fire.11
That’s why when computer scientist Joel Simon used a genetic algorithm to design a new, more efficient layout for an elementary school, its first designs had windowless classrooms buried deep in the center of a complex of round-walled caves. Nobody had told it about windows or fire escape plans or that walls should be straight.12
That’s also why you’ll get algorithms like the RNN I trained to generate new My Little Pony names by imitating a list of existing pony names—it knew which letter combinations are found in pony names, but it didn’t know that certain combinations of those are best avoided. As a result, I ended up with ponies like these:
Rade Slime
Blue Cuss
Starlich
Derdy Star
Pocky Mire
Raspberry Turd
Parpy Stink
Swill Brick
Colona
Star Sh*tter
And that’s why you’ll get algorithms that learn that racial and gender discrimination are handy ways to imitate the humans in their datasets. They don’t know that imitating the bias is wrong. They just know that this is a pattern that helps them achieve their goal. It’s up to the programmer to supply the ethics and the common sense.