You Look Like a Thing and I Love You
Page 4
Dealing with the full range of things a human can say or ask is a very broad task. The mental capacity of AI is still tiny compared to that of humans, and as tasks become broad, AIs begin to struggle.
For example, I recently trained an AI to generate recipes. This particular AI is set up to imitate text, but it started from a blank slate—no idea what recipes are, no idea that various letters are referring to ingredients and things that happen to them, no idea even what English is. There’s a lot to keep track of, but it tried its best to figure out how to place one letter after another and imitate the recipes it saw. When I gave it only recipes for cake to learn from, here’s the recipe it produced.
Carrot Cake (Vera Ladies” cakes, alcohol
1 pkg yellow cake mix
3 cup flour
1 teaspoon baking powder
1 ½ teaspoon baking soda
¼ teaspoon salt
1 teaspoon ground cinnamon
1 teaspoon ground ginger
½ teaspoon ground cloves
1 teaspoon baking powder
½ teaspoon salt
1 teaspoon vanilla
1 egg, room temperature
1 cup sugar
1 teaspoon vanilla
1 cup chopped pecans
Preheat oven to 350 degrees. Grease a 9-inch springform pan.
To make the cake: Beat eggs at high speed until thick and yellow color and set aside. In a separate bowl, beat the egg whites until stiff. Speed the first like the mixture into the prepared pan and smooth the batter. Bake in the oven for about 40 minutes or until a wooden toothpick inserted into centre comes out clean. Cool in the pan for 10 minutes. Turn out onto a wire rack to cool completely.
Remove the cake from the pan to cool completely. Serve warm.
HereCto Cookbook (1989) From the Kitchen & Hawn inthe Canadian Living
Yield: 16 servings
Now, the recipe isn’t perfect, but at least it’s a recipe that’s identifiably cake (even if, when you look at the instructions closely, you realize that it only produces a single baked egg yolk).
Next, I asked the AI to learn to generate not just cake recipes but also recipes for soup, barbecue, cookies, and salads. It had about ten times more data to learn from—24,043 general recipes as opposed to just 2,431 recipes in the cake-only dataset. Here’s a recipe it generated.
Spread Chicken Rice cheese/eggs, salads, cheese
2 lb hearts, seeded
1 cup shredded fresh mint or raspberry pie
½ cup catrimas, grated
1 tablespoon vegetable oil
1 salt
1 pepper
2 ½ tb sugar, sugar
Combine unleaves, and stir until the mixture is thick. Then add eggs, sugar, honey, and caraway seeds, and cook over low heat. Add the corn syrup, oregano, and rosemary and the white pepper. Put in the cream by heat. Cook add the remaining 1 teaspoon baking powder and salt. Bake at 350F for 2 to 1 hour. Serve hot.
Yield: 6 servings
This time, the recipe is a total disaster. The AI had to try to figure out when to use chocolate and when to use potatoes. Some recipes required baking, some required slow simmering, and the salads required no cooking at all. With all these rules to try to learn and keep track of, the AI spread its brainpower too thin.
So people who train AIs to solve commercial or research problems have discovered that it makes sense to train it to specialize. If an algorithm seems to be better at its job than the AI that invented Spread Chicken Rice, the main difference is probably that it has a narrower, better-chosen problem. The narrower the AI, the smarter it seems.
C-3PO VERSUS YOUR TOASTER
This is why AI researchers like to draw a distinction between artificial narrow intelligence (ANI), the kind we have now, and artificial general intelligence (AGI), the kind we usually find in books and movies. We’re used to stories about superintelligent computer systems like Skynet and Hal or very human robots like Wall-E, C-3PO, Data, and so forth. The AIs in these stories may struggle to understand the fine points of human emotion, but they’re able to understand and react to a huge range of objects and situations. An AGI could beat you at chess, tell you a story, bake you a cake, describe a sheep, and name three things larger than a lobster. It’s also solidly the stuff of science fiction, and most experts agree that AGI is many decades away from becoming reality—if it will become a reality at all.
The ANI that we have today is less sophisticated. Considerably less sophisticated. Compared to C-3PO, it’s basically a toaster.
The algorithms that make headlines when they beat people at games like chess and go, for example, surpass humans’ ability at a single specialized task. But machines have been superior to humans at specific tasks for a while now. A calculator has always exceeded humans’ ability to perform long division—but it still can’t walk down a flight of stairs.
Actually, plenty of sci-fi AGIs are for some reason unable to walk down stairs. The Daleks, C-3PO, RoboCop thingy, Hal. Study further?
What problems are narrow enough to be suitable for today’s ANI algorithms? Unfortunately (see warning sign number 1 of AI doom: Problem Is Too Hard), often a real-world problem is broader than it first appears. In our video-interview-analyzing AI from chapter 1, the problem at first glance seems relatively narrow: a simple matter of detecting emotion in human faces. But what about applicants who have had a stroke or who have facial scarring or who don’t emote in neurotypical ways? A human could understand an applicant’s situation and adjust their expectations accordingly, but to do the same, an AI would have to know what words the applicant is saying (speech-to-text is an entire AI problem in itself), understand what those words mean (current AIs can only interpret the meaning of limited kinds of sentences in limited subject areas and don’t do well with nuance), and use that knowledge and understanding to alter how it interprets emotional data. Today’s AIs, incapable of such a complicated task, would most likely screen all these people out before they got to a human.
As we’ll see below, self-driving cars may be another example of a problem that is broader than it at first appears.
INSUFFICIENT DATA DOES NOT COMPUTE
AIs are slow learners. If you showed a human a picture of some new animal called a wug, then gave them a big batch of pictures and told them to identify all the pictures that contain wugs, they could probably do a decent job just based on that one picture. An AI, however, might need thousands or hundreds of thousands of wug pictures before it could even semireliably identify wugs. And the wug pictures need to be varied enough for the algorithm to figure out that “wug” refers to an animal, not to the checkered floor it’s standing on or to the human hand patting its head.
Researchers are working on designing AIs that can master a topic with fewer examples (an ability called one-shot learning), but for now, if you want to solve a problem with AI, you’ll need tons and tons of training data. The popular ImageNet set of training data for image generation or image recognition currently has 14,197,122 images in only one thousand different categories. Similarly, while a human driver may only need to accumulate a few hundred hours of driving experience before they’re allowed to drive on their own, as of 2018 the self-driving car company Waymo’s cars have collected data from driving more than six million road miles plus five billion more miles driven in simulation.11 And we’re still a ways off from a widespread rollout of self-driving car technology. AI’s data hungriness is a big reason why the age of “big data,” where people collect and analyze huge sets of data, goes hand in hand with the age of AI.
Sometimes AIs learn so slowly that it’s impractical to let them do their learning in real time. Instead, they learn in sped-up time, amassing hundreds of years’ worth of training in just a few hours. A program called OpenAI Five, which learned to play the computer game Dota (an online fantasy game in which teams have to work together to take over a map), was able to beat some of the world’s best human players by playing games against itself rather than ag
ainst humans. It challenged itself to tens of thousands of simultaneous games, accumulating 180 years of gaming time each day.12 Even if the goal is to do something in the real world, it can make sense to build a simulation of that task to save time and effort.
Another AI’s task was to learn to balance a bicycle. It was a bit of a slow learner, though. The programmers kept track of all the paths the bicycle’s front wheel took as it repeatedly wobbled and crashed. It took more than a hundred crashes before the AI could drive more than a few meters without falling, and thousands more before it could go more than a few tens of meters.
Training an AI in simulation is convenient, but it also comes with risks. Because of the limited computing power of the computers that run them, simulations aren’t nearly as detailed as the real world and are by necessity held together with all sorts of hacks and shortcuts. That can sometimes be a problem if the AI notices the shortcuts and begins to exploit them (more on that later).
PIGGYBACKING ON OTHER PROGRESS
If you don’t have lots of training data, you might still be able to solve your problem with AI if you or someone else has already solved a similar problem. If the AI starts not from scratch but from a configuration it learned from a previous dataset, it can reuse a lot of what it learned. For example, say I already have an AI that I’ve trained to generate the names of metal bands. If my next task is to build an AI that can generate ice cream flavors, I may get results more quickly, and need fewer examples, if I start with the metal-band AI. After all, from learning to generate metal bands, the AI already knows
• approximately how long each name should be,
• that it should capitalize the first letter of each line,
• common letter combinations—ch and va and str and pis (it is already partway to spelling chocolate, vanilla, strawberry, and pistachio!)—and
• commonly occurring words, such as the and, um… death?
So a few short rounds of training can retrain the AI from a model that produces this:
Dragonred of Blood
Stäggabash
Deathcrack
Stormgarden
Vermit
Swiil
Inbumblious
Inhuman Sand
Dragonsulla and Steelgosh
Chaosrug
Sespessstion Sanicilevus
into a model that produces this:
Lemon-Oreo
Strawberry Churro
Cherry Chai
Malted Black Madnesss
Pumpkin Pomegranate Chocolate Bar
Smoked Cocoa Nibe
Toasted Basil
Mountain Fig n Strawberry Twist
Chocolate Chocolate Chocolate Chocolate Road
Chocolate Peanut Chocolate Chocolate Chocolate
(There’s only a miiinor awkward phase in between, when it’s generating things like this:)
Swirl of Hell
Person Cream
Nightham Toffee
Feethberrardern’s Death
Necrostar with Chocolate Person
Dirge of Fudge
Beast Cream
End All
Death Cheese
Blood Pecan
Silence of Coconut
The Butterfire
Spider and Sorbeast
Blackberry Burn
Maybe I should have started with pie instead.
As it turns out, AI models get reused a lot, a process called transfer learning. Not only can you get away with using less data by starting with an AI that’s already partway to its goal, you can also save a lot of time. It can take days or even weeks to train the most complex algorithms with the largest datasets, even on very powerful computers. But it takes only minutes or seconds to use transfer learning to train the same AI to do a similar task.
People use transfer learning a lot in image recognition in particular, since training a new image recognition algorithm from scratch requires a lot of time and a lot of data. Often people will start with an algorithm that’s already been trained to recognize general sorts of objects in generic images, then use that algorithm as a starting point for specialized object recognition. For example, if an algorithm already knows rules that help it recognize pictures of trucks, cats, and footballs, it already has a head start on the task of distinguishing different kinds of produce for, say, a grocery scanner. A lot of the rules a generic image recognition algorithm has to discover—rules that help it find edges, identify shapes, and classify textures—will be helpful for the grocery scanner.
DON’T ASK IT TO REMEMBER
A problem is more easily solvable with AI if it doesn’t require much memory. Because of their limited brainpower, AIs are particularly bad at remembering things. This shows up, for example, when AIs try to play computer games. They tend to be extravagant with their characters’ lives and other resources (like powerful attacks that they only have in limited numbers). They’ll burn through lots of lives and spells at first until their numbers get critically low, at which point they’ll suddenly start being cautious.13
One AI learned to play the game Karate Kid, but it always squandered all its powerful Crane Kick moves at the beginning of the game. Why? It only had enough memory to look forward to the next six seconds of game play. As Tom Murphy, who trained the algorithm, put it, “Anything that you are gonna need 6 seconds later, well, too bad. Wasting lives and other resources is a common failure mode.”14
Even sophisticated algorithms like OpenAI’s Dota-playing bot have only a limited time frame over which they can remember and predict. OpenAI Five can predict an impressive two minutes into the future (impressive for a game with so many complex things happening so quickly), but Dota matches can last for forty-five minutes or more. Although OpenAI Five can play with a terrifying level of aggression and precision, it also seems not to know how to use techniques that will pay off in the much longer term.15 Like the simple Karate Kid bot that employs the Crane Kick too early, it tends to use up a character’s most powerful attacks early on rather than saving them for later, when they will count the most.
This failure to plan ahead shows up fairly often. In level 2 of Super Mario Bros., there is an infamous ledge, the bane of all game-playing algorithms. This ledge has lots of shiny coins on it! By the time they get to level 2, AIs usually know coins are good. The AIs also usually know that they have to keep moving to the right so they can reach the end of the level before time runs out. But if the AI jumps onto the ledge, it then has to go backwards to get down off the ledge. The AIs have never had to go backwards before. They can’t figure it out, and they get stuck on the ledge until time runs out. “I literally spent about six weekends and thousands of hours of CPU on the problem,” said Tom Murphy, who eventually got past the ledge with some improvements to his AI’s skills at long-term planning.16
Text generation is another place where the short memory of AI can be a problem. For example, Heliograf, the journalism algorithm that translates individual lines of a spreadsheet into sentences in a formulaic sports story, works because it can write each sentence more or less independently. It doesn’t need to remember the entire article at once.
Language-translating neural networks, like the kind that power Google Translate, don’t need to remember entire paragraphs, either. Sentences, or even parts of sentences, can usually be individually translated from one language to another without any memory of the previous sentence. When there is some kind of long-term dependence, such as an ambiguity that might have been resolved with information from a previous sentence, the AI usually can’t make use of it.
Other kinds of tasks make AI’s terrible memory even more obvious. One example is algorithmically generated stories. There’s a reason AI doesn’t write books or TV shows (though people are, of course, working on this).
If you’re ever wondering whether a bit of text was written by a machine learning algorithm or a human (or at least heavily curated by a human), one way to tell is to look for major problems with memory. As of 2019, onl
y some AIs are starting to be able to keep track of long-term information in a story—and even then, they’ll tend to lose track of some bits of crucial information.
Many text-generating AIs can only keep track of a few words at a time. For example, here’s what a recurrent neural network (RNN) wrote after it was trained on nineteen thousand descriptions of people’s dreams from dreamresearch.net:
I get up and walk down the hall to his house and see a bird in the very narrow drawer and it is a group of people in the hand doors. At home like an older man is going to buy some keys. He looks at his head with a cardboard device and then my legs are parked on the table.
Now, dreams are notoriously incoherent, switching settings and mood and even characters midstream. These neural-net dreams, however, don’t maintain coherence for more than a sentence or so—sometimes considerably less. Characters who are never introduced are referred to as if they had been there all along. The whole dream forgets where it is. Individual phrases may make sense, and the rhythm of the words sounds okay if you don’t pay attention to what’s going on. Matching the surface qualities of human speech while lacking any deeper meaning is a hallmark of neural-net-generated text.