On the next page is another example, this time a recipe, where it’s even easier to see the effects of memory limitation. This recipe was generated by the same recurrent neural network, or machine learning algorithm, that generated the recipes here. (As you can see, this is the one that learned from a variety of recipes, including, apparently, recipes for black pudding, a type of blood sausage.) This neural network builds a recipe letter by letter, looking at the letters it’s already generated to decide which one comes next. But each extra letter that it looks at requires more memory, and there’s only so much memory available on the computer that’s running it. So to make the memory demands manageable, the neural network looks only at the most recent characters, a few at a time. For this particular algorithm and my computer, the largest memory I could give it was sixty-five characters. So every time it had to come up with the next letter of the recipe, it only had information about the previous sixty-five characters.* You can tell where in the recipe it ran out of memory and forgot it was making a chocolate dessert—about when it decided to add black pepper and whatever “rice cream” is.
This memory limitation is beginning to change. Researchers are working on making recurrent neural networks that can look at short-term and long-term features when predicting the next letters in a text. The idea is similar to the algorithms that look at small-scale features in images first (edges and textures, for example), then zoom out to look at the big picture. These strategies are called convolution. A neural network that uses convolution (and that is also hundreds of times larger than the one I trained on my laptop) can keep track of information long enough to remain on topic. The following recipe is from a neural network called GPT-2, which OpenAI trained on a huge selection of webpages, and which I then fine-tuned by training it on all kinds of recipes.
Chunk Cake cakes, deserts
8 cup flour
4 lb butter; room temperature
2 ¼ cup corn syrup; divided
2 eggs; pureed and cooled
1 teaspoon cream of tartar
½ cup m&m’s
8 oz chunky whites
1 chocolate sifted
Cream 2 ¼ cups of flour at medium speed until thickened.
Lightly grease and flour two greased and waxed paper-lined box ingredients; combine flour, syrup, and eggs. Add cream of tartar. Pour into a gallon-size loaf-pan. Bake at 450 degrees for 35 minutes. Meanwhile, in large bowl, combine syrup, whites, and chocolate; stir which until thoroughly mixed. Cool pan. Pour 2 tb chocolate mixture over whole cake. Refrigerate until serving time.
Yield: 20 servings
With its memory improved by convolution, the GPT-2 neural net remembers to use most of its ingredients, and even remembers that it’s supposed to be making cake. Its directions are still somewhat improbable—plain flour won’t thicken no matter how long you cream it, and the flour/syrup/egg mixture is unlikely to turn into cake, even with the addition of cream of tartar. It’s still an impressive improvement compared to the Chocolate Butterbroth Black Pudding.
Here’s another example from GPT-2, this time its attempt at writing Harry Potter fan fiction. The algorithm was able to keep track of which characters were in the scene and even remember recurring motifs—in this case, remembering that there was already a snake on Snape’s head.
Snape: I understand.
[A snake appears and Snape puts it on his head and it appears to do the talking. It says ‘I forgive you.’]
HARRY: You can’t go back if you don’t forgive.
Snape: [sighing] Hermione.
HARRY: Okay, listen.
Snape: I want to apologize to you for getting angry and upset over this.
HARRY: It’s not your fault.
HARRY: That’s not what I meant to imply.
[Another snake appears then it says ‘And I forgive you.’]
HERMIONE: And I forgive you.
Snape: Yes.
Another strategy for dealing with memory limits is to group basic units together so the neural network can achieve coherence while remembering fewer things. Rather than remembering sixty-five letters, it might remember sixty-five entire words, or even sixty-five plot elements. If I had restricted my neural network to a specially crafted set of required ingredients and allowable ranges—as a team at Google did when trying to design a new gluten-free chocolate chip cookie—it would have produced valid recipes every time.17 Unfortunately, Google’s result, though more cookielike than anything my algorithm could have produced, was reportedly still terrible.18
IS THERE A SIMPLER WAY OF SOLVING THIS PROBLEM?
This leads us to one of the final things that determines whether a problem is a good one for AI (although it doesn’t determine whether people will try to use AI to solve the problem anyway): is AI really the simplest way of solving it?
Some problems were tough to make progress on before we had big AI models and lots of data. AI revolutionized image recognition and language translation, making smart photo tagging and Google Translate ubiquitous. Those problems are hard for people to write down general rules for, but an AI approach can analyze lots of information and form its own rules. Or an AI can look at one hundred characteristics of phone customers who switched to a different provider, then figure out how to guess which customers are likely to switch in the future. Maybe the volatile customers are young, live in areas with poorer than average coverage, and have been customers for less than six months.
The danger, however, is misapplying a complex AI solution to a situation that would be better handled by a bit of common sense. Maybe the customers who leave are the ones on the weekly cockroach delivery plan—that plan is terrible.
LET THE AI DRIVE?
What about self-driving cars? There are many reasons why this is an attractive problem for AI. We would love to automate driving, of course—many people find it tedious or at times even impossible. A competent AI driver would have lightning-fast reflexes, would never weave or drift in its lane, and would never drive aggressively. In fact, self-driving cars tend to sometimes be too timid and have trouble merging with rush-hour traffic or turning left on a busy road.19 The AI would never get tired, though, and could take the wheel for endless hours while the humans nap or party.
We can also accumulate lots of example data as long as we can afford to pay human drivers to drive around for millions of miles. We can easily build virtual driving simulations so that the AI can test and refine its strategies in sped-up time.
The memory requirements for driving are modest, too. This moment’s steering and velocity don’t depend on things that happened five minutes ago. Navigation takes care of planning for the future. Road hazards like pedestrians and wildlife come and go in a matter of seconds.
And finally, controlling a self-driving car is so difficult that we don’t have other good solutions. AI is the solution that’s gotten us the furthest so far.
Yet it is an open question whether driving is a narrow enough problem to be solved with today’s AI or whether it will require something more like the human-level artificial general intelligence (AGI) I mentioned earlier. So far, AI-driven cars have proved themselves able to drive millions of miles on their own, and some companies report that a human needed to intervene on test drives only once every few thousand or so miles. It’s that rare need for intervention, however, that’s proving tough to eliminate fully.
Humans have needed to rescue the AIs of self-driving cars from a variety of situations. Usually companies don’t disclose the reasons for these so-called disengagements, only the number of them, which is required by law in some places. This may be in part because the reasons for disengagement can be frighteningly mundane. In 2015 a research paper20 listed some of them. The cars in question, among other things,
• saw overhanging branches as an obstacle,
• got confused about which lane another car was in,
• decided that the intersection had too many pedestrians for it to handle,
• didn’t see a car exit
ing a parking garage, and
• didn’t see a car that pulled out in front of it.
A fatal accident in March 2018 was the result of a situation like this—a self-driving car’s AI had trouble identifying a pedestrian, classifying her first as an unknown object, then as a bicycle, and then finally, with only 1.3 seconds left for braking, as a pedestrian. (The problem was further confounded by the fact that the car’s emergency braking systems were disabled in favor of alerting the car’s backup driver, yet the system was not designed to actually alert the backup driver. The backup driver had also spent many, many hours riding with no intervention needed, a situation that would make the vast majority of humans less than alert.)21 A fatal accident in 2016 also happened because of an obstacle-identification error—in this case, a self-driving car failed to recognize a flatbed truck as an obstacle (see the box on the next page).
In 2016 there was a fatal accident when a driver used Tesla’s autopilot feature on city streets instead of the highway driving that it had been intended for. A truck crossed in front of the car, and the autopilot’s AI failed to brake—it didn’t register the truck as an obstacle that needed to be avoided. According to analysis by Mobileye (who designed the collision-avoidance system), because their system had been designed for highway driving, it had only been trained to avoid rear-end collisions. That is, it had only been trained to recognize trucks from behind, not from the side. Tesla reported that when the AI detected the truck, it recognized it as an overhead sign and decided it didn’t need to brake.22
That’s not to mention the more unusual situations that can occur. When Volkswagen tested its AI in Australia for the first time, they discovered it was confused by kangaroos. Apparently it had never before encountered anything that hopped.23
Given the sheer variety of things that can happen on a road—parades, escaped emus, downed electrical lines, lava, emergency signs with unusual instructions, molasses floods, and sinkholes—it’s inevitable that something will occur that an AI never saw in training. It’s a tough problem to make an AI that can deal with something completely unexpected—that would know that an escaped emu is likely to run wildly around while a sinkhole will stay put and to understand intuitively that just because lava flows and pools sort of like water does, it doesn’t mean you can drive through a puddle of it.
Car companies are trying to adapt their strategies to the inevitability of mundane glitches or freak weirdness on the road. They’re looking into limiting self-driving cars to closed, controlled routes (this doesn’t necessarily solve the emu problem; they are wily) or having self-driving trucks caravan behind a lead human driver. In other words, the compromises are leading us toward solutions that look very much like mass public transportation.
As of right now, when the AIs get confused, they disengage—that is, they suddenly hand control back to the human behind the wheel. Automation level 3, conditional automation, is the highest level of car autonomy commercially available—in Tesla’s autopilot mode, for example, the car can drive for hours unguided, but a human driver can be called to take over at any moment. The problem with this level of automation is that the human had better be behind the wheel and paying attention, not in the back seat decorating cookies. And humans are very, very bad at being alert after boring hours of idly watching the road. Human rescue is often a decent option for bridging the gap between the AI performance we have and the performance we need, but humans are pretty bad at rescuing self-driving cars.
So making self-driving cars is at once an attractive and very difficult AI problem. To get mainstream self-driving cars, we may need to make compromises (like creating controlled routes and sticking with automation level number 4), or we may need AI that’s significantly more flexible than the AI we have now.
In the next chapter, we’ll look at the types of AI that are behind things like self-driving cars—modeled after brains, evolution, and even the game of call my bluff.
CHAPTER 3
How does it actually learn?
Remember that in this book I’m using the term AI to mean “machine learning programs.” (Refer to the handy chart here for a list of stuff that I am or am not considering to be AI. Sorry, person in a robot suit.) A machine learning program, as I explained in chapter 1, uses trial and error to solve a problem. But how does that process work? How does a program go from producing a jumble of random letters to writing recognizable knock-knock jokes, all without a human telling it how words work or what a joke even is?
There are lots of different methods of machine learning, many of which have been around for decades, often long before people started calling them AI. Today, these technologies are combined or remixed or made ever more powerful by faster processing and bigger datasets. In this chapter we’ll look at a few of the most common types, peeking under the hood to see how they learn.
NEURAL NETWORKS
These days, when people talk about AI, or deep learning, what they’re often referring to are artificial neural networks (ANNs). (ANNs have also been known as cybernetics, or connectionism.)
There are lots of ways to build artificial neural networks, each meant for a particular application. Some are specialized for image recognition, some for language processing, some for generating music, some for optimizing the productivity of a cockroach farm, some for writing confusing jokes. But they’re all loosely modeled after the way the brain works. That’s why they’re called artificial neural networks—their cousins, biological neural networks, are the original, far more complex models. In fact, when programmers made the first artificial neural networks, in the 1950s, the goal was to test theories about how the brain works.
In other words, artificial neural networks are imitation brains.
They’re built from a bunch of simple chunks of software, each able to perform very simple math. These chunks are usually called cells or neurons, an analogy with the neurons that make up our own brains. The power of the neural network lies in how these cells are connected.
Now, compared to actual human brains, artificial neural networks aren’t that powerful. The ones I use for a lot of the text generation in this book have as many neurons as… a worm.
Unlike a human, the neural net is at least able to devote its entire one-worm-power brain to the task at hand (if I don’t accidentally distract it with extraneous data). But how can you solve problems using a bunch of interconnected cells?
The most powerful neural networks, the ones that take months and tens of thousands of dollars’ worth of computing time to train, have far more neurons than my laptop’s neural net, some even exceeding the neuron count of a single honeybee. Looking at how the size of the world’s largest neural networks has increased over time, a leading researcher estimated in 2016 that artificial neural networks might be able to approach the number of neurons in the human brain by around 2050.1 Will this mean that AI will approach the intelligence of a human then? Probably not even close. Each neuron in the human brain is much more complex than the neurons in an artificial neural network—so complex that each human neuron is more like a complete many-layered neural network all by itself. So rather than being a neural network made of eighty-six billion neurons, the human brain is a neural network made of eighty-six billion neural networks. And there are far more complexities to our brains than there are to ANNs, including many we don’t fully understand yet.
THE MAGIC SANDWICH HOLE
Let’s say, hypothetically, that we have discovered a magic hole in the ground that produces a random sandwich every few seconds. (Okay, this is very hypothetical.) The problem is that the sandwiches are very, very random. Ingredients include jam, ice cubes, and old socks. If we want to find the good ones, we’ll have to sit in front of the hole all day and sort them.
But that’s going to get tedious. Good sandwiches are only one in a thousand. However, they are very, very good sandwiches. Let’s try to automate the job.
To save ourselves time and effort, we want to build a neural network that can look at e
ach sandwich and decide whether it’s good. For now, let’s ignore the problem of how to get the neural network to recognize the ingredients the sandwiches are made of—that’s a really hard problem. And let’s ignore the problem of how the neural network is going to pick up each sandwich. That’s also really, really hard—not just recognizing the motion of the sandwich as it flies from the hole but also instructing a robot arm to grab a slim paper-and-motor-oil sandwich or a thick bowling-ball-and-mustard sandwich. Let’s assume, then, that the neural net knows what’s in each sandwich and that we’ve solved the problem of physically moving the sandwiches. It just has to decide whether to save this sandwich for human consumption or throw it into the recycling chute. (We’re also going to ignore the mechanism of the recycling chute—let’s say it’s another magic hole.)
This reduces our task to something simple and narrow—as we discovered in chapter 2, that makes it a good candidate for automation with a machine learning algorithm. We have a bunch of inputs (the names of the ingredients), and we want to build an algorithm that will use them to figure out our single output, a number that indicates whether the sandwich is good. We can draw a simple “black box” picture of our algorithm, and it looks like this:
We want the “deliciousness” output to change depending on the combination of ingredients in the sandwich. So if a sandwich contains eggshells and mud, our black box should do this:
You Look Like a Thing and I Love You Page 5