Architects of Intelligence

Page 34

by Martin Ford

Once you have that kind of memory, other things come with it, such as confirmation bias. Confirmation bias is where you remember facts that are consistent with your theory better than facts that are inconsistent with your theory. A computer doesn’t need to do that. A computer can search for everything that matches a zip code or everything that doesn’t match a zip code. It can use NOT operators. Using a computer, I can search for everybody that is male and over 40 in my zip code, or equally everybody that doesn’t match those criteria. The human brain, using cue-addressable memory, can only search for matches within data. Everything else is much harder.

If I have a theory then I can find material that matches my theory, but anything that doesn’t match doesn’t come to mind as easily. I can’t systematically search for it. That’s confirmation bias.

Another example is the focusing illusion, where I ask you two questions in one of two orders. I either ask you how happy are you with your marriage and then how happy are you with your life, or in the other order. If I ask you first how happy you are with your marriage, that influences how you think about your life in general. You should be able to keep the two things completely separate.

MARTIN FORD: That sounds like Daniel Kahneman’s anchoring theory, where he talks about how you can give people a random number, and then that number will influence their guess about anything.

GARY MARCUS: Yes, it’s a variation. If I asked you when the Magna Carta was signed, after first asking you to look at the last three digits on a dollar bill, those three digits on the dollar bill anchor your memory.

MARTIN FORD: Your career trajectory is quite different from a lot of other people in the field of AI. Your early work focused on understanding human language and the way children learn it, and more recently you co-founded a startup company and helped launch Uber’s AI labs.

GARY MARCUS: I feel a bit like Joseph Conrad (1857-1924), who spoke Polish but wrote in English. While he wasn’t a native speaker of English, he had a lot of insights into the workings of it. In the same way, I think of myself as not a native speaker of machine learning or AI, but as someone who is coming to AI from the cognitive sciences and has fresh insights.

I did a lot of computer programming throughout my childhood and thought a lot about artificial intelligence, but I went to graduate school more interested in the cognitive sciences than artificial intelligence. During my time at graduate school, I studied with the cognitive scientist Steven Pinker, where we looked at how children learn the past tense within a language and then examined that using the precursors to deep learning that we had at the time, namely multi-layer and two-layer perceptrons.

In 1986, David Rumelhart and James L. McClelland published a paper titled Parallel Distributed Processing: explorations in the microstructure of cognition, which showed that a neural network could learn the past tense of English. Pinker and I looked at the paper in some detail, and although it was true that you could get a neural network to overregularize and say things like “goed” or “breaked” like kids do, all of the facts about when and how they made those errors were actually quite different. In response to the paper, we hypothesized that kids use a hybrid of both rules and neural networks.

MARTIN FORD: You’re talking about irregular word endings, where kids will sometimes make them regular by mistake.

GARY MARCUS: Right, kids sometimes regularize irregular verbs. I once did an automated machine-driven analysis of 11,000 utterances of kids talking to their parents with past-tense verbs. In my study, I was looking at when kids made these overregularization errors and plotting the time course of the errors and which verbs were more vulnerable to these errors.

The argument that we made was that children seem to have a rule for the regulars. For example, they add -ed, but at the same time, they also had something of an associative memory, which you might think of nowadays as a neural network, to do the irregular verbs. The idea is if you’re inflecting the verb “sing” as “sang” in the past tense, you might be just using your memory for that. If your memory understands “sing” and “sang,” it’ll help you to remember “ring” and “rang.”

However, if you inflect a word that doesn’t sound like anything you’ve heard before, like to “rouge,” which would be to apply rouge to your face, then the word doesn’t need to sound like anything you’ve heard before. You’ll still know to add -ed to it. You’d say, “Diane rouged her face yesterday.”

The point of that was that while neural networks are very good at things that work by similarity, they’re very weak at things where you don’t have a similarity but where you still understand the rule. That was 1992, 25 years later and that basic point still persists today. Most neural networks still have the problem that they’re very data-driven, and they don’t induce a high level of abstraction relative to what they’ve been trained on.

Neural networks are able to capture a lot of the garden-variety cases, but if you think about a long-tail distribution, they’re very weak at the tail. Here’s an example from a captioning system: a system might be able to tell you that a particular image is of a group of kids playing frisbee, simply because there are a lot of pictures that are like that, but if you show it a parking sign covered with stickers then it might say it’s a refrigerator filled with food and drinks. That was an actual Google captioning result. That’s because there aren’t that many examples of parking signs covered with stickers in the database, so the system performs miserably.

That key problem of neural networks not being able to generalize well outside of some core situations has been something that’s interested me for my entire career. From my point of view, it’s something that the machine learning field has still not really come to grips with.

MARTIN FORD: Understanding human language and learning is clearly one of the pillars of your research. I was wondering if you could delve into some real-life experiments that you’ve undertaken?

GARY MARCUS: During my years of studying this from the perspective of understanding human generalization, I did research with children, adults, and ultimately with babies in 1999, all of which pointed to humans being very good at abstraction.

The experiment with babies showed that seven-month-olds could hear two minutes of an artificial grammar and recognize the rules of sentences constructed by that grammar. Babies would listen to sentences like “la ta ta” and “ga na na” for two minutes with A-B-B grammar and would then notice that “wo fe wo” had a different grammar (an A-B-A grammar) as opposed to “wo fe fe” that had the same grammar as the other sentences that they’d been trained on.

This was measured by how long they would look. We found that they would look longer if we changed the grammar. That experiment really nailed that from very early on in life babies have an ability to recognize pretty deep abstractions in the language domain. Another researcher later showed that newborns could do the same thing.

MARTIN FORD: I know that you have a great interest in IBM’s Watson, and that it drew you back into the field of AI. Could you talk about why Watson reignited your interest in artificial intelligence?

GARY MARCUS: I was skeptical about Watson, so I was surprised when it first won at Jeopardy in 2011. As a scientist, I’ve trained myself to pay attention to the things that I get wrong, and I thought natural language understanding was too hard for a contemporary AI to do. Watson should not be able to beat a human in Jeopardy, and yet it did. That made me start thinking about AI again

I eventually figured out that the reason Watson won is because it was actually a narrower AI problem than it first appeared to be. That’s almost always the answer. In Watson’s case it’s because about 95% of the answers in Jeopardy turn out to be the titles of Wikipedia pages. Instead of understanding language, reasoning about it and so forth, it was mostly doing information retrieval from a restricted set, namely the pages that are Wikipedia titles. It was actually not as hard of a problem as it looked like to the untutored eye, but it was interesting enough that it got me to think ab
out AI again.

Around the same time, I started writing for The New Yorker, where I was producing a lot of pieces about neuroscience, linguistics, psychology, and also AI. In my pieces, I was trying to use what I knew about cognitive science and everything around that—how the mind and language work, how children’s minds develop, etc.—in order to give me a better understanding of AI and the mistakes people were making.

Around the same time, I starting writing and thinking a lot more about AI. One was a critical piece on one of Ray Kurzweil’s books. Another was about self-driving cars and how they would make a decision if an out-of-control school bus were hurtling toward them. Another, very prescient, piece criticized deep learning, saying that I think, as a community, we should understand it as one tool among many, not as a complete solution to AI. When I wrote that piece five years ago, I said that I didn’t think deep learning would be able to do things like abstraction and causal reasoning, and if you look carefully, you’ll see that deep learning is still struggling with exactly that set of problems.

MARTIN FORD: Let’s talk about the company you started in 2014, Geometric Intelligence. I know that was eventually bought by Uber, shortly after which you moved to Uber and became the head of their AI labs. Can you take us through that journey?

GARY MARCUS: Back in January 2014 it occurred to me that instead of writing about AI I should actually try to start a company of my own. I recruited some great people, including my friend Zoubin Ghahramani, who is one of the best machine learning people in the world, and I spent the next couple of years running a machine learning company. I learned a lot about machine learning and we built on some ideas of how to generalize better. That became our company’s core intellectual property. We spent a lot of time trying to make algorithms learn more efficiently from data.

Deep learning is incredibly greedy in terms of the amount of data that it needs in order to solve a problem. That works well in artificial worlds, such as the game of Go, but it doesn’t work that well over in the real world, where data is often expensive or difficult to obtain. We spent a lot of our time trying to do better in that area and had some nice results. For example, we could learn arbitrary tasks like the MNIST character recognition task with half as much data as deep learning.

Word got around, and eventually, we sold to Uber in December 2016. This entire process taught me quite a bit about machine learning, including its strengths and weaknesses. I worked briefly at Uber, helping with the launch of Uber AI labs, and then moved on. Since then, I’ve been researching into how AI and medicine can be combined, and also thinking a lot about robotics.

In January of 2018, I wrote two papers (https://arxiv.org/abs/1801.00631) as well as a couple of pieces on Medium. One strand of that was about deep learning and how although it’s very popular and our best tool for AI at the moment, it’s not going to get us to AGI (artificial general intelligence). The second piece was about innateness, saying that, at least in biology, systems start with a lot of inherent structure, whether you’re talking about the heart, the kidney, or the brain. The brain’s initial structure is important for how we go about understanding the world.

People talk about Nature versus Nurture, but it’s really Nature and Nurture working together. Nature is what constructs the learning mechanisms that allow us to make use of our experience in interesting ways.

MARTIN FORD: That’s something that’s demonstrated by experiments with very young babies. They haven’t had time to learn anything, but they can still do essential things like recognize faces.

GARY MARCUS: That’s right. My research with eight-month-olds also bears that out, and a recent paper in Science suggests that children are able to do logical reasoning after just the first year of life. Keep in mind that innate doesn’t mean exactly at birth. My ability to grow a beard is not something I began at birth, it was timed to hormones and puberty. A lot of the human brain actually develops outside the womb, but relatively early in life.

If you look at precocial species like horses, they can walk almost right away after being born, and they have fairly sophisticated vision and obstacle detection. Some of those mechanisms for humans get wired up in the first year of life. You’ll often hear people say that a baby learns to walk, but I don’t think that’s actually the case. There’s certainly some learning and calibration of muscle forces and so on, but some of it is maturation. A head containing a fully developed human brain would be too big to pass through the birth canal.

MARTIN FORD: Even if you had an innate ability to walk, you’d have to wait for the muscles to develop before you could put it into operation.

GARY MARCUS: Right, and those aren’t fully developed either. We come out not quite fully hatched, and I think that confuses people. A lot of what’s going on in the first few months is still pretty much genetically controlled. It’s not about learning per se.

Look at a baby ibex. After a couple of days, it can scramble down the side of a mountain. It doesn’t learn that by trial-and-error—if it falls off the side of a mountain then it’s dead—yet it can do spectacular feats of navigation and motor control.

I think our genomes wire a very rich first draft of how our brains should operate, then there’s lots of learning on top of that. Some of that first draft is, of course, about making the learning mechanisms themselves.

People in AI often try to build things with as little prior knowledge as they can get away with, and I think that’s foolish. There’s actually lots of knowledge about the world that’s been gathered by scientists and ordinary people that we should be building into our AI systems, instead of insisting, for no really good reason, that we should start from scratch.

MARTIN FORD: Any innateness that exists in the brain has to be the result of evolution, so with an AI you could either hardcode that innateness, or perhaps you could use an evolutionary algorithm to generate it automatically.

GARY MARCUS: The problem with that idea is that evolution is pretty slow and inefficient. It works over trillions of organisms and billions of years to get great results. It’s not clear that you’d get far enough with evolution in a lab in a reasonable timeframe.

One way to think about this problem is that the first 900 million years of evolution were not that exciting. Mostly you had different versions of bacteria, which is not that exciting. No offense to the bacteria.

Then suddenly things pick up and you get vertebrates, then mammals, then primates, and finally, you get us. The reason that the pace of evolution increased is because it’s like having more subroutines and more library code in your programming. The more subroutines you have, the quicker you can build more complicated things on top of that. It’s one thing to build a human on top of a primate brain with 100 or 1,000 important genetic changes, but you wouldn’t be able to make a similar leap from bacteria to a human brain.

People working on evolutionary neural networks often start too close to the bone. They’re trying to evolve individual neurons and connections between them, when my belief is that in the biological evolution of, say, humans, you already had very sophisticated sets of genetic routines. Essentially, you’ve got cascades of genes on which to operate and people haven’t really figured out how to do that in the evolutionary programming context.

I think they will eventually, but partly because of prejudice they haven’t so far. The prejudice is, “I want to start from scratch in my lab and show that I can be God by creating this in seven days.” That’s ridiculous; it’s not going to happen.

MARTIN FORD: If you were going to build this innateness into an AI system, do you have a sense of what that would look like?

GARY MARCUS: There are two parts to it. One is functionally what it should do, and the other is mechanically how you should do it.

At the functional level, I have some clear proposals drawing from my own work, and that of Elizabeth Spelke at Harvard. I laid this out in a paper that I wrote early in 2018, where I talked about ten different things that would be required (https://arxiv.org
/abs/1801.05667). I won’t go into them in depth here, but things like symbol manipulation and the ability to represent abstract variables, which computer programs are based on; operations over those variables, which is what computer programs are; a type-token distinction, recognizing this bottle as opposed to bottles in general; causality; spatial translation or translation invariance; the knowledge that objects tend to move on paths that are connected in space and time; the realization that there are sets of things, places, and so on.

If you had things like that, then you could learn about what particular kinds of objects do when they’re in particular kinds of places and they’re manipulated by particular kinds of agents. That would be better than just learning everything from pixels, which is a very popular but I think ultimately inadequate idea that we are seeing in the field right now.

What we see at the moment is people doing deep reinforcement learning over pixels of, for example, the Atari game Breakout, and while you get results that look impressive, they’re incredibly fragile.

DeepMind trained an AI to play Breakout, and when you watch it, it looks like it’s doing great. It’s supposedly learned the concept of breaking through the wall and trapping the ball at the top so it can ricochet across a lot of blocks. However, if you were to move the paddle three pixels up, the whole system breaks because it doesn’t really know what a wall is or what a ricochet is. It’s really just learned contingencies, and it’s interpolating between the contingencies that it’s memorized. The programs are not learning the abstraction that you need, and this is the problem with doing everything from pixels and very low-level representations.

MARTIN FORD: It needs a higher level of abstraction to understand objects and concepts.

GARY MARCUS: Exactly. You may also need to actually build in certain notions. like “object.” One way to think about it is like the ability to learn to process color. You don’t start with black-and-white vision and eventually learn that there is color. It starts by having two different color-receptor pigments that are sensitive to particular parts of the spectrum. Then, from there, you can learn about particular colors. You need some piece to be innate before you can do the rest. Maybe in a similar way, you might need to have innately the notion that there’s an object, and maybe the constraint that objects don’t just randomly appear and disappear.

‹ Prev Next ›