Architects of Intelligence
Page 51
I really got into the field in a serious way after taking a class on neural networks in my sophomore year in college, which would have been 1991. During that time, my dad introduced me to a friend and colleague of his at Stanford named Roger Shepard, who was one of the great cognitive psychologists of all time. Although he’s long retired, he was one of the people who pioneered the scientific and mathematical study of mental processes through the 1960s, ‘70s, and ‘80s, when I worked with him. I wound up getting a summer job with him programming some neural network implementations of a theory that Roger had been working on. The theory was of how humans, and many other organisms, solve the basic problem of generalization, which turned out to be an incredibly deep problem.
Philosophers have thought about this for hundreds, if not thousands, of years. Plato and Aristotle considered this, as did Hume, Mill, and Compton, not to mention many 20th century philosophers of science. The basic problem is, how do we go beyond specific experiences to general truths? Or from the past to the future? In the case that Roger Shepard was thinking about, he was working on the basic mathematics of how might an organism, having experienced a certain stimulus to have some good or negative consequence, figure out which other things in the world are likely to have that same consequence?
Roger had introduced some mathematics based on Bayesian statistics for solving that problem, which was a very elegant formulation of the general theory of how organisms could generalize from experience and he was looking to neural networks to try to take that theory and implement it in a more scalable way. Somehow, I wound up working with him on this project. Through that, I was exposed to both neural networks, as well as to Bayesian analyses of cognition early on, and you can view most of my career since then as working through those same ideas and methods. I was just very lucky to have been exposed to exciting ideas from great thinkers and people who wanted to work with me from an early age, and then that led to me going into graduate school in the field.
I ended up going to graduate school at MIT—in the same department that I am now a professor in. After my PhD, Roger was very supportive of me and helped to bring me to Stanford, where I spent a couple of years as an assistant professor in psychology before I moved back to MIT and Brain and Cognitive Science, where I am now. A key feature of this route is that I came to AI from the natural science side, thinking about how human minds and brains work, or biological intelligence more generally. I was trying to understand human intelligence in mathematical, computational, and engineering terms.
I describe what I do as “reverse engineering the mind,” and what that means is trying to approach the basic science of how intelligence works in the human mind as an engineer. The goal is to understand and to build models in the language and with the technical tools of engineering. We view the mind as an incredible machine that has been engineered through various processes, such as both biological and cultural evolution, learning, and development, and is developed to solve problems. If we approach it like an engineer to try to understand what problems it has been designed to solve and how it solves them, then we think that it is the best way that we can formulate our science.
MARTIN FORD: If you were advising a younger person who was considering a career in AI research, would you say that studying brain science and human cognition are important? Do you think that there is too much emphasis put on pure computer science?
JOSH TENENBAUM: I always saw these two things as being two sides of the same coin; it just made sense to me. I was interested in computer programming, and I was interested in the idea that you could program an intelligent machine. But I was just always more animated by what is clearly one of the biggest scientific and even philosophical questions of all time. The idea that it could be linked up and have a common purpose with building intelligent machines was the most exciting idea, as well as being a promising one.
My background training is not especially in biology, but more like what you might call psychology or cognitive science. More about the software of the mind, rather than the hardware of the brain, although the only reasonable scientific view is to see those as being deeply connected because, of course, they are. That’s partly what led me to MIT, where we have this department of Brain and Cognitive Science. In the mid-1980s, it used to be called the Department of Psychology, but it was always a very biologically grounded psychology department.
To me, the most interesting and biggest questions are the scientific ones. The engineering side is a way toward building more intelligent machines, but to me the value of that is as a proof of concept that our scientific models are doing the work they’re supposed to be doing. It’s a very important test, a sanity check, and rigor check because there are so many models on the scientific side that may fit a data set that somebody collected on human behavior or neural data, but if those models don’t solve the problem that the brain and the mind must solve, then they probably aren’t right.
To me it’s always been an important source of constraint that we want our models of how the brain and mind work to actually fit with all of the data that we have on the scientific side, but also to actually be implementable as engineering models that take the same kind of inputs that come into the brain and gives the same kind of outputs. That is also going to lead to all sorts of applications and payoffs. If we can understand how intelligence works in the mind and brain in engineering terms, then that is one direct route for translating the insights from neuroscience and cognitive science into various kinds of AI technologies.
More generally, I think that if you approach science like an engineer, and you say the point of neuroscience and cognitive science is not just to collect a bunch of data, but to understand the basic principles—the engineering principle by which brains and minds work—then that’s a certain viewpoint on how to do the science, but then your insights are directly translatable into useful ideas for AI.
If you look at the history of the field, I think it’s not unreasonable to say that many, if not most, of the best, interesting, new, and original ideas in artificial intelligence came from people who were trying to understand how human intelligence works. That includes the basic mathematics of what we now call deep learning and reinforcement learning, but also much further back to Boole as one of the inventors of mathematical logic, or Laplace in his work on probability theory. In more recent times, Judea Pearl, in particular, was fundamentally interested in understanding the mathematics of cognition and the way people reason under uncertainty and that led to his seminal work on Bayesian networks for probabilistic inference and causal modeling in AI.
MARTIN FORD: You described your work as an attempt to “reverse engineer the mind.” Tell me about your actual methodology for attempting that. How are you going about it? I know you do a lot of work with children.
JOSH TENENBAUM: In the first part of my career, the big question that I would always start from and come back to was the question of, how do we get so much from so little? How do humans learn concepts not from hundreds or thousands of examples, as machine learning systems have always been built for, but from just one example?
You can see that in adults, but you can also see that in children when they are learning the meaning of a word. Children can often learn a new word from seeing just one example of that word used in the right context, whether it’s a word like a noun that refers to an object, or a verb that refers to an action. You can show a young child their first giraffe, and now they know what a giraffe looks like; you can show them a new gesture or dance move, or how you use a new tool, and right away they’ve got it; they may not be able to make that move themselves, or use that tool, but they start to grasp what’s going on.
Or think about learning causality, for example. We learn in basic statistics classes that correlation and causation are not the same thing, and correlation doesn’t always imply causation. You can take a dataset, and you can measure that the two variables are correlated, but it doesn’t mean that one causes the other. It could be that A causes B, B
causes A, or some third variable causes both.
The fact that correlation doesn’t uniquely imply causation is often cited to show how difficult it is to take observational data and infer the underlying causal structure of the world, and yet humans do this. In fact, we solve a much harder version of this problem. Even young children can often infer a new causal relation from just one or a few examples—they don’t even need to see enough data to detect a statistically significant correlation. Think about the first time you saw a smartphone, whether it was an iPhone or some other device with a touchscreen where somebody swipes their finger across a little glass panel, and suddenly something lights up or moves. You had never seen anything like that before, but you only need to see that once or a couple of times to understand that there’s this new causal relation, and then that’s just your first step into learning how to control it and to get all sorts of useful things done. Even a very young child can learn this new causal relation between moving your finger in a certain way and a screen lighting up, and that is how all sorts of other possibilities of action open to you.
These problems of how we make a generalization from just one or a few examples are what I started working on with Roger Shepard when I was just an undergraduate. Early on, we used these ideas from Bayesian statistics, Bayesian inference, and Bayesian networks, to use the mathematics of probability theory to formulate how people’s mental models of the causal structure of the world might work.
It turns out that tools that were developed by mathematicians, physicists, and statisticians to make inferences from very sparse data in a statistical setting were being deployed in the 1990s in machine learning and AI, and it revolutionized the field. It was part of the move from an earlier symbolic paradigm for AI to a more statistical paradigm. To me, that was a very, very powerful way to think about how our minds were able to make inferences from sparse data.
In the last ten years or so, our interests have turned more to where these mental models come from. We’re looking at the minds and brains of babies and young children, and really trying to understand the most basic kind of learning processes that build our basic common-sense understanding of the world. For the first ten years or so of my career, so from the late 1990s until the late 2000s, we made a lot of progress modeling individual aspects of cognition using these Bayesian models, such as certain aspects of perception, causal reasoning, how people judge similarity, how people learn the meanings of words, and how people make certain kinds of plans, decisions, or understand other people’s decisions, and so on.
However, it seemed like we still didn’t really have a handle on what intelligence is really about—a flexible, general-purpose intelligence that allows you to do all of those things that you can do. 10 years ago in cognitive science, we had a bunch of really satisfying models of individual cognitive capacities using this mathematics of ways people made inferences from sparse data, but we didn’t have a unifying theory. We had tools, but we didn’t have any kind of model of common sense.
If you look at machine learning and AI technologies, and this is as true now as it was ten years ago, we were increasingly getting machine systems that did remarkable things that we used to think only humans could do. In that sense, we had real AI, in the sense of these AI technologies, but we didn’t have any real AI. We still don’t have any real AI in the sense of the original vision of the founders of the field, of what I think you might refer to as AGI—machines that have that same kind of flexible, general-purpose, common sense intelligence that every human uses to solve problems for themselves. But we are starting to lay the foundations for that now.
MARTIN FORD: Is AGI something that you’re focused on?
JOSH TENENBAUM: Yes, in the last few years general-purpose intelligence has really been what I’ve been interested in. I’m trying to understand what that would be like, and how we could capture that in engineering terms. I’ve been heavily influenced by a few colleagues like Susan Carey, and Elizabeth Spelke, who are both professors at Harvard now, who studied these questions in babies and young children. I believe that’s where we ought to look for this, it’s what all our intelligence starts with and it’s where our deepest and most interesting forms of learning happen.
Elizabeth Spelke is one of the most important people that anybody in AI should know if they’re going to look to humans. She has very famously shown that from the age of two to three months, babies already understand certain basic things about the world, like how it’s made from physical objects in three dimensions that don’t wink in and out of existence. It’s what we typically call object permanence. It used to be thought that that was something that kids came to and learned by the time they were one year old, but Spelke and others have shown that in many ways our brains are born already prepared to understand the world in terms of physical objects, and in terms of what we call intentional agents.
MARTIN FORD: There’s a debate over the importance of innate structure in AI. Is this evidence that that kind of structure is very important?
JOSH TENENBAUM: The idea that you could build machine intelligence by looking at how humans grow into intelligence—a machine that starts as a baby and learns like a child—was famously introduced by Alan Turing in the same paper where he introduced the Turing test, so it could really be the oldest good idea in AI. Back in 1950, this was Turing’s only suggestion on how you might build a machine that would pass a Turing test because back then nobody knew how to do that. Turing suggested that instead of trying to build a machine brain that was like an adult, we might build a child brain and then teach it the way we teach children.
In making his proposal, Turing was effectively taking a position on the nature-nurture question. His thinking was that children’s brains presumably start off much simpler than adults’ brains. He said, more or less, “Presumably a child’s brain is something like a notebook when you buy it from the stationers: a rather little mechanism, and lots of blank sheets.” So, building a child machine would be a sensible starting place on a scaling route to AI. Turing was probably right there. But he didn’t know what we know now about the actual starting state of the human mind. What we now from the work of people like Elizabeth Spelke, Renee Baillargeon, Laura Schulz, Alison Gopnik, and Susan Carey is that babies start off with a lot more structure than we might have thought. We also know that the learning mechanisms that children have are a lot smarter and more sophisticated. So, in some sense, our current understanding from the scientific side is that the possibilities of both nature and nurture are more than we thought when the notion of AI was first proposed.
If you look at not just Turing’s suggestions, but the way many AI people have since invoked that idea, you know that they are not looking at the real science of how babies’ brains work, rather they’re appealing to that intuitive, but incorrect, idea that babies’ brains start off very simple, or that some kind of simple trial and error or unsupervised learning takes place. These are often ways that people in AI will talk about how children learn. Children do learn from trial and error, and they do learn in an unsupervised way, but it’s much more sophisticated, especially the ways in which they learn from much less data and with much deeper kinds of understanding and explanatory frameworks. If you look at what machine learning usually means by trial and error learning or unsupervised learning, you’re still talking about very data-hungry methods that learn relatively superficial kinds of patterns.
I’ve been inspired by the insights coming from cognitive scientists and developmental psychologists trying to explain and understand what we see and how we imagine things that we haven’t seen, how we make plans and solve problems in the course of trying to make those things actually exist, and how learning is about taking these mental models that guide our explaining, our understanding, our planning, and our imagining and refining them, debugging them, and building new models. Our minds don’t just find patterns in big data.
MARTIN FORD: Is this what you’ve been focusing on in your recent work with children?
/>
JOSH TENENBAUM: Yes, I’m trying to understand the ways in which even young children are able to build models of the world in their heads from very sparse data. It’s really fundamentally a different kind of approach than the one that most machine learning right now is working on. To me, just as Turing suggested, and just as many people in AI have realized, it’s not the only way that you might think about building a human-like AI system, but it’s the only way that we know works.
If you look at human children, they’re the only scaling path to AI in the known universe that we know works. A scaling path that reliably, reproducibly, and robustly starts out knowing far less than a full adult human then develops into adult-human-level intelligence. If we could understand how humans learn, then that would certainly be a route to building much more real AI. It would also address some of the greatest scientific questions of all time that cut right to our identity, like what it means to be human.
MARTIN FORD: How does all of that thinking relate to the current overwhelming focus on deep learning? Clearly, deep neural networks have transformed AI, but lately I’ve been hearing more pushback against deep learning hype, and even some suggestions that we could be facing a new AI Winter. Is deep learning really the primary path forward, or is it just one tool in the toolbox?
JOSH TENENBAUM: What most people think of as deep learning is one tool in the toolbox, and a lot of deep learning people realize that too. The term “deep learning” has expanded beyond its original definition.
MARTIN FORD: I would define deep learning broadly as any approach using sophisticated neural networks with lots of layers, rather than using a very technical definition involving specific algorithms like backpropagation or gradient descent.
JOSH TENENBAUM: To me, the idea of using neural networks with lots of layers is also just one tool in the toolkit. What that’s good at is problems of pattern recognition, and it has proven to be a practical, scalable route for it. Where that kind of deep learning has really had success is either in problems that are traditionally seen as pattern recognition problems, like speech recognition and object recognition, or problems that can be in some way coerced into or turned into pattern recognition problems.