Architects of Intelligence
Page 14
I carried on working on neural nets on my own, and I did a PhD in 1987 titled, Modeles connexionnistes de l’apprentissage (Connectionist learning models). My advisor, Maurice Milgram, was not actually working on this topic, and he told me outright, “I can be your official advisor, but I can’t help you technically.”
I discovered through my work that in the early 1980s, there was a community of people around the world who were working on neural nets, and I connected with them and ended up discovering things like backpropagation in parallel with people like David Rumelhart and Geoffrey Hinton.
MARTIN FORD: So, in the early 1980s there was a lot of research in this area going on in Canada?
YANN LECUN: No, this was the United States. Canada was not on the map for this type of research yet. In the early 1980s, Geoffrey Hinton was a postdoc at the University of California, San Diego where he was working with cognitive scientists like David Rumelhart and Jay McClelland. Eventually they published a book explaining psychology by simple neural nets and models of computation. Geoffrey then became Associate Professor at Carnegie Mellon University, and only moved to Toronto in 1987. That’s when I also moved to Toronto, where I was a postdoc in his lab for one year.
MARTIN FORD: I was an undergraduate studying computer engineering in the early 1980s, and I don’t recall much exposure to neural networks at all. It was a concept that was out there, but it was definitely very much marginalized. Now, in 2018, that has changed dramatically.
YANN LECUN: It was worse than marginalized. In the ‘70s and early ‘80s it was anathema within the community. You couldn’t publish a paper that even mentioned the phrase neural networks because it would immediately be rejected by your peers.
In fact, Geoffrey Hinton and Terry Sejnowski published a very famous paper in 1983 called, Optimal Perceptual Inference, which described an early deep learning or neural network model. Hinton and Sejnowski had to use code words to avoid mentioning that it was a neural network. Even the title of their paper was cryptic; it was all very strange!
MARTIN FORD: One of the main innovations you’re known for is the convolutional neural network. Could you explain what that is and how it’s different from other approaches in deep learning?
YANN LECUN: The motivation for convolutional neural networks was building a neural network that was appropriate for recognizing images. It turned out to be useful for a wide-range of tasks, such as speech recognition and language translation. It’s somewhat inspired by the architecture of the visual cortex in animals or humans.
David Hubel and Torsten Wiesel did some Nobel prize-winning work in neuroscience in the 1950s and 1960s about the type of functions that the neurons in the visual cortex perform and how they’re connected with each other.
A convolutional network is a particular way of connecting the neurons with each other in such a way that the processing that takes place is appropriate for things like images. I should add that we don’t normally call them neurons because they’re not really an accurate reflection of biological neurons.
The basic principle of how the neurons are connected is that they’re organized in multiple layers and each neuron in the first layer is connected with a small patch of pixels in the input image. Each neuron computes a weighted sum of its inputs. The weights are the quantities that are modified by learning. The neurons only see a tiny window of pixels of the input, and there’s a whole bunch of neurons that look at the same little window. Then, there’s a whole bunch of neurons that look at another slightly shifted window, but this bunch performs the same operation as the other bunch. If you have a neuron that detects a particular motif in one window, you’re going to have another neuron that detects exactly the same motif in the next window and other neurons for all windows across the image.
Once you put all those neurons together and you realize what kind of mathematical operation they do, that operation is called a discrete convolution, which is why this is called a convolutional net.
That’s the first layer, and then there’s a second layer, which is a non-linearity layer—basically a threshold where each neuron turns on or turns off if the weighted sum computed by the convolution layer is above or below the threshold.
Finally, there’s a third layer that performs what’s called a pooling operation. I’m not going to cover it in detail, but it basically plays a role in making sure that when the input image is slightly shifted or deformed, the output responses don’t change that much. That’s a way of building a bit of invariance to distortion shifts or deformation of the object in the input image.
The convolutional net is basically a stack of layers of this type—convolution, non-linearity, pooling. You stack multiple layers of those, and by the time you get to the top, you have neurons that are supposed to detect individual objects.
You might have a neuron that turns on if you put an image of a horse in the image, and then you have one for cars, people, chairs, and all other categories you might want to recognize.
The trick is that the function that this neural network is doing is determined by the strength of the connections between the neurons, the weights, and those are not programmed; they’re trained.
This is what is learned when you train the neural net. You show it the image of a horse, and if it doesn’t say “horse,” you tell it that it’s wrong and here is the answer that it should have said. Then by using the backpropagation algorithm, it adjusts all the weights of all the connections in the network so that next time you show the same image of a horse, the output would be closer to the one you want, and you keep doing this for thousands of images.
MARTIN FORD: That process of training a network by giving it images of cats or horses, and so on, is what’s called supervised learning, correct? Is it true to say that supervised learning is the dominant approach today, and that it takes huge amounts of data?
YANN LECUN: Exactly. Almost all of the applications of deep learning today use supervised learning.
Supervised learning is when you give the correct answer to the machine when you’re training it, and then it corrects itself to give the correct answer. The magic of it is that after it’s been trained, it produces a correct answer most of the time in categories that it’s been trained on, even for images it’s never seen before. You’re correct, that does typically require a lot of samples, at least the first time you train the network.
MARTIN FORD: How do you see the field moving forward in the future? Supervised learning is very different from the way a human child learns. You could point at a cat once and say, “there’s a cat,” and that one sample might be enough for a child to learn. That’s dramatically different from where AI is today.
YANN LECUN: Well, yes and no. As I said, the first time you train a convolutional network you train it with thousands, possibly even millions of images of various categories. If you then want to add a new category, for example if the machine has never seen a cat and you want to train it to recognize cats, then it only requires a few samples of cats. That is because it has already been trained to recognize images of any type and it knows how to represent images; it knows what an object is, and it knows a lot of things about various objects. So, to train it to recognize a new object, you just show it a few samples, and you just need to train a couple of the top layers.
MARTIN FORD: So, if you trained a network to recognize other kinds of animals like dogs and bears, then would it only take a small amount of data to get to a cat? That seems not so different from what a child is probably doing.
YANN LECUN: But it is different, and that’s the unfortunate thing. The way a child learns (and animals, for that matter) is that most of the learning they do is before you can tell them, “this is a cat.” In the first few months of life, babies learn a huge amount by observation without having any notion of language. They learn an enormous amount of knowledge about how the world works just by observation and with a little interaction with the world.
This sort of accumulation of enormous amounts of background k
nowledge about the world is what we don’t know how to do with machines. We don’t know what to call this, some people call this unsupervised learning, but it’s a loaded term. It’s sometimes called predictive learning, or imputative learning. I call it self-supervised learning. It’s the kind of learning where you don’t train for a task, you just observe the world and figure out how it works, essentially.
MARTIN FORD: Would reinforcement learning, or learning by practice with a reward for succeeding, be in the category of unsupervised learning?
YANN LECUN: No, that’s a different category altogether. There are three categories essentially; it’s more of a continuum, but there is reinforcement learning, supervised learning, and self-supervised learning.
Reinforcement learning is learning by trial and error, getting rewards when you succeed and not getting rewards when you don’t succeed. That form of learning in its purest form is incredibly inefficient in terms of samples, and as a consequence works well for games, where you can try things as many times as you want, but doesn’t work in many real-world scenarios.
You can use reinforcement learning to train a machine to play Go or chess. That works really well, as we’ve seen with AlphaGo, for example, but it requires a ridiculous number of samples or trials. A machine has to basically play more games than all of humanity in the last 3,000 years to reach good performance, and it works really well if you can do that, but it is often impractical in the real world.
If you want to use reinforcement learning to train a robot to grab objects, it will take a ridiculous amount of time to achieve that. A human can learn to drive a car in 15 hours of training without crashing into anything. If you want to use the current reinforcement learning methods to train a car to drive itself, the machine will have to drive off cliffs 10,000 times before it figures out how not to do that.
MARTIN FORD: I guess that’s the argument for simulation.
YANN LECUN: I don’t agree. It might be an argument for simulation, but it’s also an argument for the fact that the kind of learning that we can do as humans is very, very different from pure reinforcement learning.
It’s more akin to what people call model-based reinforcement learning. This is where you have your internal model of the world that allows you to predict that when you turn the wheel in a particular direction then the car is going to go in a particular direction, and if another car comes in front you’re going to hit it, or if there is a cliff you are going to fall off that cliff. You have this predictive model that allows you to predict in advance the consequence of your actions. As a result, you can plan ahead and not take the actions that result in bad outcomes.
Learning to drive in this context is called model-based reinforcement learning, and that’s one of the things we don’t really know how to do. There is a name for it, but there’s no real way to make it work reliably! Most of the learning is not in the reinforcement, it’s in learning the predictive models in a self-supervised manner, and that’s the main problem we don’t know how to solve today.
MARTIN FORD: Is this an area that you’re focused on with your work at Facebook?
YANN LECUN: Yes, it is one of the things that we’re working on at Facebook. We’re working on a lot of different things, including getting machines to learn by observation from different data sources—learning how the world works. We’re building a model of the world so that perhaps some form of common sense will emerge and perhaps that model could be used as kind of a predictive model that would allow a machine to learn the way people do without having to try and fail 10,000 times before they’ve succeeded.
MARTIN FORD: Some people argue that deep learning alone is not going to be enough, or that there needs to be more structure in the networks, some kind of intelligent design from the onset. You seem to be a strong believer in the idea that intelligence will emerge organically from relatively generic neural networks.
YANN LECUN: I think that would be an exaggeration. Everybody agrees that there is a need for some structure, the question is how much, and what kind of structure is needed. I guess when you say that some people believe that there should be structures such as logic and reasoning, you’re probably referring to Gary Marcus and maybe Oren Etzioni.
I actually had a debate with Gary Marcus on this earlier today. Gary’s view isn’t particularly well accepted in the community because he’s been writing critically about deep learning, but he’s not been contributing to it. That’s not the case for Oren Etzioni because he’s been in the field for a while, but his view is considerably milder than Gary’s. The one thing all of us agree on, though, is that there is a need for some structure.
In fact, the very idea of convolutional networks is to put a structure in neural networks. Convolutional networks are not a blank slate, they do have a little bit of structure. The question is, if we want AI to emerge, and we’re talking general intelligence or human-level AI, how much structure do we need? That’s where people’s views may differ, like whether we need explicit structures that will allow a machine to manipulate symbols, or if we need explicit structures for representing hierarchical structures in language.
A lot of my colleagues, like Geoffrey Hinton and Yoshua Bengio, agree that in the long run we won’t need precise specific structures for this. It might be useful in the short term because we may not have figured out a general learning method for self-supervised learning. So, one way to cut corners is to hardwire the architecture; that is a perfectly fine thing to do. In the long run, though, it’s not clear how much of that we need. The microstructure of the cortex seems to be very, very uniform all over, whether you’re looking at the visual or prefrontal cortex.
MARTIN FORD: Does the brain use something like backpropagation?
YANN LECUN: We don’t really know. There are more fundamental questions than that, though. Most of the learning algorithms that people have come up with essentially consist of minimizing some objective function.
We don’t even know if the brain minimizes an objective function. If the brain does minimize an objective function, does it do it through a gradient-based method? Does the brain have some way of estimating in which direction to modify all of its synaptic connections in such a way as to improve this objective function? We don’t know that. If it estimates that gradient, does it do it by some form of backpropagation?
It’s probably not backpropagation as we know it, but it could be a form of approximation of gradient estimation that is very similar to backpropagation. Yoshua Bengio has been working on biologically plausible forms of gradient estimation, so it’s not entirely impossible that the brain does some sort of gradient estimation of some objective function, we just simply don’t know.
MARTIN FORD: What other important topics are you working on at Facebook?
YANN LECUN: We’re working on a lot of fundamental research and questions on machine learning, so things that have more to do with applied mathematics and optimization. We are working on reinforcement learning, and we are also working on something called generative models, which are a form of self-supervised or predictive learning.
MARTIN FORD: Is Facebook working on building systems that can actually carry out a conversation?
YANN LECUN: What I’ve mentioned so far are the fundamental topics of research, but there are a whole bunch of application areas.
Facebook is very active in computer vision, and I think we can claim to have the best computer vision research group in the world. It’s a mature group and there are a lot of really cool activities there. We’re putting quite a lot of work into natural language processing, and that includes translation, summarization, text categorization—figuring out what topic a text talks about, as well as dialog systems. Actually, dialog systems are a very important area of research for virtual assistants, question and answering systems, and so on.
MARTIN FORD: Do you anticipate the creation of an AI that someday could pass the Turing test?
YANN LECUN: It’s going to happen at some point, but the Turing test is not a
ctually an interesting test. In fact, I don’t think a lot of people in the AI field at the moment consider the Turing test to be a good test. It’s too easy to trick it, and to some extent, the Turing test has already been and gone.
We give a lot of importance to language as humans because we are used to discussing intelligent topics with other humans through language. However, language is sort of an epiphenomenon of intelligence, and when I say this, my colleagues who work on natural language processing disagree vehemently!
Look at orangutans, who are essentially almost as smart as we are. They have a huge amount of common sense and very good models of the world, and they can build tools, just like humans. However, they don’t have language, they’re not social animals, and they barely interact with other members of the species outside the non-linguistic mother-and-child interaction. There is a whole component of intelligence that has nothing to do with language, and we are ignoring this if we reduce AI to just satisfying the Turing test.
MARTIN FORD: What is the path to artificial general intelligence and what do we need to overcome to get there?
YANN LECUN: There are probably other problems that we do not see at the moment that we’re going to eventually encounter, but one thing I think we’ll need to figure out is the ability that babies and animals have to learn how the world works by observation in the first few days, weeks, and months of life.
In that time, you learn that the world is three-dimensional. You learn that there are objects that move in front of others in different ways when you move your head. You learn object permanence, so you learn that when an object is hidden behind another one, it’s still there. As time goes on, you learn about gravity, inertia, and rigidity—very basic concepts that are learnt essentially by observation.