by Michio Kaku
In 1997 IBM’s Deep Blue accomplished a historic breakthrough by decisively beating the world chess champion Gary Kasparov. Deep Blue was an engineering marvel, computing 11 billion operations per second. However, instead of opening the floodgates of artificial intelligence research and ushering in a new age, it did precisely the opposite. It highlighted only the primitiveness of AI research. Upon reflection, it was obvious to many that Deep Blue could not think. It was superb at chess but would score 0 on an IQ exam. After this victory, it was the loser, Kasparov, who did all the talking to the press, since Deep Blue could not talk at all. Grudgingly, AI researchers began to appreciate the fact that brute computational power does not equal intelligence. AI researcher Richard Heckler says, “Today, you can buy chess programs for $49 that will beat all but world champions, yet no one thinks they’re intelligent.”
But with Moore’s law spewing out new generations of computers every eighteen months, sooner or later the old pessimism of the past generation will be gradually forgotten and a new generation of bright enthusiasts will take over, creating renewed optimism and energy in the once-dormant field. Thirty years after the last AI winter set in, computers have advanced enough so that the new generation of AI researchers are again making hopeful predictions about the future. The time has finally come for AI, say its supporters. This time, it’s for real. The third try is the lucky charm. But if they are right, are humans soon to be obsolete?
IS THE BRAIN A DIGITAL COMPUTER?
One fundamental problem, as mathematicians now realize, is that they made a crucial error fifty years ago in thinking the brain was analogous to a large digital computer. But now it is painfully obvious that it isn’t. The brain has no Pentium chip, no Windows operating system, no application software, no CPU, no programming, and no subroutines that typify a modern digital computer. In fact, the architecture of digital computers is quite different from that of the brain, which is a learning machine of some sort, a collection of neurons that constantly rewires itself every time it learns a task. (A PC, however, does not learn at all. Your computer is just as dumb today as it was yesterday.)
So there are at least two approaches to modeling the brain. The first, the traditional top-down approach, is to treat robots like digital computers, and program all the rules of intelligence from the very beginning. A digital computer, in turn, can be broken down into something called a Turing machine, a hypothetical device introduced by the great British mathematician Alan Turing. A Turing machine consists of three basic components: an input, a central processor that digests this data, and an output. All digital computers are based on this simple model. The goal of this approach is to have a CD-ROM that has all the rules of intelligence codified on it. By inserting this disk, the computer suddenly springs to life and becomes intelligent. So this mythical CD-ROM contains all the software necessary to create intelligent machines.
However, our brain has no programming or software at all. Our brain is more like a “neural network,” a complex jumble of neurons that constantly rewires itself.
Neural networks follow Hebb’s rule: every time a correct decision is made, those neural pathways are reinforced. It does this by simply changing the strength of certain electrical connections between neurons every time it successfully performs a task. (Hebb’s rule can be expressed by the old question: How does a musician get to Carnegie Hall? Answer: practice, practice, practice. For a neural network, practice makes perfect. Hebb’s rule also explains why bad habits are so difficult to break, since the neural pathway for a bad habit is so well-worn.)
Neural networks are based on the bottom-up approach. Instead of being spoon-fed all the rules of intelligence, neural networks learn them the way a baby learns, by bumping into things and learning by experience. Instead of being programmed, neural networks learn the old-fashioned way, through the “school of hard knocks.”
Neural networks have a completely different architecture from that of digital computers. If you remove a single transistor in the digital computer’s central processor, the computer will fail. However, if you remove large chunks of the human brain, it can still function, with other parts taking over for the missing pieces. Also, it is possible to localize precisely where the digital computer “thinks”: its central processor. However, scans of the human brain clearly show that thinking is spread out over large parts of the brain. Different sectors light up in precise sequence, as if thoughts were being bounced around like a Ping-Pong ball.
Digital computers can calculate at nearly the speed of light. The human brain, by contrast, is incredibly slow. Nerve impulses travel at an excruciatingly slow pace of about 200 miles per hour. But the brain more than makes up for this because it is massively parallel, that is, it has 100 billion neurons operating at the same time, each one performing a tiny bit of computation, with each neuron connected to 10,000 other neurons. In a race, a superfast single processor is left in the dust by a superslow parallel processor. (This goes back to the old riddle: if one cat can eat one mouse in one minute, how long does it take a million cats to eat a million mice? Answer: one minute.)
In addition, the brain is not digital. Transistors are gates that can either be open or closed, represented by a 1 or 0. Neurons, too, are digital (they can fire or not fire), but they can also be analog, transmitting continuous signals as well as discrete ones.
TWO PROBLEMS WITH ROBOTS
Given the glaring limitations of computers compared to the human brain, one can appreciate why computers have not been able to accomplish two key tasks that humans perform effortlessly: pattern recognition and common sense. These two problems have defied solution for the past half century. This is the main reason why we do not have robot maids, butlers, and secretaries.
The first problem is pattern recognition. Robots can see much better than a human, but they don’t understand what they are seeing. When a robot walks into a room, it converts the image into a jumble of dots. By processing these dots, it can recognize a collection of lines, circles, squares, and rectangles. Then a robot tries to match this jumble, one by one, with objects stored in its memory—an extraordinarily tedious task even for a computer. After many hours of calculation, the robot may match these lines with chairs, tables, and people. By contrast, when we walk into a room, within a fraction of a second, we recognize chairs, tables, desks, and people. Indeed, our brains are mainly pattern-recognizing machines.
Second, robots do not have common sense. Although robots can hear much better than a human, they don’t understand what they are hearing. For example, consider the following statements:
• Children like sweets but not punishment
• Strings can pull but not push
• Sticks can push but not pull
• Animals cannot speak and understand English
• Spinning makes people feel dizzy
For us, each of these statements is just common sense. But not to robots. There is no line of logic or programming that proves that strings can pull but not push. We have learned the truth of these “obvious” statements by experience, not because they were programmed into our memories.
The problem with the top-down approach is that there are simply too many lines of code for common sense necessary to mimic human thought. Hundreds of millions of lines of code, for example, are necessary to describe the laws of common sense that a six-year-old child knows. Hans Moravec, former director of the AI laboratory at Carnegie Mellon, laments, “To this day, AI programs exhibit no shred of common sense—a medical diagnosis program, for instance, may prescribe an antibiotic when presented a broken bicycle because it lacks a model of people, disease, or bicycles.”
Some scientists, however, cling to the belief that the only obstacle to mastering common sense is brute force. They feel that a new Manhattan Project, like the program that built the atomic bomb, would surely crack the commonsense problem. The crash program to create this “encyclopedia of thought” is called CYC, started in 1984. It was to be the crowning achievement of AI
, the project to encode all the secrets of common sense into a single program. However, after several decades of hard work, the CYC project has failed to live up to its own goals.
CYC’s goal is simple: master “100 million things, about the number a typical person knows about the world, by 2007.” That deadline, and many previous ones, have slipped by without success. Each of the milestones laid out by CYC engineers has come and gone without scientists being any closer to mastering the essence of intelligence.
MAN VERSUS MACHINE
I once had a chance to match wits with a robot in a contest with one built by MIT’s Tomaso Poggio. Although robots cannot recognize simple patterns as we can, Poggio was able to create a computer program that can calculate every bit as fast as a human in one specific area: “immediate recognition.” This is our uncanny ability to instantly recognize an object even before we are aware of it. (Immediate recognition was important for our evolution, since our ancestors had only a split second to determine if a tiger was lurking in the bushes, even before they were fully aware of it.) For the first time, a robot consistently scored higher than a human on a specific vision recognition test.
The contest between me and the machine was simple. First, I sat in a chair and stared at an ordinary computer screen. Then a picture flashed on the screen for a split second, and I was supposed to press one of two keys as fast as I could, if I saw an animal in the picture or not. I had to make a decision as quickly as possible, even before I had a chance to digest the picture. The computer would also make a decision for the same picture.
Embarrassingly enough, after many rapid-fire tests, the machine and I performed about equally. But there were times when the machine scored significantly higher than I did, leaving me in the dust. I was beaten by a machine. (It was one consolation when I was told that the computer gets the right answer 82 percent of the time, but humans score only 80 percent on average.)
The key to Poggio’s machine is that it copies lessons from Mother Nature. Many scientists are realizing the truth in the statement, “The wheel has already been invented, so why not copy it?” For example, normally when a robot looks at a picture, it tries to divide it up into a series of lines, circles, squares, and other geometric shapes. But Poggio’s program is different.
When we see a picture, we might first see the outlines of various objects, then see various features within each object, then shading within these features, etc. So we split up the image into many layers. As soon as the computer processes one layer of the image, it integrates it with the next layer, and so on. In this way, step by step, layer by layer, it mimics the hierarchical way that our brains process images. (Poggio’s program cannot perform all the feats of pattern recognition that we take for granted, such as visualizing objects in 3-D, recognizing thousands of objects from different angles, etc., but it does represent a major milestone in pattern recognition.)
Later, I had an opportunity to see both the top-down and bottom-up approaches in action. I first went to the Stanford University’s artificial intelligence center, where I met STAIR (Stanford artificial intelligence robot), which uses the top-down approach. STAIR is about 4 feet tall, with a huge mechanical arm that can swivel and grab objects off a table. STAIR is also mobile, so it can wander around an office or home. The robot has a 3-D camera that locks onto an object and feeds the 3-D image into a computer, which then guides the mechanical arm to grab the object. Robots have been grabbing objects like this since the 1960s, and we see them in Detroit auto factories.
But appearances are deceptive. STAIR can do much more. Unlike the robots in Detroit, STAIR is not scripted. It operates by itself. If you ask it to pick up an orange, for example, it can analyze a collection of objects on a table, compare them with the thousands of images already stored in its memory, then identify the orange and pick it up. It can also identify objects more precisely by grabbing them and turning them around.
To test its ability, I scrambled a group of objects on a table, and then watched what happened after I asked for a specific one. I saw that STAIR correctly analyzed the new arrangement and then reached out and grabbed the correct thing. Eventually, the goal is to have STAIR navigate in home and office environments, pick up and interact with various objects and tools, and even converse with people in a simplified language. In this way, it will be able to do anything that a gofer can in an office. STAIR is an example of the top-down approach: everything is programmed into STAIR from the very beginning. (Although STAIR can recognize objects from different angles, it is still limited in the number of objects it can recognize. It would be paralyzed if it had to walk outside and recognize random objects.)
Later, I had a chance to visit New York University, where Yann LeCun is experimenting with an entirely different design, the LAGR (learning applied to ground robots). LAGR is an example of the bottom-up approach: it has to learn everything from scratch, by bumping into things. It is the size of a small golf cart and has two stereo color cameras that scan the landscape, identifying objects in its path. It then moves among these objects, carefully avoiding them, and learns with each pass. It is equipped with GPS and has two infrared sensors that can detect objects in front of it. It contains three high-power Pentium chips and is connected to a gigabit Ethernet network. We went to a nearby park, where the LAGR robot could roam around various obstacles placed in its path. Every time it went over the course, it got better at avoiding the obstacles.
One important difference between LAGR and STAIR is that LAGR is specifically designed to learn. Every time LAGR bumps into something, it moves around the object and learns to avoid that object the next time. While STAIR has thousands of images stored in its memory, LAGR has hardly any images in its memory but instead creates a mental map of all the obstacles it meets, and constantly refines that map with each pass. Unlike the driverless car, which is programmed and follows a route set previously by GPS, LAGR moves all by itself, without any instructions from a human. You tell it where to go, and it takes off. Eventually, robots like these may be found on Mars, the battlefield, and in our homes.
On one hand, I was impressed by the enthusiasm and energy of these researchers. In their hearts, they believe that they are laying the foundation for artificial intelligence, and that their work will one day impact society in ways we can only begin to understand. But from a distance, I could also appreciate how far they have to go. Even cockroaches can identify objects and learn to go around them. We are still at the stage where Mother Nature’s lowliest creatures can outsmart our most intelligent robots.
EXPERT SYSTEMS
Today, many people have simple robots in their homes that can vacuum their carpets. There are also robot security guards patrolling buildings at night, robot guides, and robot factory workers. In 2006, it was estimated that there were 950,000 industrial robots and 3,540,000 service robots working in homes and buildings. But in the coming decades, the field of robotics may blossom in several directions. But these robots won’t look like the ones of science fiction.
The greatest impact may be felt in what are called expert systems, software programs that have encoded in them the wisdom and experience of a human being. As we saw in the last chapter, one day, we may talk to the Internet on our wall screens and converse with the friendly face of a robodoc or robolawyer.
This field is called heuristics, that is, following a formal, rule-based system. When we need to plan a vacation, we will talk to the face in the wall screen and give it our preferences for the vacation: how long, where to, which hotels, what price range. The expert system will already know our preferences from past experiences and then contact hotels, airlines, etc., and give us the best options. But instead of talking to it in a chatty, gossipy way, we will have to use a fairly formal, stylized language that it understands. Such a system can rapidly perform any number of useful chores. You just give it orders, and it makes a reservation at a restaurant, checks for the location of stores, orders grocery and takeout, reserves a plane ticket, etc.
It is precisely because of the advances in heuristics over the past decades that we now have some of the rather simple search engines of today. But they are still crude. It is obvious to everyone that you are dealing with a machine and not a human. In the future, however, robots will become so sophisticated that they will almost appear to be humanlike, operating seamlessly with nuance and sophistication.
Perhaps the most practical application will be in medical care. For example, at the present time if you feel sick, you may have to wait hours in an emergency room before you see a doctor. In the near future, you may simply go to your wall screen and talk to robodoc. You will be able to change the face, and even the personality, of the robodoc that you see with the push of a button. The friendly face you see in your wall screen will ask a simple set of questions: How do you feel? Where does it hurt? When did the pain start? How often does it hurt?
Each time, you will respond by choosing from a simple set of answers. You will answer not by typing on a keyboard but by speaking.
Each of your answers, in turn, will prompt the next set of questions. After a series of such questions, the robodoc will be able to give you a diagnosis based on the best experience of the world’s doctors. Robodoc will also analyze the data from your bathroom, your clothes, and furniture, which have been continually monitoring your health via DNA chips. And it might ask you to examine your body with a portable MRI scanner, which is then analyzed by supercomputers. (Some primitive versions of these heuristic programs already exist, such as WebMD, but they lack the nuances and full power of heuristics.)