Machines of Loving Grace
Page 16
“I know someone you need to meet,” she told him.
That someone turned out to be Mitch Kapor, the founder and chief executive of Lotus Development Corporation, the publisher of the 1-2-3 spreadsheet program. Kapor came by and Kaplan pitched him on his AI-for-the masses vision. The Lotus founder was enthusiastic about the idea: “I’ve got money, why don’t you propose a product you want to build for me,” he said.
Kaplan’s first idea was to invent an inexpensive version of the Teknowledge expert system to be called ABC, as a play on 1-2-3. The idea attracted little enthusiasm. Not long afterward, however, he was flying on Kapor’s private jet. The Lotus founder sat down with paper notes and a bulky Compaq computer the size of a sewing machine and began typing. That gave Kaplan a new idea. He proposed a free-form note-taking program that would act as a calendar and a repository for all the odds and ends of daily life. Kapor loved the idea and with Ed Belove, another Lotus software designer, the three men outlined a set of ideas for the program.
Kaplan again retreated back to his cottage, this time for a year and a half, just writing the code for the program with Belove while Kapor helped with the overall design. Lotus Agenda was the first of a new breed of packaged software, known as a Personal Information Manager, which was in some ways a harbinger of the World Wide Web. Information could be stored in free form and would be automatically organized into categories. It came to be described as a “spreadsheet for words” and it was a classic example of a new generation of software tools that empowered their users in the Engelbart tradition.
Introduced in 1988 to glowing reviews from industry analysts like Esther Dyson, it would go on to gather a cult following. The American AI Winter was just arriving and most of the new wave of AI companies would soon wilt, but Kaplan had been early to see the writing on the wall. Like Breiner, he went quickly from being an AI ninja to a convert to Engelbart’s world of augmentation. PCs were the most powerful intellectual tool in history. It was becoming clear that it was equally possible to design humans into and out of systems being created with computers. Just as AI stumbled commercially, personal computing and thus intelligence augmentation shot ahead. In the late 1970s and early 1980s the personal computer industry exploded on the American scene. Overnight the idea that computing could be both a “fantasy amplifier” at home and a productivity tool at the office replaced the view of computing as an impersonal bureaucratic tool of governments and corporations. By 1982 personal computing had become such a cultural phenomenon that Time magazine put the PC on its cover as “Man of the Year.”
It was the designers themselves who made the choice of IA over AI. Kaplan would go on to found Go Corp. and design the first pen-based computers that would anticipate the iPhone and iPad by more than a decade. Like Sheldon Breiner, who was also driven away from artificial intelligence by the 1980s AI Winter, he would become part of the movement toward human-centered design in a coming post-PC era.
The quest to build a working artificial intelligence was marked from the outset by false hopes and bitter technical and philosophical quarrels. In 1958, two years after the Dartmouth Summer Research Project on Artificial Intelligence, the New York Times published a brief UPI wire story buried on page 25 of the paper. The headline read NEW NAVY DEVICE LEARNS BY DOING: PSYCHOLOGIST SHOWS EMBRYO OF COMPUTER DESIGNED TO READ AND GROW WISER.43
The article was an account of a demonstration given by Cornell psychologist Frank Rosenblatt, describing the “embryo” of an electronic computer that the navy expected would one day “walk, talk, see, write, reproduce itself and be conscious of its existence.” The device, at this point, was actually a simulation running on the Weather Bureau’s IBM 704 computer that was able to tell right from left after some fifty attempts, according to the report. Within a year, the navy apparently was planning to build a “thinking machine” based on these circuits, for a cost of $100,000.
Dr. Rosenblatt told the reporters that this would be the first device to think “as the human brain,” and that it would make mistakes initially but would grow wiser with experience. He suggested that one application for the new mechanical brain might be as a proxy for space exploration in lieu of humans. The article concluded that the first perceptron, an electronic or software effort to model biological neurons, would have about a thousand electronic “association cells” receiving electrical impulses from four hundred photocells—“eye-like” scanning devices. In contrast, it noted, the human brain was composed of ten billion responsive cells and a hundred million connections with the eyes.
The earliest work on artificial neural networks dates back to the 1940s, and in 1949 that research had caught the eye of Marvin Minsky, then a young Harvard mathematics student, who would go on to build early electronic learning networks, one as an undergraduate and a second one, named the Stochastic Neural Analog Reinforcement Calculator, or SNARC, as a graduate student at Princeton. He would later write his doctoral thesis on neural networks. These mathematical constructs are networks of nodes or “neurons” that are interconnected by numerical values that serve as “weights” or “vectors.” They can be trained by being exposed to a series of patterns such as images or sounds to later recognize similar patterns.
During the 1960s a number of competing paths toward building thinking machines emerged, and the dominant direction became the logic- and rule-based approach favored by John McCarthy. However, during the same period, groups around the country were experimenting with competing analog approaches based on the earlier neural network ideas. It’s ironic that Minsky, one of the ten attendees at the Dartmouth conference, would in 1969 precipitate a legendary controversy by writing the book Perceptrons with Seymour Papert, an analysis of neural networks that is widely believed to have stalled neural net research for many years. There is general agreement that as a consequence of the critique set forth in their book, the two MIT artificial intelligence researchers significantly delayed the young research area.
In fact, it was just one of a series of heated intellectual battles within the AI community during the sixties. Minsky and Papert have since argued that the criticism was unfair and that their book was a more balanced analysis of neural networks than was conceded by its critics. This dispute was further complicated by the fact that one of the main figures in the field, Rosenblatt, would die two years later in a sailing accident, leaving a vacuum in research activity into neural nets.
Early neural network research included work done at Stanford University as well as the research led by Charlie Rosen at SRI, but the Stanford group refocused its attention on telecommunications and Rosen would shift his Shakey work toward the dominant AI framework. Interest in neural networks would not reemerge until 1978, with the work of Terry Sejnowski, a postdoctoral student in neurobiology at Harvard. Sejnowski had given up his early focus on physics and turned to neuroscience. After taking a summer course in Woods Hole, Massachusetts, he found himself captivated by the mystery of the brain. That year a British postdoctoral psychologist, Geoffrey Hinton, was studying at the University of California at San Diego under David Rumelhart. The older UC scientist had created the parallel-distributed processing group with Donald Norman, the founder of the cognitive psychology department at the school.
Hinton, who was the great-great-grandson of logician George Boole, had come to the United States as a “refugee” as a direct consequence of the original AI Winter in England. The Lighthill report had asserted that most AI research had significantly underdelivered on its promise, the exception being computational neuroscience. In a Lighthill-BBC televised “hearing,” both sides made their arguments based on the then state-of-the-art performance of computers. Neither side seemed to have taken note of the Moore’s law of acceleration of computing speeds.
As a graduate student Hinton felt personally victimized by Minsky and Papert’s attacks on neural networks. When he would tell people that he was working on artificial neural networks as a graduate student in England, their response would be, “Don’t you get it?
Those things are no good.” His advisor told him to forget his interests and read Terry Winograd’s thesis. It was all going to be symbolic logic in the future. But Hinton was on a different path. He was beginning to form a perspective that he would later describe as “neuro-inspired” engineering. He did not go to the extreme of some in the new realm of biological computing. He thought that slavishly copying biology would be a mistake. Decades later the same issue remains hotly disputed. In 2014 the European Union funded Swiss researcher Henry Markram with more than $1 billion to model a human brain at the tiniest level of detail, and Hinton was certain that the project was doomed to failure.
In 1982 Hinton had organized a summer workshop focusing on parallel models of associated memory, where Terry Sejnowski applied to attend. Independently the young physicist had been thinking about how the brain might be modeled using some of the new schemes that were being developed. It was the first scientific meeting that Hinton had organized. He was aware that the invited crowd had met repeatedly in the past—people he thought of as “elderly professors in their forties” would come and give their same old talks. He drew up a flyer and sent it to his targeted computer science and psychology departments. It offered to pay expenses for those with new ideas. He was predictably disappointed when most of the responses came at the problem using traditional approaches within computer science and psychology. But one of the proposals stood out. It was from a young scientist who claimed to have figured out the “machine code of the brain.”
At roughly the same time Hinton was attending a conference with David Marr, the well-known MIT vision researcher, and he asked him if the guy was crazy. Marr responded that he knew him and that he was very bright and he had no idea if he was crazy or not. What was clear was that Sejnowski was pursuing a set of new ideas about cognition.
At the meeting Hinton and Sejnowski met for the first time. UCSD was already alive with a set of new ideas attempting to create new models of how the brain worked. Known as Parallel Distributed Processing, or PDP, it was a break from the symbol processing approach that was then dominating artificial intelligence and the cognitive sciences. They quickly realized they had been thinking about the problem from a similar perspective. They could both see the power of a new approach based on webs of sensors or “neurons” that were interconnected by a lattice of values representing connection strengths. In this new direction, if you wanted the network to interpret an image, you described the image in terms of a web of weighted connections. It proved to be a vastly more effective approach than the original symbolic model for artificial intelligence.
Everything changed in 1982 when Sejnowski’s former physics advisor at Princeton, John Hopfield, invented what would become known as the Hopfield Network. Hopfield’s approach broke from earlier neural network models that had been created by the designers of the first perceptrons, by allowing the individual neurons to update their values independently. The fresh approach to the idea of neural networks inspired both Hinton and Sejnowski to join in an intense collaboration.
The two young scientists had both taken their first teaching positions by that time, Hinton at Carnegie Mellon and Sejnowski at Johns Hopkins, but they had become friends and were close enough that they could make the four-hour drive back and forth on weekends. They realized they had found a way to transform the original neural network model into a more powerful learning algorithm. They knew that humans learn by seeing examples and generalizing, and so mimicking that process became their focus. In creating a new kind of multilayered network, which they called a Boltzmann Machine, an homage to the Austrian physicist Ludwig Boltzmann. In their new model they conceived a more powerful approach to machine learning and made the most significant advance in design since the original single-layer learning algorithm designed by Rosenblatt.
Sejnowski had missed the entire political debate over the perceptron. As a physics graduate student he had been outside the world of artificial intelligence in the late 1960s when Minsky and Papert had made their attacks. Yet he had read the original Perceptron book and he had loved it for its beautiful geometric insights. He had basically ignored their argument that the perceptron would not be generalizable to the world of multilayer systems. Now he was able to prove them wrong.
Hinton and Sejnowski had developed an alternative model, but they needed to prove its power in contrast to the rule-based systems popular at the time. During the summer, with help of a graduate student, Sejnowski settled on a language problem to demonstrate the power of the new technique, training his neural net to pronounce English text as an alternative to a rule-based approach. At the time he had no experience in linguistics and so he went to the school library and checked out a textbook that was a large compendium of pronunciation rules. The book documented the incredibly complex set of rules and exceptions required to speak the English language correctly.
Halfway through their work on a neural network able to learn to pronounce English correctly, Hinton came to Baltimore for a visit. He was skeptical.
“This probably won’t work,” he said. “English is an incredibly complicated language and your simple network won’t be able to absorb it.”
So they decided to begin with a subset of the language. They went to the library again and found a children’s book with a very small set of words. They brought up the network and set it to work absorbing the language in the children’s book. It was spooky that within an hour it began to work. At first the sounds it generated were gibberish, like the sounds an infant might make, but as it was trained it improved continuously. Initially it got a couple of words correct and then it continued until it was able to perfect itself. It learned from both the general rules and the special cases.
They went back to the library and got another linguistics text containing a transcription of a story told by a fifth grader about what it was like in school and a trip to his grandmother’s house on one side of the page. On the other side were the actual sounds for each word transcribed by a phonologist. It was a perfect teacher for their artificial neurons and so they ran that information through the neural network. It was a relatively small corpus, but the network began to speak just like the fifth grader. The researchers were amazed and their appetite was whetted.
Next they got a copy of a twenty-thousand-word dictionary and decided to see how far they could push their prototype neural network. This time they let the program run for a week on what was a powerful computer for its day, a Digital Equipment Corp. VAX minicomputer. It learned and learned and learned and ultimately it was able to pronounce new words it had never seen before. It was doing an amazingly good job.
They called the program Nettalk. It was built out of three hundred simulated circuits they called neurons. They were arranged in three layers—an input layer to capture the words, an output layer to generate the speech sounds, and a “hidden layer” to connect the two. The neurons were interconnected to one another by eighteen thousand “synapses”—links that had numeric values that could be represented as weights. If these simple networks could “learn” to hear, see, speak, and generally mimic the range of things that humans do, they were obviously a powerful new direction for both artificial intelligence and augmentation.
After the success of Nettalk, Sejnowski’s and Hinton’s careers diverged. Sejnowski moved to California and joined the Salk Institute, where his research focused on theoretical problems in neuroscience. In exploring the brain he became a deep believer in the power of diversity as a basic principle of biology—a fundamental divergence from the way modern digital computing evolved. Hinton joined the computer science department at the University of Toronto and over the next two decades he would develop the original Boltzmann Machine approach. From the initial supervised model, he found ways to add unsupervised (automatic) learning. The Internet became a godsend, providing vast troves of data in the form of crowd-sourced images, videos, and snippets of speech, both labeled and unlabeled. The advances would eventually underpin a dramatic new tool for companies like
Google, Microsoft, and Apple that were anxious to deploy Internet services based on vision, speech, and pattern recognition.
This complete reversal of the perceptron’s fate also lay in part in a clever public relations campaign, years in the making. Before Sejnowski and Hinton’s first encounter in San Diego, a cerebral young French student, Yann LeCun, had stumbled across Seymour Papert’s dismissive discussion of the perceptron, and it sparked his interest. After reading the account, LeCun headed to the library to learn everything he could about machines that were capable of learning. The son of an aerospace engineer, he had grown up tinkering with aviation hardware and was steeped in electronics before going to college. He would have studied astrophysics, but he enjoyed hacking too much. He read the entire literature on the perceptron going back to the fifties and concluded that there was no one working on the subject in the early 1980s. It was the heyday of expert systems and no one was writing about neural networks.
In Europe his journey began as a lonely crusade. As an undergraduate he would study electrical engineering, and he began his Ph.D. work with someone who had no idea about the topic he was focusing on. Then shortly after he began his graduate studies he stumbled across an obscure article on Boltzmann Machines by Hinton and Sejnowski. “I’ve got to talk to these guys!” he thought to himself. “They are the only people who seem to understand.”
Serendipitously, it turned out that they were able to meet in the winter of 1985 in the French Alps at a scientific conference on the convergence of ideas in physics and neuroscience. Hopfield Networks, which served as an early model for human memory, had sparked a new academic community of interest. Although Sejnowski attended the meeting he actually missed LeCun’s talk. It was the first time the young French scientist had presented in English, and he had been terrified, mostly because there was a Bell Laboratories physicist at the conference who often arrogantly shot down each talk with criticisms. The people that LeCun was sitting next to told him that was the Bell Labs style—either the ideas were subpar, or the laboratory’s scientists had already thought of them. To his shock, when he gave his talk in broken English, the Bell Labs scientist stood up and endorsed it. A year later, Bell Labs offered LeCun a job.