*
“Organized complexity” proved to be a constructive way of thinking about urban life, but Jacobs’s book was a work of social theory, not science. Was it possible to model and explain the behavior of self-organizing systems using more rigorous methods? Could the developing technology of digital computing be usefully applied to this problem? Partially thanks to Shannon’s work in the late forties, the biological sciences had made a number of significant breakthroughs in understanding pattern recognition and feedback by the time Jacobs published her masterpiece. Shortly after his appointment to the Harvard faculty in 1956, the entomologist Edward O. Wilson convincingly proved that ants communicate with one another—and coordinate overall colony behavior—by recognizing patterns in pheromone trails left by fellow ants, not unlike the cyclic AMP signals of the slime mold. At the Free University of Brussels in the fifties, Ilya Prigogine was making steady advances in his understanding of nonequilibrium thermodynamics, environments where the laws of entropy are temporarily overcome, and higher-level order may spontaneously emerge out of underlying chaos. And at MIT’s Lincoln Laboratory, a twenty-five-year-old researcher named Oliver Selfridge was experimenting with a model for teaching a computer how to learn.
There is a world of difference between a computer that passively receives the information you supply and a computer that actively learns on its own. The very first generation of computers such as ENIAC had processed information fed to them by their masters, and they had been capable of performing various calculations with that data, based on the instruction sets programmed into them. This was a startling enough development at a time when “computer” meant a person with a slide rule and an eraser. But even in those early days, the digital visionaries had imagined a machine capable of more open-ended learning. Turing and Shannon had argued over the future musical tastes of the “electronic brain” during lunch hour at Bell Labs, while their colleague Norbert Wiener had written a bestselling paean to the self-regulatory powers of feedback in his 1949 manifesto Cybernetics.
“Mostly my participation in all of this is a matter of good luck for me,” Selfridge says today, sitting in his cramped, windowless MIT office. Born in England, Selfridge enrolled at Harvard at the age of fifteen and started his doctorate three years later at MIT, where Norbert Wiener was his dissertation adviser. As a precocious twenty-one-year-old, Selfridge suggested a few corrections to a paper that his mentor had published on heart flutters, corrections that Wiener graciously acknowledged in the opening pages of Cybernetics. “I think I now have the honor of being one of the few living people mentioned in that book,” Selfridge says, laughing.
After a sojourn working on military control projects in New Jersey, Selfridge returned to MIT in the midfifties. His return coincided with an explosion of interest in artificial intelligence (AI), a development that introduced him to a then-junior fellow at Harvard named Marvin Minsky. “My concerns in AI,” Selfridge says now, “were not so much the actual processing as they were in how systems change, how they evolve—in a word, how they learn.” Exploring the possibilities of machine learning brought Selfridge back to memories of his own education in England. “At school in England I had read John Milton’s Paradise Lost,” he says, “and I’d been struck by the image of Pandemonium—it’s Greek for ‘all the demons.’ Then after my second son, Peter, was born, I went over Paradise Lost again, and the shrieking of the demons awoke something in me.” The pattern recognizer in Selfridge’s brain had hit upon a way of teaching a computer to recognize patterns.
“We are proposing here a model of a process which we claim can adaptively improve itself to handle certain pattern-recognition problems which cannot be adequately specified in advance.” These were the first words Selfridge delivered at a symposium in late 1958, held at the very same National Physical Laboratory from which Turing had escaped a decade before. Selfridge’s presentation had the memorable title “Pandemonium: A Paradigm for Learning,” and while it had little impact outside the nascent computer-science community, the ideas Selfridge outlined that day would eventually become part of our everyday life—each time we enter a name in our PalmPilots or use voice-recognition software to ask for information over the phone. Pandemonium, as Selfridge outlined it in his talk, was not so much a specific piece of software as it was a way of approaching a problem. The problem was an ambitious one, given the limited computational resources of the day: how to teach a computer to recognize patterns that were ill-defined or erratic, like the sound waves that comprise spoken language.
The brilliance of Selfridge’s new paradigm lay in the fact that it relied on a distributed, bottom-up intelligence, and not a unified, top-down one. Rather than build a single smart program, Selfridge created a swarm of limited miniprograms, which he called demons. “The idea was, we have a bunch of these demons shrieking up the hierarchy,” he explains. “Lower-level demons shrieking to higher-level demons shrieking to higher ones.”
To understand what that “shrieking” means, imagine a system with twenty-six individual demons, each trained to recognize a letter of the alphabet. The pool of demons is shown a series of words, and each demon “votes” as to whether each letter displayed represents its chosen letter. If the first letter is a, the a-recognizing demon reports that it is highly likely that it has recognized a match. Because of the similarities in shape, the o-recognizer might report a possible match, while the b-recognizer would emphatically declare that the letter wasn’t intelligible to it. All the letter-recognizing demons would report to a master demon, who would tally up the votes for each letter and choose the demon that expressed the highest confidence. Then the software would move on to the next letter in the sequence, and the process would begin again. At the end of the transmission, the master demon would have a working interpretation of the text that had been transmitted, based on the assembled votes of the demon democracy.
Of course, the accuracy of that interpretation depended on the accuracy of the letter recognizers. If you were trying to teach a computer how to read, it was cheating to assume from the outset that you could find twenty-six accurate letter recognizers. Selfridge was after a larger goal: How do you teach a machine to recognize letters—or vowel sounds, minor chords, fingerprints—in the first place? The answer involved adding another layer of demons, and a feedback mechanism whereby the various demon guesses could be graded. This lower level was populated by even less sophisticated miniprograms, trained only to recognize raw physical shapes (or sounds, in the case of Morse code or spoken language). Some demons recognized parallel lines, others perpendicular ones. Some demons looked for circles, others for dots. None of these shapes were associated with any particular letter; these bottom-dwelling demons were like two-year-old children—capable of reporting on the shapes they witnessed, but not perceiving them as letters or words.
Using these minimally equipped demons, the system could be trained to recognize letters, without “knowing” anything about the alphabet in advance. The recipe was relatively simple: Present the letter b to the bottom-level demons, and see which ones respond, and which ones don’t. In the case of the letter b, the vertical-line recognizers might respond, along with the circle recognizers. Those lower-level demons would report to a letter-recognizer one step higher in the chain. Based on the information gathered from its lieutenants, that recognizer would make a guess as to the letter’s identity. Those guesses are then “graded” by the software. If the guess is wrong, the software learns to dissociate those particular lieutenants from the letter in question; if the guess happens to be right, it strengthens the connection between the lieutenants and the letter.
The results are close to random at first, but if you repeat the process a thousand times, or ten thousand, the system learns to associate specific assembles of shape-recognizers with specific letters and soon enough is capable of translating entire sentences with remarkable accuracy. The system doesn’t come with any predefined conceptions about the shapes of letters—you train the system to a
ssociate letters with specific shapes in the grading phase. (This is why handwriting-recognition software can adapt to so many different types of penmanship, but can’t adapt to penmanship that changes day to day.) That mix of random beginnings organizing into more complicated results reminded Selfridge of another process, whose own underlying code was just then being deciphered in the form of DNA. “The scheme sketched is really a natural selection on the processing demons,” Selfridge explained. “If they serve a useful function they survive and perhaps are even the source for other subdemons who are themselves judged on their merits. It is perfectly reasonable to conceive of this taking place on a broader scale … instead of having but one Pandemonium we might have some crowd of them, all fairly similarly constructed, and employ natural selection on the crowd of them.”
The system Selfridge described—with its bottom-up learning, and its evaluating feedback loops—belongs in the history books as the first practical description of an emergent software program. The world now swarms with millions of his demons.
*
Among the students at MIT in the late forties was a transplanted midwesterner named John Holland. Holland was also a pupil of Norbert Wiener’s, and he spent a great deal of his undergraduate years stealing time on the early computer prototypes being built in Cambridge at that time. His unusual expertise at computer programming led IBM to hire him in the fifties to help develop their first commercial calculator, the 701. As a student of Wiener’s, he was naturally inclined to experiment with ways to make the sluggish 701 machine learn in a more organic, bottom-up fashion—not unlike Selfridge’s Pandemonium—and Holland and a group of like-minded colleagues actually programmed a crude simulation of neurons interacting. But IBM was in the business of selling adding machines then, and so Holland’s work went largely ignored and underfunded. After a few years Holland returned to academia to get his doctorate at the University of Michigan, where the Logic of Computers Group had just been formed.
In the sixties, after graduating as the first computer science Ph.D. in the country, Holland began a line of inquiry that would dominate his work for the rest of his life. Like Turing, Holland wanted to explore the way simple rules could lead to complex behavior; like Selfridge, he wanted to create software that would be capable of open-ended learning. Holland’s great breakthrough was to harness the forces of another bottom-up, open-ended system: natural selection. Building on Selfridge’s Pandemonium model, Holland took the logic of Darwinian evolution and built it into code. He called his new creation the genetic algorithm.
A traditional software program is a series of instructions that tells the computer what to do: paint the screen with red pixels, multiply a set of numbers, delete a file. Usually those instructions are encoded as a series of branching paths: do this first, and if you get result A, do one thing; if you get result B, do another thing. The art of programming lay in figuring out how to construct the most efficient sequence of instructions, the sequence that would get the most done with the shortest amount of code—and with the least likelihood of a crash. Normally that was done using the raw intellectual firepower of the programmer’s mind. You thought about the problem, sketched out the best solution, fed it into the computer, evaluated its success, and then tinkered with it to make it better. But Holland imagined another approach: set up a gene pool of possible software and let successful programs evolve out of the soup.
Holland’s system revolved around a series of neat parallels between computer programs and earth’s life-forms. Each depends on a master code for its existence: the zeros and ones of computer programming, and the coiled strands of DNA lurking in all of our cells (usually called the genotype). Those two kinds of codes dictate some kind of higher-level form or behavior (the phenotype): growing red hair or multiplying two numbers together. With DNA-based organisms, natural selection works by creating a massive pool of genetic variation, then evaluating the success rate of the assorted behaviors unleashed by all those genes. Successful variations get passed down to the next generation, while unsuccessful ones disappear. Sexual reproduction ensures that the innovative combinations of genes find each other. Occasionally, random mutations appear in the gene pool, introducing complete new avenues for the system to explore. Run through enough cycles, and you have a recipe for engineering masterworks like the human eye—without a bona fide engineer in sight.
The genetic algorithm was an attempt to capture that process in silicon. Software already has a genotype and a phenotype, Holland recognized; there’s the code itself, and then there’s what the code actually does. What if you created a gene pool of different code combinations, then evaluated the success rate of the phenotypes, eliminating the least successful strands? Natural selection relies on a brilliantly simple, but somewhat tautological, criterion for evaluating success: your genes get to pass on to the next generation if you survive long enough to produce a next generation. Holland decided to make that evaluation step more precise: his programs would be admitted to the next generation if they did a better job of accomplishing a specific task—doing simple math, say, or recognizing patterns in visual images. The programmer could decide what the task was; he or she just couldn’t directly instruct the software how to accomplish it. He or she would set up the parameters that defined genetic fitness, then let the software evolve on its own.
Holland developed his ideas in the sixties and seventies using mostly paper and pencil—even the more advanced technology of that era was far too slow to churn through the thousandfold generations of evolutionary time. But the massively parallel, high-speed computers introduced in the eighties—such as Danny Hillis’s Connection Machine—were ideally suited for exploring the powers of the genetic algorithm. And one of the most impressive GA systems devised for the Connection Machine focused exclusively on simulating the behavior of ants.
It was a program called Tracker, designed in the mideighties by two UCLA professors, David Jefferson and Chuck Taylor. (Jefferson was in the computer science department, while Taylor was a biologist.) “I got the idea from reading Richard Dawkins’s first book, The Selfish Gene,” Jefferson says today. “That book really transformed me. He makes the point that in order to watch Darwinian evolution in action, all you need are objects that are capable of reproducing themselves, and reproducing themselves imperfectly, and having some sort of resource limitation so that there’s competition. And nothing else matters—it’s a very tiny, abstract axiom that is required to make evolution work. And so it occurred to me that programs have those properties—programs can reproduce themselves. Except that they usually reproduce themselves exactly. But I recognized that if there was a way to have them reproduce imperfectly, and if you had not just one program but a whole population of them, then you could simulate evolution with the software instead of organisms.”
After a few small-scale experiments, Jefferson and Taylor decided to simulate the behavior of ants learning to follow a pheromone trail. “Ants were on my mind—I was looking for simple creatures, and E. O. Wilson’s opus on ants had just come out,” Jefferson explains. “What we were really looking for was a simple task that simple creatures perform where it wasn’t obvious how to make a program do it. Somehow we came up with the idea of following a trail—and not just a clean trail, a noisy trail, a broken trail.” The two scientists created a virtual grid of squares, drawing a meandering path of eighty-two squares across it. Their goal was to evolve a simple program, a virtual ant, that could navigate the length of the path in a finite amount of time, using only limited information about the path’s twists and turns. At each cycle, an ant had the option of “sniffing” the square ahead of him, advancing forward one square, or turning right or left ninety degrees. Jefferson and Taylor gave their ants one hundred cycles to navigate the path; once an ant used up his hundred cycles, the software tallied up the number of squares on the trail he had successfully landed on and gave him a score. An ant that lost his way after square one would be graded 1; an ant that successfully completed the trail
before the hundred cycles were up would get a perfect score, 82.
The scoring system allowed Jefferson and Taylor to create fitness criteria that determined which ants were allowed to reproduce. Tracker began by simulating sixteen thousand ants—one for each of the Connection Machine’s processors—with sixteen thousand more or less random strategies for trail navigation. One ant might begin with the strategy of marching straight across the grid; another by switching back and forth between ninety-degree rotations and sniffings; another following more baroque rules. The great preponderance of these strategies would be complete disasters, but a few would allow a stumble across a larger portion of the trail. Those more successful ants would be allowed to mate and reproduce, creating a new generation of sixteen thousand ants ready to tackle the trail.
The path—dubbed the John Muir Trail after the famous environmentalist—began with a relatively straightforward section, with a handful of right-hand turns and longer straight sections, then steadily grew more complicated. Jefferson says now that he designed it that way because he was worried that early generations would be so incompetent that a more challenging path would utterly confound them. “You have to remember that we had no idea when we started this experiment whether sixteen thousand was anywhere near a large enough population to seek Darwinian evolution,” he explains. “And I didn’t know if it was going to take ten generations, or one hundred generations, or ten thousand generations. There was no theory to guide us quantitatively about either the size of the population in space or the length of the experiment in time.”
Emergence Page 5