by Ray Kurzweil
In the novel usr/bin/god, Cory Doctorow, a leading science-fiction writer, uses an intriguing variation of a GA to evolve an AI. The GA generates a large number of intelligent systems based on various intricate combinations of techniques, with each combination characterized by its genetic code. These systems then evolve using a GA.
The evaluation function works as follows: each system logs on to various human chat rooms and tries to pass for a human, basically a covert Turing test. If one of the humans in a chat room says something like “What are you, a chatterbot?” (chatterbot meaning an automatic program, which at today’s level of development is expected to not understand language at a human level), the evaluation is over, that system ends its interactions, and reports its score to the GA. The score is determined by how long it was able to pass for human without being challenged in this way. The GA evolves more and more intricate combinations of techniques that are increasingly capable of passing for human.
The main difficulty with this idea is that the evaluation function is fairly slow, although it will take an appreciable amount of time only after the systems are reasonably intelligent. Also, the evaluations can take place largely in parallel. It’s an interesting idea and may actually be a useful method to finish the job of passing the Turing test, once we get to the point where we have sufficiently sophisticated algorithms to feed into such a GA, so that evolving a Turing-capable AI is feasible.
Recursive Search. Often we need to search through a vast number of combinations of possible solutions to solve a given problem. A classic example is in playing games such as chess. As a player considers her next move, she can list all of her possible moves, and then, for each such move, all possible countermoves by the opponent, and so on. It is difficult, however, for human players to keep a huge “tree” of move-countermove sequences in their heads, and so they rely on pattern recognition—recognizing situations based on prior experience—whereas machines use logical analysis of millions of moves and countermoves.
Such a logical tree is at the heart of most game-playing programs. Consider how this is done. We construct a program called Pick Best Next Step to select each move. Pick Best Next Step starts by listing all of the possible moves from the current state of the board. (If the problem was solving a mathematical theorem, rather than game moves, the program would list all of the possible next steps in a proof.) For each move the program constructs a hypothetical board that reflects what would happen if we made this move. For each such hypothetical board, we now need to consider what our opponent would do if we made this move. Now recursion comes in, because Pick Best Next Step simply calls Pick Best Next Step (in other words, itself) to pick the best move for our opponent. In calling itself, Pick Best Next Step then lists all of the legal moves for our opponent.
The program keeps calling itself, looking ahead as many moves as we have time to consider, which results in the generation of a huge move-countermove tree. This is another example of exponential growth, because to look ahead an additional move (or countermove) requires multiplying the amount of available computation by about five. Key to the success of the recursive formula is pruning this huge tree of possibilities and ultimately stopping its growth. In the game context, if a board looks hopeless for either side, the program can stop the expansion of the move-countermove tree from that point (called a “terminal leaf” of the tree) and consider the most recently considered move to be a likely win or loss. When all of these nested program calls are completed, the program will have determined the best possible move for the current actual board within the limits of the depth of recursive expansion that it had time to pursue and the quality of its pruning algorithm. (For an algorithmic description of recursive search, see this note:177)
The recursive formula is often effective at mathematics. Rather than game moves, the “moves” are the axioms of the field of math being addressed, as well as previously proved theorems. The expansion at each point is the possible axioms (or previously proved theorems) that can be applied to a proof at each step. (This was the approach used by Newell, Shaw, and Simons’s General Problem Solver.)
From these examples it may appear that recursion is well suited only for problems in which we have crisply defined rules and objectives. But it has also shown promise in computer generation of artistic creations. For example, a program I designed called Ray Kurzweil’s Cybernetic Poet uses a recursive approach.178 The program establishes a set of goals for each word—achieving a certain rhythmic pattern, poem structure, and word choice that is desirable at that point in the poem. If the program is unable to find a word that meets these criteria, it backs up and erases the previous word it has written, reestablishes the criteria it had originally set for the word just erased, and goes from there. If that also leads to a dead end, it backs up again, thus moving backward and forward. Eventually, it forces itself to make up its mind by relaxing some of the constraints if all paths lead to dead ends.
“Thinking Machines 2” by mathematician Martin Wattenberg with Marek Walczak displays the move-countermove sequences it is evaluating as it considers its next move.
Deep Fritz Draws: Are Humans Getting Smarter, or Are Computers Getting Stupider?
We find one example of qualitative improvements in software in the world of computer chess, which, according to common wisdom, is governed only by the brute-force expansion of computer hardware. In a chess tournament in October 2002 with top-ranking human player Vladimir Kramnik, the Deep Fritz software achieved a draw. I point out that Deep Fritz has available only about 1.3 percent of the brute-force computation as the previous computer champion, Deep Blue. Despite that, it plays chess at about the same level because of its superior pattern recognition—based pruning algorithm (see below). In six years a program like Deep Fritz will again achieve Deep Blue’s ability to analyze two hundred million board positions per second. Deep Fritz—like chess programs running on ordinary personal computers will routinely defeat all humans later in this decade.
In The Age of Intelligent Machines, which I wrote between 1986 and 1989, I predicted that a computer would defeat the human world chess champion by the end of the 1990s. I also noted that computers were gaining about forty-five points per year in their chess ratings, whereas the best human playing was essentially fixed, so this projected a crossover point in 1998. Indeed, Deep Blue did defeat Gary Kasparov in a highly publicized tournament in 1997.
Yet in the Deep Fritz–Kramnik match, the current reigning computer program was able to achieve only a tie. Five years had passed since Deep Blue’s victory, so what are we to make of this situation? Should we conclude that:
1. Humans are getting smarter, or at least better at chess?
2. Computers are getting worse at chess? If so, should we conclude that the much-publicized improvement in computer speed over the past five years was not all it was cracked up to be? Or, that computer software is getting worse, at least in chess?
The Specialized-Hardware Advantage
Neither of the above conclusions is warranted. The correct conclusion is that software is getting better because Deep Fritz essentially matched the performance of Deep Blue, yet with far smaller computational resources. To gain some insight into these questions, we need to examine a few essentials. When I wrote my predictions of computer chess in the late 1980s, Carnegie Mellon University was embarked on a program to develop specialized chips for conducting the “minimax” algorithm (the standard game-playing method that relies on building trees of move-countermove sequences, and then evaluating the terminal-leaf positions of each branch of the tree) specifically for chess moves.
Based on this specialized hardware CMU’s 1988 chess machine, HiTech, was able to analyze 175,000 board positions per second. It achieved a chess rating of 2,359, only about 440 points below the human world champion.
A year later, in 1989, CMU’s Deep Thought machine increased this capacity to one million board positions per second and achieved a rating of 2,400. IBM eventually took over the project and ren
amed it Deep Blue but kept the basic CMU architecture. The version of Deep Blue that defeated Kasparov in 1997 had 256 special-purpose chess processors working in parallel, which analyzed two hundred million board positions per second.
It is important to note the use of specialized hardware to accelerate the specific calculations needed to generate the minimax algorithm for chess moves. It’s well known to computer-systems designers that specialized hardware can generally implement a specific algorithm at least one hundred times faster than a general-purpose computer. Specialized ASICs (application-specific integrated circuits) require significant development efforts and costs, but for critical calculations that are needed on a repetitive basis (for example, decoding MP3 files or rendering graphics primitives for video games), this expenditure can be well worth the investment.
Deep Blue Versus Deep Fritz
Because there had always been a great deal of focus on the milestone of a computer’s being able to defeat a human opponent, support was available for investing in special-purpose chess circuits. Although there was some lingering controversy regarding the parameters of the Deep Blue–Kasparov match, the level of interest in computer chess waned considerably after 1997. After all, the goal had been achieved, and there was little point in beating a dead horse. IBM canceled work on the project, and there has been no work on specialized chess chips since that time. The focus of research in the various domains spun out of AI has been placed instead on problems of greater consequence, such as guiding airplanes, missiles, and factory robots, understanding natural language, diagnosing electrocardiograms and blood-cell images, detecting credit-card fraud, and a myriad of other successful narrow AI applications.
Computer hardware has nonetheless continued its exponential increase, with personal-computer speeds doubling every year since 1997. Thus the general-purpose Pentium processors used by Deep Fritz are about thirty-two times faster than processors in 1997. Deep Fritz uses a network of only eight personal computers, so the hardware is equivalent to 256 1997-class personal computers. Compare that to Deep Blue, which used 256 specialized chess processors, each of which was about one hundred times faster than 1997 personal computers (of course, only for computing chess minimax). So Deep Blue was 25,600 times faster than a 1997 PC and one hundred times faster than Deep Fritz. This analysis is confirmed by the reported speeds of the two systems: Deep Blue can analyze 200 million board positions per second compared to only about 2.5 million for Deep Fritz.
Significant Software Gains
So what can we say about the software in Deep Fritz? Although chess machines are usually referred to as examples of brute-force calculation, there is one important aspect of these systems that does require qualitative judgment. The combinatorial explosion of possible move-countermove sequences is rather formidable.
In The Age of Intelligent Machines I estimated that it would take about forty billion years to make one move if we failed to prune the move-countermove tree and attempted to make a “perfect” move in a typical game. (Assuming about thirty moves each in a typical game and about eight possible moves per play, we have 830 possible move sequences; analyzing one billion move sequences per second would take 1018 seconds or forty billion years.) Thus a practical system needs to continually prune away unpromising lines of action. This requires insight and is essentially a pattern-recognition judgment.
Humans, even world-class chess masters, perform the minimax algorithm extremely slowly, generally performing less than one move-countermove analysis per second. So how is it that a chess master can compete at all with computer systems? The answer is that we possess formidable powers of pattern recognition, which enable us to prune the tree with great insight.
It’s precisely in this area that Deep Fritz has improved considerably over Deep Blue. Deep Fritz has only slightly more computation available than CMU’s Deep Thought yet is rated almost 400 points higher.
Are Human Chess Players Doomed?
Another prediction I made in The Age of Intelligent Machines was that once computers did perform as well or better as humans in chess, we would either think more of computer intelligence, less of human intelligence, or less of chess, and that if history is a guide, the last of these would be the likely outcome. Indeed, that is precisely what happened. Soon after Deep Blue’s victory we began to hear a lot about how chess is really just a simple game of calculating combinations and that the computer victory just demonstrated that it was a better calculator.
The reality is slightly more complex. The ability of humans to perform well in chess is clearly not due to our calculating prowess, at which we are in fact rather poor. We use instead a quintessentially human form of judgment. For this type of qualitative judgment, Deep Fritz represents genuine progress over earlier systems. (Incidentally, humans have made no progress in the last five years, with the top human scores remaining just below 2,800. As of 2004, Kasparov is rated at 2,795 and Kramnik at 2,794.)
Where do we go from here? Now that computer chess is relying on software running on ordinary personal computers, chess programs will continue to benefit from the ongoing acceleration of computer power. By 2009 a program like Deep Fritz will again achieve Deep Blue’s ability to analyze two hundred million board positions per second. With the opportunity to harvest computation on the Internet, we will be able to achieve this potential several years sooner than 2009. (Internet harvesting of computers will require more ubiquitous broadband communication, but that’s coming, too.)
With these inevitable speed increases, as well as ongoing improvements in pattern recognition, computer chess ratings will continue to edge higher. Deep Fritz–like programs running on ordinary personal computers will routinely defeat all humans later in this decade. Then we’ll really lose interest in chess.
Combining Methods. The most powerful approach to building robust AI systems is to combine approaches, which is how the human brain works. As we discussed, the brain is not one big neural net but instead consists of hundreds of regions, each of which is optimized for processing information in a different way. None of these regions by itself operates at what we would consider human levels of performance, but clearly by definition the overall system does exactly that.
I’ve used this approach in my own AI work, especially in pattern recognition. In speech recognition, for example, we implemented a number of different pattern-recognition systems based on different paradigms. Some were specifically programmed with knowledge of phonetic and linguistic constraints from experts. Some were based on rules to parse sentences (which involves creating sentence diagrams showing word usage, similar to the diagrams taught in grade school). Some were based on self-organizing techniques, such as Markov models, trained on extensive libraries of recorded and annotated human speech. We then programmed a software “expert manager” to learn the strengths and weaknesses of the different “experts” (recognizers) and combine their results in optimal ways. In this fashion, a particular technique that by itself might produce unreliable results can nonetheless contribute to increasing the overall accuracy of the system.
There are many intricate ways to combine the varied methods in AI’s toolbox. For example, one can use a genetic algorithm to evolve the optimal topology (organization of nodes and connections) for a neural net or a Markov model. The final output of the GA-evolved neural net can then be used to control the parameters of a recursive search algorithm. We can add in powerful signal- and image-processing techniques that have been developed for pattern-processing systems. Each specific application calls for a different architecture. Computer science professor and AI entrepreneur Ben Goertzel has written a series of books and articles that describe strategies and architectures for combining the diverse methods underlying intelligence. His Novamente architecture is intended to provide a framework for general-purpose AI.179
The above basic descriptions provide only a glimpse into how increasingly sophisticated current AI systems are designed. It’s beyond the scope of this book to provide a comprehensive d
escription of the techniques of AI, and even a doctoral program in computer science is unable to cover all of the varied approaches in use today.
Many of the examples of real-world narrow AI systems described in the next section use a variety of methods integrated and optimized for each particular task. Narrow AI is strengthening as a result of several concurrent trends: continued exponential gains in computational resources, extensive real-world experience with thousands of applications, and fresh insights into how the human brain makes intelligent decisions.
A Narrow AI Sampler
When I wrote my first AI book, The Age of Intelligent Machines, in the late 1980s, I had to conduct extensive investigations to find a few successful examples of AI in practice. The Internet was not yet prevalent, so I had to go to real libraries and visit the AI research centers in the United States, Europe, and Asia. I included in my book pretty much all of the reasonable examples I could identify. In my research for this book my experience has been altogether different. I have been inundated with thousands of compelling examples. In our reporting on the KurzweilAI.net Web site, we feature one or more dramatic systems almost every day.180
A 2003 study by Business Communications Company projected a $21 billion market by 2007 for AI applications, with average annual growth of 12.2 percent from 2002 to 2007.181 Leading industries for AI applications include business intelligence, customer relations, finance, defense and domestic security, and education. Here is a small sample of narrow AI in action.