by Ray Kurzweil
164. Sir Arthur Conan Doyle, “The Red-Headed League,” 1890, available at http://www.eastoftheweb.com/short-stories/UBooks/RedHead.shtml.
165. V. Yu et al., “Antimicrobial Selection by a Computer: A Blinded Evaluation by Infectious Diseases Experts,” JAMA 242.12 (1979): 1279–82.
166. Gary H. Anthes, “Computerizing Common Sense,” Computerworld, April 8, 2002, http://www.computerworld.com/news/2002/story/0,11280,69881,00.html.
167. Kristen Philipkoski, “Now Here’s a Really Big Idea,” Wired News, November 25, 2002, http://www.wired.com/news/technology/0,1282,56374,00.html, reporting on Darryl Macer, “The Next Challenge Is to Map the Human Mind,” Nature 420 (November 14, 2002): 121; see also a description of the project at http://www.biol.tsukuba.ac.jp/~macer/index.html.
168. Thomas Bayes, “An Essay Towards Solving a Problem in the Doctrine of Chances,” published in 1763, two years after his death in 1761.
169. SpamBayes spam filter, http://spambayes.sourceforge.net.
170. Lawrence R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE 77 (1989): 257–86. For a mathematical treatment of Markov models, see http://jedlik.phy.bme.hu/~gerjanos/HMM/node2.html.
171. Kurzweil Applied Intelligence (KAI), founded by the author in 1982, was sold in 1997 for $100 million and is now part of ScanSoft (formerly called Kurzweil Computer Products, the author’s first company, which was sold to Xerox in 1980), now a public company. KAI introduced the first commercially marketed large-vocabulary speech-recognition system in 1987 (Kurzweil Voice Report, with a ten-thousand-word vocabulary).
172. Here is the basic schema for a neural net algorithm. Many variations are possible, and the designer of the system needs to provide certain critical parameters and methods, detailed below.
Creating a neural-net solution to a problem involves the following steps:
Define the input.
Define the topology of the neural net (i.e., the layers of neurons and the connections between the neurons).
Train the neural net on examples of the problem.
Run the trained neural net to solve new examples of the problem.
Take your neural-net company public.
These steps (except for the last one) are detailed below:
The Problem Input
The problem input to the neural net consists of a series of numbers. This input can be:
In a visual pattern-recognition system, a two-dimensional array of numbers representing the pixels of an image; or
In an auditory (e.g., speech) recognition system, a two-dimensional array of numbers representing a sound, in which the first dimension represents parameters of the sound (e.g., frequency components) and the second dimension represents different points in time; or
In an arbitrary pattern-recognition system, an n-dimensional array of numbers representing the input pattern.
Defining the Topology
To set up the neural net, the architecture of each neuron consists of:
Multiple inputs in which each input is “connected” to either the output of another neuron, or one of the input numbers.
Generally, a single output, which is connected either to the input of another neuron (which is usually in a higher layer), or to the final output.
Set Up the First Layer of Neurons
Create N0 neurons in the first layer. For each of these neurons, “connect” each of the multiple inputs of the neuron to “points” (i.e., numbers) in the problem input. These connections can be determined randomly or using an evolutionary algorithm (see below).
Assign an initial “synaptic strength” to each connection created. These weights can start out all the same, can be assigned randomly, or can be determined in another way (see below).
Set Up the Additional Layers of Neurons
Set up a total of M layers of neurons. For each layer, set up the neurons in that layer.
For layeri:
Create Ni neurons in layeri. For each of these neurons, “connect” each of the multiple inputs of the neuron to the outputs of the neurons in layeri–1 (see variations below).
Assign an initial “synaptic strength” to each connection created. These weights can start out all the same, can be assigned randomly, or can be determined in another way (see below).
The outputs of the neurons in layerM are the outputs of the neural net (see variations below).
The Recognition Trials
How Each Neuron Works
Once the neuron is set up, it does the following for each recognition trial:
Each weighted input to the neuron is computed by multiplying the output of the other neuron (or initial input) that the input to this neuron is connected to by the synaptic strength of that connection.
All of these weighted inputs to the neuron are summed.
If this sum is greater than the firing threshold of this neuron, then this neuron is considered to fire and its output is 1. Otherwise, its output is 0 (see variations below).
Do the Following for Each Recognition Trial
For each layer, from layer0 to layerM:
For each neuron in the layer:
Sum its weighted inputs (each weighted input = the output of the other neuron [or initial input] that the input to this neuron is connected to multiplied by the synaptic strength of that connection).
If this sum of weighted inputs is greater than the firing threshold for this neuron, set the output of this neuron = 1, otherwise set it to 0.
To Train the Neural Net
Run repeated recognition trials on sample problems.
After each trial, adjust the synaptic strengths of all the interneuronal connections to improve the performance of the neural net on this trial (see the discussion below on how to do this).
Continue this training until the accuracy rate of the neural net is no longer improving (i.e., reaches an asymptote).
Key Design Decisions
In the simple schema above, the designer of this neural-net algorithm needs to determine at the outset:
What the input numbers represent.
The number of layers of neurons.
The number of neurons in each layer. (Each layer does not necessarily need to have the same number of neurons.)
The number of inputs to each neuron in each layer. The number of inputs (i.e., interneuronal connections) can also vary from neuron to neuron and from layer to layer.
The actual “wiring” (i.e., the connections). For each neuron in each layer, this consists of a list of other neurons, the outputs of which constitute the inputs to this neuron. This represents a key design area. There are a number of possible ways to do this:
(i) Wire the neural net randomly; or
(ii) Use an evolutionary algorithm (see below) to determine an optimal wiring; or
(iii) Use the system designer’s best judgment in determining the wiring.
The initial synaptic strengths (i.e., weights) of each connection. There are a number of possible ways to do this:
(i) Set the synaptic strengths to the same value; or
(ii) Set the synaptic strengths to different random values; or
(iii) Use an evolutionary algorithm to determine an optimal set of initial values; or
(iv) Use the system designer’s best judgment in determining the initial values.
The firing threshold of each neuron.
The output. The output can be:
(i) the outputs of layerM of neurons; or
(ii) the output of a single output neuron, the inputs of which are the outputs of the neurons in layerM;or
(iii) a function of (e.g., a sum of) the outputs of the neurons in layerM;or
(iv) another function of neuron outputs in multiple layers.
How the synaptic strengths of all the connections are adjusted during the training of this neural net. This is a key design decision and is the subject of a great deal of research and discussion. There are a number of possibl
e ways to do this:
(i) For each recognition trial, increment or decrement each synaptic strength by a (generally small) fixed amount so that the neural net’s output more closely matches the correct answer. One way to do this is to try both incrementing and decrementing and see which has the more desirable effect. This can be time-consuming, so other methods exist for making local decisions on whether to increment or decrement each synaptic strength.
(ii) Other statistical methods exist for modifying the synaptic strengths after each recognition trial so that the performance of the neural net on that trial more closely matches the correct answer.
Note that neural-net training will work even if the answers to the training trials are not all correct. This allows using real-world training data that may have an inherent error rate. One key to the success of a neural net–based recognition system is the amount of data used for training. Usually a very substantial amount is needed to obtain satisfactory results. Just like human students, the amount of time that a neural net spends learning its lessons is a key factor in its performance.
Variations
Many variations of the above are feasible:
There are different ways of determining the topology. In particular, the interneuronal wiring can be set either randomly or using an evolutionary algorithm.
There are different ways of setting the initial synaptic strengths.
The inputs to the neurons in layeri do not necessarily need to come from the outputs of the neurons in layeri 1. Alternatively, the inputs to the neurons in each layer can come from any lower layer or any layer.
There are different ways to determine the final output.
The method described above results in an “all or nothing” (1 or 0) firing called a nonlinearity. There are other nonlinear functions that can be used. Commonly a function is used that goes from 0 to 1 in a rapid but more gradual fashion. Also, the outputs can be numbers other than 0 and 1.
The different methods for adjusting the synaptic strengths during training represent key design decisions.
The above schema describes a “synchronous” neural net, in which each recognition trial proceeds by computing the outputs of each layer, starting with layer 0through layerM. In a true parallel system, in which each neuron is operating independently of the others, the neurons can operate “asynchronously” (that is, independently). In an asynchronous approach, each neuron is constantly scanning its inputs and fires whenever the sum of its weighted inputs exceeds its threshold (or whatever its output function specifies).
173. See chapter 4 for a detailed discussion of brain reverse engineering. As one example of the progression, S. J. Thorpe writes: “We have really only just begun what will certainly be a long term project aimed at reverse engineering the primate visual system. For the moment, we have only explored some very simple architectures, involving essentially just feed-forward architectures involving a relatively small numbers of layers. . . . In the years to come, we will strive to incorporate as many of the computational tricks used by the primate and human visual system as possible. More to the point, it seems that by adopting the spiking neuron approach, it will soon be possible to develop sophisticated systems capable of simulating very large neuronal networks in real time.” S. J. Thorpe et al., “Reverse Engineering of the Visual System Using Networks of Spiking Neurons,” Proceedings of the IEEE 2000 International Symposium on Circuits and Systems IV (IEEE Press), pp. 405–8, http://www.sccn.ucsd.edu/~arno/mypapers/thorpe.pdf.
174. T. Schoenauer et al. write: “Over the past years a huge diversity of hardware for artificial neural networks (ANN) has been designed. . . . Today one can choose from a wide range of neural network hardware. Designs differ in terms of architectural approaches, such as neurochips, accelerator boards and multi-board neurocomputers, as well as concerning the purpose of the system, such as the ANN algorithm(s) and the system’s versatility. . . . Digital neurohardware can be classified by the:[sic] system architecture, degree of parallelism, typical neural network partition per processor, inter-processor communication network and numerical representation.” T. Schoenauer, A. Jahnke, U. Roth, and H. Klar, “Digital Neurohardware: Principles and Perspectives,” in Proc. Neuronale Netze in der Anwendung—Neural Networks in Applications NN’98, Magdeburg, invited paper (February 1998): 101–6, http://bwrc.eecs.berkeley.edu/People/kcamera/neural/papers/
schoenauer98digital.pdf. See also Yihua Liao, “Neural Networks in Hardware: A Survey” (2001), http://ailab.das.ucdavis.edu/~yihua/research/NNhardware.pdf.
175. Here is the basic schema for a genetic (evolutionary) algorithm. Many variations are possible, and the designer of the system needs to provide certain critical parameters and methods, detailed below.
THE EVOLUTIONARY ALGORITHM
Create N solution “creatures.” Each one has:
A genetic code: a sequence of numbers that characterize a possible solution to the problem. The numbers can represent critical parameters, steps to a solution, rules, etc.
For each generation of evolution, do the following:
Do the following for each of the N solution creatures:
(i) Apply this solution creature’s solution (as represented by its genetic code) to the problem, or simulated environment.
(ii) Rate the solution.
Pick the L solution creatures with the highest ratings to survive into the next generation.
Eliminate the (N L) nonsurviving solution creatures.
Create (N L) new solution creatures from the L surviving solution creatures by:
(i) Making copies of the L surviving creatures. Introduce small random variations into each copy; or
(ii) Creating additional solution creatures by combining parts of the genetic code (using “sexual” reproduction, or otherwise combining portions of the chromosomes) from the L surviving creatures; or
(iii) Doing a combination of (i) and (ii).
Determine whether or not to continue evolving: Improvement = (highest rating in this generation) – (highest rating in the previous generation).
If Improvement The solution creature with the highest rating from the last generation of evolution has the best solution. Apply the solution defined by its genetic code to the problem.
Key Design Decisions
In the simple schema above, the designer needs to determine at the outset:
Key parameters:
N
L
Improvement threshold
What the numbers in the genetic code represent and how the solution is computed from the genetic code.
A method for determining the N solution creatures in the first generation. In general, these need only be “reasonable” attempts at a solution. If these first-generation solutions are too far afield, the evolutionary algorithm may have difficulty converging on a good solution. It is often worthwhile to create the initial solution creatures in such a way that they are reasonably diverse. This will help prevent the evolutionary process from just finding a “locally” optimal solution.
How the solutions are rated.
How the surviving solution creatures reproduce.
Variations
Many variations of the above are feasible. For example:
There does not need to be a fixed number of surviving solution creatures (L) from each generation. The survival rule(s) can allow for a variable number of survivors.
There does not need to be a fixed number of new solution creatures created in each generation (N L). The procreation rules can be independent of the size of the population. Procreation can be related to survival, thereby allowing the fittest solution creatures to procreate the most.
The decision as to whether or not to continue evolving can be varied. It can consider more than just the highest-rated solution creature from the most recent generation(s). It can also consider a trend that goes beyond just the last two generations.
176. Sam Williams, “When Machines Breed,” August 12, 2004, http://www.salon.com/tech/feature/2004/08/1
2/evolvable_hardware/
index_np.html.
177. Here is the basic scheme (algorithm description) of recursive search. Many variations are possible, and the designer of the system needs to provide certain critical parameters and methods, detailed below.
THE RECURSIVE ALGORITHM
Define a function (program) “Pick Best Next Step.” The function returns a value of “SUCCESS” (we’ve solved the problem) or “FAILURE” (we didn’t solve it). If it returns with a value of SUCCESS, then the function also returns the sequence of steps that solved the problem.
PICK BEST NEXT STEP does the following:
Determine if the program can escape from continued recursion at this point. This bullet, and the next two bullets deal with this escape decision.
First, determine if the problem has now been solved. Since this call to Pick Best Next Step probably came from the program calling itself, we may now have a satisfactory solution. Examples are: