by Ray Kurzweil
There are a number of other key ways in which the brain differs from a conventional computer:
The brain’s circuits are very slow. Synaptic-reset and neuron-stabilization times (the amount of time required for a neuron and its synapses to reset themselves after the neuron fires) are so slow that there are very few neuron-firing cycles available to make pattern-recognition decisions. Functional magnetic-resonance imaging (fMRI) and magneto-encephalography (MEG) scans show that judgments that do not require resolving ambiguities appear to be made in a single neuron-firing cycle (less than twenty milliseconds), involving essentially no iterative (repeated) processes. Recognition of objects occurs in about 150 milliseconds, so that even if we “think something over,” the number of cycles of operation is measured in hundreds or thousands at most, not billions, as with a typical computer.
But it’s massively parallel. The brain has on the order of one hundred trillion interneuronal connections, each potentially processing information simultaneously. These two factors (slow cycle time and massive parallelism) result in a certain level of computational capacity for the brain, as we discussed earlier.
Today our largest supercomputers are approaching this range. The leading supercomputers (including those used by the most popular search engines) measure over 1014 cps, which matches the lower range of the estimates I discussed in chapter 3 for functional simulation. It is not necessary, however, to use the same granularity of parallel processing as the brain itself so long as we match the overall computational speed and memory capacity needed and otherwise simulate the brain’s massively parallel architecture.
The brain combines analog and digital phenomena. The topology of connections in the brain is essentially digital—a connection exists, or it doesn’t. An axon firing is not entirely digital but closely approximates a digital process. Most every function in the brain is analog and is filled with nonlinearities (sudden shifts in output, rather than levels changing smoothly) that are substantially more complex than the classical model that we have been using for neurons. However, the detailed, nonlinear dynamics of a neuron and all of its constituents (dendrites, spines, channels, and axons) can be modeled through the mathematics of nonlinear systems. These mathematical models can then be simulated on a digital computer to any desired degree of accuracy. As I mentioned, if we simulate the neural regions using transistors in their native analog mode rather than through digital computation, this approach can provide improved capacity by three or four orders of magnitude, as Carver Mead has demonstrated.13
The brain rewires itself. Dendrites are continually exploring new spines and synapses. The topology and conductance of dendrites and synapses are also continually adapting. The nervous system is self-organizing at all levels of its organization. While the mathematical techniques used in computerized pattern-recognition systems such as neural nets and Markov models are much simpler than those used in the brain, we do have substantial engineering experience with self-organizing models.14 Contemporary computers don’t literally rewire themselves (although emerging “self-healing systems” are starting to do this), but we can effectively simulate this process in software.15 In the future, we can implement this in hardware, as well, although there may be advantages to implementing most self-organization in software, which provides more flexibility for programmers.
Most of the details in the brain are random. While there is a great deal of stochastic (random within carefully controlled constraints) process in every aspect of the brain, it is not necessary to model every “dimple” on the surface of every dendrite, any more than it is necessary to model every tiny variation in the surface of every transistor in understanding the principles of operation of a computer. But certain details are critical in decoding the principles of operation of the brain, which compels us to distinguish between them and those that comprise stochastic “noise” or chaos. The chaotic (random and unpredictable) aspects of neural function can be modeled using the mathematical techniques of complexity theory and chaos theory.16
The brain uses emergent properties. Intelligent behavior is an emergent property of the brain’s chaotic and complex activity. Consider the analogy to the apparently intelligent design of termite and ant colonies, with their delicately constructed interconnecting tunnels and ventilation systems. Despite their clever and intricate design, ant and termite hills have no master architects; the architecture emerges from the unpredictable interactions of all the colony members, each following relatively simple rules.
The brain is imperfect. It is the nature of complex adaptive systems that the emergent intelligence of its decisions is suboptimal. (That is, it reflects a lower level of intelligence than would be represented by an optimal arrangement of its elements.) It needs only to be good enough, which in the case of our species meant a level of intelligence sufficient to enable us to outwit the competitors in our ecological niche (for example, primates who also combine a cognitive function with an opposable appendage but whose brains are not as developed as humans and whose hands do not work as well).
We contradict ourselves. A variety of ideas and approaches, including conflicting ones, leads to superior outcomes. Our brains are quite capable of holding contradictory views. In fact, we thrive on this internal diversity. Consider the analogy to a human society, particularly a democratic one, with its constructive ways of resolving multiple viewpoints.
The brain uses evolution. The basic learning paradigm used by the brain is an evolutionary one: the patterns of connections that are most successful in making sense of the world and contributing to recognitions and decisions survive. A newborn’s brain contains mostly randomly linked interneuronal connections, and only a portion of those survive in the two-year-old brain.17
The patterns are important. Certain details of these chaotic self-organizing methods, expressed as model constraints (rules defining the initial conditions and the means for self-organization), are crucial, whereas many details within the constraints are initially set randomly. The system then self-organizes and gradually represents the invariant features of the information that has been presented to the system. The resulting information is not found in specific nodes or connections but rather is a distributed pattern.
The brain is holographic. There is an analogy between distributed information in a hologram and the method of information representation in brain networks. We find this also in the self-organizing methods used in computerized pattern recognition, such as neural nets, Markov models, and genetic algorithms.18
The brain is deeply connected. The brain gets its resilience from being a deeply connected network in which information has many ways of navigating from one point to another. Consider the analogy to the Internet, which has become increasingly stable as the number of its constituent nodes has increased. Nodes, even entire hubs of the Internet, can become inoperative without ever bringing down the entire network. Similarly, we continually lose neurons without affecting the integrity of the entire brain.
The brain does have an architecture of regions. Although the details of connections within a region are initially random within constraints and self-organizing, there is an architecture of several hundred regions that perform specific functions, with specific patterns of connections between regions.
The design of a brain region is simpler than the design of a neuron. Models often get simpler at a higher level, not more complex. Consider an analogy with a computer. We do need to understand the detailed physics of semiconductors to model a transistor, and the equations underlying a single real transistor are complex. However, a digital circuit that multiplies two numbers, although involving hundreds of transistors, can be modeled far more simply, with only a few formulas. An entire computer with billions of transistors can be modeled through its instruction set and register description, which can be described on a handful of written pages of text and mathematical transformations.
The software programs for an operating system, language compilers, and assemblers are reasonably complex,
but modeling a particular program—for example, a speech-recognition program based on Markov modeling—may be described in only a few pages of equations. Nowhere in such a description would be found the details of semiconductor physics. A similar observation also holds true for the brain. A particular neural arrangement that detects a particular invariant visual feature (such as a face) or that performs a band-pass filtering (restricting input to a specific frequency range) operation on auditory information or that evaluates the temporal proximity of two events can be described with far greater simplicity than the actual physics and chemical relations controlling the neurotransmitters and other synaptic and dendritic variables involved in the respective processes. Although all of this neural complexity will have to be carefully considered before advancing to the next higher level (modeling the brain), much of it can be simplified once the operating principles of the brain are understood.
Trying to Understand Our Own Thinking
The Accelerating Pace of Research
We are now approaching the knee of the curve (the period of rapid exponential growth) in the accelerating pace of understanding the human brain, but our attempts in this area have a long history. Our ability to reflect on and build models of our thinking is a unique attribute of our species. Early mental models were of necessity based on simply observing our external behavior (for example, Aristotle’s analysis of the human ability to associate ideas, written 2,350 years ago).19
At the beginning of the twentieth century we developed the tools to examine the physical processes inside the brain. An early breakthrough was the measurement of the electrical output of nerve cells, developed in 1928 by neuroscience pioneer E. D. Adrian, which demonstrated that there were electrical processes taking place inside the brain.20 As Adrian wrote, “I had arranged electrodes on the optic nerve of a toad in connection with some experiments on the retina. The room was nearly dark and I was puzzled to hear repeated noises in the loudspeaker attached to the amplifier, noises indicating that a great deal of impulse activity was going on. It was not until I compared the noises with my own movements around the room that I realized I was in the field of vision of the toad’s eye and that it was signaling what I was doing.”
Adrian’s key insight from this experiment remains a cornerstone of neuroscience today: the frequency of the impulses from the sensory nerve is proportional to the intensity of the sensory phenomena being measured. For example, the higher the intensity of light, the higher the frequency (pulses per second) of the neural impulses from the retina to the brain. It was a student of Adrian, Horace Barlow, who contributed another lasting insight, “trigger features” in neurons, with the discovery that the retinas of frogs and rabbits had single neurons that would trigger on “seeing” specific shapes, directions, or velocities. In other words, perception involves a series of stages, with each layer of neurons recognizing more sophisticated features of the image.
In 1939 we began to develop an idea of how neurons perform: by accumulating (adding) their inputs and then producing a spike of membrane conductance (a sudden increase in the ability of the neuron’s membrane to conduct a signal) and voltage along the neuron’s axon (which connects to other neurons via a synapse). A. L. Hodgkin and A. F. Huxley described their theory of the axon’s “action potential” (voltage).21 They also made an actual measurement of an action potential on an animal neural axon in 1952.22 They chose squid neurons because of their size and accessible anatomy.
Building on Hodgkin and Huxley’s insight W. S. McCulloch and W. Pitts developed in 1943 a simplified model of neurons and neural nets that motivated a half century of work on artificial (simulated) neural nets (using a computer program to simulate the way neurons work in the brain as a network). This model was further refined by Hodgkin and Huxley in 1952. Although we now realize that actual neurons are far more complex than these early models, the original concept has held up well. This basic neural-net model has a neural “weight” (representing the “strength” of the connection) for each synapse and a nonlinearity (firing threshold) in the neuron soma (cell body).
As the sum of the weighted inputs to the neuron soma increases, there is relatively little response from the neuron until a critical threshold is reached, at which point the neuron rapidly increases the output of its axon and fires. Different neurons have different thresholds. Although recent research shows that the actual response is more complex than this, the McCulloch-Pitts and Hodgkin-Huxley models remain essentially valid.
These insights led to an enormous amount of early work in creating artificial neural nets, in a field that became known as connectionism. This was perhaps the first self-organizing paradigm introduced to the field of computation.
A key requirement for a self-organizing system is a nonlinearity: some means of creating outputs that are not simple weighted sums of the inputs. The early neural-net models provided this nonlinearity in their replica of the neuron nucleus.23 (The basic neural-net method is straightforward.)24 Work initiated by Alan Turing on theoretical models of computation around the same time also showed that computation requires a nonlinearity. A system that simply creates weighted sums of its inputs cannot perform the essential requirements of computation.
We now know that actual biological neurons have many other nonlinearities resulting from the electrochemical action of the synapses and the morphology (shape) of the dendrites. Different arrangements of biological neurons can perform computations, including adding, subtracting, multiplying, dividing, averaging, filtering, normalizing, and thresholding signals, among other types of transformations.
The ability of neurons to perform multiplication is important because it allows the behavior of one network of neurons in the brain to be modulated (influenced) by the result of computations of another network. Experiments using electrophysiological measurements on monkeys provide evidence that the rate of signaling by neurons in the visual cortex when processing an image is increased or decreased by whether or not the monkey is paying attention to a particular area of that image.25 Human fMRI studies have also shown that paying attention to a particular area of an image increases the responsiveness of the neurons processing that image in a cortical region called V5, which is responsible for motion detection.26
Another key breakthrough occurred in 1949 when Donald Hebb presented his seminal theory of neural learning, the “Hebbian response”: if a synapse (or group of synapses) is stimulated repeatedly, that synapse becomes stronger. Over time this conditioning of the synapse produces a learning response. The connectionism movement designed simulated neural nets based on this model, and this gave momentum to such experiments during the 1950s and 1960s.
The connectionism movement experienced a setback in 1969 with the publication of the book Perceptrons by MIT’s Marvin Minsky and Seymour Papert.27 It included a key theorem demonstrating that the most common (and simplest) type of neural net used at the time (called a Perceptron, pioneered by Cornell’s Frank Rosenblatt) was unable to solve the simple problem of determining whether or not a line drawing was fully connected.28 The neural-net movement had a resurgence in the 1980s using a method called “backpropagation,” in which the strength of each simulated synapse was determined using a learning algorithm that adjusted the weight (the strength of the output) of each artificial neuron after each training trial so the network could “learn” to more correctly match the right answer.
However, backpropagation is not a feasible model of training synaptic weights in an actual biological neural network, because backward connections to actually adjust the strength of the synaptic connections do not appear to exist in mammalian brains. In computers, however, this type of self-organizing system can solve a wide range of pattern-recognition problems, and the power of this simple model of self-organizing interconnected neurons has been demonstrated.
Less well known is Hebb’s second form of learning: a hypothesized loop in which the excitation of a neuron would feed back on itself (possibly through other layers), causing a reverber
ation (a continued reexcitation of the neurons in the loop). Hebb theorized that this type of reverberation could be the source of short-term learning. He also suggested that this short-term reverberation could lead to long-term memories: “Let us assume then that the persistence or repetition of a reverberatory activity (or ‘trace’) tends to induce lasting cellular changes that add to its stability. The assumption can be precisely stated as follows: When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”
Although Hebbian reverberatory memory is not as well established as Hebb’s synaptic learning, instances have recently been discovered. For example, sets of excitatory neurons (ones that stimulate a synapse) and inhibitory neurons (ones that block a stimulus) begin an oscillation when certain visual patterns are presented.29 And researchers at MIT and Lucent Technologies’ Bell Labs have created an electronic integrated circuit, composed of transistors, that simulates the action of sixteen excitatory neurons and one inhibitory neuron to mimic the biological circuitry of the cerebral cortex.30
These early models of neurons and neural information processing, although overly simplified and inaccurate in some respects, were remarkable, given the lack of data and tools when these theories were developed.
Peering into the Brain
We’ve been able to reduce drift and noise in our instruments to such an extent that we can see the tiniest motions of these molecules, through distances that are less than their own diameters. . . . [T]hese kinds of experiments were just pipedreams 15 years ago.