Borderlands of Science

Page 27

by Charles Sheffield

How long before our style of personal computer shares history with the slide rule? With that cautionary note in the back of our minds, we begin with some known facts.

In the years since 1946, the speed of electronic computers has increased by a factor of two every two years. There is no evidence that the rate of increase is slowing down. By 2006, machines a billion times as fast as ENIAC will exist.

Between 1946 and 1950, a handful of machines were built. One famous projection, by the management of IBM, estimated that a total of five computers would serve the total needs of the United States. Today, restricting ourselves to general purpose computers, and ignoring the ubiquitous special purpose machines that inhabit everything from thermostats to automobile fuel injection systems, several hundred million machines are in use around the world. Many ordinary households have more than one computer. I had to pause to count, before I realized that here in my house as I write (on a computer, of course) there are seven working machines. It was eight machines until a couple of months ago, when I threw away a computer purchased in 1981 which, although still working, was hopelessly out of date.

Not only speeds and numbers of machines have changed. Consider storage. The first machine for which I ever wrote a program was called DEUCE. It was big enough to walk inside. The engineers would do just that, tapping at suspect vacuum tubes with a screwdriver when the electronics were proving balky. Machine errors were as common a cause of trouble as programing errors; and the latter were dreadfully frequent, because we were working at a level so close to basic machine logic that today the machine would be considered impossible to program.

DEUCE had 402 words of high-speed (mercury delay line) memory, and 8,192 words of back-up (rotating drum) memory. The machine that I just threw away had several times that much storage. There were no tapes or disks on DEUCE, only punched cards for input and for intermediate or final output.

Today, it is a poor personal computer that does not have several hundred million bytes of disk storage. Applications programs have expanded to fill the space available. The word processing language I am using requires five million bytes. I'm not sure what it does with them, because I stay several layers of programming languages away from the basic machine instructions.

Costs have been coming down too, to the point where the choice, to buy or to wait, is not an easy one. We know that in a year's time we will receive several times as much capability for today's price. Costs have decreased in about the same way that speeds have gone up. We receive a billion times as much computing for a dollar as we did in 1946.

Where will all this end? There must be limits to speed of performance and memory, but the past half century is no help at all in defining them. We must look elsewhere for the limiting variables.

Physics provides two seeming limits. First, there is a definite relationship between the frequency with which something oscillates, and the associated energy. Specifically, E=hf where E is energy, f is frequency, and h is Planck's constant.

If we are performing 1015 operations per second, the associated energy is about one electron volt. Recalling from Chapter 5 that the binding energy of electrons to atoms is just a few electron volts, we see that higher speeds than 1015 operations a second will ionize atoms and rip them apart. Here, then, is one apparent limit that we will not be able to surmount.

Actually, we could do better by a factor of a million if we were to build a computer completely from nuclear matter. That would be possible on the surface of a neutron star, but the location might cause other problems.

The speed of light provides a second limit. In order for the computer to function, electrical signals (which travel at light speed) must be able to pass from one component of the computer to another. Since the speed of light is about 300,000 kilometers a second, in 10-15 seconds, the time for one operation, a light-speed signal will travel only 0.3 micrometers. A hydrogen atom is about 10-4 micrometers across, and all other atoms are bigger; therefore, in 10-15 seconds light travels only a distance of at most a couple of hundred atoms. If we could make components that small (currently we cannot) then again we would have a limit of 1015 operations a second. A computer made entirely from nuclear matter could push the size limit a million times smaller.

To set this in perspective, there are computers today whose circuit speeds exceed tens of billions of operations a second. We seem, at last, to be able to make a definite statement: computers have a limiting speed of operations which is certainly less than a hundred thousand times what we can achieve today.

But wait. This speed limit assumes that operations are done sequentially, one after another. This is known as serial operation. Many of today's fastest machines employ parallel operation, in which many computations are done at the same time. For example, suppose we have to multiply one column of N numbers by another column. Since the N results are independent, with N multiplier units in our computer we can do all N multiplications in the same time as one multiplication.

There are machines today that perform over a hundred thousand operations in parallel. This pushes the potential speed of operation of a machine up by a factor of 105. Is there a limit to the degree of parallelism that can be achieved in computation?

It is difficult to see one. We do know that the limit is far higher than 105; the human brain, with its 1011 neurons, has slow "hardware." The discharge/recharge rate of a neuron is no more than a thousand times a second. But a human (and many animals) can process an image received from the eyes, and recognize a particular individual's face, in less than a tenth of a second. That is less than one hundred "hardware cycles." Our computers require hundreds of millions of hardware cycles to do the same task, and do it less well. We have to conclude that some huge parallelism is at work in both human and animal brains.

Even without considering radically different methods of computing (which we are now about to do), we see no limits to computational speed.

10.2 Biological computers. We now consider two new approaches to achieving high parallelism. Both are at the limit of today's science. Some people would argue that the biological computing, of this section, and the quantum computing of the next, are beyond the limit.

In Chapter 6, we noted that the DNA molecule can be thought of as an information-storing device. That the DNA of our chromosomes happens to carry our own genetic code makes it of special interest to us, which is why we have a human genome mapping project, but in fact, any information at all can be stored in DNA. For example, let us treat the nucleotide bases, A, C, G, and T, as codes for the numbers 0 to 3. Then we can write, using binary notation, A=(00), C=(01), G=(10), T=(11). The long string of a DNA molecule may then be thought of as a general purpose storage device. The computer stores information in binary form, as 0s and 1s. Any sequence of binary digits can be mapped exactly onto a sequence of nucleotide bases, and vice versa. For example, the digit string (11011001101010001111) is equivalent to (TCGCGGGATT). This is an extremely compact form of storage, since a few cubic centimeters can hold as many as 1018 DNA molecules. We also have techniques readily available for snipping, joining, and replicating DNA molecules.

So much for storage. This all seems interesting, but not particularly useful, especially since the process of storing and retrieving DNA-stored information is slow compared with electronic storage and retrieval. How can DNA serve as an actual computer, and how can it solve real problems?

The first DNA computer calculation was performed by Leonard Adleman (Adleman, 1994). The problem that he addressed was one in directed graph theory. We will consider here a slightly different and perhaps more familiar application. It is well-known and notorious in operations research and it is called the Traveling Salesman Problem.

Suppose that we are given a set of towns, and know the distance between every pair of them. We assume it may not be the same distance both ways, because of bypasses and one-way streets. If a traveling salesman wants to visit each town just once, and return to his starting place, what is the shortest distance th
at he must travel?

For two towns, the answer is trivial. Similarly for three, where the town-to-town distances define a unique triangle. With four, five, and six towns, the problem becomes a bit more complicated, but it still seems easy. Here, for example, is the case with five towns, Hull, Hornsea, Beverley, Weighton, Driffield, and their associated distance pairs:

Hull

Hornsea

Beverley

Weighton

Driffield

Hull

0

17

10

15

17

Hornsea

18

0

6

12

20

Beverley

12

5

0

14

19

Weighton

12

11

14

0

7

Driffield

16

21

18

6

0

We read this as telling us that the distance from Hull to Weighton is 15 miles (first row, fourth column), while the distance from Weighton to Hull is only 12 miles (fourth row, first column).

There is a simple and absolutely guaranteed way of generating the best solution. Write down the five towns in every possible order, add up the distances for each case, and pick the shortest sum. We can reduce the work somewhat, by noting that it does not matter where we start; so the closed circuit Hull-Beverley-Hornsea-Weighton-Driffield-Hull must give the same total distance as Hornsea-Weighton-Driffield-Hull-Beverley-Hornsea. However, a path may not give the same distance as the reverse path.

Applying the writing-out method to this case, we have to look at 4x3x2x1=24 possible routes. The salesman's minimum distance is 50 miles, and his route is Hull-Beverley-Hornsea-Weighton-Driffield-Hull.

It may seem hard to believe, but if we specify not five but a hundred towns, the method we have proposed is too much even for today's fastest computers. It is easy to see why. It does not matter where we begin. So pick a town. Then we have 99 choices for the second town; for each of those we have 98 choices for the third town; then 97 choices for the fourth town, and so on. We have to evaluate a total of 99x98x97 . . . 2x1 cases. This number is called factorial 99, and it equals about 10156. The lifetime of the universe does not provide enough computing time to enumerate all the routes.

Of course, direct enumeration is a safe but not a sensible method. The actual approach to tackle the problem would use some variation of a method known as linear programming. Even this, however, will not give us an exact answer. We are forced to go to approximate methods that provide good (but not necessarily best) solutions.

Now let us see how to solve the problem with DNA computation. First, we give to each town a string of 20 DNA nucleotides. The number 20 is long enough to prevent confusion in the calculation. There is no difficulty if we preferred 40, or 100, or any other even number, but 20 is sufficient for this small example.

Here are suitable single DNA strands, where we have deliberately written each one as two ten-string pieces, and the gaps after the tenth nucleotide are introduced only for convenience of reading:

* * *

Hull

TCGCGGGATT

AGACTGTAAG

Hornsea

GTTCGAAGTC

AGTCGTACCT

Beverley

AGCTTATATC

GGTATATGGC

Weighton

ATATGGCGAA

CAGTCGTGCG

Driffield

CGGGATTAGA

TAATCAGGTA

It does not matter what the particular strings look like, only that they be different. They can be chosen at random.

As we know, to every DNA string there corresponds a complementary string, in which the nucleotides (A,T) and (C,G) are interchanged. Together, a string and its complementary string make up a portion of DNA double helix.

The complementary single DNA strands for Hull, Hornsea, Beverley, Weighton, and Driffield, with gaps after the tenth nucleotide introduced for convenience, are:

Hull

AGCGCCCTAA

TCTGACATTC

Hornsea

CAAGCTTCAG

TCAGCATGGA

Beverley

TCGAATATAG

CCATATACCG

Weighton

TATACCGCTT

GTCAGCACGC

Driffield

GCCCTAATCT

ATTAGTCCAT

Now we proceed as follows:

1) We copy the single strands for each town billions of times.

2) For each town pair, we make a 20-element strand using the last 10 elements of the complementary string of the first town, and the first 10 elements of the complementary string of the second town; thus, for Hull to Weighton the string is:

TCTGACATTC TATACCGCTT

3) We copy these complementary strands billions of times.

4) For each town-to-town pair, say Hull to Weighton, we take the associated distance (in this case, from the table of distances we see that this distance is 15).

5) Into the middle of each 20-element single strand made for each town pair, insert a line of paired nucleotides equal in length to the distance between that pair. For Hull to Weighton, we then have the string:

TCTGACATTC

AAAAAAAAAAAAAA

TACCGCTT

TTTTTTTTTTTTTTT

The string is a double strand in its middle, a single strand at its ends. Note that the strand corresponding to travel from Weighton to Hull is different than Hull to Weighton.

The strings were written with spaces only to make the structure easier to see. Leaving out the gaps now, we have for Hull to Weighton:

TCTGACATTC

AAAAAAAAAAAAAA

TACCGCTT

TTTTTTTTTTTTTTT

We make such a DNA strand for travel from every town to every other. The length of any strand will be 20+the distance between the pair (taken in the right order).

In our five-town example, there will be exactly 20 strands, one for each element of the distance table.

6) Now we are ready to start work. We put all our town strands and our town-to-town complementary strands into a beaker, and stir gently.

What happens? As we know, complementary pairs have a natural affinity for each other. Any string, such as AGTC, will seek to match with its complementary string, TCAG, so as to form a double strand. A longer string will have to match base pairs along a greater length before a double strand can form.

Consider the strand for a particular town, say Weighton. Its strand is ATATGGCGAA CAGTCGTGCG. The last ten elements of the strand will seek to match their complementary strand, GTCAGCACGC. Such a sequence will be found at one end of every town-to-town string originating at Weighton, and it will link up to form a double strand over those ten sites. Suppose it links to the Weighton-to-Beverley connecting strand. The result will look like this:

(Weighton to Beverley strand, 34 bases)

GTCAGCACGC

AAAAAAAAAAAAA

TCGAATATAG

ATGGCGAA

AGTCGTGCG

TTTTTTTTTTTTT

(match Beverley strand)

(Weighton strand)

(distance 14)

The far end of the Weighton-to-Beverley strand will have an affinity for the Beverley strand, so it will link up with the first ten sites of that town. The last ten sites of Beverley will in turn seek out the first ten sites of another compatible town-to-town strand, whose last ten sites will link to another town (which could, of course, be Weighton again). The process will go on, adding town after town, unless the process terminates by a string "catching its tail," linking the final site back to the first one to form a closed loop.

Here, for example, is Weighton to Beverley to Hull, with Hull at the right-hand end all set to connect with some other town-to-town strand:

(Weighton to Beverley)
>
TCAGCACGC

AAAAAAAAAAAAAA

TGAATATAG . .&npsp;.

ATATGGCGAA

AGTCGTGCG

TTTTTTTTTTTTT

CTTATATC . .&npsp;.

(Weighton)

(Beverley . .&npsp;.)

(Beverley to Hull)

. .&npsp;. CCATATACCG

AAAAAAAAAAA

GCGCCCTAA

. .&npsp;. GGTATATGGC

TTTTTTTTTTTT

TCGCGGGATT

GACTGTAAG

(Beverley continued)

Hull)

7) In our beaker we have endless billions of different DNA strings. We now sort them out chemically, according to the number of nucleotide bases in each. We can put an upper limit on the length of "interesting" strings. Any string that has more nucleotide base pairs in it than 100 (20 for each of the five towns) plus the sum of the five maximum distances, cannot be the solution we want. We reject all those strings and retain the others.

We also have a lower limit. Any string less than a hundred bases long cannot contain all the towns, so it can be rejected.

We return only strands of acceptable length to our beaker.

8) Now we unwind each of the double strands into its two separate strings. We want to find the shortest suitable strand, but to be suitable it must include a sequence of bases from each of the five towns (we might find a loop that went from Beverley to Hornsea to Beverley, total 11 miles and therefore total string length 51 bases, but it would be no good).

‹ Prev Next ›