The Science of Discworld II - The Globe tsod-2

Page 5

by Terry Pratchett

An excellent way to approach 'why' questions is to consider alternatives and rule them out. 'Why did you park the car round the corner down a side-street?' 'Because if I'd parked outside the front door on the double yellow lines, a traffic warden would have given me a parking ticket.' This particular 'why' question is a story, a piece of fiction: a hypothetical discussion of the likely consequences of an action that never occurred. Humans invented their own brand of narrativium as an aid to the exploration of I-space, the space of 'insteads'. Narrative provides I-space with a geography: if I did this instead of that, then what would happen would be ...

On Discworld, phase spaces are real. The fictitious alternatives to the one actual state exist, too, and you can get inside the phase space and roam over its landscape -provided you know the right spells, secret entrances and other magical paraphernalia. L-space is a case in point. On Roundworld, we can pretend that phase space exists, and we can imagine exploring its geography. This pretence has turned out to be extraordinarily insightful.

Associated with any physical system, then, is a phase space, a space of the possible. If you're studying the solar system, then the phase space comprises all possible ways to arrange one star, nine planets, a considerable number of moons and a gigantic number of asteroids in space. If you're studying a sand-pile, then the phase space comprises the number of possible ways to arrange several million grains of sand. If you're studying thermodynamics, then the phase space comprises all possible positions and velocities for a large number of gas molecules. Indeed, for each molecule there are three position coordinates and three velocity coordinates, because the molecule lives in three-dimensional space. So with N molecules there are 6 N coordinates altogether. If you're looking at games of chess, then the phase space consists of all possible positions of the pieces on the board. If you're thinking about all possible books, then the phase space is L-space. And if you're thinking about all possible universes, you're contemplating U- space. Each 'point' of U-space is an entire universe (and you have to invent the multiverse to hold them all ... )

When cosmologists think about varying the natural constants, as we described in Chapter 2 in connection with the carbon resonance in stars, they are thinking about one tiny and rather obvious piece of U-space, the part that can be derived from our universe by changing the fundamental constants but otherwise keeping the laws the same. There are infinitely many other ways to set up an alternative universe: they range from having 101 dimensions and totally different laws to being identical with our universe except for six atoms of dysprosium in the core of the star Procyon that change into iodine on Thursdays.

As this example suggests, the first thing to appreciate about phase spaces is that they are generally rather big. What the universe actually does is a tiny proportion of all the things it could have done instead. For instance, suppose that a car park has one hundred parking slots, and that cars are either red, blue, green, white, or black. When the car park is full, how many different patterns of colour are there? Ignore the make of car, ignore how well or badly it is parked; focus solely on the pattern of colours.

Mathematicians call this kind of question 'combinatorics', and they have devised all sorts of clever ways to find answers. Roughly speaking, combinatorics is the art of counting things without actually counting them. Many years ago a mathematical acquaintance of ours came across a university administrator counting light bulbs in the roof of a lecture hall. The lights were arranged in a perfect rectangular grid, 10 by 20. The administrator was staring at the ceiling, going '49, 50, 51 -'

'Two hundred,' said the mathematician.

'How do you know that?'

'Well, it's a 10 by 20 grid, and 10 times 20 is 200.' 'No, no,' replied the administrator. 'I want the exact number.'[13] Back to those cars. There are five colours, and each slot can be filled by just one of them. So there are five ways to fill the first slot, five ways to fill the second, and so on. Any way to fill the first slot can be combined with any way to fill the second, so those two slots can be filled in 5 X 5 = 25 ways. Each of those can be combined with any of the five ways to fill the third slot, so now we have 25 x 5 = 125 possibilities. By the same reasoning, the total number of ways to fill the whole car park is 5 x 5 x 5 ... X 5, with a hundred fives. This is 5100, which is rather big. To be precise, it is

78886090522101180541172856528278622 96732064351090230047702789306640625

(we've broken the number in two so that it fits the page width) which has 70 digits. It took a computer algebra system about five seconds to work that out, by the way, and about 4.999 of those seconds were taken up with giving it the instructions. And most of the rest was used up printing the result to the screen. Anyway, you now see why combinatorics is the art of counting without actually counting; if you listed all the possibilities and counted them '1, 2, 3, 4 ...' you'd never finish. So it's a good job that the university administrator wasn't in charge of car parking.

How big is L-space? The Librarian said it is infinite, which is true if you used infinity to mean 'a much larger number than I can envisage' or if you don't place an upper limit on how big a book can be[14], or if you allow all possible alphabets, syllabaries, and pictograms. If stick to 'ordinarysized'

English books, we can reduce the estimate. A typical book is 100,000 words long, or about

600,000 characters (letters and spaces, we'll ignore punctuation marks). There are 26 letters in the English alphabet, plus a space, making 27 characters that can go into each of the 600,000 possible positions. The counting principle that we used to solve the car-parking problem now implies that the maximum number of books of this length is 27600,000, which is roughly 10860,000

(that is, an 860,000-digit number). Of course, most those 'books' make very little sense, because we've not yet insisted that the letters make sensible words. If we assume that the words are drawn from a list of 10,000 standard ones, and calculate the number of ways to arrange 100,000 words in order, then the figure changes. 10,000100,000 is equal to 10400,000, and this is quite a bit smaller ... but still enormous. Mind you, most of those books wouldn't make much sense either; they'd read something like 'Cabbage patronymic forgotten prohibit hostile quintessence' continuing at book length[15]. So maybe we ought to work with sentences ... At any rate, even if we cut the numbers down in that manner, it turns out that the universe is not big enough to contain that many physical books. So it's a good job that L-space is available, and now we know why there's never enough shelf space. We like to think that our major libraries, such as the British Library or the Library of Congress, are pretty big. But, in fact, the space of those books that actually exist is a tiny, tiny fraction of L-space, all the books that could have existed. In particular, we're never going to run out of new books to write.

Poincare's phase space viewpoint has proved to be so useful that nowadays you'll find it in every area of science -and in areas that aren't science at all. A major consumer of phase spaces is economics. Suppose that a national economy involves a million different goods -cheese, bicycles, rats-on-a-stick, and so on. Associated with each good is a price, say .2.35 for a lump of cheese, .449.99 for a bicycle, .15.00 for a rat-on-a-stick. So the state of the economy is a list of one million numbers. The phase space consists of all possible lists of a million numbers, including many lists that make no economic sense at all, such as lists that include the .0.02 bicycle or the .999,999,999.95 rat. The economist's job is to discover the principles that select, from the space of all possible lists of numbers, the actual list that is observed.

The classic principle of this kind is the Law of Supply and Demand, which says that if goods are in short supply and you really, really want them, then the price goes up. It sometimes works, but it often doesn't. Finding such laws is something of a black art, and the results are not totally convincing, but that just tells us that economics is hard. Poor results notwithstanding, the economist's way of thinking is a phase space point of view.

Here's a little tale that shows just ho
w far removed economic theory is from reality. The basis of conventional economics is the idea of a rational agent with perfect information, who maximises utility. According to these assumptions, a taxi-driver, for example, will arrange his activities to generate the most money for the least effort.

Now, the income of a taxi-driver depends on circumstances. On good days, with lots of passengers around, he will do well; on bad days, he won't. A rational taxi-driver will therefore work longer on good days and give up early on bad ones. However, a study of taxi-drivers in New York carried out by Colin Camerer and others shows the exact opposite. The taxi-drivers seem to set themselves a daily target, and stop working once they reach it. So they work shorter hours on good days, and longer hours on bad ones. They could increase their earnings by 8 per cent just by working the same number of hours every day, for the same total working time. If they worked longer on good days and shorter on bad ones, they could increase their earnings by

15 per cent. But they don't have a good enough intuition for economic phase space to appreciate this. They are adopting a common human trait of placing too much value on what they have today, and too little on what they may gain tomorrow.

Biology, too, has been invaded by phase spaces. The first of these to gain widespread currency was DNA-space. Associated with every living organism is its genome, a string of chemical molecules called UNA. The DNA molecule is a double helix, two spirals wrapped round a common core. Each spiral is made up of a string of 'bases' or 'nucleotides', which come in four varieties: cytosine, guanine, adenine, thymine, normally abbreviated to their initials C, G, A, T.

The sequences on the two strings are 'complementary': wherever C appears on one string, you get G on the other, and similarly for A and T. the DNA contains two copies of the sequence, one positive and on negative, so to speak. In the abstract, then, the genome can be thought of as a single sequence of these four letters, something like AATG GCCTCAG ... going on for rather a long time. The human genome for example, goes on for about three billion letters.

The phase space for genomes, DNA-space, consists of all possible sequences of a given length. If we're thinking about human beings the relevant DNA-space comprises all possible sequences of three billion code letters C, G, A, T How big is that space? It's the same problem as the cars in the car park, mathematically speaking, so the answer is 4x4x4x...x4 with three billion 4s. That is

43,000,000,000

. This number is a lot bigger than the 70-digit number we got for the car-parking problem. It's a lot bigger than L-space for normal-size books, too. In fact, it has about

1,800,000,000 digits. If you wrote it out with 3,000 digits per page, you'd need a 600,000-page book to hold it.

The image of DNA-space is very useful for geneticists who are considering possible changes to DNA sequences, such as 'point mutations' where one code letter is changed, say as the result of a copying error. Or an incoming high-energy cosmic ray. Viruses, in particular, mutate so rapidly that it makes little sense to talk of a viral species as a fixed thing. Instead, biologists talk of quasi-species, and visualise these as clusters of related sequences in DNA-space. The clusters slosh around as time passes, but they stay together as one cluster, which allows the virus to retain its identity.

In the whole of human history, the total number of people has been no more than ten billion, a mere 11-digit number. This is an incredibly tiny fraction of all those possibilities. So actual human beings have explored the tiniest portion of DNA-space, just as actual books have explored the tiniest portion of L-space. Of course, the interesting questions are not as straightforward as that. Most sequences of letters do not make up a sensible book; most DNA

sequences do not correspond to a viable organism, let alone a human being.

And now we come to the crunch for phase spaces. In physics, it is reasonable to assume that the sensible phase space can be 'pre-stated' before tackling questions about the corresponding system. We can imagine rearranging the bodies of the solar system into any configuration in that imaginary phase space. We lack the engineering capacity to do that, but we have no difficulty imagining it done, and we see no physical reason to remove any particular configuration from consideration.

When it comes to DNA-space, however, the important questions are not about the whole of that vast space of all possible sequences. Nearly all of those sequences correspond to no organism whatsoever, not even a dead one. What we really need to consider is 'viable-DNA-space', the space of all DNA sequences that could be realised within some viable organism. This is some immensely complicated but very thin part of DNA-space, and we don't know what it is. We have no idea how to look at a hypothetical DNA sequence and decide whether it can occur in a viable organism.

The same problem arises in connection with L-space, but there's a twist. A literate human can look at a sequence of letters and spaces and decide whether it constitutes a story; they know how to 'read' the code and work out its meaning, if it's in a language they understand. They can even make a stab at deciding whether it's a good story or a bad one. However, we do not know how to transfer this ability to a computer. The rules that our minds use, to decide whether what we're reading is a story, are implicit in the networks of nerve cells in our brains. Nobody has yet been able to make those rules explicit. We don't know how to characterise the 'readable books' subset of L-space.

For DNA, the problem is compounded because there isn't some kind of fixed rule that 'translates'

a DNA code into an organism. Biologists used to think there would be, and had high hopes of learning the 'language' involved. Then the DNA for a genuine (potential) organism would be a code sequence that told a coherent story of biological development, and all other DNA sequences would be gibberish. In effect, the biologists expected to be able to look at the DNA sequence of a tiger and see the bit that specified the stripes, the bit that specified the claws, and so on. This was a bit optimistic. The current state of the art is that we can see the bit of DNA that specifies the protein from which claws are made, or the bits that make the orange, black and white pigments on the fur that show up as stripes, but that's about as far as our understanding of DNA narrative goes. It is now becoming clear that many non-genetic factors go into the growth of an organism, too, so even in principle there may not be a 'language' that translates DNA into living creatures.

For example, tiger DNA turns into a baby tiger only in the presence of an egg, supplied by a mother tiger. The same DNA in the presence of a mongoose egg, would not make a tiger at all.

Now, it could be that this is just a technical problem: that for each DNA code there is a unique kind of mother-organism that turns it into a living creature, so that the form of that creature is still implicit in the code. But theoretically, at least, the same DNA code could make two totally different organisms. We give an example in The Collapse of Chaos where the developing organism first 'looks' to see what kind of mother it is in, and then develops in different ways depending on what it sees.

Complexity guru Stuart Kauffman has taken this difficulty a stage further. He points out that while in physics we can expect to pre-stage the phase space of a system, the same is never true in biology. Biological systems are more creative than physical ones: the organisation of matter within living creatures is of a different qualitative nature from the organisation we find in inorganic matter. In particular, organisms can evolve, and when they do that they often become more complicated. The fish-like ancestor of humans was less complicate than we are today, for example. (We've not specified a measure of complexity here, but that statement will be reasonable for most sensible measures of complexity, so let's not worry about definitions.

Evolution does not necessarily increase complexity, but it's at its mo puzzling when it does.)

Kauffman contrasts two systems. One is the traditional thermodynamic model in physics, of N

gas molecules (modelled as hard spheres) bouncing around inside their 6N-dimensional phase space. Here we know the pha
se space in advance, we can specify the dynamic precisely, and we can deduce general laws. Among them is the Second Law of Thermodynamics, which states that with overwhelming probability the system will become more disordered as time passes, and the molecules will distribute themselves uniformly throughout their container.

The second system is the 'biosphere', an evolving ecology. Here, it is not at all clear which phase space to use. Potential choices are either much too big, or much too limited. Suppose for a moment that the old biologists' dream of a DNA language for organisms was true. Then we might hope to employ DNA-space as our phase space.

However, as we've just seen, only a tiny, intricate subset of that space would really be of interest

-but we can't work out which subset. When you add to that the probable non-existence of any such language, the whole approach falls apart. On the other hand, if the phase space is too small, entirely reasonable changes might take the organisms outside it altogether. For example, tiger- space might be defined in terms of the number of stripes on the big cat's body. But if one day a big cat evolves that has spots instead of stripes, there's no place for it in the tiger phase space.

Sure, it's not a tiger ... but its mother was. We can't sensibly exclude this kind of innovation if we want to understand real biology.

As organisms evolve, they change. Sometimes evolution can be seen as the opening-up of a region of phase space that was sitting there waiting, but was not occupied by organisms. If the colours and patterns on an insect change a bit, all that we're seeing is the exploration of new regions of a fairly well-defined 'insect-space'. But when an entirely new trick, wings, appears, even the phase space seems to have changed.

‹ Prev Next ›