29
At a rate of a billion routes per second, that search would take you longer
30
than the age of the observable universe.
31
The trick, then, isn’t just to find any old search algorithm, but to find ef-
32
ficient ones. And very often, the number of choices is so high that we’re happy
33
to find solutions that are just pretty good, rather than absolutely perfect.
34
Natural selection can be thought of as a search algorithm. The problem
S35
being tackled by evolution is: “What organism would survive and
N36
279
Big Picture - UK final proofs.indd 279
20/07/2016 10:02:50
T H E B IG PIC T U R E
01
reproduce most effectively in this particular environment?” Except it’s not
02
really “organisms” that are being searched, it’s genomes, or particular strings
03
of nucleotides in a strand of DNA. The human genome contains about 3
04
billion nucleotides. That’s a lot, compared to, for example, a bacterium,
05
which might have a few million. But let’s not be too proud; there are flower-
06
ing plants with over 100 billion nucleotide base pairs in their DNA. Some
07
organisms will survive and reproduce, while some will not. How, over the
08
course of generations, do we find the DNA sequences that lead to organ-
09
isms with the highest chance of survival?
10
This counts as a hard problem, from the perspective of computational re-
11
sources. Each of our 3 billion nucleotides is 1 of 4 possible choices: A, C, G,
12
or T. The total number of possible arrangements of human- sized DNA is not
13
four times 3 billion (which wouldn’t be so bad); it’s four to the power of 3 bil-14
lion: 43,000,000,000, or roughly 1 followed by 2 billion zeros. That’s a stupen-
15
dously, hilariously large number. It’s also an overestimate; some sequences of
16
nucleotides have the same functional impact as others, and the vast majority
17
of sequences wouldn’t even lead to an organism. We could count genes rather
18
than nucleotides; that would cut down the number of dimensions consider-
19
ably, although each gene can take many more than four possible forms, so the
20
number is still enormous, and the interdependence of different gene func-
21
tions makes any such counting a little uncertain. By any possible measure the
22
problem of finding the “best” organism by searching through all of the pos-
23
sible genomes an organism could have is a daunting one.
24
Evolution provides a strategy for searching for high- fitness genomes in a
25
ridiculously big space of possibilities. Computer scientists have recently
26
shown that a simplified model of evolution (allowing for mixing via sexual
27
reproduction, but not for mutations) is mathematically equivalent to an
28
algorithm devised by game theorists years ago, known as multiplicative
29
weight updates. Good ideas tend to show up in a variety of places.
30
The phrase “search algorithm” isn’t meant to imply that anyone wrote an
31
algorithm, or anyone is specifying a goal that evolution is supposed to search
32
for. Evolution doesn’t have any goals in mind; evolution simply happens, with
33
Laplacian equanimity, each step following from the previous one. In the spirit
34
of poetic naturalism, “search algorithm” is simply a useful way of talking
35S
about the process of evolution. Under the appropriate circumstances, they are
36N
formally mathematically equivalent, and the connection provides some nice
2 80
Big Picture - UK final proofs.indd 280
20/07/2016 10:02:50
S E A RC h I n g t h R O ug h t h E l A n d S C A PE
visual intuition. Don’t let the language trick you into believing there is any
01
agency guiding the course of evolution, or setting up goals ahead of time; at
02
the same time, don’t let the fear of sounding like you believe in agency prevent
03
you from using a language that gives significant insight into the process.
04
05
•
06
One way of visualizing evolution’s search problem is in terms of a fitness
07
landscape. The idea is that we can assign, to any particular genome in a
08
specific environment, a numerical value called its “fitness.” This number
09
characterizes how likely it is that an organism based on that genome will
10
reproduce in that environment. We can visualize the fitness in terms of a
11
rolling landscape, with hills and valleys, where the role of “directions in
12
space” is played by different forms each gene can take, and the role of
13
“height above ground” is played by the fitness. (When we actually draw a
14
fitness landscape, we typically look at only one or two genes at a time, but
15
in the back of your mind you should remember that this is really a 25, 000-
16
dimensional space we’re thinking of, one for each gene.) A high- fitness hill
17
corresponds to a genome that produces an organism that is very likely to
18
reproduce (the more offspring, the better), while a low- fitness valley is a
19
genome that is unlikely to make it to subsequent generations.
20
21
22
Fitness
23
24
25
26
27
28
29
30
31
Gene 1
32
33
34
Gene 2
S35
N36
2 81
Big Picture - UK final proofs.indd 281
20/07/2016 10:02:50
T H E B IG PIC T U R E
01
We can think of evolution as nudging populations toward higher eleva-
02
tions in the fitness landscape, favoring genes that lead to more fit organisms.
03
That’s a simplification, of course. There isn’t a single fixed fitness landscape
04
appropriate to all species and all circumstances for all times; at best we
05
should think of a particular population in a fixed environment. The shape
06
of the landscape will depend on all of the properties of that environment.
07
Other species come and go, the physical surroundings change, so the land-
08
scape changes over time. But some aspects of the
environment can be stable
09
enough over a sufficiently long time period that a fixed landscape is a useful
10
metaphor for visualizing what goes on.
11
Biologists see the world differently from physicists. The concept of a
12
landscape also appears in physics, for example, when we are asking what
13
phase a system settles down to at a given temperature and pressure. But in
14
the back of their minds, physicists are always thinking about a ball rolling
15
on a hill. Consequently, the favored points on the landscape are the mini-
16
mum values of the function being plotted (typically the energy), since balls
17
roll downward. Biologists are thinking about wily mountain goats, or chil-
18
dren playing a game of King of the Castle. To them, the favored points on
19
the landscape are the maximum values of fitness.
20
Here’s how evolution searches through the fitness landscape, looking for
21
higher peaks: We have a population of organisms of a certain species, so
22
they occupy a set of nearby points on the landscape. Individuals are born,
23
hopefully reproduce, and die. Their offspring have slightly different ge-
24
nomes, so they are located somewhere else on the landscape— not too far
25
away, but not at exactly the same place either. The ones that end up lower
26
on a slope are less likely to reproduce than those that find themselves higher
27
up. As generations pass by, the population finds itself gradually moving
28
toward higher ground.
29
We draw two- dimensional plots, but in reality the number of genes can
30
be very large indeed, so it can take an extremely long time for a population
31
to climb up the landscape. Species may never get to the top of a single hill,
32
much less the highest mountain around, though individual traits may do
33
so. Some parts of the landscape are relatively flat; that’s where different ge-
34
nomes don’t have very different levels of fitness, and genetic drift can be the
35S
dominant feature in evolution. A more realistic portrayal would have a
36N
time- varying landscape, as both physical and biological features of the
2 82
Big Picture - UK final proofs.indd 282
20/07/2016 10:02:50
S E A RC h I n g t h R O ug h t h E l A n d S C A PE
environment continually shift around. When that happens, it’s literally im-
01
possible to simply find the top of a hill and just sit there; one day’s maxi-
02
mum might be a valley the next day.
03
Finally, there’s no sense in which evolution’s algorithm is guaranteed to
04
find the best possible result. Most variations are small, and allow us to ex-
05
plore only nearby points in the landscape. Occasionally, there might be a
06
rare mutation that enables us to skip from one peak to another, but only
07
with peaks that are close together to begin with. Just as for the traveling-
08
salesman problem, finding a good- enough solution can be extremely useful
09
for all practical purposes.
10
11
•
12
The search procedure employed by evolution is so efficient that real human
13
computer programmers often use an analogous process to develop their
14
own strategies. This is a technique known as genetic algorithms. As with
15
genomes, we can imagine the set of all possible algorithms of a certain
16
length, at least within a fixed computer language. There will be a large num-
17
ber of them, and in principle we want to know which one is the best at
18
solving some specified problem. The genetic- algorithm approach works like
19
natural selection, except that the role of the fitness landscape is put in by
20
the programmer. In biology this would be called directed evolution, to em-
21
phasize the difference with natural selection, where the fitness landscape is
22
fixed by nature without any particular agenda.
23
Start with some randomly chosen algorithms, and let them tackle the
24
problem. Take some fraction that do the best, and then “mutate” them,
25
possibly also allowing them to mix with other successful algorithms. Throw
26
away the unsuccessful strategies, and repeat the process. The population of
27
algorithms being studied will gradually climb up the relevant fitness land-
28
scape, defined as how well each strategy does at finding a good solution to
29
its problem. (It’s the virtual equivalent of what Bartel and Szostak did to
30
find RNA configurations that could act as catalysts.)
31
Genetic algorithms provide a nice illustration of some of the interesting
32
features of evolution as a strategy inventor. One such example was invented
33
by computer scientist Melanie Mitchell. She asks us to consider Robby, a
34
virtual robot who lives in a simple world, a ten-by-ten grid of squares.
S35
Robby threw a party last night, and there are empty cans scattered across
N36
2 83
Big Picture - UK final proofs.indd 283
20/07/2016 10:02:50
T H E B IG PIC T U R E
01
the grid. Robby wants to clean them up in a hurry, being as efficient as pos-
02
sible with only a finite amount of time available. Our task is to invent a
03
strategy— an unambiguous set of instructions about what to do at every
04
step— that will help Robby the Robot pick up all the cans on the grid.
05
You might think that Robby can just walk from one can to the next, and
06
the challenge is to find the shortest path. But Robby is burdened with two
07
significant handicaps, perhaps due to partying a little too hard the previous
08
night. First, he can’t see very far. Standing on any one square, Robby can see
09
whether there’s a can on his own square, and he can see whether there’s a
10
can on any of the squares immediately north, south, east, or west of him.
11
But that’s all; he can’t see whether there are any cans diagonally, or on the
12
squares farther away.
13
14
15
16
17
18
19
20
21
22
23
/> 24
25
The world of Robby the Robot, on the left: a grid of squares, some empty and some littered 26
with cans. Robby’s field of view is highlighted. On the right, a situation where Robby is 27
on a square with a can, and with multiple cans nearby.
28
29
Your next thought is therefore that Robby should walk in some kind of
30
pattern, systematically scanning the grid and picking up any cans he sees.
31
But he has a second handicap: Robby has absolutely no memory at all. He
32
doesn’t know where he’s been, what cans he’s picked up, or what he was do-
33
ing even one moment ago. His strategy can only refer to what he must do
34
next based on his situation right now; it can’t include anything like “move
35S
east, and next time move south,” since that would encompass two moves
36N
in a row.
2 84
Big Picture - UK final proofs.indd 284
20/07/2016 10:02:50
S E A RC h I n g t h R O ug h t h E l A n d S C A PE
Given these limitations, it’s straightforward to enumerate every possible
01
strategy that Robby could follow. There are five squares he knows about: his
02
own, and the four neighbors corresponding to each cardinal direction.
03
Each square is in one of three conditions: it can be empty, have a can, or be
04
behind the wall (where he can’t go). Robby’s “state” is a list of what’s on each
05
of the five squares he knows about: a total of 35 = 243 states. There are seven
06
possible actions he can take: he can pick up a can (if one is there), he can
07
move in one of the four cardinal directions, he can move in a random direc-
08
tion, or he can stay put and do nothing.
09
A strategy for Robby is just a specification of one of the seven actions for
10
each of the 243 states. The number of possible strategies is thus 7243, or
11
about 10205. You’re not going to try out every strategy just to find the best
12
possible one.
13
You can try to be clever, and design a strategy you think will do a
14
good job. Mitchell did just that, choosing a baseline strategy for what
15
would count as “pretty good, even if not necessarily the best.” It was a
16
simple approach: if Robby is on a square with a can, pick it up. If not,
17
look for cans on the four nearby squares. If there is one can, move in that
18
direction. If there are no cans, move in a random direction. If there is more
19
than one can, move in a specified direction. Call this the “benchmark strat-
The Big Picture Page 48