The Big Picture Page 48 Read online free by Carroll, Sean M.

Home > Other > The Big Picture > Page 48

The Big Picture Page 48

29

At a rate of a billion routes per second, that search would take you longer

30

than the age of the observable universe.

31

The trick, then, isn’t just to find any old search algorithm, but to find ef-

32

ficient ones. And very often, the number of choices is so high that we’re happy

33

to find solutions that are just pretty good, rather than absolutely perfect.

34

Natural selection can be thought of as a search algorithm. The problem

S35

being tackled by evolution is: “What organism would survive and

N36

279

Big Picture - UK final proofs.indd 279

20/07/2016 10:02:50

T H E B IG PIC T U R E

01

reproduce most effectively in this particular environment?” Except it’s not

02

really “organisms” that are being searched, it’s genomes, or particular strings

03

of nucleotides in a strand of DNA. The human genome contains about 3

04

billion nucleotides. That’s a lot, compared to, for example, a bacterium,

05

which might have a few million. But let’s not be too proud; there are flower-

06

ing plants with over 100 billion nucleotide base pairs in their DNA. Some

07

organisms will survive and reproduce, while some will not. How, over the

08

course of generations, do we find the DNA sequences that lead to organ-

09

isms with the highest chance of survival?

10

This counts as a hard problem, from the perspective of computational re-

11

sources. Each of our 3 billion nucleotides is 1 of 4 possible choices: A, C, G,

12

or T. The total number of possible arrangements of human- sized DNA is not

13

four times 3 billion (which wouldn’t be so bad); it’s four to the power of 3 bil-14

lion: 43,000,000,000, or roughly 1 followed by 2 billion zeros. That’s a stupen-

15

dously, hilariously large number. It’s also an overestimate; some sequences of

16

nucleotides have the same functional impact as others, and the vast majority

17

of sequences wouldn’t even lead to an organism. We could count genes rather

18

than nucleotides; that would cut down the number of dimensions consider-

19

ably, although each gene can take many more than four possible forms, so the

20

number is still enormous, and the interdependence of different gene func-

21

tions makes any such counting a little uncertain. By any possible measure the

22

problem of finding the “best” organism by searching through all of the pos-

23

sible genomes an organism could have is a daunting one.

24

Evolution provides a strategy for searching for high- fitness genomes in a

25

ridiculously big space of possibilities. Computer scientists have recently

26

shown that a simplified model of evolution (allowing for mixing via sexual

27

reproduction, but not for mutations) is mathematically equivalent to an

28

algorithm devised by game theorists years ago, known as multiplicative

29

weight updates. Good ideas tend to show up in a variety of places.

30

The phrase “search algorithm” isn’t meant to imply that anyone wrote an

31

algorithm, or anyone is specifying a goal that evolution is supposed to search

32

for. Evolution doesn’t have any goals in mind; evolution simply happens, with

33

Laplacian equanimity, each step following from the previous one. In the spirit

34

of poetic naturalism, “search algorithm” is simply a useful way of talking

35S

about the process of evolution. Under the appropriate circumstances, they are

36N

formally mathematically equivalent, and the connection provides some nice

2 80

Big Picture - UK final proofs.indd 280

20/07/2016 10:02:50

S E A RC h I n g t h R O ug h t h E l A n d S C A PE

visual intuition. Don’t let the language trick you into believing there is any

01

agency guiding the course of evolution, or setting up goals ahead of time; at

02

the same time, don’t let the fear of sounding like you believe in agency prevent

03

you from using a language that gives significant insight into the process.

04

05

•

06

One way of visualizing evolution’s search problem is in terms of a fitness

07

landscape. The idea is that we can assign, to any particular genome in a

08

specific environment, a numerical value called its “fitness.” This number

09

characterizes how likely it is that an organism based on that genome will

10

reproduce in that environment. We can visualize the fitness in terms of a

11

rolling landscape, with hills and valleys, where the role of “directions in

12

space” is played by different forms each gene can take, and the role of

13

“height above ground” is played by the fitness. (When we actually draw a

14

fitness landscape, we typically look at only one or two genes at a time, but

15

in the back of your mind you should remember that this is really a 25, 000-

16

dimensional space we’re thinking of, one for each gene.) A high- fitness hill

17

corresponds to a genome that produces an organism that is very likely to

18

reproduce (the more offspring, the better), while a low- fitness valley is a

19

genome that is unlikely to make it to subsequent generations.

20

21

22

Fitness

23

24

25

26

27

28

29

30

31

Gene 1

32

33

34

Gene 2

S35

N36

2 81

Big Picture - UK final proofs.indd 281

20/07/2016 10:02:50

T H E B IG PIC T U R E

01

We can think of evolution as nudging populations toward higher eleva-

02

tions in the fitness landscape, favoring genes that lead to more fit organisms.

03

That’s a simplification, of course. There isn’t a single fixed fitness landscape

04

appropriate to all species and all circumstances for all times; at best we

05

should think of a particular population in a fixed environment. The shape

06

of the landscape will depend on all of the properties of that environment.

07

Other species come and go, the physical surroundings change, so the land-

08

scape changes over time. But some aspects of the
environment can be stable

09

enough over a sufficiently long time period that a fixed landscape is a useful

10

metaphor for visualizing what goes on.

11

Biologists see the world differently from physicists. The concept of a

12

landscape also appears in physics, for example, when we are asking what

13

phase a system settles down to at a given temperature and pressure. But in

14

the back of their minds, physicists are always thinking about a ball rolling

15

on a hill. Consequently, the favored points on the landscape are the mini-

16

mum values of the function being plotted (typically the energy), since balls

17

roll downward. Biologists are thinking about wily mountain goats, or chil-

18

dren playing a game of King of the Castle. To them, the favored points on

19

the landscape are the maximum values of fitness.

20

Here’s how evolution searches through the fitness landscape, looking for

21

higher peaks: We have a population of organisms of a certain species, so

22

they occupy a set of nearby points on the landscape. Individuals are born,

23

hopefully reproduce, and die. Their offspring have slightly different ge-

24

nomes, so they are located somewhere else on the landscape— not too far

25

away, but not at exactly the same place either. The ones that end up lower

26

on a slope are less likely to reproduce than those that find themselves higher

27

up. As generations pass by, the population finds itself gradually moving

28

toward higher ground.

29

We draw two- dimensional plots, but in reality the number of genes can

30

be very large indeed, so it can take an extremely long time for a population

31

to climb up the landscape. Species may never get to the top of a single hill,

32

much less the highest mountain around, though individual traits may do

33

so. Some parts of the landscape are relatively flat; that’s where different ge-

34

nomes don’t have very different levels of fitness, and genetic drift can be the

35S

dominant feature in evolution. A more realistic portrayal would have a

36N

time- varying landscape, as both physical and biological features of the

2 82

Big Picture - UK final proofs.indd 282

20/07/2016 10:02:50

S E A RC h I n g t h R O ug h t h E l A n d S C A PE

environment continually shift around. When that happens, it’s literally im-

01

possible to simply find the top of a hill and just sit there; one day’s maxi-

02

mum might be a valley the next day.

03

Finally, there’s no sense in which evolution’s algorithm is guaranteed to

04

find the best possible result. Most variations are small, and allow us to ex-

05

plore only nearby points in the landscape. Occasionally, there might be a

06

rare mutation that enables us to skip from one peak to another, but only

07

with peaks that are close together to begin with. Just as for the traveling-

08

salesman problem, finding a good- enough solution can be extremely useful

09

for all practical purposes.

10

11

•

12

The search procedure employed by evolution is so efficient that real human

13

computer programmers often use an analogous process to develop their

14

own strategies. This is a technique known as genetic algorithms. As with

15

genomes, we can imagine the set of all possible algorithms of a certain

16

length, at least within a fixed computer language. There will be a large num-

17

ber of them, and in principle we want to know which one is the best at

18

solving some specified problem. The genetic- algorithm approach works like

19

natural selection, except that the role of the fitness landscape is put in by

20

the programmer. In biology this would be called directed evolution, to em-

21

phasize the difference with natural selection, where the fitness landscape is

22

fixed by nature without any particular agenda.

23

Start with some randomly chosen algorithms, and let them tackle the

24

problem. Take some fraction that do the best, and then “mutate” them,

25

possibly also allowing them to mix with other successful algorithms. Throw

26

away the unsuccessful strategies, and repeat the process. The population of

27

algorithms being studied will gradually climb up the relevant fitness land-

28

scape, defined as how well each strategy does at finding a good solution to

29

its problem. (It’s the virtual equivalent of what Bartel and Szostak did to

30

find RNA configurations that could act as catalysts.)

31

Genetic algorithms provide a nice illustration of some of the interesting

32

features of evolution as a strategy inventor. One such example was invented

33

by computer scientist Melanie Mitchell. She asks us to consider Robby, a

34

virtual robot who lives in a simple world, a ten-by-ten grid of squares.

S35

Robby threw a party last night, and there are empty cans scattered across

N36

2 83

Big Picture - UK final proofs.indd 283

20/07/2016 10:02:50

T H E B IG PIC T U R E

01

the grid. Robby wants to clean them up in a hurry, being as efficient as pos-

02

sible with only a finite amount of time available. Our task is to invent a

03

strategy— an unambiguous set of instructions about what to do at every

04

step— that will help Robby the Robot pick up all the cans on the grid.

05

You might think that Robby can just walk from one can to the next, and

06

the challenge is to find the shortest path. But Robby is burdened with two

07

significant handicaps, perhaps due to partying a little too hard the previous

08

night. First, he can’t see very far. Standing on any one square, Robby can see

09

whether there’s a can on his own square, and he can see whether there’s a

10

can on any of the squares immediately north, south, east, or west of him.

11

But that’s all; he can’t see whether there are any cans diagonally, or on the

12

squares farther away.

13

14

15

16

17

18

19

20

21

22

23
/> 24

25

The world of Robby the Robot, on the left: a grid of squares, some empty and some littered 26

with cans. Robby’s field of view is highlighted. On the right, a situation where Robby is 27

on a square with a can, and with multiple cans nearby.

28

29

Your next thought is therefore that Robby should walk in some kind of

30

pattern, systematically scanning the grid and picking up any cans he sees.

31

But he has a second handicap: Robby has absolutely no memory at all. He

32

doesn’t know where he’s been, what cans he’s picked up, or what he was do-

33

ing even one moment ago. His strategy can only refer to what he must do

34

next based on his situation right now; it can’t include anything like “move

35S

east, and next time move south,” since that would encompass two moves

36N

in a row.

2 84

Big Picture - UK final proofs.indd 284

20/07/2016 10:02:50

S E A RC h I n g t h R O ug h t h E l A n d S C A PE

Given these limitations, it’s straightforward to enumerate every possible

01

strategy that Robby could follow. There are five squares he knows about: his

02

own, and the four neighbors corresponding to each cardinal direction.

03

Each square is in one of three conditions: it can be empty, have a can, or be

04

behind the wall (where he can’t go). Robby’s “state” is a list of what’s on each

05

of the five squares he knows about: a total of 35 = 243 states. There are seven

06

possible actions he can take: he can pick up a can (if one is there), he can

07

move in one of the four cardinal directions, he can move in a random direc-

08

tion, or he can stay put and do nothing.

09

A strategy for Robby is just a specification of one of the seven actions for

10

each of the 243 states. The number of possible strategies is thus 7243, or

11

about 10205. You’re not going to try out every strategy just to find the best

12

possible one.

13

You can try to be clever, and design a strategy you think will do a

14

good job. Mitchell did just that, choosing a baseline strategy for what

15

would count as “pretty good, even if not necessarily the best.” It was a

16

simple approach: if Robby is on a square with a can, pick it up. If not,

17

look for cans on the four nearby squares. If there is one can, move in that

18

direction. If there are no cans, move in a random direction. If there is more

19

than one can, move in a specified direction. Call this the “benchmark strat-

‹ Prev Next ›