The Big Picture

Home > Other > The Big Picture > Page 48
The Big Picture Page 48

by Carroll, Sean M.


  29

  At a rate of a billion routes per second, that search would take you longer

  30

  than the age of the observable universe.

  31

  The trick, then, isn’t just to find any old search algorithm, but to find ef-

  32

  ficient ones. And very often, the number of choices is so high that we’re happy

  33

  to find solutions that are just pretty good, rather than absolutely perfect.

  34

  Natural selection can be thought of as a search algorithm. The problem

  S35

  being tackled by evolution is: “What organism would survive and

  N36

  279

  Big Picture - UK final proofs.indd 279

  20/07/2016 10:02:50

  T H E B IG PIC T U R E

  01

  reproduce most effectively in this particular environment?” Except it’s not

  02

  really “organisms” that are being searched, it’s genomes, or particular strings

  03

  of nucleotides in a strand of DNA. The human genome contains about 3

  04

  billion nucleotides. That’s a lot, compared to, for example, a bacterium,

  05

  which might have a few million. But let’s not be too proud; there are flower-

  06

  ing plants with over 100 billion nucleotide base pairs in their DNA. Some

  07

  organisms will survive and reproduce, while some will not. How, over the

  08

  course of generations, do we find the DNA sequences that lead to organ-

  09

  isms with the highest chance of survival?

  10

  This counts as a hard problem, from the perspective of computational re-

  11

  sources. Each of our 3 billion nucleotides is 1 of 4 possible choices: A, C, G,

  12

  or T. The total number of possible arrangements of human- sized DNA is not

  13

  four times 3 billion (which wouldn’t be so bad); it’s four to the power of 3 bil-14

  lion: 43,000,000,000, or roughly 1 followed by 2 billion zeros. That’s a stupen-

  15

  dously, hilariously large number. It’s also an overestimate; some sequences of

  16

  nucleotides have the same functional impact as others, and the vast majority

  17

  of sequences wouldn’t even lead to an organism. We could count genes rather

  18

  than nucleotides; that would cut down the number of dimensions consider-

  19

  ably, although each gene can take many more than four possible forms, so the

  20

  number is still enormous, and the interdependence of different gene func-

  21

  tions makes any such counting a little uncertain. By any possible measure the

  22

  problem of finding the “best” organism by searching through all of the pos-

  23

  sible genomes an organism could have is a daunting one.

  24

  Evolution provides a strategy for searching for high- fitness genomes in a

  25

  ridiculously big space of possibilities. Computer scientists have recently

  26

  shown that a simplified model of evolution (allowing for mixing via sexual

  27

  reproduction, but not for mutations) is mathematically equivalent to an

  28

  algorithm devised by game theorists years ago, known as multiplicative

  29

  weight updates. Good ideas tend to show up in a variety of places.

  30

  The phrase “search algorithm” isn’t meant to imply that anyone wrote an

  31

  algorithm, or anyone is specifying a goal that evolution is supposed to search

  32

  for. Evolution doesn’t have any goals in mind; evolution simply happens, with

  33

  Laplacian equanimity, each step following from the previous one. In the spirit

  34

  of poetic naturalism, “search algorithm” is simply a useful way of talking

  35S

  about the process of evolution. Under the appropriate circumstances, they are

  36N

  formally mathematically equivalent, and the connection provides some nice

  2 80

  Big Picture - UK final proofs.indd 280

  20/07/2016 10:02:50

  S E A RC h I n g t h R O ug h t h E l A n d S C A PE

  visual intuition. Don’t let the language trick you into believing there is any

  01

  agency guiding the course of evolution, or setting up goals ahead of time; at

  02

  the same time, don’t let the fear of sounding like you believe in agency prevent

  03

  you from using a language that gives significant insight into the process.

  04

  05

  •

  06

  One way of visualizing evolution’s search problem is in terms of a fitness

  07

  landscape. The idea is that we can assign, to any particular genome in a

  08

  specific environment, a numerical value called its “fitness.” This number

  09

  characterizes how likely it is that an organism based on that genome will

  10

  reproduce in that environment. We can visualize the fitness in terms of a

  11

  rolling landscape, with hills and valleys, where the role of “directions in

  12

  space” is played by different forms each gene can take, and the role of

  13

  “height above ground” is played by the fitness. (When we actually draw a

  14

  fitness landscape, we typically look at only one or two genes at a time, but

  15

  in the back of your mind you should remember that this is really a 25, 000-

  16

  dimensional space we’re thinking of, one for each gene.) A high- fitness hill

  17

  corresponds to a genome that produces an organism that is very likely to

  18

  reproduce (the more offspring, the better), while a low- fitness valley is a

  19

  genome that is unlikely to make it to subsequent generations.

  20

  21

  22

  Fitness

  23

  24

  25

  26

  27

  28

  29

  30

  31

  Gene 1

  32

  33

  34

  Gene 2

  S35

  N36

  2 81

  Big Picture - UK final proofs.indd 281

  20/07/2016 10:02:50

  T H E B IG PIC T U R E

  01

  We can think of evolution as nudging populations toward higher eleva-

  02

  tions in the fitness landscape, favoring genes that lead to more fit organisms.

  03

  That’s a simplification, of course. There isn’t a single fixed fitness landscape

  04

  appropriate to all species and all circumstances for all times; at best we

  05

  should think of a particular population in a fixed environment. The shape

  06

  of the landscape will depend on all of the properties of that environment.

  07

  Other species come and go, the physical surroundings change, so the land-

  08

  scape changes over time. But some aspects of the
environment can be stable

  09

  enough over a sufficiently long time period that a fixed landscape is a useful

  10

  metaphor for visualizing what goes on.

  11

  Biologists see the world differently from physicists. The concept of a

  12

  landscape also appears in physics, for example, when we are asking what

  13

  phase a system settles down to at a given temperature and pressure. But in

  14

  the back of their minds, physicists are always thinking about a ball rolling

  15

  on a hill. Consequently, the favored points on the landscape are the mini-

  16

  mum values of the function being plotted (typically the energy), since balls

  17

  roll downward. Biologists are thinking about wily mountain goats, or chil-

  18

  dren playing a game of King of the Castle. To them, the favored points on

  19

  the landscape are the maximum values of fitness.

  20

  Here’s how evolution searches through the fitness landscape, looking for

  21

  higher peaks: We have a population of organisms of a certain species, so

  22

  they occupy a set of nearby points on the landscape. Individuals are born,

  23

  hopefully reproduce, and die. Their offspring have slightly different ge-

  24

  nomes, so they are located somewhere else on the landscape— not too far

  25

  away, but not at exactly the same place either. The ones that end up lower

  26

  on a slope are less likely to reproduce than those that find themselves higher

  27

  up. As generations pass by, the population finds itself gradually moving

  28

  toward higher ground.

  29

  We draw two- dimensional plots, but in reality the number of genes can

  30

  be very large indeed, so it can take an extremely long time for a population

  31

  to climb up the landscape. Species may never get to the top of a single hill,

  32

  much less the highest mountain around, though individual traits may do

  33

  so. Some parts of the landscape are relatively flat; that’s where different ge-

  34

  nomes don’t have very different levels of fitness, and genetic drift can be the

  35S

  dominant feature in evolution. A more realistic portrayal would have a

  36N

  time- varying landscape, as both physical and biological features of the

  2 82

  Big Picture - UK final proofs.indd 282

  20/07/2016 10:02:50

  S E A RC h I n g t h R O ug h t h E l A n d S C A PE

  environment continually shift around. When that happens, it’s literally im-

  01

  possible to simply find the top of a hill and just sit there; one day’s maxi-

  02

  mum might be a valley the next day.

  03

  Finally, there’s no sense in which evolution’s algorithm is guaranteed to

  04

  find the best possible result. Most variations are small, and allow us to ex-

  05

  plore only nearby points in the landscape. Occasionally, there might be a

  06

  rare mutation that enables us to skip from one peak to another, but only

  07

  with peaks that are close together to begin with. Just as for the traveling-

  08

  salesman problem, finding a good- enough solution can be extremely useful

  09

  for all practical purposes.

  10

  11

  •

  12

  The search procedure employed by evolution is so efficient that real human

  13

  computer programmers often use an analogous process to develop their

  14

  own strategies. This is a technique known as genetic algorithms. As with

  15

  genomes, we can imagine the set of all possible algorithms of a certain

  16

  length, at least within a fixed computer language. There will be a large num-

  17

  ber of them, and in principle we want to know which one is the best at

  18

  solving some specified problem. The genetic- algorithm approach works like

  19

  natural selection, except that the role of the fitness landscape is put in by

  20

  the programmer. In biology this would be called directed evolution, to em-

  21

  phasize the difference with natural selection, where the fitness landscape is

  22

  fixed by nature without any particular agenda.

  23

  Start with some randomly chosen algorithms, and let them tackle the

  24

  problem. Take some fraction that do the best, and then “mutate” them,

  25

  possibly also allowing them to mix with other successful algorithms. Throw

  26

  away the unsuccessful strategies, and repeat the process. The population of

  27

  algorithms being studied will gradually climb up the relevant fitness land-

  28

  scape, defined as how well each strategy does at finding a good solution to

  29

  its problem. (It’s the virtual equivalent of what Bartel and Szostak did to

  30

  find RNA configurations that could act as catalysts.)

  31

  Genetic algorithms provide a nice illustration of some of the interesting

  32

  features of evolution as a strategy inventor. One such example was invented

  33

  by computer scientist Melanie Mitchell. She asks us to consider Robby, a

  34

  virtual robot who lives in a simple world, a ten-by-ten grid of squares.

  S35

  Robby threw a party last night, and there are empty cans scattered across

  N36

  2 83

  Big Picture - UK final proofs.indd 283

  20/07/2016 10:02:50

  T H E B IG PIC T U R E

  01

  the grid. Robby wants to clean them up in a hurry, being as efficient as pos-

  02

  sible with only a finite amount of time available. Our task is to invent a

  03

  strategy— an unambiguous set of instructions about what to do at every

  04

  step— that will help Robby the Robot pick up all the cans on the grid.

  05

  You might think that Robby can just walk from one can to the next, and

  06

  the challenge is to find the shortest path. But Robby is burdened with two

  07

  significant handicaps, perhaps due to partying a little too hard the previous

  08

  night. First, he can’t see very far. Standing on any one square, Robby can see

  09

  whether there’s a can on his own square, and he can see whether there’s a

  10

  can on any of the squares immediately north, south, east, or west of him.

  11

  But that’s all; he can’t see whether there are any cans diagonally, or on the

  12

  squares farther away.

  13

  14

  15

  16

  17

  18

  19

  20

  21

  22

  23
/>   24

  25

  The world of Robby the Robot, on the left: a grid of squares, some empty and some littered 26

  with cans. Robby’s field of view is highlighted. On the right, a situation where Robby is 27

  on a square with a can, and with multiple cans nearby.

  28

  29

  Your next thought is therefore that Robby should walk in some kind of

  30

  pattern, systematically scanning the grid and picking up any cans he sees.

  31

  But he has a second handicap: Robby has absolutely no memory at all. He

  32

  doesn’t know where he’s been, what cans he’s picked up, or what he was do-

  33

  ing even one moment ago. His strategy can only refer to what he must do

  34

  next based on his situation right now; it can’t include anything like “move

  35S

  east, and next time move south,” since that would encompass two moves

  36N

  in a row.

  2 84

  Big Picture - UK final proofs.indd 284

  20/07/2016 10:02:50

  S E A RC h I n g t h R O ug h t h E l A n d S C A PE

  Given these limitations, it’s straightforward to enumerate every possible

  01

  strategy that Robby could follow. There are five squares he knows about: his

  02

  own, and the four neighbors corresponding to each cardinal direction.

  03

  Each square is in one of three conditions: it can be empty, have a can, or be

  04

  behind the wall (where he can’t go). Robby’s “state” is a list of what’s on each

  05

  of the five squares he knows about: a total of 35 = 243 states. There are seven

  06

  possible actions he can take: he can pick up a can (if one is there), he can

  07

  move in one of the four cardinal directions, he can move in a random direc-

  08

  tion, or he can stay put and do nothing.

  09

  A strategy for Robby is just a specification of one of the seven actions for

  10

  each of the 243 states. The number of possible strategies is thus 7243, or

  11

  about 10205. You’re not going to try out every strategy just to find the best

  12

  possible one.

  13

  You can try to be clever, and design a strategy you think will do a

  14

  good job. Mitchell did just that, choosing a baseline strategy for what

  15

  would count as “pretty good, even if not necessarily the best.” It was a

  16

  simple approach: if Robby is on a square with a can, pick it up. If not,

  17

  look for cans on the four nearby squares. If there is one can, move in that

  18

  direction. If there are no cans, move in a random direction. If there is more

  19

  than one can, move in a specified direction. Call this the “benchmark strat-

 

‹ Prev