Rationality- From AI to Zombies

Rationality- From AI to Zombies Page 55

by Eliezer Yudkowsky

  Yet when the blind idiot god created protein computers, its monomaniacal focus on inclusive genetic fitness was not faithfully transmitted. Its optimization criterion did not successfully quine. We, the handiwork of evolution, are as alien to evolution as our Maker is alien to us. One pure utility function splintered into a thousand shards of desire.

  Why? Above all, because evolution is stupid in an absolute sense. But also because the first protein computers weren’t anywhere near as general as the blind idiot god, and could only utilize short-term desires.

  In the final analysis, asking why evolution didn’t build humans to maximize inclusive genetic fitness is like asking why evolution didn’t hand humans a ribosome and tell them to design their own biochemistry. Because evolution can’t refactor code that fast, that’s why. But maybe in a billion years of continued natural selection that’s exactly what would happen, if intelligence were foolish enough to allow the idiot god continued reign.

  The Mote in God’s Eye by Niven and Pournelle depicts an intelligent species that stayed biological a little too long, slowly becoming truly enslaved by evolution, gradually turning into true fitness maximizers obsessed with outreproducing each other. But thankfully that’s not what happened. Not here on Earth. At least not yet.

  So humans love the taste of sugar and fat, and we love our sons and daughters. We seek social status, and sex. We sing and dance and play. We learn for the love of learning.

  A thousand delicious tastes, matched to ancient reinforcers that once correlated with reproductive fitness—now sought whether or not they enhance reproduction. Sex with birth control, chocolate, the music of long-dead Bach on a CD.

  And when we finally learn about evolution, we think to ourselves: “Obsess all day about inclusive genetic fitness? Where’s the fun in that?”

  The blind idiot god’s single monomaniacal goal splintered into a thousand shards of desire. And this is well, I think, though I’m a human who says so. Or else what would we do with the future? What would we do with the billion galaxies in the night sky? Fill them with maximally efficient replicators? Should our descendants deliberately obsess about maximizing their inclusive genetic fitness, regarding all else only as a means to that end?

  Being a thousand shards of desire isn’t always fun, but at least it’s not boring. Somewhere along the line, we evolved tastes for novelty, complexity, elegance, and challenge—tastes that judge the blind idiot god’s monomaniacal focus, and find it aesthetically unsatisfying.

  And yes, we got those very same tastes from the blind idiot’s godshatter.

  So what?


  Part M

  Fragile Purposes


  Belief in Intelligence

  I don’t know what moves Garry Kasparov would make in a chess game. What, then, is the empirical content of my belief that “Kasparov is a highly intelligent chess player”? What real-world experience does my belief tell me to anticipate? Is it a cleverly masked form of total ignorance?

  To sharpen the dilemma, suppose Kasparov plays against some mere chess grandmaster Mr. G, who’s not in the running for world champion. My own ability is far too low to distinguish between these levels of chess skill. When I try to guess Kasparov’s move, or Mr. G’s next move, all I can do is try to guess “the best chess move” using my own meager knowledge of chess. Then I would produce exactly the same prediction for Kasparov’s move or Mr. G’s move in any particular chess position. So what is the empirical content of my belief that “Kasparov is a better chess player than Mr. G”?

  The empirical content of my belief is the testable, falsifiable prediction that the final chess position will occupy the class of chess positions that are wins for Kasparov, rather than drawn games or wins for Mr. G. (Counting resignation as a legal move that leads to a chess position classified as a loss.) The degree to which I think Kasparov is a “better player” is reflected in the amount of probability mass I concentrate into the “Kasparov wins” class of outcomes, versus the “drawn game” and “Mr. G wins” class of outcomes. These classes are extremely vague in the sense that they refer to vast spaces of possible chess positions—but “Kasparov wins” is more specific than maximum entropy, because it can be definitely falsified by a vast set of chess positions.

  The outcome of Kasparov’s game is predictable because I know, and understand, Kasparov’s goals. Within the confines of the chess board, I know Kasparov’s motivations—I know his success criterion, his utility function, his target as an optimization process. I know where Kasparov is ultimately trying to steer the future and I anticipate he is powerful enough to get there, although I don’t anticipate much about how Kasparov is going to do it.

  Imagine that I’m visiting a distant city, and a local friend volunteers to drive me to the airport. I don’t know the neighborhood. Each time my friend approaches a street intersection, I don’t know whether my friend will turn left, turn right, or continue straight ahead. I can’t predict my friend’s move even as we approach each individual intersection—let alone predict the whole sequence of moves in advance.

  Yet I can predict the result of my friend’s unpredictable actions: we will arrive at the airport. Even if my friend’s house were located elsewhere in the city, so that my friend made a completely different sequence of turns, I would just as confidently predict our arrival at the airport. I can predict this long in advance, before I even get into the car. My flight departs soon, and there’s no time to waste; I wouldn’t get into the car in the first place, if I couldn’t confidently predict that the car would travel to the airport along an unpredictable pathway.

  Isn’t this a remarkable situation to be in, from a scientific perspective? I can predict the outcome of a process, without being able to predict any of the intermediate steps of the process.

  How is this even possible? Ordinarily one predicts by imagining the present and then running the visualization forward in time. If you want a precise model of the Solar System, one that takes into account planetary perturbations, you must start with a model of all major objects and run that model forward in time, step by step.

  Sometimes simpler problems have a closed-form solution, where calculating the future at time T takes the same amount of work regardless of T. A coin rests on a table, and after each minute, the coin turns over. The coin starts out showing heads. What face will it show a hundred minutes later? Obviously you did not answer this question by visualizing a hundred intervening steps. You used a closed-form solution that worked to predict the outcome, and would also work to predict any of the intervening steps.

  But when my friend drives me to the airport, I can predict the outcome successfully using a strange model that won’t work to predict any of the intermediate steps. My model doesn’t even require me to input the initial conditions—I don’t need to know where we start out in the city!

  I do need to know something about my friend. I must know that my friend wants me to make my flight. I must credit that my friend is a good enough planner to successfully drive me to the airport (if he wants to). These are properties of my friend’s initial state—properties which let me predict the final destination, though not any intermediate turns.

  I must also credit that my friend knows enough about the city to drive successfully. This may be regarded as a relation between my friend and the city; hence, a property of both. But an extremely abstract property, which does not require any specific knowledge about either the city, or about my friend’s knowledge about the city.

  This is one way of viewing the subject matter to which I’ve devoted my life—these remarkable situations which place us in such odd epistemic positions. And my work, in a sense, can be viewed as unraveling the exact form of that strange abstract knowledge we can possess; whereby, not knowing the actions, we can justifiably know the consequence.

  “Intelligence” is too narrow a term to describe these remarkable situations in full generality. I would say rather “optimization process.” A similar
situation accompanies the study of biological natural selection, for example; we can’t predict the exact form of the next organism observed.

  But my own specialty is the kind of optimization process called “intelligence”; and even narrower, a particular kind of intelligence called “Friendly Artificial Intelligence”—of which, I hope, I will be able to obtain especially precise abstract knowledge.



  Humans in Funny Suits

  Many times the human species has travelled into space, only to find the stars inhabited by aliens who look remarkably like humans in funny suits—or even humans with a touch of makeup and latex—or just beige Caucasians in fee simple.

  Star Trek: The Original Series, “Arena,” © CBS Corporation

  It’s remarkable how the human form is the natural baseline of the universe, from which all other alien species are derived via a few modifications.

  What could possibly explain this fascinating phenomenon? Convergent evolution, of course! Even though these alien life-forms evolved on a thousand alien planets, completely independently from Earthly life, they all turned out the same.

  Don’t be fooled by the fact that a kangaroo (a mammal) resembles us rather less than does a chimp (a primate), nor by the fact that a frog (amphibians, like us, are tetrapods) resembles us less than the kangaroo. Don’t be fooled by the bewildering variety of the insects, who split off from us even longer ago than the frogs; don’t be fooled that insects have six legs, and their skeletons on the outside, and a different system of optics, and rather different sexual practices.

  You might think that a truly alien species would be more different from us than we are from insects. As I said, don’t be fooled. For an alien species to evolve intelligence, it must have two legs with one knee each attached to an upright torso, and must walk in a way similar to us. You see, any intelligence needs hands, so you’ve got to repurpose a pair of legs for that—and if you don’t start with a four-legged being, it can’t develop a running gait and walk upright, freeing the hands.

  . . . Or perhaps we should consider, as an alternative theory, that it’s the easy way out to use humans in funny suits.

  But the real problem is not shape; it is mind. “Humans in funny suits” is a well-known term in literary science-fiction fandom, and it does not refer to something with four limbs that walks upright. An angular creature of pure crystal is a “human in a funny suit” if she thinks remarkably like a human—especially a human of an English-speaking culture of the late-twentieth/early-twenty-first century.

  I don’t watch a lot of ancient movies. When I was watching the movie Psycho (1960) a few years back, I was taken aback by the cultural gap between the Americans on the screen and my America. The buttoned-shirted characters of Psycho are considerably more alien than the vast majority of so-called “aliens” I encounter on TV or the silver screen.

  To write a culture that isn’t just like your own culture, you have to be able to see your own culture as a special case—not as a norm which all other cultures must take as their point of departure. Studying history may help—but then it is only little black letters on little white pages, not a living experience. I suspect that it would help more to live for a year in China or Dubai or among the !Kung . . . this I have never done, being busy. Occasionally I wonder what things I might not be seeing (not there, but here).

  Seeing your humanity as a special case is very much harder than this.

  In every known culture, humans seem to experience joy, sadness, fear, disgust, anger, and surprise. In every known culture, these emotions are indicated by the same facial expressions. Next time you see an “alien”—or an “AI,” for that matter—I bet that when it gets angry (and it will get angry), it will show the human-universal facial expression for anger.

  We humans are very much alike under our skulls—that goes with being a sexually reproducing species; you can’t have everyone using different complex adaptations, they wouldn’t assemble. (Do the aliens reproduce sexually, like humans and many insects? Do they share small bits of genetic material, like bacteria? Do they form colonies, like fungi? Does the rule of psychological unity apply among them?)

  The only intelligences your ancestors had to manipulate—complexly so, and not just tame or catch in nets—the only minds your ancestors had to model in detail—were minds that worked more or less like their own. And so we evolved to predict Other Minds by putting ourselves in their shoes, asking what we would do in their situations; for that which was to be predicted, was similar to the predictor.

  “What?” you say. “I don’t assume other people are just like me! Maybe I’m sad, and they happen to be angry! They believe other things than I do; their personalities are different from mine!” Look at it this way: a human brain is an extremely complicated physical system. You are not modeling it neuron-by-neuron or atom-by-atom. If you came across a physical system as complex as the human brain which was not like you, it would take scientific lifetimes to unravel it. You do not understand how human brains work in an abstract, general sense; you can’t build one, and you can’t even build a computer model that predicts other brains as well as you predict them.

  The only reason you can try at all to grasp anything as physically complex and poorly understood as the brain of another human being is that you configure your own brain to imitate it. You empathize (though perhaps not sympathize). You impose on your own brain the shadow of the other mind’s anger and the shadow of its beliefs. You may never think the words, “What would I do in this situation?,” but that little shadow of the other mind that you hold within yourself is something animated within your own brain, invoking the same complex machinery that exists in the other person, synchronizing gears you don’t understand. You may not be angry yourself, but you know that if you were angry at you, and you believed that you were godless scum, you would try to hurt you . . .

  This “empathic inference” (as I shall call it) works for humans, more or less.

  But minds with different emotions—minds that feel emotions you’ve never felt yourself, or that fail to feel emotions you would feel? That’s something you can’t grasp by putting your brain into the other brain’s shoes. I can tell you to imagine an alien that grew up in a universe with four spatial dimensions, instead of three spatial dimensions, but you won’t be able to reconfigure your visual cortex to see like that alien would see. I can try to write a story about aliens with different emotions, but you won’t be able to feel those emotions, and neither will I.

  Imagine an alien watching a video of the Marx Brothers and having absolutely no idea what was going on, or why you would actively seek out such a sensory experience, because the alien has never conceived of anything remotely like a sense of humor. Don’t pity them for missing out; you’ve never antled.

  You might ask: Maybe the aliens do have a sense of humor, but you’re not telling funny enough jokes? This is roughly the equivalent of trying to speak English very loudly, and very slowly, in a foreign country, on the theory that those foreigners must have an inner ghost that can hear the meaning dripping from your words, inherent in your words, if only you can speak them loud enough to overcome whatever strange barrier stands in the way of your perfectly sensible English.

  It is important to appreciate that laughter can be a beautiful and valuable thing, even if it is not universalizable, even if it is not possessed by all possible minds. It would be our own special part of the gift we give to tomorrow. That can count for something too.

  It had better, because universalizability is one metaethical notion that I can’t salvage for you. Universalizability among humans, maybe; but not among all possible minds.

  And what about minds that don’t run on emotional architectures like your own—that don’t have things analogous to emotions? No, don’t bother explaining why any intelligent mind powerful enough to build complex machines must inevitably have states analogous to emotions. Natural selection builds complex machines without itself having emoti
ons. Now there’s a Real Alien for you—an optimization process that really Does Not Work Like You Do.

  Much of the progress in biology since the 1960s has consisted of trying to enforce a moratorium on anthropomorphizing evolution. That was a major academic slap-fight, and I’m not sure that sanity would have won the day if not for the availability of crushing experimental evidence backed up by clear math. Getting people to stop putting themselves in alien shoes is a long, hard, uphill slog. I’ve been fighting that battle on AI for years.

  Our anthropomorphism runs very deep in us; it cannot be excised by a simple act of will, a determination to say, “Now I shall stop thinking like a human!” Humanity is the air we breathe; it is our generic, the white paper on which we begin our sketches. And we do not think of ourselves as being human when we are being human.

  It is proverbial in literary science fiction that the true test of an author is their ability to write Real Aliens. (And not just conveniently incomprehensible aliens who, for their own mysterious reasons, do whatever the plot happens to require.) Jack Vance was one of the great masters of this art. Vance’s humans, if they come from a different culture, are more alien than most “aliens.” (Never read any Vance? I would recommend starting with City of the Chasch.) Niven and Pournelle’s The Mote in God’s Eye also gets a standard mention here.

  And conversely—well, I once read a science fiction author (I think Orson Scott Card) say that the all-time low point of television science fiction was the Star Trek episode where parallel evolution has proceeded to the extent of producing aliens who not only look just like humans, who not only speak English, but have also independently rewritten, word for word, the preamble to the US Constitution.


