Rationality- From AI to Zombies

Home > Science > Rationality- From AI to Zombies > Page 57
Rationality- From AI to Zombies Page 57

by Eliezer Yudkowsky


  This doesn’t mean you program in every decision explicitly. Deep Blue was a chess player far superior to its programmers. Deep Blue made better chess moves than anything its makers could have explicitly programmed—but not because the programmers shrugged and left it up to the ghost. Deep Blue moved better than its programmers . . . at the end of a chain of cause and effect that began in the programmers’ code and proceeded lawfully from there. Nothing happened just because it was so obviously a good move that Deep Blue’s ghostly free will took over, without the code and its lawful consequences being involved.

  If you try to wash your hands of constraining the AI, you aren’t left with a free ghost like an emancipated slave. You are left with a heap of sand that no one has purified into silicon, shaped into a CPU and programmed to think.

  Go ahead, try telling a computer chip “Do whatever you want!” See what happens? Nothing. Because you haven’t constrained it to understand freedom.

  All it takes is one single step that is so obvious, so logical, so self-evident that your mind just skips right over it, and you’ve left the path of the AI programmer. It takes an effort like the one I illustrate in Grasping Slippery Things to prevent your mind from doing this.

  *

  147

  Artificial Addition

  Suppose that human beings had absolutely no idea how they performed arithmetic. Imagine that human beings had evolved, rather than having learned, the ability to count sheep and add sheep. People using this built-in ability have no idea how it worked, the way Aristotle had no idea how his visual cortex supported his ability to see things. Peano Arithmetic as we know it has not been invented. There are philosophers working to formalize numerical intuitions, but they employ notations such as

  Plus-Of(Seven,Six) = Thirteen

  to formalize the intuitively obvious fact that when you add “seven” plus “six,” of course you get “thirteen.”

  In this world, pocket calculators work by storing a giant lookup table of arithmetical facts, entered manually by a team of expert Artificial Arithmeticians, for starting values that range between zero and one hundred. While these calculators may be helpful in a pragmatic sense, many philosophers argue that they’re only simulating addition, rather than really adding. No machine can really count—that’s why humans have to count thirteen sheep before typing “thirteen” into the calculator. Calculators can recite back stored facts, but they can never know what the statements mean—if you type in “two hundred plus two hundred” the calculator says “Error: Outrange,” when it’s intuitively obvious, if you know what the words mean, that the answer is “four hundred.”

  Some philosophers, of course, are not so naive as to be taken in by these intuitions. Numbers are really a purely formal system—the label “thirty-seven” is meaningful, not because of any inherent property of the words themselves, but because the label refers to thirty-seven sheep in the external world. A number is given this referential property by its semantic network of relations to other numbers. That’s why, in computer programs, the LISP token for “thirty-seven” doesn’t need any internal structure—it’s only meaningful because of reference and relation, not some computational property of “thirty-seven” itself.

  No one has ever developed an Artificial General Arithmetician, though of course there are plenty of domain-specific, narrow Artificial Arithmeticians that work on numbers between “twenty” and “thirty,” and so on. And if you look at how slow progress has been on numbers in the range of “two hundred,” then it becomes clear that we’re not going to get Artificial General Arithmetic any time soon. The best experts in the field estimate it will be at least a hundred years before calculators can add as well as a human twelve-year-old.

  But not everyone agrees with this estimate, or with merely conventional beliefs about Artificial Arithmetic. It’s common to hear statements such as the following:

  “It’s a framing problem—what ‘twenty-one plus’ equals depends on whether it’s ‘plus three’ or ‘plus four.’ If we can just get enough arithmetical facts stored to cover the common-sense truths that everyone knows, we’ll start to see real addition in the network.”

  “But you’ll never be able to program in that many arithmetical facts by hiring experts to enter them manually. What we need is an Artificial Arithmetician that can learn the vast network of relations between numbers that humans acquire during their childhood by observing sets of apples.”

  “No, what we really need is an Artificial Arithmetician that can understand natural language, so that instead of having to be explicitly told that twenty-one plus sixteen equals thirty-seven, it can get the knowledge by exploring the Web.”

  “Frankly, it seems to me that you’re just trying to convince yourselves that you can solve the problem. None of you really know what arithmetic is, so you’re floundering around with these generic sorts of arguments. ‘We need an AA that can learn X,’ ‘We need an AA that can extract X from the Internet.’ I mean, it sounds good, it sounds like you’re making progress, and it’s even good for public relations, because everyone thinks they understand the proposed solution—but it doesn’t really get you any closer to general addition, as opposed to domain-specific addition. Probably we will never know the fundamental nature of arithmetic. The problem is just too hard for humans to solve.”

  “That’s why we need to develop a general arithmetician the same way Nature did—evolution.”

  “Top-down approaches have clearly failed to produce arithmetic. We need a bottom-up approach, some way to make arithmetic emerge. We have to acknowledge the basic unpredictability of complex systems.”

  “You’re all wrong. Past efforts to create machine arithmetic were futile from the start, because they just didn’t have enough computing power. If you look at how many trillions of synapses there are in the human brain, it’s clear that calculators don’t have lookup tables anywhere near that large. We need calculators as powerful as a human brain. According to Moore’s Law, this will occur in the year 2031 on April 27 between 4:00 and 4:30 in the morning.”

  “I believe that machine arithmetic will be developed when researchers scan each neuron of a complete human brain into a computer, so that we can simulate the biological circuitry that performs addition in humans.”

  “I don’t think we have to wait to scan a whole brain. Neural networks are just like the human brain, and you can train them to do things without knowing how they do them. We’ll create programs that will do arithmetic without we, our creators, ever understanding how they do arithmetic.”

  “But Gödel’s Theorem shows that no formal system can ever capture the basic properties of arithmetic. Classical physics is formalizable, so to add two and two, the brain must take advantage of quantum physics.”

  “Hey, if human arithmetic were simple enough that we could reproduce it in a computer, we wouldn’t be able to count high enough to build computers.”

  “Haven’t you heard of John Searle’s Chinese Calculator Experiment? Even if you did have a huge set of rules that would let you add ‘twenty-one’ and ‘sixteen,’ just imagine translating all the words into Chinese, and you can see that there’s no genuine addition going on. There are no real numbers anywhere in the system, just labels that humans use for numbers . . .”

  There is more than one moral to this parable, and I have told it with different morals in different contexts. It illustrates the idea of levels of organization, for example—a CPU can add two large numbers because the numbers aren’t black-box opaque objects, they’re ordered structures of 32 bits.

  But for purposes of overcoming bias, let us draw two morals:

  First, the danger of believing assertions you can’t regenerate from your own knowledge.

  Second, the danger of trying to dance around basic confusions.

  Lest anyone accuse me of generalizing from fictional evidence, both lessons may be drawn from the real history of Artificial Intelligence as well.

  The first danger is the objec
t-level problem that the AA devices ran into: they functioned as tape recorders playing back “knowledge” generated from outside the system, using a process they couldn’t capture internally. A human could tell the AA device that “twenty-one plus sixteen equals thirty-seven,” and the AA devices could record this sentence and play it back, or even pattern-match “twenty-one plus sixteen” to output “thirty-seven!”—but the AA devices couldn’t generate such knowledge for themselves.

  Which is strongly reminiscent of believing a physicist who tells you “Light is waves,” recording the fascinating words and playing them back when someone asks “What is light made of?,” without being able to generate the knowledge for yourself.

  The second moral is the meta-level danger that consumed the Artificial Arithmetic researchers and opinionated bystanders—the danger of dancing around confusing gaps in your knowledge. The tendency to do just about anything except grit your teeth and buckle down and fill in the damn gap.

  Whether you say, “It is emergent!,” or whether you say, “It is unknowable!,” in neither case are you acknowledging that there is a basic insight required which is possessable, but unpossessed by you.

  How can you know when you’ll have a new basic insight? And there’s no way to get one except by banging your head against the problem, learning everything you can about it, studying it from as many angles as possible, perhaps for years. It’s not a pursuit that academia is set up to permit, when you need to publish at least one paper per month. It’s certainly not something that venture capitalists will fund. You want to either go ahead and build the system now, or give up and do something else instead.

  Look at the comments above: none are aimed at setting out on a quest for the missing insight which would make numbers no longer mysterious, make “twenty-seven” more than a black box. None of the commenters realized that their difficulties arose from ignorance or confusion in their own minds, rather than an inherent property of arithmetic. They were not trying to achieve a state where the confusing thing ceased to be confusing.

  If you read Judea Pearl’s Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference,1 then you will see that the basic insight behind graphical models is indispensable to problems that require it. (It’s not something that fits on a T-shirt, I’m afraid, so you’ll have to go and read the book yourself. I haven’t seen any online popularizations of Bayesian networks that adequately convey the reasons behind the principles, or the importance of the math being exactly the way it is, but Pearl’s book is wonderful.) There were once dozens of “non-monotonic logics” awkwardly trying to capture intuitions such as “If my burglar alarm goes off, there was probably a burglar, but if I then learn that there was a small earthquake near my home, there was probably not a burglar.” With the graphical-model insight in hand, you can give a mathematical explanation of exactly why first-order logic has the wrong properties for the job, and express the correct solution in a compact way that captures all the common-sense details in one elegant swoop. Until you have that insight, you’ll go on patching the logic here, patching it there, adding more and more hacks to force it into correspondence with everything that seems “obviously true.”

  You won’t know the Artificial Arithmetic problem is unsolvable without its key. If you don’t know the rules, you don’t know the rule that says you need to know the rules to do anything. And so there will be all sorts of clever ideas that seem like they might work, like building an Artificial Arithmetician that can read natural language and download millions of arithmetical assertions from the Internet.

  And yet somehow the clever ideas never work. Somehow it always turns out that you “couldn’t see any reason it wouldn’t work” because you were ignorant of the obstacles, not because no obstacles existed. Like shooting blindfolded at a distant target—you can fire blind shot after blind shot, crying, “You can’t prove to me that I won’t hit the center!” But until you take off the blindfold, you’re not even in the aiming game. When “no one can prove to you” that your precious idea isn’t right, it means you don’t have enough information to strike a small target in a vast answer space. Until you know your idea will work, it won’t.

  From the history of previous key insights in Artificial Intelligence, and the grand messes that were proposed prior to those insights, I derive an important real-life lesson: When the basic problem is your ignorance, clever strategies for bypassing your ignorance lead to shooting yourself in the foot.

  *

  1. Pearl, Probabilistic Reasoning in Intelligent Systems.

  148

  Terminal Values and Instrumental Values

  On a purely instinctive level, any human planner behaves as if they distinguish between means and ends. Want chocolate? There’s chocolate at the Publix supermarket. You can get to the supermarket if you drive one mile south on Washington Ave. You can drive if you get into the car. You can get into the car if you open the door. You can open the door if you have your car keys. So you put your car keys into your pocket, and get ready to leave the house . . .

  . . . when suddenly the word comes on the radio that an earthquake has destroyed all the chocolate at the local Publix. Well, there’s no point in driving to the Publix if there’s no chocolate there, and no point in getting into the car if you’re not driving anywhere, and no point in having car keys in your pocket if you’re not driving. So you take the car keys out of your pocket, and call the local pizza service and have them deliver a chocolate pizza. Mm, delicious.

  I rarely notice people losing track of plans they devised themselves. People usually don’t drive to the supermarket if they know the chocolate is gone. But I’ve also noticed that when people begin explicitly talking about goal systems instead of just wanting things, mentioning “goals” instead of using them, they oft become confused. Humans are experts at planning, not experts on planning, or there’d be a lot more AI developers in the world.

  In particular, I’ve noticed people get confused when—in abstract philosophical discussions rather than everyday life—they consider the distinction between means and ends; more formally, between “instrumental values” and “terminal values.”

  Part of the problem, it seems to me, is that the human mind uses a rather ad-hoc system to keep track of its goals—it works, but not cleanly. English doesn’t embody a sharp distinction between means and ends: “I want to save my sister’s life” and “I want to administer penicillin to my sister” use the same word “want.”

  Can we describe, in mere English, the distinction that is getting lost?

  As a first stab:

  “Instrumental values” are desirable strictly conditional on their anticipated consequences. “I want to administer penicillin to my sister,” not because a penicillin-filled sister is an intrinsic good, but in anticipation of penicillin curing her flesh-eating pneumonia. If instead you anticipated that injecting penicillin would melt your sister into a puddle like the Wicked Witch of the West, you’d fight just as hard to keep her penicillin-free.

  “Terminal values” are desirable without conditioning on other consequences: “I want to save my sister’s life” has nothing to do with your anticipating whether she’ll get injected with penicillin after that.

  This first attempt suffers from obvious flaws. If saving my sister’s life would cause the Earth to be swallowed up by a black hole, then I would go off and cry for a while, but I wouldn’t administer penicillin. Does this mean that saving my sister’s life was not a “terminal” or “intrinsic” value, because it’s theoretically conditional on its consequences? Am I only trying to save her life because of my belief that a black hole won’t consume the Earth afterward? Common sense should say that’s not what’s happening.

  So forget English. We can set up a mathematical description of a decision system in which terminal values and instrumental values are separate and incompatible types—like integers and floating-point numbers, in a programming language with no automatic conversion between them.
<
br />   An ideal Bayesian decision system can be set up using only four elements:

  Outcomes : type Outcome[] list of possible outcomes

  {sister lives, sister dies}

  Actions : type Action[] list of possible actions

  {administer penicillin, don’t administer penicillin}

  Utility_function : type Outcome -> Utility utility function that maps each outcome onto a utility

  (a utility being representable as a real number between negative and positive infinity)

  {sister lives → 1, sister dies → 0}

  Conditional_probability_function :

  type Action -> (Outcome -> Probability) conditional probability function that maps each action onto a probability distribution over outcomes

  (a probability being representable as a real number between 0 and 1)

  {administer penicillin → (sister lives → 0.9, sister dies → 0.1), don’t administer penicillin → (sister lives → 0.3, sister dies → 0.7)}

  If you can’t read the type system directly, don’t worry, I’ll always translate into English. For programmers, seeing it described in distinct statements helps to set up distinct mental objects.

  And the decision system itself?

  Expected_Utility : Action A ->

  (Sum O in Outcomes: Utility(O) * Probability(O|A)) The “expected utility” of an action equals the sum, over all outcomes, of the utility of that outcome times the conditional probability of that outcome given that action.

 

‹ Prev