Rationality- From AI to Zombies

Home > Science > Rationality- From AI to Zombies > Page 140
Rationality- From AI to Zombies Page 140

by Eliezer Yudkowsky


  The few philosophers present did not extract him from his difficulties. It’s not as if a philosopher will say, “Sorry, morality is understood, it is a settled issue in cognitive science and philosophy, and your viewpoint is simply wrong.” The nature of morality is still an open question in philosophy; the debate is still going on. A philosopher will feel obligated to present you with a list of classic arguments on all sides—most of which Eliezer1996 is quite intelligent enough to knock down, and so he concludes that philosophy is a wasteland.

  But wait. It gets worse.

  I don’t recall exactly when—it might have been 1997—but the younger me, let’s call him Eliezer1997, set out to argue inescapably that creating superintelligence is the right thing to do.

  *

  296

  The Sheer Folly of Callow Youth

  There speaks the sheer folly of callow youth; the rashness of an ignorance so abysmal as to be possible only to one of your ephemeral race . . .

  —Gharlane of Eddore1

  Once upon a time, years ago, I propounded a mysterious answer to a mysterious question—as I’ve hinted on several occasions. The mysterious question to which I propounded a mysterious answer was not, however, consciousness—or rather, not only consciousness. No, the more embarrassing error was that I took a mysterious view of morality.

  I held off on discussing that until now, after the series on metaethics, because I wanted it to be clear that Eliezer1997 had gotten it wrong.

  When we last left off, Eliezer1997, not satisfied with arguing in an intuitive sense that superintelligence would be moral, was setting out to argue inescapably that creating superintelligence was the right thing to do.

  Well (said Eliezer1997) let’s begin by asking the question: Does life have, in fact, any meaning?

  “I don’t know,” replied Eliezer1997 at once, with a certain note of self-congratulation for admitting his own ignorance on this topic where so many others seemed certain.

  “But,” he went on—

  (Always be wary when an admission of ignorance is followed by “But.”)

  “But, if we suppose that life has no meaning—that the utility of all outcomes is equal to zero—that possibility cancels out of any expected utility calculation. We can therefore always act as if life is known to be meaningful, even though we don’t know what that meaning is. How can we find out that meaning? Considering that humans are still arguing about this, it’s probably too difficult a problem for humans to solve. So we need a superintelligence to solve the problem for us. As for the possibility that there is no logical justification for one preference over another, then in this case it is no righter or wronger to build a superintelligence, than to do anything else. This is a real possibility, but it falls out of any attempt to calculate expected utility—we should just ignore it. To the extent someone says that a superintelligence would wipe out humanity, they are either arguing that wiping out humanity is in fact the right thing to do (even though we see no reason why this should be the case) or they are arguing that there is no right thing to do (in which case their argument that we should not build intelligence defeats itself).”

  Ergh. That was a really difficult paragraph to write. My past self is always my own most concentrated Kryptonite, because my past self is exactly precisely all those things that the modern me has installed allergies to block. Truly is it said that parents do all the things they tell their children not to do, which is how they know not to do them; it applies between past and future selves as well.

  How flawed is Eliezer1997’s argument? I couldn’t even count the ways. I know memory is fallible, reconstructed each time we recall, and so I don’t trust my assembly of these old pieces using my modern mind. Don’t ask me to read my old writings; that’s too much pain.

  But it seems clear that I was thinking of utility as a sort of stuff, an inherent property. So that “life is meaningless” corresponded to utility = 0. But of course the argument works equally well with utility = 100, so that if everything is meaningful but it is all equally meaningful, that should fall out too . . . Certainly I wasn’t then thinking of a utility function as an affine structure in preferences. I was thinking of “utility” as an absolute level of inherent value.

  I was thinking of should as a kind of purely abstract essence of compellingness, that-which-makes-you-do-something; so that clearly any mind that derived a should would be bound by it. Hence the assumption, which Eliezer1997 did not even think to explicitly note, that a logic that compels an arbitrary mind to do something is exactly the same as that which human beings mean and refer to when they utter the word “right” . . .

  But now I’m trying to count the ways, and if you’ve been following along, you should be able to handle that yourself.

  An important aspect of this whole failure was that, because I’d proved that the case “life is meaningless” wasn’t worth considering, I didn’t think it was necessary to rigorously define “intelligence” or “meaning.” I’d previously come up with a clever reason for not trying to go all formal and rigorous when trying to define “intelligence” (or “morality”)—namely all the bait-and-switches that past AI folk, philosophers, and moralists had pulled with definitions that missed the point.

  I draw the following lesson: No matter how clever the justification for relaxing your standards, or evading some requirement of rigor, it will blow your foot off just the same.

  And another lesson: I was skilled in refutation. If I’d applied the same level of rejection-based-on-any-flaw to my own position as I used to defeat arguments brought against me, then I would have zeroed in on the logical gap and rejected the position—if I’d wanted to. If I’d had the same level of prejudice against it as I’d had against other positions in the debate.

  But this was before I’d heard of Kahneman, before I’d heard the term “motivated skepticism,” before I’d integrated the concept of an exactly correct state of uncertainty that summarizes all the evidence, and before I knew the deadliness of asking “Am I allowed to believe?” for liked positions and “Am I forced to believe?” for disliked positions. I was a mere Traditional Rationalist who thought of the scientific process as a referee between people who took up positions and argued them, may the best side win.

  My ultimate flaw was not a liking for “intelligence,” nor any amount of technophilia and science fiction exalting the siblinghood of sentience. It surely wasn’t my ability to spot flaws. None of these things could have led me astray, if I had held myself to a higher standard of rigor throughout, and adopted no position otherwise. Or even if I’d just scrutinized my preferred vague position, with the same demand-of-rigor I applied to counterarguments.

  But I wasn’t much interested in trying to refute my belief that life had meaning, since my reasoning would always be dominated by cases where life did have meaning.

  And with the intelligence explosion at stake, I thought I just had to proceed at all speed using the best concepts I could wield at the time, not pause and shut down everything while I looked for a perfect definition that so many others had screwed up . . .

  No.

  No, you don’t use the best concepts you can use at the time.

  It’s Nature that judges you, and Nature does not accept even the most righteous excuses. If you don’t meet the standard, you fail. It’s that simple. There is no clever argument for why you have to make do with what you have, because Nature won’t listen to that argument, won’t forgive you because there were so many excellent justifications for speed.

  We all know what happened to Donald Rumsfeld, when he went to war with the army he had, instead of the army he needed.

  Maybe Eliezer1997 couldn’t have conjured the correct model out of thin air. (Though who knows what would have happened, if he’d really tried . . .) And it wouldn’t have been prudent for him to stop thinking entirely, until rigor suddenly popped out of nowhere.

  But neither was it correct for Eliezer1997 to put his weight down on his “best guess,” i
n the absence of precision. You can use vague concepts in your own interim thought processes, as you search for a better answer, unsatisfied with your current vague hints, and unwilling to put your weight down on them. You don’t build a superintelligence based on an interim understanding. No, not even the “best” vague understanding you have. That was my mistake—thinking that saying “best guess” excused anything. There was only the standard I had failed to meet.

  Of course Eliezer1997 didn’t want to slow down on the way to the intelligence explosion, with so many lives at stake, and the very survival of Earth-originating intelligent life, if we got to the era of nanoweapons before the era of superintelligence—

  Nature doesn’t care about such righteous reasons. There’s just the astronomically high standard needed for success. Either you match it, or you fail. That’s all.

  The apocalypse does not need to be fair to you.

  The apocalypse does not need to offer you a chance of success

  In exchange for what you’ve already brought to the table.

  The apocalypse’s difficulty is not matched to your skills.

  The apocalypse’s price is not matched to your resources.

  If the apocalypse asks you for something unreasonable

  And you try to bargain it down a little

  (Because everyone has to compromise now and then)

  The apocalypse will not try to negotiate back up.

  And, oh yes, it gets worse.

  How did Eliezer1997 deal with the obvious argument that you couldn’t possibly derive an “ought” from pure logic, because “ought” statements could only be derived from other “ought” statements?

  Well (observed Eliezer1997), this problem has the same structure as the argument that a cause only proceeds from another cause, or that a real thing can only come of another real thing, whereby you can prove that nothing exists.

  Thus (he said) there are three “hard problems”: the hard problem of conscious experience, in which we see that qualia cannot arise from computable processes; the hard problem of existence, in which we ask how any existence enters apparently from nothingness; and the hard problem of morality, which is to get to an “ought.”

  These problems are probably linked. For example, the qualia of pleasure are one of the best candidates for something intrinsically desirable. We might not be able to understand the hard problem of morality, therefore, without unraveling the hard problem of consciousness. It’s evident that these problems are too hard for humans—otherwise someone would have solved them over the last 2,500 years since philosophy was invented.

  It’s not as if they could have complicated solutions—they’re too simple for that. The problem must just be outside human concept-space. Since we can see that consciousness can’t arise on any computable process, it must involve new physics—physics that our brain uses, but can’t understand. That’s why we need superintelligence in order to solve this problem. Probably it has to do with quantum mechanics, maybe with a dose of tiny closed timelike curves from out of General Relativity; temporal paradoxes might have some of the same irreducibility properties that consciousness seems to demand . . .

  Et cetera, ad nauseam. You may begin to perceive, in the arc of my Overcoming Bias posts, the letter I wish I could have written to myself.

  Of this I learn the lesson: You cannot manipulate confusion. You cannot make clever plans to work around the holes in your understanding. You can’t even make “best guesses” about things which fundamentally confuse you, and relate them to other confusing things. Well, you can, but you won’t get it right, until your confusion dissolves. Confusion exists in the mind, not in the reality, and trying to treat it like something you can pick up and move around will only result in unintentional comedy.

  Similarly, you cannot come up with clever reasons why the gaps in your model don’t matter. You cannot draw a border around the mystery, put on neat handles that let you use the Mysterious Thing without really understanding it—like my attempt to make the possibility that life is meaningless cancel out of an expected utility formula. You can’t pick up the gap and manipulate it.

  If the blank spot on your map conceals a land mine, then putting your weight down on that spot will be fatal, no matter how good your excuse for not knowing. Any black box could contain a trap, and there’s no way to know except opening up the black box and looking inside. If you come up with some righteous justification for why you need to rush on ahead with the best understanding you have—the trap goes off.

  It’s only when you know the rules,

  That you realize why you needed to learn;

  What would have happened otherwise,

  How much you needed to know.

  Only knowledge can foretell the cost of ignorance. The ancient alchemists had no logical way of knowing the exact reasons why it was hard for them to turn lead into gold. So they poisoned themselves and died. Nature doesn’t care.

  But there did come a time when realization began to dawn on me.

  *

  1. Edward Elmer Smith, Second Stage Lensmen (Old Earth Books, 1998).

  297

  That Tiny Note of Discord

  When we last left Eliezer1997, he believed that any superintelligence would automatically do what was “right,” and indeed would understand that better than we could—even though, he modestly confessed, he did not understand the ultimate nature of morality. Or rather, after some debate had passed, Eliezer1997 had evolved an elaborate argument, which he fondly claimed to be “formal,” that we could always condition upon the belief that life has meaning; and so cases where superintelligences did not feel compelled to do anything in particular would fall out of consideration. (The flaw being the unconsidered and unjustified equation of “universally compelling argument” with “right.”)

  So far, the young Eliezer is well on the way toward joining the “smart people who are stupid because they’re skilled at defending beliefs they arrived at for unskilled reasons” club. All his dedication to “rationality” has not saved him from this mistake, and you might be tempted to conclude that it is useless to strive for rationality.

  But while many people dig holes for themselves, not everyone succeeds in clawing their way back out.

  And from this I learn my lesson: That it all began—

  —with a small, small question; a single discordant note; one tiny lonely thought . . .

  As our story starts, we advance three years to Eliezer2000, who in most respects resembles his self of 1997. He currently thinks he’s proven that building a superintelligence is the right thing to do if there is any right thing at all. From which it follows that there is no justifiable conflict of interest over the intelligence explosion among the peoples and persons of Earth.

  This is an important conclusion for Eliezer2000, because he finds the notion of fighting over the intelligence explosion to be unbearably stupid. (Sort of like the notion of God intervening in fights between tribes of bickering barbarians, only in reverse.) Eliezer2000’s self-concept does not permit him—he doesn’t even want—to shrug and say, “Well, my side got here first, so we’re going to seize the banana before anyone else gets it.” It’s a thought too painful to think.

  And yet then the notion occurs to him:

  Maybe some people would prefer an AI do particular things, such as not kill them, even if life is meaningless?

  His immediately following thought is the obvious one, given his premises:

  In the event that life is meaningless, nothing is the “right” thing to do; therefore it wouldn’t be particularly right to respect people’s preferences in this event.

  This is the obvious dodge. The thing is, though, Eliezer2000 doesn’t think of himself as a villain. He doesn’t go around saying, “What bullets shall I dodge today?” He thinks of himself as a dutiful rationalist who tenaciously follows lines of inquiry. Later, he’s going to look back and see a whole lot of inquiries that his mind somehow managed to not follow—but that’s not his cur
rent self-concept.

  So Eliezer2000 doesn’t just grab the obvious out. He keeps thinking.

  But if people believe they have preferences in the event that life is meaningless, then they have a motive to dispute my intelligence explosion project and go with a project that respects their wish in the event life is meaningless. This creates a present conflict of interest over the intelligence explosion, and prevents right things from getting done in the mainline event that life is meaningful.

  Now, there’s a lot of excuses Eliezer2000 could have potentially used to toss this problem out the window. I know, because I’ve heard plenty of excuses for dismissing Friendly AI. “The problem is too hard to solve” is one I get from AGI wannabes who imagine themselves smart enough to create true Artificial Intelligence, but not smart enough to solve a really difficult problem like Friendly AI. Or “worrying about this possibility would be a poor use of resources, what with the incredible urgency of creating AI before humanity wipes itself out—you’ve got to go with what you have,” this being uttered by people who just basically aren’t interested in the problem.

  But Eliezer2000 is a perfectionist. He’s not perfect, obviously, and he doesn’t attach as much importance as I do to the virtue of precision, but he is most certainly a perfectionist. The idea of metaethics that Eliezer2000 espouses, in which superintelligences know what’s right better than we do, previously seemed to wrap up all the problems of justice and morality in an airtight wrapper.

  The new objection seems to poke a minor hole in the airtight wrapper. This is worth patching. If you have something that’s perfect, are you really going to let one little possibility compromise it?

 

‹ Prev