Rationality- From AI to Zombies

Page 141

by Eliezer Yudkowsky

So Eliezer2000 doesn’t even want to drop the issue; he wants to patch the problem and restore perfection. How can he justify spending the time? By thinking thoughts like:

What about Brian Atkins? [Brian Atkins being the startup funder of the Machine Intelligence Research Institute, then called the Singularity Institute.] He would probably prefer not to die, even if life were meaningless. He’s paying for MIRI right now; I don’t want to taint the ethics of our cooperation.

Eliezer2000’s sentiment doesn’t translate very well—English doesn’t have a simple description for it, or any other culture I know. Maybe the passage in the Old Testament, “Thou shalt not boil a young goat in its mother’s milk.” Someone who helps you out of altruism shouldn’t regret helping you; you owe them, not so much fealty, but rather, that they’re actually doing what they think they’re doing by helping you.

Well, but how would Brian Atkins find out, if I don’t tell him? Eliezer2000 doesn’t even think this except in quotation marks, as the obvious thought that a villain would think in the same situation. And Eliezer2000 has a standard counter-thought ready too, a ward against temptations to dishonesty—an argument that justifies honesty in terms of expected utility, not just a personal love of personal virtue:

Human beings aren’t perfect deceivers; it’s likely that I’ll be found out. Or what if genuine lie detectors are invented before the Singularity, sometime over the next thirty years? I wouldn’t be able to pass a lie detector test.

Eliezer2000 lives by the rule that you should always be ready to have your thoughts broadcast to the whole world at any time, without embarrassment. Otherwise, clearly, you’ve fallen from grace: either you’re thinking something you shouldn’t be thinking, or you’re embarrassed by something that shouldn’t embarrass you.

(These days, I don’t espouse quite such an extreme viewpoint, mostly for reasons of Fun Theory. I see a role for continued social competition between intelligent life-forms, as least as far as my near-term vision stretches. I admit, these days, that it might be all right for human beings to have a self; as John McCarthy put it, “If everyone were to live for others all the time, life would be like a procession of ants following each other around in a circle.” If you’re going to have a self, you may as well have secrets, and maybe even conspiracies. But I do still try to abide by the principle of being able to pass a future lie detector test, with anyone else who’s also willing to go under the lie detector, if the topic is a professional one. Fun Theory needs a commonsense exception for global catastrophic risk management.)

Even taking honesty for granted, there are other excuses Eliezer2000 could use to flush the question down the toilet. “The world doesn’t have the time” or “It’s unsolvable” would still work. But Eliezer2000 doesn’t know that this problem, the “backup” morality problem, is going to be particularly difficult or time-consuming. He’s just now thought of the whole issue.

And so Eliezer2000 begins to really consider the question: Supposing that “life is meaningless” (that superintelligences don’t produce their own motivations from pure logic), then how would you go about specifying a fallback morality? Synthesizing it, inscribing it into the AI?

There’s a lot that Eliezer2000 doesn’t know, at this point. But he has been thinking about self-improving AI for three years, and he’s been a Traditional Rationalist for longer than that. There are techniques of rationality that he has practiced, methodological safeguards he’s already devised. He already knows better than to think that all an AI needs is the One Great Moral Principle. Eliezer2000 already knows that it is wiser to think technologically than politically. He already knows the saying that AI programmers are supposed to think in code, to use concepts that can be inscribed in a computer. Eliezer2000 already has a concept that there is something called “technical thinking” and it is good, though he hasn’t yet formulated a Bayesian view of it. And he’s long since noticed that suggestively named LISP tokens don’t really mean anything, et cetera. These injunctions prevent him from falling into some of the initial traps, the ones that I’ve seen consume other novices on their own first steps into the Friendly AI problem . . . though technically this was my second step; I well and truly failed on my first.

But in the end, what it comes down to is this: For the first time, Eliezer2000 is trying to think technically about inscribing a morality into an AI, without the escape-hatch of the mysterious essence of rightness.

That’s the only thing that matters, in the end. His previous philosophizing wasn’t enough to force his brain to confront the details. This new standard is strict enough to require actual work. Morality slowly starts being less mysterious to him—Eliezer2000 is starting to think inside the black box.

His reasons for pursuing this course of action—those don’t matter at all.

Oh, there’s a lesson in his being a perfectionist. There’s a lesson in the part about how Eliezer2000 initially thought this was a tiny flaw, and could have dismissed it out-of-mind if that had been his impulse.

But in the end, the chain of cause and effect goes like this: Eliezer2000 investigated in more detail, therefore he got better with practice. Actions screen off justifications. If your arguments happen to justify not working things out in detail, like Eliezer1996, then you won’t get good at thinking about the problem. If your arguments call for you to work things out in detail, then you have an opportunity to start accumulating expertise.

That was the only choice that mattered, in the end—not the reasons for doing anything.

I say all this, as you may well guess, because of the AI wannabes I sometimes run into who have their own clever reasons for not thinking about the Friendly AI problem. Our clever reasons for doing what we do tend to matter a lot less to Nature than they do to ourselves and our friends. If your actions don’t look good when they’re stripped of all their justifications and presented as mere brute facts . . . then maybe you should re-examine them.

A diligent effort won’t always save a person. There is such a thing as lack of ability. Even so, if you don’t try, or don’t try hard enough, you don’t get a chance to sit down at the high-stakes table—never mind the ability ante. That’s cause and effect for you.

Also, perfectionism really matters. The end of the world doesn’t always come with trumpets and thunder and the highest priority in your inbox. Sometimes the shattering truth first presents itself to you as a small, small question; a single discordant note; one tiny lonely thought, that you could dismiss with one easy effortless touch . . .

. . . and so, over succeeding years, understanding begins to dawn on that past Eliezer, slowly. That Sun rose slower than it could have risen.

*

298

Fighting a Rearguard Action Against the Truth

When we last left Eliezer2000, he was just beginning to investigate the question of how to inscribe a morality into an AI. His reasons for doing this don’t matter at all, except insofar as they happen to historically demonstrate the importance of perfectionism. If you practice something, you may get better at it; if you investigate something, you may find out about it; the only thing that matters is that Eliezer2000 is, in fact, focusing his full-time energies on thinking technically about AI morality—rather than, as previously, finding any justification for not spending his time this way. In the end, this is all that turns out to matter.

But as our story begins—as the sky lightens to gray and the tip of the Sun peeks over the horizon—Eliezer2001 hasn’t yet admitted that Eliezer1997 was mistaken in any important sense. He’s just making Eliezer1997’s strategy even better by including a contingency plan for “the unlikely event that life turns out to be meaningless” . . .

. . . which means that Eliezer2001 now has a line of retreat away from his mistake.

I don’t just mean that Eliezer2001 can say “Friendly AI is a contingency plan,” rather than screaming “OOPS!”

I mean that Eliezer2001 now actually has a contingency plan. If Eliezer2001 st
arts to doubt his 1997 metaethics, the intelligence explosion has a fallback strategy, namely Friendly AI. Eliezer2001 can question his metaethics without it signaling the end of the world.

And his gradient has been smoothed; he can admit a 10% chance of having previously been wrong, then a 20% chance. He doesn’t have to cough out his whole mistake in one huge lump.

If you think this sounds like Eliezer2001 is too slow, I quite agree.

Eliezer1996–2000’s strategies had been formed in the total absence of “Friendly AI” as a consideration. The whole idea was to get a superintelligence, any superintelligence, as fast as possible—codelet soup, ad-hoc heuristics, evolutionary programming, open-source, anything that looked like it might work—preferably all approaches simultaneously in a Manhattan Project. (“All parents did the things they tell their children not to do. That’s how they know to tell them not to do it.”1) It’s not as if adding one more approach could hurt.

His attitudes toward technological progress have been formed—or more accurately, preserved from childhood-absorbed technophilia—around the assumption that any/all movement toward superintelligence is a pure good without a hint of danger.

Looking back, what Eliezer2001 needed to do at this point was declare an HMC event—Halt, Melt, and Catch Fire. One of the foundational assumptions on which everything else has been built has been revealed as flawed. This calls for a mental brake to a full stop: take your weight off all beliefs built on the wrong assumption, do your best to rethink everything from scratch. This is an art I need to write more about—it’s akin to the convulsive effort required to seriously clean house, after an adult religionist notices for the first time that God doesn’t exist.

But what Eliezer2001 actually did was rehearse his previous technophilic arguments for why it’s difficult to ban or governmentally control new technologies—the standard arguments against “relinquishment.”

It does seem even to my modern self that all those awful consequences which technophiles argue to follow from various kinds of government regulation are more or less correct—it’s much easier to say what someone is doing wrong, than to say the way that is right. My modern viewpoint hasn’t shifted to think that technophiles are wrong about the downsides of technophobia; but I do tend to be a lot more sympathetic to what technophobes say about the downsides of technophilia. What previous Eliezers said about the difficulties of, e.g., the government doing anything sensible about Friendly AI, still seems pretty true. It’s just that a lot of his hopes for science, or private industry, etc., now seem equally wrongheaded.

Still, let’s not get into the details of the technovolatile viewpoint. Eliezer2001 has just tossed a major foundational assumption—that AI can’t be dangerous, unlike other technologies—out the window. You would intuitively suspect that this should have some kind of large effect on his strategy.

Well, Eliezer2001 did at least give up on his 1999 idea of an open-source AI Manhattan Project using self-modifying heuristic soup, but overall . . .

Overall, he’d previously wanted to charge in, guns blazing, immediately using his best idea at the time; and afterward he still wanted to charge in, guns blazing. He didn’t say, “I don’t know how to do this.” He didn’t say, “I need better knowledge.” He didn’t say, “This project is not yet ready to start coding.” It was still all, “The clock is ticking, gotta move now! MIRI will start coding as soon as it’s got enough money!”

Before, he’d wanted to focus as much scientific effort as possible with full information-sharing, and afterward he still thought in those terms. Scientific secrecy = bad guy, openness = good guy. (Eliezer2001 hadn’t read up on the Manhattan Project and wasn’t familiar with the similar argument that Leó Szilárd had with Enrico Fermi.)

That’s the problem with converting one big “Oops!” into a gradient of shifting probability. It means there isn’t a single watershed moment—a visible huge impact—to hint that equally huge changes might be in order.

Instead, there are all these little opinion shifts . . . that give you a chance to repair the arguments for your strategies; to shift the justification a little, but keep the “basic idea” in place. Small shocks that the system can absorb without cracking, because each time, it gets a chance to go back and repair itself. It’s just that in the domain of rationality, cracking = good, repair = bad. In the art of rationality it’s far more efficient to admit one huge mistake, than to admit lots of little mistakes.

There’s some kind of instinct humans have, I think, to preserve their former strategies and plans, so that they aren’t constantly thrashing around and wasting resources; and of course an instinct to preserve any position that we have publicly argued for, so that we don’t suffer the humiliation of being wrong. And though the younger Eliezer has striven for rationality for many years, he is not immune to these impulses; they waft gentle influences on his thoughts, and this, unfortunately, is more than enough damage.

Even in 2002, the earlier Eliezer isn’t yet sure that Eliezer1997’s plan couldn’t possibly have worked. It might have gone right. You never know, right?

But there came a time when it all fell crashing down.

*

1. John Moore, Slay and Rescue (Xlibris Corp, 2000).

299

My Naturalistic Awakening

In the previous episode, Eliezer2001 is fighting a rearguard action against the truth. Only gradually shifting his beliefs, admitting an increasing probability in a different scenario, but never saying outright, “I was wrong before.” He repairs his strategies as they are challenged, finding new justifications for just the same plan he pursued before.

(Of which it is therefore said: “Beware lest you fight a rearguard retreat against the evidence, grudgingly conceding each foot of ground only when forced, feeling cheated. Surrender to the truth as quickly as you can. Do this the instant you realize what you are resisting; the instant you can see from which quarter the winds of evidence are blowing against you.”)

Memory fades, and I can hardly bear to look back upon those times—no, seriously, I can’t stand reading my old writing. I’ve already been corrected once in my recollections, by those who were present. And so, though I remember the important events, I’m not really sure what order they happened in, let alone what year.

But if I had to pick a moment when my folly broke, I would pick the moment when I first comprehended, in full generality, the notion of an optimization process. That was the point at which I first looked back and said, “I’ve been a fool.”

Previously, in 2002, I’d been writing a bit about the evolutionary psychology of human general intelligence—though at the time, I thought I was writing about AI; at this point I thought I was against anthropomorphic intelligence, but I was still looking to the human brain for inspiration. (The paper in question is “Levels of Organization in General Intelligence,” a requested chapter for the volume Artificial General Intelligence,1 which finally came out in print in 2007.)

So I’d been thinking (and writing) about how natural selection managed to cough up human intelligence; I saw a dichotomy between them, the blindness of natural selection and the lookahead of intelligent foresight, reasoning by simulation versus playing everything out in reality, abstract versus concrete thinking. And yet it was natural selection that created human intelligence, so that our brains, though not our thoughts, are entirely made according to the signature of natural selection.

To this day, this still seems to me like a reasonably shattering insight, and so it drives me up the wall when people lump together natural selection and intelligence-driven processes as “evolutionary.” They really are almost absolutely different in a number of important ways—though there are concepts in common that can be used to describe them, like consequentialism and cross-domain generality.

But that Eliezer2002 is thinking in terms of a dichotomy between evolution and intelligence tells you something about the limits of his vision—like someone who thinks of politics as a dichotomy
between conservative and liberal stances, or someone who thinks of fruit as a dichotomy between apples and strawberries.

After the “Levels of Organization” draft was published online, Emil Gilliam pointed out that my view of AI seemed pretty similar to my view of intelligence. Now, of course Eliezer2002 doesn’t espouse building an AI in the image of a human mind; Eliezer2002 knows very well that a human mind is just a hack coughed up by natural selection. But Eliezer2002 has described these levels of organization in human thinking, and he hasn’t proposed using different levels of organization in the AI. Emil Gilliam asks whether I think I might be hewing too close to the human line. I dub the alternative the “Completely Alien Mind Design” and reply that a CAMD is probably too difficult for human engineers to create, even if it’s possible in theory, because we wouldn’t be able to understand something so alien while we were putting it together.

I don’t know if Eliezer2002 invented this reply on his own, or if he read it somewhere else. Needless to say, I’ve heard this excuse plenty of times since then. In reality, what you genuinely understand, you can usually reconfigure in almost any sort of shape, leaving some structural essence inside; but when you don’t understand flight, you suppose that a flying machine needs feathers, because you can’t imagine departing from the analogy of a bird.

So Eliezer2002 is still, in a sense, attached to humanish mind designs—he imagines improving on them, but the human architecture is still in some sense his point of departure.

What is it that finally breaks this attachment?

It’s an embarrassing confession: It came from a science fiction story I was trying to write. (No, you can’t see it; it’s not done.) The story involved a non-cognitive non-evolutionary optimization process, something like an Outcome Pump. Not intelligence, but a cross-temporal physical effect—that is, I was imagining it as a physical effect—that narrowly constrained the space of possible outcomes. (I can’t tell you any more than that; it would be a spoiler, if I ever finished the story. Just see the essay on Outcome Pumps.) It was “just a story,” and so I was free to play with the idea and elaborate it out logically: C was constrained to happen, therefore B (in the past) was constrained to happen, therefore A (which led to B) was constrained to happen.

‹ Prev Next ›