Rationality- From AI to Zombies

Page 142

by Eliezer Yudkowsky

Drawing a line through one point is generally held to be dangerous. Two points make a dichotomy; you imagine them opposed to one another. But when you’ve got three different points—that’s when you’re forced to wake up and generalize.

Now I had three points: Human intelligence, natural selection, and my fictional plot device.

And so that was the point at which I generalized the notion of an optimization process, of a process that squeezes the future into a narrow region of the possible.

This may seem like an obvious point, if you’ve been following Overcoming Bias this whole time; but if you look at Shane Legg’s collection of 71 definitions of intelligence, you’ll see that “squeezing the future into a constrained region” is a less obvious reply than it seems.

Many of the definitions of “intelligence” by AI researchers do talk about “solving problems” or “achieving goals.” But from the viewpoint of past Eliezers, at least, it is only hindsight that makes this the same thing as “squeezing the future.”

A goal is a mentalistic object; electrons have no goals, and solve no problems either. When a human imagines a goal, they imagine an agent imbued with wanting-ness—it’s still empathic language.

You can espouse the notion that intelligence is about “achieving goals”—and then turn right around and argue about whether some “goals” are better than others—or talk about the wisdom required to judge between goals themselves—or talk about a system deliberately modifying its goals—or talk about the free will needed to choose plans that achieve goals—or talk about an AI realizing that its goals aren’t what the programmers really meant to ask for. If you imagine something that squeezes the future into a narrow region of the possible, like an Outcome Pump, those seemingly sensible statements somehow don’t translate.

So for me at least, seeing through the word “mind” to a physical process that would, just by naturally running, just by obeying the laws of physics, end up squeezing its future into a narrow region, was a naturalistic enlightenment over and above the notion of an agent trying to achieve its goals.

It was like falling out of a deep pit, falling into the ordinary world, strained cognitive tensions relaxing into unforced simplicity, confusion turning to smoke and drifting away. I saw the work performed by intelligence; smart was no longer a property, but an engine. Like a knot in time, echoing the outer part of the universe in the inner part, and thereby steering it. I even saw, in a flash of the same enlightenment, that a mind had to output waste heat in order to obey the laws of thermodynamics.

Previously, Eliezer2001 had talked about Friendly AI as something you should do just to be sure—if you didn’t know whether AI design X was going to be Friendly, then you really ought to go with AI design Y that you did know would be Friendly. But Eliezer2001 didn’t think he knew whether you could actually have a superintelligence that turned its future light cone into paperclips.

Now, though, I could see it—the pulse of the optimization process, sensory information surging in, motor instructions surging out, steering the future. In the middle, the model that linked up possible actions to possible outcomes, and the utility function over the outcomes. Put in the corresponding utility function, and the result would be an optimizer that would steer the future anywhere.

Up until that point, I’d never quite admitted to myself that Eliezer1997’s AI goal system design would definitely, no two ways about it, pointlessly wipe out the human species. Now, however, I looked back, and I could finally see what my old design really did, to the extent it was coherent enough to be talked about. Roughly, it would have converted its future light cone into generic tools—computers without programs to run, stored energy without a use . . .

. . . how on Earth had I, the fine and practiced rationalist—how on Earth had I managed to miss something that obvious, for six damned years?

That was the point at which I awoke clear-headed, and remembered; and thought, with a certain amount of embarrassment: I’ve been stupid.

*

1. Ben Goertzel and Cassio Pennachin, eds., Artificial General Intelligence, Cognitive Technologies (Berlin: Springer, 2007), doi:10.1007/978-3-540-68677-4.

300

The Level Above Mine

I once lent Xiaoguang “Mike” Li my copy of Probability Theory: The Logic of Science. Mike Li read some of it, and then came back and said:

Wow . . . it’s like Jaynes is a thousand-year-old vampire.

Then Mike said, “No, wait, let me explain that—” and I said, “No, I know exactly what you mean.” It’s a convention in fantasy literature that the older a vampire gets, the more powerful they become.

I’d enjoyed math proofs before I encountered Jaynes. But E. T. Jaynes was the first time I picked up a sense of formidability from mathematical arguments. Maybe because Jaynes was lining up “paradoxes” that had been used to object to Bayesianism, and then blasting them to pieces with overwhelming firepower—power being used to overcome others. Or maybe the sense of formidability came from Jaynes not treating his math as a game of aesthetics; Jaynes cared about probability theory, it was bound up with other considerations that mattered, to him and to me too.

For whatever reason, the sense I get of Jaynes is one of terrifying swift perfection—something that would arrive at the correct answer by the shortest possible route, tearing all surrounding mistakes to shreds in the same motion. Of course, when you write a book, you get a chance to show only your best side. But still.

It spoke well of Mike Li that he was able to sense the aura of formidability surrounding Jaynes. It’s a general rule, I’ve observed, that you can’t discriminate between levels too far above your own. E.g., someone once earnestly told me that I was really bright, and “ought to go to college.” Maybe anything more than around one standard deviation above you starts to blur together, though that’s just a cool-sounding wild guess.

So, having heard Mike Li compare Jaynes to a thousand-year-old vampire, one question immediately popped into my mind:

“Do you get the same sense off me?” I asked.

Mike shook his head. “Sorry,” he said, sounding somewhat awkward, “it’s just that Jaynes is . . .”

“No, I know,” I said. I hadn’t thought I’d reached Jaynes’s level. I’d only been curious about how I came across to other people.

I aspire to Jaynes’s level. I aspire to become as much the master of Artificial Intelligence / reflectivity, as Jaynes was master of Bayesian probability theory. I can even plead that the art I’m trying to master is more difficult than Jaynes’s, making a mockery of deference. Even so, and embarrassingly, there is no art of which I am as much the master now, as Jaynes was of probability theory.

This is not, necessarily, to place myself beneath Jaynes as a person—to say that Jaynes had a magical aura of destiny, and I don’t.

Rather I recognize in Jaynes a level of expertise, of sheer formidability, which I have not yet achieved. I can argue forcefully in my chosen subject, but that is not the same as writing out the equations and saying: DONE.

For so long as I have not yet achieved that level, I must acknowledge the possibility that I can never achieve it, that my native talent is not sufficient. When Marcello Herreshoff had known me for long enough, I asked him if he knew of anyone who struck him as substantially more natively intelligent than myself. Marcello thought for a moment and said “John Conway—I met him at a summer math camp.” Darn, I thought, he thought of someone, and worse, it’s some ultra-famous old guy I can’t grab. I inquired how Marcello had arrived at the judgment. Marcello said, “He just struck me as having a tremendous amount of mental horsepower,” and started to explain a math problem he’d had a chance to work on with Conway.

Not what I wanted to hear.

Perhaps, relative to Marcello’s experience of Conway and his experience of me, I haven’t had a chance to show off on any subject that I’ve mastered as thoroughly as Conway had mastered his many fields of mathematics.

Or i
t might be that Conway’s brain is specialized off in a different direction from mine, and that I could never approach Conway’s level on math, yet Conway wouldn’t do so well on AI research.

Or . . .

. . . or I’m strictly dumber than Conway, dominated by him along all dimensions. Maybe, if I could find a young proto-Conway and tell them the basics, they would blaze right past me, solve the problems that have weighed on me for years, and zip off to places I can’t follow.

Is it damaging to my ego to confess that last possibility? Yes. It would be futile to deny that.

Have I really accepted that awful possibility, or am I only pretending to myself to have accepted it? Here I will say: “No, I think I have accepted it.” Why do I dare give myself so much credit? Because I’ve invested specific effort into that awful possibility. I am writing here for many reasons, but a major one is the vision of some younger mind reading these words and zipping off past me. It might happen, it might not.

Or sadder: Maybe I just wasted too much time on setting up the resources to support me, instead of studying math full-time through my whole youth; or I wasted too much youth on non-mathy ideas. And this choice, my past, is irrevocable. I’ll hit a brick wall at 40, and there won’t be anything left but to pass on the resources to another mind with the potential I wasted, still young enough to learn. So to save them time, I should leave a trail to my successes, and post warning signs on my mistakes.

Such specific efforts predicated on an ego-damaging possibility—that’s the only kind of humility that seems real enough for me to dare credit myself. Or giving up my precious theories, when I realized that they didn’t meet the standard Jaynes had shown me—that was hard, and it was real. Modest demeanors are cheap. Humble admissions of doubt are cheap. I’ve known too many people who, presented with a counterargument, say, “I am but a fallible mortal, of course I could be wrong,” and then go on to do exactly what they had planned to do previously.

You’ll note that I don’t try to modestly say anything like, “Well, I may not be as brilliant as Jaynes or Conway, but that doesn’t mean I can’t do important things in my chosen field.”

Because I do know . . . that’s not how it works.

*

301

The Magnitude of His Own Folly

In the years before I met a would-be creator of Artificial General Intelligence (with a funded project) who happened to be a creationist, I would still try to argue with individual AGI wannabes.

In those days, I sort-of-succeeded in convincing one such fellow that, yes, you had to take Friendly AI into account, and no, you couldn’t just find the right fitness metric for an evolutionary algorithm. (Previously he had been very impressed with evolutionary algorithms.)

And the one said: Oh, woe! Oh, alas! What a fool I’ve been! Through my carelessness, I almost destroyed the world! What a villain I once was!

Now, there’s a trap I knew better than to fall into—

—at the point where, in late 2002, I looked back to Eliezer1997’s AI proposals and realized what they really would have done, insofar as they were coherent enough for me to talk about what they “really would have done.”

When I finally saw the magnitude of my own folly, everything fell into place at once. The dam against realization cracked; and the unspoken doubts that had been accumulating behind it crashed through all together. There wasn’t a prolonged period, or even a single moment that I remember, of wondering how I could have been so stupid. I already knew how.

And I also knew, all at once, in the same moment of realization, that to say, I almost destroyed the world!, would have been too prideful.

It would have been too confirming of ego, too confirming of my own importance in the scheme of things, at a time when—I understood in the same moment of realization—my ego ought to be taking a major punch to the stomach. I had been so much less than I needed to be; I had to take that punch in the stomach, not avert it.

And by the same token, I didn’t fall into the conjugate trap of saying: Oh, well, it’s not as if I had code and was about to run it; I didn’t really come close to destroying the world. For that, too, would have minimized the force of the punch. It wasn’t really loaded? I had proposed and intended to build the gun, and load the gun, and put the gun to my head and pull the trigger; and that was a bit too much self-destructiveness.

I didn’t make a grand emotional drama out of it. That would have wasted the force of the punch, averted it into mere tears.

I knew, in the same moment, what I had been carefully not-doing for the last six years. I hadn’t been updating.

And I knew I had to finally update. To actually change what I planned to do, to change what I was doing now, to do something different instead.

I knew I had to stop.

Halt, melt, and catch fire.

Say, “I’m not ready.” Say, “I don’t know how to do this yet.”

These are terribly difficult words to say, in the field of AGI. Both the lay audience and your fellow AGI researchers are interested in code, projects with programmers in play. Failing that, they may give you some credit for saying, “I’m ready to write code; just give me the funding.”

Say, “I’m not ready to write code,” and your status drops like a depleted uranium balloon.

What distinguishes you, then, from six billion other people who don’t know how to create Artificial General Intelligence? If you don’t have neat code (that does something other than be humanly intelligent, obviously; but at least it’s code), or at minimum your own startup that’s going to write code as soon as it gets funding—then who are you and what are you doing at our conference?

Maybe later I’ll write on where this attitude comes from—the excluded middle between “I know how to build AGI!” and “I’m working on narrow AI because I don’t know how to build AGI,” the nonexistence of a concept for “I am trying to get from an incomplete map of FAI to a complete map of FAI.”

But this attitude does exist, and so the loss of status associated with saying “I’m not ready to write code” is very great. (If the one doubts this, let them name any other who simultaneously says “I intend to build an Artificial General Intelligence,” “Right now I can’t build an AGI because I don’t know X,” and “I am currently trying to figure out X.”)

(And never mind AGI folk who’ve already raised venture capital, promising returns in five years.)

So there’s a huge reluctance to say, “Stop.” You can’t just say, “Oh, I’ll swap back to figure-out-X mode,” because that mode doesn’t exist.

Was there more to that reluctance than just loss of status, in my case? Eliezer2001 might also have flinched away from slowing his perceived forward momentum into the intelligence explosion, which was so right and so necessary . . .

But mostly, I think I flinched away from not being able to say, “I’m ready to start coding.” Not just for fear of others’ reactions, but because I’d been inculcated with the same attitude myself.

Above all, Eliezer2001 didn’t say, “Stop”—even after noticing the problem of Friendly AI—because I did not realize, on a gut level, that Nature was allowed to kill me.

“Teenagers think they’re immortal,” the proverb goes. Obviously this isn’t true in the literal sense that if you ask them, “Are you indestructible?” they will reply “Yes, go ahead and try shooting me.” But perhaps wearing seat belts isn’t deeply emotionally compelling for them, because the thought of their own death isn’t quite real—they don’t really believe it’s allowed to happen. It can happen in principle but it can’t actually happen.

Personally, I always wore my seat belt. As an individual, I understood that I could die.

But, having been raised in technophilia to treasure that one most precious thing, far more important than my own life, I once thought that the Future was indestructible.

Even when I acknowledged that nanotech could wipe out humanity, I still believed the intelligence explosion was invulnerabl
e. That if humanity survived, the intelligence explosion would happen, and the resultant AI would be too smart to be corrupted or lost.

Even after that, when I acknowledged Friendly AI as a consideration, I didn’t emotionally believe in the possibility of failure, any more than that teenager who doesn’t wear their seat belt really believes that an automobile accident is really allowed to kill or cripple them.

It wasn’t until my insight into optimization let me look back and see Eliezer1997 in plain light that I realized that Nature was allowed to kill me.

“The thought you cannot think controls you more than thoughts you speak aloud.” But we flinch away from only those fears that are real to us.

AGI researchers take very seriously the prospect of someone else solving the problem first. They can imagine seeing the headlines in the paper saying that their own work has been upstaged. They know that Nature is allowed to do that to them. The ones who have started companies know that they are allowed to run out of venture capital. That possibility is real to them, very real; it has a power of emotional compulsion over them.

I don’t think that “Oops” followed by the thud of six billion bodies falling, at their own hands, is real to them on quite the same level.

It is unsafe to say what other people are thinking. But it seems rather likely that when the one reacts to the prospect of Friendly AI by saying, “If you delay development to work on safety, other projects that don’t care at all about Friendly AI will beat you to the punch,” the prospect of they themselves making a mistake followed by six billion thuds is not really real to them; but the possibility of others beating them to the punch is deeply scary.

‹ Prev Next ›