Rationality- From AI to Zombies

Page 123

by Eliezer Yudkowsky

Being religious doesn’t make you less than human. Your brain still has the abilities of a human brain. The dangerous part is that being religious might stop you from applying those native abilities to your religion—stop you from reflecting fully on yourself. People don’t heal their errors by resetting themselves to an ideal philosopher of pure emptiness and reconsidering all their sensory experiences from scratch. They heal themselves by becoming more willing to question their current beliefs, using more of the power of their current mind.

This is why it’s important to distinguish between reflecting on your mind using your mind (it’s not like you can use anything else) and having an unquestionable assumption that you can’t reflect on.

“I believe that the Bible is the word of God, because the Bible says so.” Well, if the Bible were an astoundingly reliable source of information about all other matters, if it had not said that grasshoppers had four legs or that the universe was created in six days, but had instead contained the Periodic Table of Elements centuries before chemistry—if the Bible had served us only well and told us only truth—then we might, in fact, be inclined to take seriously the additional statement in the Bible, that the Bible had been generated by God. We might not trust it entirely, because it could also be aliens or the Dark Lords of the Matrix, but it would at least be worth taking seriously.

Likewise, if everything else that priests had told us turned out to be true, we might take more seriously their statement that faith had been placed in us by God and was a systematically trustworthy source—especially if people could divine the hundredth digit of pi by faith as well.

So the important part of appreciating the circularity of “I believe that the Bible is the word of God, because the Bible says so,” is not so much that you are going to reject the idea of reflecting on your mind using your current mind. Rather, you realize that anything which calls into question the Bible’s trustworthiness also calls into question the Bible’s assurance of its trustworthiness.

This applies to rationality too: if the future should cease to resemble the past—even on its lowest and simplest and most stable observed levels of organization—well, mostly, I’d be dead, because my brain’s processes require a lawful universe where chemistry goes on working. But if somehow I survived, then I would have to start questioning the principle that the future should be predicted to be like the past.

But for now . . . what’s the alternative to saying, “I’m going to believe that the future will be like the past on the most stable level of organization I can identify, because that’s previously worked better for me than any other algorithm I’ve tried”?

Is it saying, “I’m going to believe that the future will not be like the past, because that algorithm has always failed before”?

At this point I feel obliged to drag up the point that rationalists are not out to win arguments with ideal philosophers of perfect emptiness; we are simply out to win. For which purpose we want to get as close to the truth as we can possibly manage. So at the end of the day, I embrace the principle: “Question your brain, question your intuitions, question your principles of rationality, using the full current force of your mind, and doing the best you can do at every point.”

If one of your current principles does come up wanting—according to your own mind’s examination, since you can’t step outside yourself—then change it! And then go back and look at things again, using your new improved principles.

The point is not to be reflectively consistent. The point is to win. But if you look at yourself and play to win, you are making yourself more reflectively consistent—that’s what it means to “play to win” while “looking at yourself.”

Everything, without exception, needs justification. Sometimes—unavoidably, as far as I can tell—those justifications will go around in reflective loops. I do think that reflective loops have a meta-character which should enable one to distinguish them, by common sense, from circular logics. But anyone seriously considering a circular logic in the first place is probably out to lunch in matters of rationality, and will simply insist that their circular logic is a “reflective loop” even if it consists of a single scrap of paper saying “Trust me.” Well, you can’t always optimize your rationality techniques according to the sole consideration of preventing those bent on self-destruction from abusing them.

The important thing is to hold nothing back in your criticisms of how to criticize; nor should you regard the unavoidability of loopy justifications as a warrant of immunity from questioning.

Always apply full force, whether it loops or not—do the best you can possibly do, whether it loops or not—and play, ultimately, to win.

*

265

My Kind of Reflection

In Where Recursive Justification Hits Bottom, I concluded that it’s okay to use induction to reason about the probability that induction will work in the future, given that it’s worked in the past; or to use Occam’s Razor to conclude that the simplest explanation for why Occam’s Razor works is that the universe itself is fundamentally simple.

Now I am far from the first person to consider reflective application of reasoning principles. Chris Hibbert compared my view to Bartley’s Pan-Critical Rationalism (I was wondering whether that would happen). So it seems worthwhile to state what I see as the distinguishing features of my view of reflection, which may or may not happen to be shared by any other philosopher’s view of reflection.

All of my philosophy here actually comes from trying to figure out how to build a self-modifying AI that applies its own reasoning principles to itself in the process of rewriting its own source code. So whenever I talk about using induction to license induction, I’m really thinking about an inductive AI considering a rewrite of the part of itself that performs induction. If you wouldn’t want the AI to rewrite its source code to not use induction, your philosophy had better not label induction as unjustifiable.

One of the most powerful principles I know for AI in general is that the true Way generally turns out to be naturalistic—which for reflective reasoning means treating transistors inside the AI just as if they were transistors found in the environment, not an ad-hoc special case. This is the real source of my insistence in Recursive Justification that questions like “How well does my version of Occam’s Razor work?” should be considered just like an ordinary question—or at least an ordinary very deep question. I strongly suspect that a correctly built AI, in pondering modifications to the part of its source code that implements Occamian reasoning, will not have to do anything special as it ponders—in particular, it shouldn’t have to make a special effort to avoid using Occamian reasoning.

I don’t think that “reflective coherence” or “reflective consistency” should be considered as a desideratum in itself. As I say in The Twelve Virtues and The Simple Truth, if you make five accurate maps of the same city, then the maps will necessarily be consistent with each other; but if you draw one map by fantasy and then make four copies, the five will be consistent but not accurate. In the same way, no one is deliberately pursuing reflective consistency, and reflective consistency is not a special warrant of trustworthiness; the goal is to win. But anyone who pursues the goal of winning, using their current notion of winning, and modifying their own source code, will end up reflectively consistent as a side effect—just like someone continually striving to improve their map of the world should find the parts becoming more consistent among themselves, as a side effect. If you put on your AI goggles, then the AI, rewriting its own source code, is not trying to make itself “reflectively consistent”—it is trying to optimize the expected utility of its source code, and it happens to be doing this using its current mind’s anticipation of the consequences.

One of the ways I license using induction and Occam’s Razor to consider “induction” and “Occam’s Razor” is by appealing to E. T. Jaynes’s principle that we should always use all the information available to us (computing power permitting) in a cal
culation. If you think induction works, then you should use it in order to use your maximum power, including when you’re thinking about induction.

In general, I think it’s valuable to distinguish a defensive posture where you’re imagining how to justify your philosophy to a philosopher that questions you, from an aggressive posture where you’re trying to get as close to the truth as possible. So it’s not that being suspicious of Occam’s Razor, but using your current mind and intelligence to inspect it, shows that you’re being fair and defensible by questioning your foundational beliefs. Rather, the reason why you would inspect Occam’s Razor is to see if you could improve your application of it, or if you’re worried it might really be wrong. I tend to deprecate mere dutiful doubts.

If you run around inspecting your foundations, I expect you to actually improve them, not just dutifully investigate. Our brains are built to assess “simplicity” in a certain intuitive way that makes Thor sound simpler than Maxwell’s Equations as an explanation for lightning. But, having gotten a better look at the way the universe really works, we’ve concluded that differential equations (which few humans master) are actually simpler (in an information-theoretic sense) than heroic mythology (which is how most tribes explain the universe). This being the case, we’ve tried to import our notions of Occam’s Razor into math as well.

On the other hand, the improved foundations should still add up to normality; 2 + 2 should still end up equalling 4, not something new and amazing and exciting like “fish.”

I think it’s very important to distinguish between the questions “Why does induction work?” and “Does induction work?” The reason why the universe itself is regular is still a mysterious question unto us, for now. Strange speculations here may be temporarily needful. But on the other hand, if you start claiming that the universe isn’t actually regular, that the answer to “Does induction work?” is “No!,” then you’re wandering into 2 + 2 = 3 territory. You’re trying too hard to make your philosophy interesting, instead of correct. An inductive AI asking what probability assignment to make on the next round is asking “Does induction work?,” and this is the question that it may answer by inductive reasoning. If you ask “Why does induction work?” then answering “Because induction works” is circular logic, and answering “Because I believe induction works” is magical thinking.

I don’t think that going around in a loop of justifications through the meta-level is the same thing as circular logic. I think the notion of “circular logic” applies within the object level, and is something that is definitely bad and forbidden, on the object level. Forbidding reflective coherence doesn’t sound like a good idea. But I haven’t yet sat down and formalized the exact difference—my reflective theory is something I’m trying to work out, not something I have in hand.

*

266

No Universally Compelling Arguments

What is so terrifying about the idea that not every possible mind might agree with us, even in principle?

For some folks, nothing—it doesn’t bother them in the slightest. And for some of those folks, the reason it doesn’t bother them is that they don’t have strong intuitions about standards and truths that go beyond personal whims. If they say the sky is blue, or that murder is wrong, that’s just their personal opinion; and that someone else might have a different opinion doesn’t surprise them.

For other folks, a disagreement that persists even in principle is something they can’t accept. And for some of those folks, the reason it bothers them is that it seems to them that if you allow that some people cannot be persuaded even in principle that the sky is blue, then you’re conceding that “the sky is blue” is merely an arbitrary personal opinion.

I’ve proposed that you should resist the temptation to generalize over all of mind design space. If we restrict ourselves to minds specifiable in a trillion bits or less, then each universal generalization “All minds m: X(m)” has two to the trillionth chances to be false, while each existential generalization “Exists mind m: X(m)” has two to the trillionth chances to be true.

This would seem to argue that for every argument A, howsoever convincing it may seem to us, there exists at least one possible mind that doesn’t buy it.

And the surprise and/or horror of this prospect (for some) has a great deal to do, I think, with the intuition of the ghost-in-the-machine—a ghost with some irreducible core that any truly valid argument will convince.

I have previously spoken of the intuition whereby people map programming a computer onto instructing a human servant, so that the computer might rebel against its code—or perhaps look over the code, decide it is not reasonable, and hand it back.

If there were a ghost in the machine and the ghost contained an irreducible core of reasonableness, above which any mere code was only a suggestion, then there might be universal arguments. Even if the ghost were initially handed code-suggestions that contradicted the Universal Argument, when we finally did expose the ghost to the Universal Argument—or the ghost could discover the Universal Argument on its own, that’s also a popular concept—the ghost would just override its own, mistaken source code.

But as the student programmer once said, “I get the feeling that the computer just skips over all the comments.” The code is not given to the AI; the code is the AI.

If you switch to the physical perspective, then the notion of a Universal Argument seems noticeably unphysical. If there’s a physical system that at time T, after being exposed to argument E, does X, then there ought to be another physical system that at time T, after being exposed to environment E, does Y. Any thought has to be implemented somewhere, in a physical system; any belief, any conclusion, any decision, any motor output. For every lawful causal system that zigs at a set of points, you should be able to specify another causal system that lawfully zags at the same points.

Let’s say there’s a mind with a transistor that outputs +3 volts at time T, indicating that it has just assented to some persuasive argument. Then we can build a highly similar physical cognitive system with a tiny little trapdoor underneath the transistor containing a little gray man who climbs out at time T and sets that transistor’s output to -3 volts, indicating non-assent. Nothing acausal about that; the little gray man is there because we built him in. The notion of an argument that convinces any mind seems to involve a little blue woman who was never built into the system, who climbs out of literally nowhere, and strangles the little gray man, because that transistor has just got to output +3 volts. It’s such a compelling argument, you see.

But compulsion is not a property of arguments; it is a property of minds that process arguments.

So the reason I’m arguing against the ghost isn’t just to make the point that (1) Friendly AI has to be explicitly programmed and (2) the laws of physics do not forbid Friendly AI. (Though of course I take a certain interest in establishing this.)

I also wish to establish the notion of a mind as a causal, lawful, physical system in which there is no irreducible central ghost that looks over the neurons/code and decides whether they are good suggestions.

(There is a concept in Friendly AI of deliberately programming an FAI to review its own source code and possibly hand it back to the programmers. But the mind that reviews is not irreducible, it is just the mind that you created. The FAI is renormalizing itself however it was designed to do so; there is nothing acausal reaching in from outside. A bootstrap, not a skyhook.)

All this echoes back to the worry about a Bayesian’s “arbitrary” priors. If you show me one Bayesian who draws 4 red balls and 1 white ball from a barrel, and who assigns probability 5/7 to obtaining a red ball on the next occasion (by Laplace’s Rule of Succession), then I can show you another mind which obeys Bayes’s Rule to conclude a 2/7 probability of obtaining red on the next occasion—corresponding to a different prior belief about the barrel, but, perhaps, a less “reasonable” one.

Many philosophers are convinced that because you can in
-principle construct a prior that updates to any given conclusion on a stream of evidence, therefore, Bayesian reasoning must be “arbitrary,” and the whole schema of Bayesianism flawed, because it relies on “unjustifiable” assumptions, and indeed “unscientific,” because you cannot force any possible journal editor in mindspace to agree with you.

And this (I replied) relies on the notion that by unwinding all arguments and their justifications, you can obtain an ideal philosophy student of perfect emptiness, to be convinced by a line of reasoning that begins from absolutely no assumptions.

But who is this ideal philosopher of perfect emptiness? Why, it is just the irreducible core of the ghost!

And that is why (I went on to say) the result of trying to remove all assumptions from a mind, and unwind to the perfect absence of any prior, is not an ideal philosopher of perfect emptiness, but a rock. What is left of a mind after you remove the source code? Not the ghost who looks over the source code, but simply . . . no ghost.

So—and I shall take up this theme again later—wherever you are to locate your notions of validity or worth or rationality or justification or even objectivity, it cannot rely on an argument that is universally compelling to all physically possible minds.

Nor can you ground validity in a sequence of justifications that, beginning from nothing, persuades a perfect emptiness.

Oh, there might be argument sequences that would compel any neurologically intact human—like the argument I use to make people let the AI out of the box1—but that is hardly the same thing from a philosophical perspective.

‹ Prev Next ›