Book Read Free

Rationality- From AI to Zombies

Page 122

by Eliezer Yudkowsky


  Later I will say more upon this subject, but I can go ahead and tell you one of the guiding principles: If you meet someone who says that their AI will do XYZ just like humans, do not give them any venture capital. Say to them rather: “I’m sorry, I’ve never seen a human brain, or any other intelligence, and I have no reason as yet to believe that any such thing can exist. Now please explain to me what your AI does, and why you believe it will do it, without pointing to humans as an example.” Planes would fly just as well, given a fixed design, if birds had never existed; they are not kept aloft by analogies.

  So now you perceive, I hope, why, if you wanted to teach someone to do fundamental work on strong AI—bearing in mind that this is demonstrably a very difficult art, which is not learned by a supermajority of students who are just taught existing reductions such as search trees—then you might go on for some length about such matters as the fine art of reductionism, about playing rationalist’s Taboo to excise problematic words and replace them with their referents, about anthropomorphism, and, of course, about early stopping on mysterious answers to mysterious questions.

  *

  1. Pearl, Probabilistic Reasoning in Intelligent Systems.

  263

  The Design Space of Minds-in-General

  People ask me, “What will Artificial Intelligences be like? What will they do? Tell us your amazing story about the future.”

  And lo, I say unto them, “You have asked me a trick question.”

  ATP synthase is a molecular machine—one of three known occasions when evolution has invented the freely rotating wheel—that is essentially the same in animal mitochondria, plant chloroplasts, and bacteria. ATP synthase has not changed significantly since the rise of eukaryotic life two billion years ago. It’s something we all have in common—thanks to the way that evolution strongly conserves certain genes; once many other genes depend on a gene, a mutation will tend to break all the dependencies.

  Any two AI designs might be less similar to each other than you are to a petunia. Asking what “AIs” will do is a trick question because it implies that all AIs form a natural class. Humans do form a natural class because we all share the same brain architecture. But when you say “Artificial Intelligence,” you are referring to a vastly larger space of possibilities than when you say “human.” When people talk about “AIs” we are really talking about minds-in-general, or optimization processes in general. Having a word for “AI” is like having a word for everything that isn’t a duck.

  Imagine a map of mind design space . . . this is one of my standard diagrams . . .

  All humans, of course, fit into a tiny little dot—as a sexually reproducing species, we can’t be too different from one another.

  This tiny dot belongs to a wider ellipse, the space of transhuman mind designs—things that might be smarter than us, or much smarter than us, but that in some sense would still be people as we understand people.

  This transhuman ellipse is within a still wider volume, the space of posthuman minds, which is everything that a transhuman might grow up into.

  And then the rest of the sphere is the space of minds-in-general, including possible Artificial Intelligences so odd that they aren’t even posthuman.

  But wait—natural selection designs complex artifacts and selects among complex strategies. So where is natural selection on this map?

  So this entire map really floats in a still vaster space, the space of optimization processes. At the bottom of this vaster space, below even humans, is natural selection as it first began in some tidal pool: mutate, replicate, and sometimes die, no sex.

  Are there any powerful optimization processes, with strength comparable to a human civilization or even a self-improving AI, which we would not recognize as minds? Arguably Marcus Hutter’s AIXI should go in this category: for a mind of infinite power, it’s awfully stupid—poor thing can’t even recognize itself in a mirror. But that is a topic for another time.

  My primary moral is to resist the temptation to generalize over all of mind design space.

  If we focus on the bounded subspace of mind design space that contains all those minds whose makeup can be specified in a trillion bits or less, then every universal generalization that you make has two to the trillionth power chances to be falsified.

  Conversely, every existential generalization—“there exists at least one mind such that X”—has two to the trillionth power chances to be true.

  So you want to resist the temptation to say either that all minds do something, or that no minds do something.

  The main reason you could find yourself thinking that you know what a fully generic mind will (won’t) do is if you put yourself in that mind’s shoes—imagine what you would do in that mind’s place—and get back a generally wrong, anthropomorphic answer. (Albeit that it is true in at least one case, since you are yourself an example.) Or if you imagine a mind doing something, and then imagining the reasons you wouldn’t do it—so that you imagine that a mind of that type can’t exist, that the ghost in the machine will look over the corresponding source code and hand it back.

  Somewhere in mind design space is at least one mind with almost any kind of logically consistent property you care to imagine.

  And this is important because it emphasizes the importance of discussing what happens, lawfully, and why, as a causal result of a mind’s particular constituent makeup; somewhere in mind design space is a mind that does it differently.

  Of course, you could always say that anything that doesn’t do it your way is “by definition” not a mind; after all, it’s obviously stupid. I’ve seen people try that one too.

  *

  Part V

  Value Theory

  264

  Where Recursive Justification Hits Bottom

  Why do I believe that the Sun will rise tomorrow?

  Because I’ve seen the Sun rise on thousands of previous days.

  Ah . . . but why do I believe the future will be like the past?

  Even if I go past the mere surface observation of the Sun rising, to the apparently universal and exceptionless laws of gravitation and nuclear physics, then I am still left with the question: “Why do I believe this will also be true tomorrow?”

  I could appeal to Occam’s Razor, the principle of using the simplest theory that fits the facts . . . but why believe in Occam’s Razor? Because it’s been successful on past problems? But who says that this means Occam’s Razor will work tomorrow?

  And lo, the one said:

  Science also depends on unjustified assumptions. Thus science is ultimately based on faith, so don’t you criticize me for believing in [silly-belief-#238721].

  As I’ve previously observed:

  It’s a most peculiar psychology—this business of “Science is based on faith too, so there!” Typically this is said by people who claim that faith is a good thing. Then why do they say “Science is based on faith too!” in that angry-triumphal tone, rather than as a compliment?

  Arguing that you should be immune to criticism is rarely a good sign.

  But this doesn’t answer the legitimate philosophical dilemma: If every belief must be justified, and those justifications in turn must be justified, then how is the infinite recursion terminated?

  And if you’re allowed to end in something assumed-without-justification, then why aren’t you allowed to assume any old thing without justification?

  A similar critique is sometimes leveled against Bayesianism—that it requires assuming some prior—by people who apparently think that the problem of induction is a particular problem of Bayesianism, which you can avoid by using classical statistics.

  But first, let it be clearly admitted that the rules of Bayesian updating do not of themselves solve the problem of induction.

  Suppose you’re drawing red and white balls from an urn. You observe that, of the first 9 balls, 3 are red and 6 are white. What is the probability that the next ball drawn will be red?

  That depends on
your prior beliefs about the urn. If you think the urn-maker generated a uniform random number between 0 and 1, and used that number as the fixed probability of each ball being red, then the answer is 4/11 (by Laplace’s Law of Succession). If you think the urn originally contained 10 red balls and 10 white balls, then the answer is 7/11.

  Which goes to say that with the right prior—or rather the wrong prior—the chance of the Sun rising tomorrow would seem to go down with each succeeding day . . . if you were absolutely certain, a priori, that there was a great barrel out there from which, on each day, there was drawn a little slip of paper that determined whether the Sun rose or not; and that the barrel contained only a limited number of slips saying “Yes,” and the slips were drawn without replacement.

  There are possible minds in mind design space who have anti-Occamian and anti-Laplacian priors; they believe that simpler theories are less likely to be correct, and that the more often something happens, the less likely it is to happen again.

  And when you ask these strange beings why they keep using priors that never seem to work in real life . . . they reply, “Because it’s never worked for us before!”

  Now, one lesson you might derive from this is “Don’t be born with a stupid prior.” This is an amazingly helpful principle on many real-world problems, but I doubt it will satisfy philosophers.

  Here’s how I treat this problem myself: I try to approach questions like “Should I trust my brain?” or “Should I trust Occam’s Razor?” as though they were nothing special—or at least, nothing special as deep questions go.

  Should I trust Occam’s Razor? Well, how well does (any particular version of) Occam’s Razor seem to work in practice? What kind of probability-theoretic justifications can I find for it? When I look at the universe, does it seem like the kind of universe in which Occam’s Razor would work well?

  Should I trust my brain? Obviously not; it doesn’t always work. But nonetheless, the human brain seems much more powerful than the most sophisticated computer programs I could consider trusting otherwise. How well does my brain work in practice, on which sorts of problems?

  When I examine the causal history of my brain—its origins in natural selection—I find, on the one hand, all sorts of specific reasons for doubt; my brain was optimized to run on the ancestral savanna, not to do math. But on the other hand, it’s also clear why, loosely speaking, it’s possible that the brain really could work. Natural selection would have quickly eliminated brains so completely unsuited to reasoning, so anti-helpful, as anti-Occamian or anti-Laplacian priors.

  So what I did in practice does not amount to declaring a sudden halt to questioning and justification. I’m not halting the chain of examination at the point that I encounter Occam’s Razor, or my brain, or some other unquestionable. The chain of examination continues—but it continues, unavoidably, using my current brain and my current grasp on reasoning techniques. What else could I possibly use?

  Indeed, no matter what I did with this dilemma, it would be me doing it. Even if I trusted something else, like some computer program, it would be my own decision to trust it.

  The technique of rejecting beliefs that have absolutely no justification is in general an extremely important one. I sometimes say that the fundamental question of rationality is “Why do you believe what you believe?” I don’t even want to say something that sounds like it might allow a single exception to the rule that everything needs justification.

  Which is, itself, a dangerous sort of motivation; you can’t always avoid everything that might be risky, and when someone annoys you by saying something silly, you can’t reverse that stupidity to arrive at intelligence.

  But I would nonetheless emphasize the difference between saying:

  Here is this assumption I cannot justify, which must be simply taken, and not further examined.

  Versus saying:

  Here the inquiry continues to examine this assumption, with the full force of my present intelligence—as opposed to the full force of something else, like a random number generator or a magic 8-ball—even though my present intelligence happens to be founded on this assumption.

  Still . . . wouldn’t it be nice if we could examine the problem of how much to trust our brains without using our current intelligence? Wouldn’t it be nice if we could examine the problem of how to think, without using our current grasp of rationality?

  When you phrase it that way, it starts looking like the answer might be “No.”

  E. T. Jaynes used to say that you must always use all the information available to you—he was a Bayesian probability theorist, and had to clean up the paradoxes other people generated when they used different information at different points in their calculations. The principle of “Always put forth your true best effort” has at least as much appeal as “Never do anything that might look circular.” After all, the alternative to putting forth your best effort is presumably doing less than your best.

  But still . . . wouldn’t it be nice if there were some way to justify using Occam’s Razor, or justify predicting that the future will resemble the past, without assuming that those methods of reasoning which have worked on previous occasions are better than those which have continually failed?

  Wouldn’t it be nice if there were some chain of justifications that neither ended in an unexaminable assumption, nor was forced to examine itself under its own rules, but, instead, could be explained starting from absolute scratch to an ideal philosophy student of perfect emptiness?

  Well, I’d certainly be interested, but I don’t expect to see it done any time soon. There is no perfectly empty ghost-in-the-machine; there is no argument that you can explain to a rock.

  Even if someone cracks the First Cause problem and comes up with the actual reason the universe is simple, which does not itself presume a simple universe . . . then I would still expect that the explanation could only be understood by a mindful listener, and not by, say, a rock. A listener that didn’t start out already implementing modus ponens might be out of luck.

  So, at the end of the day, what happens when someone keeps asking me “Why do you believe what you believe?”

  At present, I start going around in a loop at the point where I explain, “I predict the future as though it will resemble the past on the simplest and most stable level of organization I can identify, because previously, this rule has usually worked to generate good results; and using the simple assumption of a simple universe, I can see why it generates good results; and I can even see how my brain might have evolved to be able to observe the universe with some degree of accuracy, if my observations are correct.”

  But then . . . haven’t I just licensed circular logic?

  Actually, I’ve just licensed reflecting on your mind’s degree of trustworthiness, using your current mind as opposed to something else.

  Reflection of this sort is, indeed, the reason we reject most circular logic in the first place. We want to have a coherent causal story about how our mind comes to know something, a story that explains how the process we used to arrive at our beliefs is itself trustworthy. This is the essential demand behind the rationalist’s fundamental question, “Why do you believe what you believe?”

  Now suppose you write on a sheet of paper: “(1) Everything on this sheet of paper is true, (2) The mass of a helium atom is 20 grams.” If that trick actually worked in real life, you would be able to know the true mass of a helium atom just by believing some circular logic that asserted it. Which would enable you to arrive at a true map of the universe sitting in your living room with the blinds drawn. Which would violate the Second Law of Thermodynamics by generating information from nowhere. Which would not be a plausible story about how your mind could end up believing something true.

  Even if you started out believing the sheet of paper, it would not seem that you had any reason for why the paper corresponded to reality. It would just be a miraculous coincidence that (a) the mass of a helium atom was 20 grams, and (b) the paper
happened to say so.

  Believing self-validating statement sets does not in general seem like it should work to map external reality—when we reflect on it as a causal story about minds—using, of course, our current minds to do so.

  But what about evolving to give more credence to simpler beliefs, and to believe that algorithms which have worked in the past are more likely to work in the future? Even when we reflect on this as a causal story of the origin of minds, it still seems like this could plausibly work to map reality.

  And what about trusting reflective coherence in general? Wouldn’t most possible minds, randomly generated and allowed to settle into a state of reflective coherence, be incorrect? Ah, but we evolved by natural selection; we were not generated randomly.

  If trusting this argument seems worrisome to you, then forget about the problem of philosophical justifications, and ask yourself whether it’s really truly true.

  (You will, of course, use your own mind to do so.)

  Is this the same as the one who says, “I believe that the Bible is the word of God, because the Bible says so”?

  Couldn’t they argue that their blind faith must also have been placed in them by God, and is therefore trustworthy?

  In point of fact, when religious people finally come to reject the Bible, they do not do so by magically jumping to a non-religious state of pure emptiness, and then evaluating their religious beliefs in that non-religious state of mind, and then jumping back to a new state with their religious beliefs removed.

  People go from being religious to being non-religious because even in a religious state of mind, doubt seeps in. They notice their prayers (and worse, the prayers of seemingly much worthier people) are not being answered. They notice that God, who speaks to them in their heart in order to provide seemingly consoling answers about the universe, is not able to tell them the hundredth digit of pi (which would be a lot more reassuring, if God’s purpose were reassurance). They examine the story of God’s creation of the world and damnation of unbelievers, and it doesn’t seem to make sense even under their own religious premises.

 

‹ Prev