Rationality- From AI to Zombies

Page 125

by Eliezer Yudkowsky

Some said, “Yes, because when an Earth person and a Twin Earth person utter the word ‘water,’ they have the same sensory test in mind.”

Some said, “No, because ‘water’ in our Earth means H2O and ‘water’ in the Twin Earth means XYZ.”

If you think of “water” as a concept that begins by eating a world to find out the empirical true nature of that transparent flowing stuff, and returns a new fixed concept Water42 or H2O, then this world-eating concept is the same in our Earth and the Twin Earth; it just returns different answers in different places.

If you think of “water” as meaning H2O, then the concept does nothing different when we transport it between worlds, and the Twin Earth contains no H2O.

And of course there is no point in arguing over what the sound of the syllables “wa-ter” really means.

So should you pick one definition and use it consistently? But it’s not that easy to save yourself from confusion. You have to train yourself to be deliberately aware of the distinction between the curried and uncurried forms of concepts.

When you take the uncurried water concept and apply it in a different world, it is the same concept but it refers to a different thing; that is, we are applying a constant world-eating function to a different world and obtaining a different return value. In the Twin Earth, XYZ is “water” and H2O is not; in our Earth, H2O is “water” and XYZ is not.

On the other hand, if you take “water” to refer to what the prior thinker would call “the result of applying ‘water’ to our Earth,” then in the Twin Earth, XYZ is not water and H2O is.

The whole confusingness of the subsequent philosophical debate rested on a tendency to instinctively curry concepts or instinctively uncurry them.

Similarly it takes an extra step for Fred to realize that other agents, like the Bug-Eyed-Monster agent, will choose kidnappees for ravishing based on Sexiness_BEM(Woman), not Sexiness_Fred(Woman). To do this, Fred must consciously re-envision Sexiness as a function with two arguments. All Fred’s brain does by instinct is evaluate Woman.sexiness–-that is, Sexiness_Fred(Woman); but it’s simply labeled Woman.sexiness.

The fixed mathematical function Sexiness_20934 makes no mention of Fred or the BEM, only women, so Fred does not instinctively see why the BEM would evaluate “sexiness” any differently. And indeed the BEM would not evaluate Sexiness_20934 any differently, if for some odd reason it cared about the result of that particular function; but it is an empirical fact about the BEM that it uses a different function to decide who to kidnap.

If you’re wondering as to the point of this analysis, try putting the above distinctions to work to Taboo such confusing words as “objective,” “subjective,” and “arbitrary.”

*

270

What Would You Do Without Morality?

To those who say “Nothing is real,” I once replied, “That’s great, but how does the nothing work?”

Suppose you learned, suddenly and definitively, that nothing is moral and nothing is right; that everything is permissible and nothing is forbidden.

Devastating news, to be sure—and no, I am not telling you this in real life. But suppose I did tell it to you. Suppose that, whatever you think is the basis of your moral philosophy, I convincingly tore it apart, and moreover showed you that nothing could fill its place. Suppose I proved that all utilities equaled zero.

I know that Your-Moral-Philosophy is as true and undisprovable as 2 + 2 = 4. But still, I ask that you do your best to perform the thought experiment, and concretely envision the possibilities even if they seem painful, or pointless, or logically incapable of any good reply.

Would you still tip cabdrivers? Would you cheat on your Significant Other? If a child lay fainted on the train tracks, would you still drag them off?

Would you still eat the same kinds of foods—or would you only eat the cheapest food, since there’s no reason you should have fun—or would you eat very expensive food, since there’s no reason you should save money for tomorrow?

Would you wear black and write gloomy poetry and denounce all altruists as fools? But there’s no reason you should do that—it’s just a cached thought.

Would you stay in bed because there was no reason to get up? What about when you finally got hungry and stumbled into the kitchen—what would you do after you were done eating?

Would you go on reading Overcoming Bias, and if not, what would you read instead? Would you still try to be rational, and if not, what would you think instead?

Close your eyes, take as long as necessary to answer:

What would you do, if nothing were right?

*

271

Changing Your Metaethics

If you say, “Killing people is wrong,” that’s morality. If you say, “You shouldn’t kill people because God prohibited it,” or “You shouldn’t kill people because it goes against the trend of the universe,” that’s metaethics.

Just as there’s far more agreement on Special Relativity than there is on the question “What is science?,” people find it much easier to agree “Murder is bad” than to agree what makes it bad, or what it means for something to be bad.

People do get attached to their metaethics. Indeed they frequently insist that if their metaethic is wrong, all morality necessarily falls apart. It might be interesting to set up a panel of metaethicists—theists, Objectivists, Platonists, etc.—all of whom agree that killing is wrong; all of whom disagree on what it means for a thing to be “wrong”; and all of whom insist that if their metaethic is untrue, then morality falls apart.

Clearly a good number of people, if they are to make philosophical progress, will need to shift metathics at some point in their lives. You may have to do it.

At that point, it might be useful to have an open line of retreat—not a retreat from morality, but a retreat from Your-Current-Metaethic. (You know, the one that, if it is not true, leaves no possible basis for not killing people.)

And so I summarized below some possible lines of retreat. For I have learned that to change metaethical beliefs is nigh-impossible in the presence of an unanswered attachment.

If, for example, someone believes the authority of “Thou Shalt Not Kill” derives from God, then there are several and well-known things to say that can help set up a line of retreat—as opposed to immediately attacking the plausibility of God. You can say, “Take personal responsibility! Even if you got orders from God, it would be your own decision to obey those orders. Even if God didn’t order you to be moral, you could just be moral anyway.”

The above argument actually generalizes to quite a number of metaethics—you just substitute Their-Favorite-Source-Of-Morality, or even the word “morality,” for “God.” Even if your particular source of moral authority failed, couldn’t you just drag the child off the train tracks anyway? And indeed, who is it but you that ever decided to follow this source of moral authority in the first place? What responsibility are you really passing on?

So the most important line of retreat is: If your metaethic stops telling you to save lives, you can just drag the kid off the train tracks anyway. To paraphrase Piers Anthony, only those who have moralities worry over whether or not they have them. If your metaethic tells you to kill people, why should you even listen? Maybe that which you would do even if there were no morality, is your morality.

The point being, of course, not that no morality exists; but that you can hold your will in place, and not fear losing sight of what’s important to you, while your notions of the nature of morality change.

I’ve written some essays to set up lines of retreat specifically for more naturalistic metaethics. Joy in the Merely Real and Explaining vs. Explaining Away argue that you shouldn’t be disappointed in any facet of life, just because it turns out to be explicable instead of inherently mysterious: for if we cannot take joy in the merely real, our lives shall be empty indeed.

No Universally Compelling Arguments sets up a line of retreat from the d
esire to have everyone agree with our moral arguments. There’s a strong moral intuition which says that if our moral arguments are right, by golly, we ought to be able to explain them to people. This may be valid among humans, but you can’t explain moral arguments to a rock. There is no ideal philosophy student of perfect emptiness who can be persuaded to implement modus ponens, starting without modus ponens. If a mind doesn’t contain that which is moved by your moral arguments, it won’t respond to them.

But then isn’t all morality circular logic, in which case it falls apart? Where Recursive Justification Hits Bottom and My Kind of Reflection explain the difference between a self-consistent loop through the meta-level, and actual circular logic. You shouldn’t find yourself saying “The universe is simple because it is simple,” or “Murder is wrong because it is wrong”; but neither should you try to abandon Occam’s Razor while evaluating the probability that Occam’s Razor works, nor should you try to evaluate “Is murder wrong?” from somewhere outside your brain. There is no ideal philosophy student of perfect emptiness to which you can unwind yourself—try to find the perfect rock to stand upon, and you’ll end up as a rock. So instead use the full force of your intelligence, your full rationality and your full morality, when you investigate the foundations of yourself.

We can also set up a line of retreat for those afraid to allow a causal role for evolution, in their account of how morality came to be. (Note that this is extremely distinct from granting evolution a justificational status in moral theories.) Love has to come into existence somehow—for if we cannot take joy in things that can come into existence, our lives will be empty indeed. Evolution may not be a particularly pleasant way for love to evolve, but judge the end product—not the source. Otherwise you would be committing what is known (appropriately) as The Genetic Fallacy: causation is not the same concept as justification. It’s not like you can step outside the brain evolution gave you; rebelling against nature is only possible from within nature.

The earlier series on Evolutionary Psychology should dispense with the metaethical confusion of believing that any normal human being thinks about their reproductive fitness, even unconsciously, in the course of making decisions. Only evolutionary biologists even know how to define genetic fitness, and they know better than to think it defines morality.

Alarming indeed is the thought that morality might be computed inside our own minds—doesn’t this imply that morality is a mere thought? Doesn’t it imply that whatever you think is right, must be right?

No. Just because a quantity is computed inside your head doesn’t mean that the quantity computed is about your thoughts. There’s a difference between a calculator that calculates “What is 2 + 3?” and one that outputs “What do I output when someone presses ‘2,’ ‘+,’ and ‘3’?”

Finally, if life seems painful, reductionism may not be the real source of your problem—if living in a world of mere particles seems too unbearable, maybe your life isn’t exciting enough right now?

And if you’re wondering why I deem this business of metaethics important, when it is all going to end up adding up to moral normality . . . telling you to pull the child off the train tracks, rather than the converse . . .

Well, there is opposition to rationality from people who think it drains meaning from the universe.

And this is a special case of a general phenomenon, in which many many people get messed up by misunderstanding where their morality comes from. Poor metaethics forms part of the teachings of many a cult, including the big ones. My target audience is not just people who are afraid that life is meaningless, but also those who’ve concluded that love is a delusion because real morality has to involve maximizing your inclusive fitness, or those who’ve concluded that unreturned kindness is evil because real morality arises only from selfishness, etc.

*

272

Could Anything Be Right?

Years ago, Eliezer1999 was convinced that he knew nothing about morality.

For all he knew, morality could require the extermination of the human species; and if so he saw no virtue in taking a stand against morality, because he thought that, by definition, if he postulated that moral fact, that meant human extinction was what “should” be done.

I thought I could figure out what was right, perhaps, given enough reasoning time and enough facts, but that I currently had no information about it. I could not trust evolution which had built me. What foundation did that leave on which to stand?

Well, indeed Eliezer1999 was massively mistaken about the nature of morality, so far as his explicitly represented philosophy went.

But as Davidson once observed, if you believe that “beavers” live in deserts, are pure white in color, and weigh 300 pounds when adult, then you do not have any beliefs about beavers, true or false. You must get at least some of your beliefs right, before the remaining ones can be wrong about anything.1

My belief that I had no information about morality was not internally consistent.

Saying that I knew nothing felt virtuous, for I had once been taught that it was virtuous to confess my ignorance. “The only thing I know is that I know nothing,” and all that. But in this case I would have been better off considering the admittedly exaggerated saying, “The greatest fool is the one who is not aware they are wise.” (This is nowhere near the greatest kind of foolishness, but it is a kind of foolishness.)

Was it wrong to kill people? Well, I thought so, but I wasn’t sure; maybe it was right to kill people, though that seemed less likely.

What kind of procedure would answer whether it was right to kill people? I didn’t know that either, but I thought that if you built a generic superintelligence (what I would later label a “ghost of perfect emptiness”) then it could, you know, reason about what was likely to be right and wrong; and since it was superintelligent, it was bound to come up with the right answer.

The problem that I somehow managed not to think too hard about was where the superintelligence would get the procedure that discovered the procedure that discovered the procedure that discovered morality—if I couldn’t write it into the start state that wrote the successor AI that wrote the successor AI.

As Marcello Herreshoff later put it, “We never bother running a computer program unless we don’t know the output and we know an important fact about the output.” If I knew nothing about morality, and did not even claim to know the nature of morality, then how could I construct any computer program whatsoever—even a “superintelligent” one or a “self-improving” one—and claim that it would output something called “morality”?

There are no-free-lunch theorems in computer science—in a maxentropy universe, no plan is better on average than any other. If you have no knowledge at all about “morality,” there’s also no computational procedure that will seem more likely than others to compute “morality,” and no meta-procedure that’s more likely than others to produce a procedure that computes “morality.”

I thought that surely even a ghost of perfect emptiness, finding that it knew nothing of morality, would see a moral imperative to think about morality.

But the difficulty lies in the word think. Thinking is not an activity that a ghost of perfect emptiness is automatically able to carry out. Thinking requires running some specific computation that is the thought. For a reflective AI to decide to think requires that it know some computation which it believes is more likely to tell it what it wants to know than consulting an Ouija board; the AI must also have a notion of how to interpret the output.

If one knows nothing about morality, what does the word “should” mean, at all? If you don’t know whether death is right or wrong—and don’t know how you can discover whether death is right or wrong—and don’t know whether any given procedure might output the procedure for saying whether death is right or wrong—then what do these words, “right” and “wrong,” even mean?

If the words “right” and “wrong” have nothing baked into them—no start
ing point—if everything about morality is up for grabs, not just the content but the structure and the starting point and the determination procedure—then what is their meaning? What distinguishes, “I don’t know what is right” from “I don’t know what is wakalixes”?

A scientist may say that everything is up for grabs in science, since any theory may be disproven; but then they have some idea of what would count as evidence that could disprove the theory. Could there be something that would change what a scientist regarded as evidence?

Well, yes, in fact; a scientist who read some Karl Popper and thought they knew what “evidence” meant could be presented with the coherence and uniqueness proofs underlying Bayesian probability, and that might change their definition of evidence. They might not have had any explicit notion in advance that such a proof could exist. But they would have had an implicit notion. It would have been baked into their brains, if not explicitly represented therein, that such-and-such an argument would in fact persuade them that Bayesian probability gave a better definition of “evidence” than the one they had been using.

In the same way, you could say, “I don’t know what morality is, but I’ll know it when I see it,” and make sense.

But then you are not rebelling completely against your own evolved nature. You are supposing that whatever has been baked into you to recognize “morality,” is, if not absolutely trustworthy, then at least your initial condition with which you start debating. Can you trust your moral intuitions to give you any information about morality at all, when they are the product of mere evolution?

‹ Prev Next ›