Book Read Free

Rationality- From AI to Zombies

Page 120

by Eliezer Yudkowsky


  It is an undeniable fact that we tend to do things that make us happy, but this doesn’t mean we should regard the happiness as the only reason for so acting. First, this would make it difficult to explain how we could care about anyone else’s happiness—how we could treat people as ends in themselves, rather than instrumental means of obtaining a warm glow of satisfaction.

  Second, just because something is a consequence of my action doesn’t mean it was the sole justification. If I’m writing a blog post, and I get a headache, I may take an ibuprofen. One of the consequences of my action is that I experience less pain, but this doesn’t mean it was the only consequence, or even the most important reason for my decision. I do value the state of not having a headache. But I can value something for its own sake and also value it as a means to an end.

  For all value to be reducible to happiness, it’s not enough to show that happiness is involved in most of our decisions—it’s not even enough to show that happiness is the most important consequent in all of our decisions—it must be the only consequent. That’s a tough standard to meet. (I originally found this point in a Sober and Wilson paper, not sure which one.)

  If I claim to value art for its own sake, then would I value art that no one ever saw? A screensaver running in a closed room, producing beautiful pictures that no one ever saw? I’d have to say no. I can’t think of any completely lifeless object that I would value as an end, not just a means. That would be like valuing ice cream as an end in itself, apart from anyone eating it. Everything I value, that I can think of, involves people and their experiences somewhere along the line.

  The best way I can put it is that my moral intuition appears to require both the objective and subjective component to grant full value.

  The value of scientific discovery requires both a genuine scientific discovery, and a person to take joy in that discovery. It may seem difficult to disentangle these values, but the pills make it clearer.

  I would be disturbed if people retreated into holodecks and fell in love with mindless wallpaper. I would be disturbed even if they weren’t aware it was a holodeck, which is an important ethical issue if some agents can potentially transport people into holodecks and substitute zombies for their loved ones without their awareness. Again, the pills make it clearer: I’m not just concerned with my own awareness of the uncomfortable fact. I wouldn’t put myself into a holodeck even if I could take a pill to forget the fact afterward. That’s simply not where I’m trying to steer the future.

  I value freedom: When I’m deciding where to steer the future, I take into account not only the subjective states that people end up in, but also whether they got there as a result of their own efforts. The presence or absence of an external puppet master can affect my valuation of an otherwise fixed outcome. Even if people wouldn’t know they were being manipulated, it would matter to my judgment of how well humanity had done with its future. This is an important ethical issue, if you’re dealing with agents powerful enough to helpfully tweak people’s futures without their knowledge.

  So my values are not strictly reducible to happiness: There are properties I value about the future that aren’t reducible to activation levels in anyone’s pleasure center; properties that are not strictly reducible to subjective states even in principle.

  Which means that my decision system has a lot of terminal values, none of them strictly reducible to anything else. Art, science, love, lust, freedom, friendship . . .

  And I’m okay with that. I value a life complicated enough to be challenging and aesthetic—not just the feeling that life is complicated, but the actual complications—so turning into a pleasure center in a vat doesn’t appeal to me. It would be a waste of humanity’s potential, which I value actually fulfilling, not just having the feeling that it was fulfilled.

  *

  1. Harris, The End of Faith: Religion, Terror, and the Future of Reason.

  258

  Fake Selfishness

  Once upon a time, I met someone who proclaimed himself to be purely selfish, and told me that I should be purely selfish as well. I was feeling mischievous1 that day, so I said, “I’ve observed that with most religious people, at least the ones I meet, it doesn’t matter much what their religion says, because whatever they want to do, they can find a religious reason for it. Their religion says they should stone unbelievers, but they want to be nice to people, so they find a religious justification for that instead. It looks to me like when people espouse a philosophy of selfishness, it has no effect on their behavior, because whenever they want to be nice to people, they can rationalize it in selfish terms.”

  And the one said, “I don’t think that’s true.”

  I said, “If you’re genuinely selfish, then why do you want me to be selfish too? Doesn’t that make you concerned for my welfare? Shouldn’t you be trying to persuade me to be more altruistic, so you can exploit me?” The one replied: “Well, if you become selfish, then you’ll realize that it’s in your rational self-interest to play a productive role in the economy, instead of, for example, passing laws that infringe on my private property.”

  And I said, “But I’m a small-‘l’ libertarian already, so I’m not going to support those laws. And since I conceive of myself as an altruist, I’ve taken a job that I expect to benefit a lot of people, including you, instead of a job that pays more. Would you really benefit more from me if I became selfish? Besides, is trying to persuade me to be selfish the most selfish thing you could be doing? Aren’t there other things you could do with your time that would bring much more direct benefits? But what I really want to know is this: Did you start out by thinking that you wanted to be selfish, and then decide this was the most selfish thing you could possibly do? Or did you start out by wanting to convert others to selfishness, then look for ways to rationalize that as self-benefiting?”

  And the one said, “You may be right about that last part,” so I marked him down as intelligent.

  *

  1. Other mischievous questions to ask self-proclaimed Selfishes: “Would you sacrifice your own life to save the entire human species?” (If they notice that their own life is strictly included within the human species, you can specify that they can choose between dying immediately to save the Earth, or living in comfort for one more year and then dying along with Earth.) Or, taking into account that scope insensitivity leads many people to be more concerned over one life than the Earth, “If you had to choose one event or the other, would you rather that you stubbed your toe, or that the stranger standing near the wall there gets horribly tortured for fifty years?” (If they say that they’d be emotionally disturbed by knowing, specify that they won’t know about the torture.) “Would you steal a thousand dollars from Bill Gates if you could be guaranteed that neither he nor anyone else would ever find out about it?” (Selfish libertarians only.)

  259

  Fake Morality

  God, say the religious fundamentalists, is the source of all morality; there can be no morality without a Judge who rewards and punishes. If we did not fear hell and yearn for heaven, then what would stop people from murdering each other left and right?

  Suppose Omega makes a credible threat that if you ever step inside a bathroom between 7 a.m. and 10 a.m. in the morning, Omega will kill you. Would you be panicked by the prospect of Omega withdrawing its threat? Would you cower in existential terror and cry: “If Omega withdraws its threat, then what’s to keep me from going to the bathroom?” No; you’d probably be quite relieved at your increased opportunity to, ahem, relieve yourself.

  Which is to say: The very fact that a religious person would be afraid of God withdrawing Its threat to punish them for committing murder shows that they have a revulsion of murder that is independent of whether God punishes murder or not. If they had no sense that murder was wrong independently of divine retribution, the prospect of God not punishing murder would be no more existentially horrifying than the prospect of God not punishing sneezing. If Overcoming Bias has
any religious readers left, I say to you: it may be that you will someday lose your faith; and on that day, you will not lose all sense of moral direction. For if you fear the prospect of God not punishing some deed, that is a moral compass. You can plug that compass directly into your decision system and steer by it. You can simply not do whatever you are afraid God may not punish you for doing. The fear of losing a moral compass is itself a moral compass. Indeed, I suspect you are steering by that compass, and that you always have been. As Piers Anthony once said, “Only those with souls worry over whether or not they have them.” s/soul/morality/ and the point carries.

  You don’t hear religious fundamentalists using the argument: “If we did not fear hell and yearn for heaven, then what would stop people from eating pork?” Yet by their assumptions—that we have no moral compass but divine reward and retribution—this argument should sound just as forceful as the other.

  Even the notion that God threatens you with eternal hellfire, rather than cookies, piggybacks on a pre-existing negative value for hellfire. Consider the following, and ask which of these two philosophers is really the altruist, and which is really selfish?

  “You should be selfish, because when people set out to improve society, they meddle in their neighbors’ affairs and pass laws and seize control and make everyone unhappy. Take whichever job that pays the most money: the reason the job pays more is that the efficient market thinks it produces more value than its alternatives. Take a job that pays less, and you’re second-guessing what the market thinks will benefit society most.”

  “You should be altruistic, because the world is an iterated Prisoner’s Dilemma, and the strategy that fares best is Tit for Tat with initial cooperation. People don’t like jerks. Nice guys really do finish first. Studies show that people who contribute to society and have a sense of meaning in their lives are happier than people who don’t; being selfish will only make you unhappy in the long run.”

  Blank out the recommendations of these two philosophers, and you can see that the first philosopher is using strictly prosocial criteria to justify their recommendations; to the first philosopher, what validates an argument for selfishness is showing that selfishness benefits everyone. The second philosopher appeals to strictly individual and hedonic criteria; to them, what validates an argument for altruism is showing that altruism benefits them as an individual—higher social status, or more intense feelings of pleasure.

  So which of these two is the actual altruist? Whichever one actually holds open doors for little old ladies.

  *

  260

  Fake Utility Functions

  Every now and then, you run across someone who has discovered the One Great Moral Principle, of which all other values are a mere derivative consequence.

  I run across more of these people than you do. Only in my case, it’s people who know the amazingly simple utility function that is all you need to program into an artificial superintelligence and then everything will turn out fine.

  Some people, when they encounter the how-to-program-a-superintelligence problem, try to solve the problem immediately. Norman R. F. Maier: “Do not propose solutions until the problem has been discussed as thoroughly as possible without suggesting any.” Robyn Dawes: “I have often used this edict with groups I have led—particularly when they face a very tough problem, which is when group members are most apt to propose solutions immediately.” Friendly AI is an extremely tough problem, so people solve it extremely fast.

  There’s several major classes of fast wrong solutions I’ve observed; and one of these is the Incredibly Simple Utility Function That Is All A Superintelligence Needs For Everything To Work Out Just Fine.

  I may have contributed to this problem with a really poor choice of phrasing, years ago when I first started talking about “Friendly AI.” I referred to the optimization criterion of an optimization process—the region into which an agent tries to steer the future—as the “supergoal.” I’d meant “super” in the sense of “parent,” the source of a directed link in an acyclic graph. But it seems the effect of my phrasing was to send some people into happy death spirals as they tried to imagine the Superest Goal Ever, the Goal That Overrides All Other Goals, the Single Ultimate Rule From Which All Ethics Can Be Derived.

  But a utility function doesn’t have to be simple. It can contain an arbitrary number of terms. We have every reason to believe that insofar as humans can said to be have values, there are lots of them—high Kolmogorov complexity. A human brain implements a thousand shards of desire, though this fact may not be appreciated by one who has not studied evolutionary psychology. (Try to explain this without a full, long introduction, and the one hears “humans are trying to maximize fitness,” which is exactly the opposite of what evolutionary psychology says.)

  So far as descriptive theories of morality are concerned, the complicatedness of human morality is a known fact. It is a descriptive fact about human beings that the love of a parent for a child, and the love of a child for a parent, and the love of a man for a woman, and the love of a woman for a man, have not been cognitively derived from each other or from any other value. A mother doesn’t have to do complicated moral philosophy to love her daughter, nor extrapolate the consequences to some other desideratum. There are many such shards of desire, all different values.

  Leave out just one of these values from a superintelligence, and even if you successfully include every other value, you could end up with a hyperexistential catastrophe, a fate worse than death. If there’s a superintelligence that wants everything for us that we want for ourselves, except the human values relating to controlling your own life and achieving your own goals, that’s one of the oldest dystopias in the book. (Jack Williamson’s “With Folded Hands . . . ,” in this case.)

  So how does the one constructing the Amazingly Simple Utility Function deal with this objection?

  Objection? Objection? Why would they be searching for possible objections to their lovely theory? (Note that the process of searching for real, fatal objections isn’t the same as performing a dutiful search that amazingly hits on only questions to which they have a snappy answer.) They don’t know any of this stuff. They aren’t thinking about burdens of proof. They don’t know the problem is difficult. They heard the word “supergoal” and went off in a happy death spiral around “complexity” or whatever.

  Press them on some particular point, like the love a mother has for her children, and they reply, “But if the superintelligence wants ‘complexity,’ it will see how complicated the parent-child relationship is, and therefore encourage mothers to love their children.” Goodness, where do I start?

  Begin with the motivated stopping: A superintelligence actually searching for ways to maximize complexity wouldn’t conveniently stop if it noticed that a parent-child relation was complex. It would ask if anything else was more complex. This is a fake justification; the one trying to argue the imaginary superintelligence into a policy selection didn’t really arrive at that policy proposal by carrying out a pure search for ways to maximize complexity.

  The whole argument is a fake morality. If what you really valued was complexity, then you would be justifying the parental-love drive by pointing to how it increases complexity. If you justify a complexity drive by alleging that it increases parental love, it means that what you really value is the parental love. It’s like giving a prosocial argument in favor of selfishness.

  But if you consider the affective death spiral, then it doesn’t increase the perceived niceness of “complexity” to say “A mother’s relationship to her daughter is only important because it increases complexity; consider that if the relationship became simpler, we would not value it.” What does increase the perceived niceness of “complexity” is saying, “If you set out to increase complexity, mothers will love their daughters—look at the positive consequence this has!”

  This point applies whenever you run across a moralist who tries to convince you that their One Great Id
ea is all that anyone needs for moral judgment, and proves this by saying, “Look at all these positive consequences of this Great Thingy,” rather than saying, “Look at how all these things we think of as ‘positive’ are only positive when their consequence is to increase the Great Thingy.” The latter being what you’d actually need to carry such an argument.

  But if you’re trying to persuade others (or yourself) of your theory that the One Great Idea is “bananas,” you’ll sell a lot more bananas by arguing how bananas lead to better sex, rather than claiming that you should only want sex when it leads to bananas.

  Unless you’re so far gone into the Happy Death Spiral that you really do start saying “Sex is only good when it leads to bananas.” Then you’re in trouble. But at least you won’t convince anyone else.

  In the end, the only process that reliably regenerates all the local decisions you would make given your morality is your morality. Anything else—any attempt to substitute instrumental means for terminal ends—ends up losing purpose and requiring an infinite number of patches because the system doesn’t contain the source of the instructions you’re giving it. You shouldn’t expect to be able to compress a human morality down to a simple utility function, any more than you should expect to compress a large computer file down to 10 bits.

  *

  261

  Detached Lever Fallacy

  This fallacy gets its name from an ancient sci-fi TV show, which I never saw myself, but was reported to me by a reputable source (some guy at a science fiction convention). Anyone knows the exact reference, do leave a comment.

 

‹ Prev