Rationality- From AI to Zombies

Page 119

by Eliezer Yudkowsky

If you start from well-calibrated priors, and you apply Bayesian reasoning, you’ll end up with well-calibrated conclusions. Imagine that two million entities, scattered across different planets in the universe, have the opportunity to encounter something so strange as waking up with a tentacle (or—gasp!—ten fingers). One million of these entities say “one in a thousand” for the prior probability of some hypothesis X, and each hypothesis X says “one in a hundred” for the likelihood of waking up with a tentacle. And one million of these entities say “one in a hundred” for the prior probability of some hypothesis Y, and each hypothesis Y says “one in ten” for the likelihood of waking up with a tentacle. If we suppose that all entities are well-calibrated, then we shall look across the universe and find ten entities who wound up with a tentacle because of hypotheses of plausibility class X, and a thousand entities who wound up with tentacles because of hypotheses of plausibility class Y. So if you find yourself with a tentacle, and if your probabilities are well-calibrated, then the tentacle is more likely to stem from a hypothesis you would class as probable than a hypothesis you would class as improbable. (What if your probabilities are poorly calibrated, so that when you say “million-to-one” it happens one time out of twenty? Then you’re grossly overconfident, and we adjust your probabilities in the direction of less discrimination and greater entropy.)

The hypothesis of being transported into a webcomic, even if it “explains” the scenario of waking up with a blue tentacle, is a poor explanation because of its low prior probability. The webcomic hypothesis doesn’t contribute to explaining the tentacle, because it doesn’t make you anticipate waking up with a tentacle.

If we start with a quadrillion sentient minds scattered across the universe, quite a lot of entities will encounter events that are very likely, only about a mere million entities will experience events with lifetime likelihoods of a billion-to-one (as we would anticipate, surveying with infinite eyes and perfect calibration), and not a single entity will experience the impossible.

If, somehow, you really did wake up with a tentacle, it would likely be because of something much more probable than “being transported into a webcomic,” some perfectly normal reason to wake up with a tentacle which you just didn’t see coming. A reason like what? I don’t know. Nothing. I don’t anticipate waking up with a tentacle, so I can’t give any good explanation for it. Why should I bother crafting excuses that I don’t expect to use? If I was worried I might someday need a clever excuse for waking up with a tentacle, the reason I was nervous about the possibility would be my explanation.

Reality dishes out experiences using probability, not plausibility. If you find out that your laptop doesn’t obey Conservation of Momentum, then reality must think that a perfectly normal thing to do to you. How could violating Conservation of Momentum possibly be perfectly normal? I anticipate that question has no answer and will never need answering. Similarly, people do not wake up with tentacles, so apparently it is not perfectly normal.

* * *

There is a shattering truth, so surprising and terrifying that people resist the implications with all their strength. Yet there are a lonely few with the courage to accept this satori. Here is wisdom, if you would be wise:

Since the beginning

Not one unusual thing

Has ever happened.

Alas for those who turn their eyes from zebras and dream of dragons! If we cannot learn to take joy in the merely real, our lives shall be empty indeed.

*

1. Edwin T. Jaynes, Probability Theory: The Logic of Science, ed. George Larry Bretthorst (New York: Cambridge University Press, 2003), doi:10.2277/0521592712.

2. Feynman, Leighton, and Sands, The Feynman Lectures on Physics.

3. Readers with calculus may verify that in the simpler case of a light that has only two colors, with p being the bet on the first color and f the frequency of the first color, the expected payoff f × (1 - (1 - p)2) + (1 - f) × (1 - p2), with p variable and f constant, has its global maximum when we set p = f.

4. Don’t remember how to read P(A|B)? See An Intuitive Explanation of Bayesian Reasoning.

5. J. Frank Yates et al., “Probability Judgment Across Cultures,” in Gilovich, Griffin, and Kahneman, Heuristics and Biases, 271–291.

6. Karl R Popper, The Logic of Scientific Discovery (New York: Basic Books, 1959).

7. Jaynes, Probability Theory.

8. Imagination Engines, Inc., “The Imagination Engine™; or Imagitron™,” 2011, http://www.imagination-engines.com/ie.htm.

9. Friedrich Spee, Cautio Criminalis; or, A Book on Witch Trials, ed. and trans. Marcus Hellyer, Studies in Early Modern German History (1631; Charlottesville: University of Virginia Press, 2003).

10. quoted in Dave Robinson and Judy Groves, Philosophy for Beginners, 1st ed. (Cambridge: Icon Books, 1998).

11. TalkOrigins Foundation, “Frequently Asked Questions about Creationism and Evolution,” http://www.talkorigins.org/origins/faqs-qa.html.

12. Daniel C. Dennett, Darwin’s Dangerous Idea: Evolution and the Meanings of Life (Simon & Schuster, 1995).

13. quoted in Lyle Zapato, “Lord Kelvin Quotations,” 2008, http://zapatopi.net/kelvin/quotes/.

14. Charles Darwin, On the Origin of Species by Means of Natural Selection; or, The Preservation of Favoured Races in the Struggle for Life, 1st ed. (London: John Murray, 1859), http://darwin-online.org.uk/content/frameset?viewtype=text&itemID=F373&pageseq=1; Charles Darwin, The Descent of Man, and Selection in Relation to Sex, 2nd ed. (London: John Murray, 1874), http://darwin-online.org.uk/content/frameset?itemID=F944&viewtype=text&pageseq=1.

15. Williams, Adaptation and Natural Selection.

16. Carl Sagan, The Demon-Haunted World: Science as a Candle in the Dark, 1st ed. (New York: Random House, 1995).

17. Kevin Brown, Reflections On Relativity (Raleigh, NC: printed by author, 2011), 405-414, http://www.mathpages.com/rr/rrtoc.htm.

18. Ibid., 405-414.

19. Stephen Thornton, “Karl Popper,” in The Stanford Encyclopedia of Philosophy, Winter 2002, ed. Edward N. Zalta (Stanford University), http://plato.stanford.edu/archives/win2002/entries/popper/.

20. John Baez, “The Crackpot Index,” 1998, http://math.ucr.edu/home/baez/crackpot.html.

Book V

Mere Goodness

Ends: An Introduction

U. Fake Preferences

257. Not for the Sake of Happiness (Alone)

258. Fake Selfishness

259. Fake Morality

260. Fake Utility Functions

261. Detached Lever Fallacy

262. Dreams of AI Design

263. The Design Space of Minds-in-General

V. Value Theory

264. Where Recursive Justification Hits Bottom

265. My Kind of Reflection

266. No Universally Compelling Arguments

267. Created Already in Motion

268. Sorting Pebbles into Correct Heaps

269. 2-Place and 1-Place Words

270. What Would You Do Without Morality?

271. Changing Your Metaethics

272. Could Anything Be Right?

273. Morality as Fixed Computation

274. Magical Categories

275. The True Prisoner’s Dilemma

276. Sympathetic Minds

277. High Challenge

278. Serious Stories

279. Value is Fragile

280. The Gift We Give to Tomorrow

W. Quantified Humanism

281. Scope Insensitivity

282. One Life Against the World

283. The Allais Paradox

284. Zut Allais!

285. Feeling Moral

286. The “Intuitions” Behind “Utilitarianism”

287. Ends Don’t Justify Means (Among Humans)

288. Ethical Injunctions

289. Something to Protect

290. When (Not) to Use Probabilities

291. New
comb’s Problem and Regret of Rationality

Interlude: The Twelve Virtues of Rationality

Ends: An Introduction

by Rob Bensinger

Value theory is the study of what people care about. It’s the study of our goals, our tastes, our pleasures and pains, our fears and our ambitions.

That includes conventional morality. Value theory subsumes things we wish we cared about, or would care about if we were wiser and better people—not just things we already do care about.

Value theory also subsumes mundane, everyday values: art, food, sex, friendship, and everything else that gives life its affective valence. Going to the movies with your friend Sam can be something you value even if it’s not a moral value.

We find it useful to reflect upon and debate our values because how we act is not always how we wish we’d act. Our preferences can conflict with each other. We can desire to have a different set of desires. We can lack the will, the attention, or the insight needed to act the way we’d like to.

Humans do care about their actions’ consequences, but not consistently enough to formally qualify as agents with utility functions. That humans don’t act the way they wish they would is what we mean when we say “humans aren’t instrumentally rational.”

Theory and Practice

Adding to the difficulty, there exists a gulf between how we think we wish we’d act, and how we actually wish we’d act.

Philosophers disagree wildly about what we want—as do psychologists, and as do politicians—and about what we ought to want. They disagree even about what it means to “ought” to want something. The history of moral theory, and the history of human efforts at coordination, is piled high with the corpses of failed Guiding Principles to True Ultimate No-Really-This-Time-I-Mean-It Normativity.

If you’re trying to come up with a reliable and pragmatically useful specification of your goals—not just for winning philosophy debates, but (say) for designing safe autonomous adaptive AI, or for building functional institutions and organizations, or for making it easier to decide which charity to donate to, or for figuring out what virtues you should be cultivating—humanity’s track record with value theory does not bode well for you.

Mere Goodness collects three sequences of blog posts on human value: “Fake Preferences” (on failed attempts at theories of value), “Value Theory” (on obstacles to developing a new theory, and some intuitively desirable features of such a theory), and “Quantified Humanism” (on the tricky question of how we should apply such theories to our ordinary moral intuitions and decision-making).

The last of these topics is the most important. The cash value of a normative theory is how well it translates into normative practice. Acquiring a deeper and fuller understanding of your values should make you better at actually fulfilling them. At a bare minimum, your theory shouldn’t get in the way of your practice. What good would it be, then, to know what’s good?

Reconciling this art of applied ethics (and applied aesthetics, and applied economics, and applied psychology) with our best available data and theories often comes down to the question of when we should trust our snap judgments, and when we should ditch them.

In many cases, our explicit models of what we care about are so flimsy or impractical that we’re better off trusting our vague initial impulses. In many other cases, we can do better with a more informed and systematic approach. There is no catch-all answer. We will just have to scrutinize examples and try to notice the different warning signs for “sophisticated theories tend to fail here” and “naive feelings tend to fail here.”

Journey and Destination

A recurring theme in the pages to come will be the question: Where shall we go? What outcomes are actually valuable?

To address this question, Yudkowsky coined the term “fun theory.” Fun theory is the attempt to figure out what our ideal vision of the future would look like—not just the system of government or moral code we’d ideally live under, but the kinds of adventures we’d ideally go on, the kinds of music we’d ideally compose, and everything else we ultimately want out of life.

Stretched into the future, questions of fun theory intersect with questions of transhumanism, the view that we can radically improve the human condition if we make enough scientific and social progress.1 Transhumanism occasions a number of debates in moral philosophy, such as whether the best long-term outcomes for sentient life would be based on hedonism (the pursuit of pleasure) or on more complex notions of eudaimonia (general well-being). Other futurist ideas discussed at various points in Rationality: From AI to Zombies include cryonics (storing your body in a frozen state after death, in case future medical technology finds a way to revive you), mind uploading (implementing human minds in synthetic hardware), and large-scale space colonization.

Perhaps surprisingly, fun theory is one of the more neglected applications of value theory. Utopia-planning has become rather passe—partly because it smacks of naiveté, and partly because we’re empirically terrible at translating utopias into realities. Even the word utopia reflects this cynicism; it is derived from the Greek for “non-place.”

Yet if we give up on the quest for a true, feasible utopia (or eutopia, “good place”), it’s not obvious that the cumulative effect of our short-term pursuit of goals will be a future we find valuable over the long term. Value is not an inevitable feature of the world. Creating it takes work. Preserving it takes work.

This invites a second question: How shall we get there? What is the relationship between good ends and good means?

When we play a game, we want to enjoy the process. We don’t generally want to just skip ahead to being declared the winner. Sometimes, the journey matters more than the destination. Sometimes, the journey is all that matters.

Yet there are other cases where the reverse is true. Sometimes the end-state is just too important for “the journey” to factor into our decisions. If you’re trying to save a family member’s life, it’s not necessarily a bad thing to get some enjoyment out of the process; but if you can increase your odds of success in a big way by picking a less enjoyable strategy . . .

In many cases, our values are concentrated in the outcomes of our actions, and in our future. We care about the way the world will end up looking—especially those parts of the world that can love and hurt and want.

How do detached, abstract theories stack up against vivid, affect-laden feelings in those cases? More generally: What is the moral relationship between actions and consequences?

Those are hard questions, but perhaps we can at least make progress on determining what we mean by them. What are we building into our concept of what’s “valuable” at the very start of our inquiry?

*

1. One example of a transhumanist argument is: “We could feasibly abolish aging and disease within a few decades or centuries. This would effectively end death by natural causes, putting us in the same position as organisms with negligible senescence—lobsters, Aldabra giant tortoises, etc. Therefore we should invest in disease prevention and anti-aging technologies.” This idea qualifies as transhumanist because eliminating the leading causes of injury and death would drastically change human life.

Bostrom and Savulescu survey arguments for and against radical human enhancement, e.g., Sandel’s objection that tampering with our biology too much would make life feel like less of a “gift.”2,3 Bostrom’s “History of Transhumanist Thought” provides context for the debate.4

2. Nick Bostrom, “A History of Transhumanist Thought,” Journal of Evolution and Technology 14, no. 1 (2005): 1–25, http://www.nickbostrom.com/papers/history.pdf.

3. Michael Sandel, “What’s Wrong With Enhancement,” Background material for the President’s Council on Bioethics. (2002).

4. Nick Bostrom and Julian Savulescu, “Human Enhancement Ethics: The State of the Debate,” in Human Enhancement, ed. Nick Bostrom and Julian Savulescu (2009).

Part U

Fake Preferences
/>
257

Not for the Sake of Happiness (Alone)

When I met the futurist Greg Stock some years ago, he argued that the joy of scientific discovery would soon be replaced by pills that could simulate the joy of scientific discovery. I approached him after his talk and said, “I agree that such pills are probably possible, but I wouldn’t voluntarily take them.”

And Stock said, “But they’ll be so much better that the real thing won’t be able to compete. It will just be way more fun for you to take the pills than to do all the actual scientific work.”

And I said, “I agree that’s possible, so I’ll make sure never to take them.”

Stock seemed genuinely surprised by my attitude, which genuinely surprised me. One often sees ethicists arguing as if all human desires are reducible, in principle, to the desire for ourselves and others to be happy. (In particular, Sam Harris does this in The End of Faith, which I just finished perusing—though Harris’s reduction is more of a drive-by shooting than a major topic of discussion.)1

This isn’t the same as arguing whether all happinesses can be measured on a common utility scale—different happinesses might occupy different scales, or be otherwise non-convertible. And it’s not the same as arguing that it’s theoretically impossible to value anything other than your own psychological states, because it’s still permissible to care whether other people are happy.

The question, rather, is whether we should care about the things that make us happy, apart from any happiness they bring.

We can easily list many cases of moralists going astray by caring about things besides happiness. The various states and countries that still outlaw oral sex make a good example; these legislators would have been better off if they’d said, “Hey, whatever turns you on.” But this doesn’t show that all values are reducible to happiness; it just argues that in this particular case it was an ethical mistake to focus on anything else.

‹ Prev Next ›