Rationality- From AI to Zombies

Home > Science > Rationality- From AI to Zombies > Page 33
Rationality- From AI to Zombies Page 33

by Eliezer Yudkowsky


  After she said “I believe people are nicer than they are,” I asked, “So, are you consistently surprised when people undershoot your expectations?” There was a long silence, and then, slowly: “Well . . . am I surprised when people . . . undershoot my expectations?”

  I didn’t understand this pause at the time. I’d intended it to suggest that if she was constantly disappointed by reality, then this was a downside of believing falsely. But she seemed, instead, to be taken aback at the implications of not being surprised.

  I now realize that the whole essence of her philosophy was her belief that she had deceived herself, and the possibility that her estimates of other people were actually accurate, threatened the Dark Side Epistemology that she had built around beliefs such as “I benefit from believing people are nicer than they actually are.”

  She has taken the old idol off its throne, and replaced it with an explicit worship of the Dark Side Epistemology that was once invented to defend the idol; she worships her own attempt at self-deception. The attempt failed, but she is honestly unaware of this.

  And so humanity’s token guardians of sanity (motto: “pooping your deranged little party since Epicurus”) must now fight the active worship of self-deception—the worship of the supposed benefits of faith, in place of God.

  This actually explains a fact about myself that I didn’t really understand earlier—the reason why I’m annoyed when people talk as if self-deception is easy, and why I write entire essays arguing that making a deliberate choice to believe the sky is green is harder to get away with than people seem to think.

  It’s because—while you can’t just choose to believe the sky is green—if you don’t realize this fact, then you actually can fool yourself into believing that you’ve successfully deceived yourself.

  And since you then sincerely expect to receive the benefits that you think come from self-deception, you get the same sort of placebo benefit that would actually come from a successful self-deception.

  So by going around explaining how hard self-deception is, I’m actually taking direct aim at the placebo benefits that people get from believing that they’ve deceived themselves, and targeting the new sort of religion that worships only the worship of God.

  Will this battle, I wonder, generate a new list of reasons why, not belief, but belief in belief, is itself a good thing? Why people derive great benefits from worshipping their worship? Will we have to do this over again with belief in belief in belief and worship of worship of worship? Or will intelligent theists finally just give up on that line of argument?

  I wish I could believe that no one could possibly believe in belief in belief in belief, but the Zombie World argument in philosophy has gotten even more tangled than this and its proponents still haven’t abandoned it.

  *

  1. Ludwig Wittgenstein, Philosophical Investigations, trans. Gertrude E. M. Anscombe (Oxford: Blackwell, 1953).

  2. Terry Pratchett, Maskerade, Discworld Series (ISIS, 1997).

  85

  Moore’s Paradox

  Moore’s Paradox is the standard term for saying “It’s raining outside but I don’t believe that it is.” Hat tip to painquale on MetaFilter.

  I think I understand Moore’s Paradox a bit better now, after reading some of the comments on Less Wrong. Jimrandomh suggests:

  Many people cannot distinguish between levels of indirection. To them, “I believe X” and “X” are the same thing, and therefore, reasons why it is beneficial to believe X are also reasons why X is true.

  I don’t think this is correct—relatively young children can understand the concept of having a false belief, which requires separate mental buckets for the map and the territory. But it points in the direction of a similar idea:

  Many people may not consciously distinguish between believing something and endorsing it.

  After all—“I believe in democracy” means, colloquially, that you endorse the concept of democracy, not that you believe democracy exists. The word “belief,” then, has more than one meaning. We could be looking at a confused word that causes confused thinking (or maybe it just reflects pre-existing confusion).

  So: in the original example, “I believe people are nicer than they are,” she came up with some reasons why it would be good to believe people are nice—health benefits and such—and since she now had some warm affect on “believing people are nice,” she introspected on this warm affect and concluded, “I believe people are nice.” That is, she mistook the positive affect attached to the quoted belief, as signaling her belief in the proposition. At the same time, the world itself seemed like people weren’t so nice. So she said, “I believe people are nicer than they are.”

  And that verges on being an honest mistake—sort of—since people are not taught explicitly how to know when they believe something. As in the parable of the dragon in the garage; the one who says “There is a dragon in my garage—but it’s invisible,” does not recognize their anticipation of seeing no dragon, as indicating that they possess an (accurate) model with no dragon in it.

  It’s not as if people are trained to recognize when they believe something. It’s not like they’re ever taught in high school: “What it feels like to actually believe something—to have that statement in your belief pool—is that it just seems like the way the world is. You should recognize this feeling, which is actual (unquoted) belief, and distinguish it from having good feelings about a belief that you recognize as a belief (which means that it’s in quote marks).”

  This goes a long way toward making this real-life case of Moore’s Paradox seem less alien, and providing another mechanism whereby people can be simultaneously right and wrong.

  Likewise Kurige, who wrote:

  I believe that there is a God—and that He has instilled a sense of right and wrong in us by which we are able to evaluate the world around us. I also believe a sense of morality has been evolutionarily programmed into us—a sense of morality that is most likely a result of the formation of meta-political coalitions in Bonobo communities a very, very long time ago. These two beliefs are not contradictory, but the complexity lies in reconciling the two.

  I suspect, Kurige, you have decided that you have reasons to endorse the quoted belief that God has instilled a sense of right and wrong in us. And also that you have reasons to endorse the verdict of science. They both seem like good communities to join, right? There are benefits to both sets of beliefs? You introspect and find that you feel good about both beliefs?

  But you did not say:

  “God instilled a sense of right and wrong in us, and also a sense of morality has been evolutionarily programmed into us. The two states of reality are not inconsistent, but the complexity lies in reconciling the two.”

  If you’re reading this, Kurige, you should very quickly say the above out loud, so you can notice that it seems at least slightly harder to swallow—notice the subjective difference—before you go to the trouble of rerationalizing.

  This is the subjective difference between having reasons to endorse two different beliefs, and your mental model of a single world, a single way-things-are.

  *

  86

  Don’t Believe You’ll Self-Deceive

  I don’t mean to seem like I’m picking on Kurige, but I think you have to expect a certain amount of questioning if you show up on Less Wrong and say:

  One thing I’ve come to realize that helps to explain the disparity I feel when I talk with most other Christians is the fact that somewhere along the way my world-view took a major shift away from blind faith and landed somewhere in the vicinity of Orwellian double-think.

  “If you know it’s double-think . . .

  . . . how can you still believe it?” I helplessly want to say.

  Or:

  I chose to believe in the existence of God—deliberately and consciously. This decision, however, has absolutely zero effect on the actual existence of God.

  If you know your belief isn’t correl
ated to reality, how can you still believe it?

  Shouldn’t the gut-level realization, “Oh, wait, the sky really isn’t green” follow from the realization “My map that says ‘the sky is green’ has no reason to be correlated with the territory”?

  Well . . . apparently not.

  One part of this puzzle may be my explanation of Moore’s Paradox (“It’s raining, but I don’t believe it is”)—that people introspectively mistake positive affect attached to a quoted belief, for actual credulity.

  But another part of it may just be that—contrary to the indignation I initially wanted to put forward—it’s actually quite easy not to make the jump from “The map that reflects the territory would say ‘X’” to actually believing “X.” It takes some work to explain the ideas of minds as map-territory correspondence builders, and even then, it may take more work to get the implications on a gut level.

  I realize now that when I wrote “You cannot make yourself believe the sky is green by an act of will,” I wasn’t just a dispassionate reporter of the existing facts. I was also trying to instill a self-fulfilling prophecy.

  It may be wise to go around deliberately repeating “I can’t get away with double-thinking! Deep down, I’ll know it’s not true! If I know my map has no reason to be correlated with the territory, that means I don’t believe it!”

  Because that way—if you’re ever tempted to try—the thoughts “But I know this isn’t really true!” and “I can’t fool myself!” will always rise readily to mind; and that way, you will indeed be less likely to fool yourself successfully. You’re more likely to get, on a gut level, that telling yourself X doesn’t make X true: and therefore, really truly not-X.

  If you keep telling yourself that you can’t just deliberately choose to believe the sky is green—then you’re less likely to succeed in fooling yourself on one level or another; either in the sense of really believing it, or of falling into Moore’s Paradox, belief in belief, or belief in self-deception.

  If you keep telling yourself that deep down you’ll know—

  If you keep telling yourself that you’d just look at your elaborately constructed false map, and just know that it was a false map without any expected correlation to the territory, and therefore, despite all its elaborate construction, you wouldn’t be able to invest any credulity in it—

  If you keep telling yourself that reflective consistency will take over and make you stop believing on the object level, once you come to the meta-level realization that the map is not reflecting—

  Then when push comes to shove—you may, indeed, fail.

  When it comes to deliberate self-deception, you must believe in your own inability!

  Tell yourself the effort is doomed—and it will be!

  Is that the power of positive thinking, or the power of negative thinking? Either way, it seems like a wise precaution.

  *

  Part I

  Seeing with Fresh Eyes

  87

  Anchoring and Adjustment

  Suppose I spin a Wheel of Fortune device as you watch, and it comes up pointing to 65. Then I ask: Do you think the percentage of African countries in the UN is above or below this number? What do you think is the percentage of African countries in the UN? Take a moment to consider these two questions yourself, if you like, and please don’t Google.

  Also, try to guess, within five seconds, the value of the following arithmetical expression. Five seconds. Ready? Set . . . Go!

  1 × 2 × 3 × 4 × 5 × 6 × 7 × 8

  Tversky and Kahneman recorded the estimates of subjects who saw the Wheel of Fortune showing various numbers.1 The median estimate of subjects who saw the wheel show 65 was 45%; the median estimate of subjects who saw 10 was 25%.

  The current theory for this and similar experiments is that subjects take the initial, uninformative number as their starting point or anchor; and then they adjust upward or downward from their starting estimate until they reached an answer that “sounded plausible”; and then they stopped adjusting. This typically results in under-adjustment from the anchor—more distant numbers could also be “plausible,” but one stops at the first satisfying-sounding answer.

  Similarly, students shown “1 × 2 × 3 × 4 × 5 × 6 × 7 × 8” made a median estimate of 512, while students shown “8 × 7 × 6 × 5 × 4 × 3 × 2 × 1” made a median estimate of 2,250. The motivating hypothesis was that students would try to multiply (or guess-combine) the first few factors of the product, then adjust upward. In both cases the adjustments were insufficient, relative to the true value of 40,320; but the first set of guesses were much more insufficient because they started from a lower anchor.

  Tversky and Kahneman report that offering payoffs for accuracy did not reduce the anchoring effect.

  Strack and Mussweiler asked for the year Einstein first visited the United States.2 Completely implausible anchors, such as 1215 or 1992, produced anchoring effects just as large as more plausible anchors such as 1905 or 1939.

  There are obvious applications in, say, salary negotiations, or buying a car. I won’t suggest that you exploit it, but watch out for exploiters.

  And watch yourself thinking, and try to notice when you are adjusting a figure in search of an estimate.

  Debiasing manipulations for anchoring have generally proved not very effective. I would suggest these two: First, if the initial guess sounds implausible, try to throw it away entirely and come up with a new estimate, rather than sliding from the anchor. But this in itself may not be sufficient—subjects instructed to avoid anchoring still seem to do so.3 So, second, even if you are trying the first method, try also to think of an anchor in the opposite direction—an anchor that is clearly too small or too large, instead of too large or too small—and dwell on it briefly.

  *

  1. Amos Tversky and Daniel Kahneman, “Judgment Under Uncertainty: Heuristics and Biases,” Science 185, no. 4157 (1974): 1124–1131, doi:10.1126/science.185.4157.1124.

  2. Fritz Strack and Thomas Mussweiler, “Explaining the Enigmatic Anchoring Effect: Mechanisms of Selective Accessibility,” Journal of Personality and Social Psychology 73, no. 3 (1997): 437–446.

  3. George A. Quattrone et al., “Explorations in Anchoring: The Effects of Prior Range, Anchor Extremity, and Suggestive Hints” (Unpublished manuscript, Stanford University, 1981).

  88

  Priming and Contamination

  Suppose you ask subjects to press one button if a string of letters forms a word, and another button if the string does not form a word (e.g., “banack” vs. “banner”). Then you show them the string “water.” Later, they will more quickly identify the string “drink” as a word. This is known as “cognitive priming”; this particular form would be “semantic priming” or “conceptual priming.”

  The fascinating thing about priming is that it occurs at such a low level—priming speeds up identifying letters as forming a word, which one would expect to take place before you deliberate on the word’s meaning.

  Priming also reveals the massive parallelism of spreading activation: if seeing “water” activates the word “drink,” it probably also activates “river,” or “cup,” or “splash” . . . and this activation spreads, from the semantic linkage of concepts, all the way back to recognizing strings of letters.

  Priming is subconscious and unstoppable, an artifact of the human neural architecture. Trying to stop yourself from priming is like trying to stop the spreading activation of your own neural circuits. Try to say aloud the color—not the meaning, but the color—of the following letter-string:

  GREEN

  In Mussweiler and Strack’s experiment, subjects were asked an anchoring question: “Is the annual mean temperature in Germany higher or lower than 5 C / 20 C?”1 Afterward, on a word-identification task, subjects presented with the 5 C anchor were faster on identifying words like “cold” and “snow,” while subjects with the high anchor were faster to identify “hot” and “sun.” This shows a non-adjustment
mechanism for anchoring: priming compatible thoughts and memories.

  The more general result is that completely uninformative, known false, or totally irrelevant “information” can influence estimates and decisions. In the field of heuristics and biases, this more general phenomenon is known as contamination.2

  Early research in heuristics and biases discovered anchoring effects, such as subjects giving lower (higher) estimates of the percentage of UN countries found within Africa, depending on whether they were first asked if the percentage was more or less than 10 (65). This effect was originally attributed to subjects adjusting from the anchor as a starting point, stopping as soon as they reached a plausible value, and under-adjusting because they were stopping at one end of a confidence interval.3

  Tversky and Kahneman’s early hypothesis still appears to be the correct explanation in some circumstances, notably when subjects generate the initial estimate themselves.4 But modern research seems to show that most anchoring is actually due to contamination, not sliding adjustment. (Hat tip to Unnamed for reminding me of this—I’d read the Epley and Gilovich paper years ago, as a chapter in Heuristics and Biases, but forgotten it.)

  Your grocery store probably has annoying signs saying “Limit 12 per customer” or “5 for $10.” Are these signs effective at getting customers to buy in larger quantities? You probably think you’re not influenced. But someone must be, because these signs have been shown to work, which is why stores keep putting them up.5

  Yet the most fearsome aspect of contamination is that it serves as yet another of the thousand faces of confirmation bias. Once an idea gets into your head, it primes information compatible with it—and thereby ensures its continued existence. Never mind the selection pressures for winning political arguments; confirmation bias is built directly into our hardware, associational networks priming compatible thoughts and memories. An unfortunate side effect of our existence as neural creatures.

 

‹ Prev