Rationality- From AI to Zombies

Home > Science > Rationality- From AI to Zombies > Page 15
Rationality- From AI to Zombies Page 15

by Eliezer Yudkowsky


  Even better: A colony is made of ants. We can successfully predict some aspects of colony behavior using models that include only individual ants, without any global colony variables, showing that we understand how those colony behaviors arise from ant behaviors.

  Another fun exercise is to replace the word “emergent” with the old word, the explanation that people had to use before emergence was invented:

  Before: Life is an emergent phenomenon.

  After: Life is a magical phenomenon.

  Before: Human intelligence is an emergent product of neurons firing.

  After: Human intelligence is a magical product of neurons firing.

  Does not each statement convey exactly the same amount of knowledge about the phenomenon’s behavior? Does not each hypothesis fit exactly the same set of outcomes?

  “Emergence” has become very popular, just as saying “magic” used to be very popular. “Emergence” has the same deep appeal to human psychology, for the same reason. “Emergence” is such a wonderfully easy explanation, and it feels good to say it; it gives you a sacred mystery to worship. Emergence is popular because it is the junk food of curiosity. You can explain anything using emergence, and so people do just that; for it feels so wonderful to explain things. Humans are still humans, even if they’ve taken a few science classes in college. Once they find a way to escape the shackles of settled science, they get up to the same shenanigans as their ancestors—dressed up in the literary genre of “science,” but humans are still humans, and human psychology is still human psychology.

  *

  37

  Say Not “Complexity”

  Once upon a time . . .

  This is a story from when I first met Marcello, with whom I would later work for a year on AI theory; but at this point I had not yet accepted him as my apprentice. I knew that he competed at the national level in mathematical and computing olympiads, which sufficed to attract my attention for a closer look; but I didn’t know yet if he could learn to think about AI.

  I had asked Marcello to say how he thought an AI might discover how to solve a Rubik’s Cube. Not in a preprogrammed way, which is trivial, but rather how the AI itself might figure out the laws of the Rubik universe and reason out how to exploit them. How would an AI invent for itself the concept of an “operator,” or “macro,” which is the key to solving the Rubik’s Cube?

  At some point in this discussion, Marcello said: “Well, I think the AI needs complexity to do X, and complexity to do Y—”

  And I said, “Don’t say ‘complexity.’”

  Marcello said, “Why not?”

  I said, “Complexity should never be a goal in itself. You may need to use a particular algorithm that adds some amount of complexity, but complexity for the sake of complexity just makes things harder.” (I was thinking of all the people whom I had heard advocating that the Internet would “wake up” and become an AI when it became “sufficiently complex.”)

  And Marcello said, “But there’s got to be some amount of complexity that does it.”

  I closed my eyes briefly, and tried to think of how to explain it all in words. To me, saying “complexity” simply felt like the wrong move in the AI dance. No one can think fast enough to deliberate, in words, about each sentence of their stream of consciousness; for that would require an infinite recursion. We think in words, but our stream of consciousness is steered below the level of words, by the trained-in remnants of past insights and harsh experience . . .

  I said, “Did you read A Technical Explanation of Technical Explanation?”

  “Yes,” said Marcello.

  “Okay,” I said. “Saying ‘complexity’ doesn’t concentrate your probability mass.”

  “Oh,” Marcello said, “like ‘emergence.’ Huh. So . . . now I’ve got to think about how X might actually happen . . .”

  That was when I thought to myself, “Maybe this one is teachable.”

  Complexity is not a useless concept. It has mathematical definitions attached to it, such as Kolmogorov complexity and Vapnik-Chervonenkis complexity. Even on an intuitive level, complexity is often worth thinking about—you have to judge the complexity of a hypothesis and decide if it’s “too complicated” given the supporting evidence, or look at a design and try to make it simpler.

  But concepts are not useful or useless of themselves. Only usages are correct or incorrect. In the step Marcello was trying to take in the dance, he was trying to explain something for free, get something for nothing. It is an extremely common misstep, at least in my field. You can join a discussion on Artificial General Intelligence and watch people doing the same thing, left and right, over and over again—constantly skipping over things they don’t understand, without realizing that’s what they’re doing.

  In an eyeblink it happens: putting a non-controlling causal node behind something mysterious, a causal node that feels like an explanation but isn’t. The mistake takes place below the level of words. It requires no special character flaw; it is how human beings think by default, how they have thought since the ancient times.

  What you must avoid is skipping over the mysterious part; you must linger at the mystery to confront it directly. There are many words that can skip over mysteries, and some of them would be legitimate in other contexts—“complexity,” for example. But the essential mistake is that skip-over, regardless of what causal node goes behind it. The skip-over is not a thought, but a microthought. You have to pay close attention to catch yourself at it. And when you train yourself to avoid skipping, it will become a matter of instinct, not verbal reasoning. You have to feel which parts of your map are still blank, and more importantly, pay attention to that feeling.

  I suspect that in academia there is a huge pressure to sweep problems under the rug so that you can present a paper with the appearance of completeness. You’ll get more kudos for a seemingly complete model that includes some “emergent phenomena,” versus an explicitly incomplete map where the label says “I got no clue how this part works” or “then a miracle occurs.” A journal may not even accept the latter paper, since who knows but that the unknown steps are really where everything interesting happens? And yes, it sometimes happens that all the non-magical parts of your map turn out to also be non-important. That’s the price you sometimes pay, for entering into terra incognita and trying to solve problems incrementally. But that makes it even more important to know when you aren’t finished yet. Mostly, people don’t dare to enter terra incognita at all, for the deadly fear of wasting their time.

  And if you’re working on a revolutionary AI startup, there is an even huger pressure to sweep problems under the rug; or you will have to admit to yourself that you don’t know how to build an AI yet, and your current life plans will come crashing down in ruins around your ears. But perhaps I am over-explaining, since skip-over happens by default in humans; if you’re looking for examples, just watch people discussing religion or philosophy or spirituality or any science in which they were not professionally trained.

  Marcello and I developed a convention in our AI work: when we ran into something we didn’t understand, which was often, we would say “magic”—as in, “X magically does Y”—to remind ourselves that here was an unsolved problem, a gap in our understanding. It is far better to say “magic,” than “complexity” or “emergence”; the latter words create an illusion of understanding. Wiser to say “magic,” and leave yourself a placeholder, a reminder of work you will have to do later.

  *

  38

  Positive Bias: Look into the Dark

  I am teaching a class, and I write upon the blackboard three numbers: 2-4-6. “I am thinking of a rule,” I say, “which governs sequences of three numbers. The sequence 2-4-6, as it so happens, obeys this rule. Each of you will find, on your desk, a pile of index cards. Write down a sequence of three numbers on a card, and I’ll mark it ‘Yes’ for fits the rule, or ‘No’ for not fitting the rule. Then you can write down another set of three
numbers and ask whether it fits again, and so on. When you’re confident that you know the rule, write down the rule on a card. You can test as many triplets as you like.”

  Here’s the record of one student’s guesses:

  4-6-2 No

  4-6-8 Yes

  10-12-14 Yes

  At this point the student wrote down their guess at the rule. What do you think the rule is? Would you have wanted to test another triplet, and if so, what would it be? Take a moment to think before continuing.

  The challenge above is based on a classic experiment due to Peter Wason, the 2-4-6 task. Although subjects given this task typically expressed high confidence in their guesses, only 21% of the subjects successfully guessed the experimenter’s real rule, and replications since then have continued to show success rates of around 20%.1

  The study was called “On the failure to eliminate hypotheses in a conceptual task.” Subjects who attempt the 2-4-6 task usually try to generate positive examples, rather than negative examples—they apply the hypothetical rule to generate a representative instance, and see if it is labeled “Yes.”

  Thus, someone who forms the hypothesis “numbers increasing by two” will test the triplet 8-10-12, hear that it fits, and confidently announce the rule. Someone who forms the hypothesis X-2X-3X will test the triplet 3-6-9, discover that it fits, and then announce that rule.

  In every case the actual rule is the same: the three numbers must be in ascending order.

  But to discover this, you would have to generate triplets that shouldn’t fit, such as 20-23-26, and see if they are labeled “No.” Which people tend not to do, in this experiment. In some cases, subjects devise, “test,” and announce rules far more complicated than the actual answer.

  This cognitive phenomenon is usually lumped in with “confirmation bias.” However, it seems to me that the phenomenon of trying to test positive rather than negative examples, ought to be distinguished from the phenomenon of trying to preserve the belief you started with. “Positive bias” is sometimes used as a synonym for “confirmation bias,” and fits this particular flaw much better.

  It once seemed that phlogiston theory could explain a flame going out in an enclosed box (the air became saturated with phlogiston and no more could be released), but phlogiston theory could just as well have explained the flame not going out. To notice this, you have to search for negative examples instead of positive examples, look into zero instead of one; which goes against the grain of what experiment has shown to be human instinct.

  For by instinct, we human beings only live in half the world.

  One may be lectured on positive bias for days, and yet overlook it in-the-moment. Positive bias is not something we do as a matter of logic, or even as a matter of emotional attachment. The 2-4-6 task is “cold,” logical, not affectively “hot.” And yet the mistake is sub-verbal, on the level of imagery, of instinctive reactions. Because the problem doesn’t arise from following a deliberate rule that says “Only think about positive examples,” it can’t be solved just by knowing verbally that “We ought to think about both positive and negative examples.” Which example automatically pops into your head? You have to learn, wordlessly, to zag instead of zig. You have to learn to flinch toward the zero, instead of away from it.

  I have been writing for quite some time now on the notion that the strength of a hypothesis is what it can’t explain, not what it can—if you are equally good at explaining any outcome, you have zero knowledge. So to spot an explanation that isn’t helpful, it’s not enough to think of what it does explain very well—you also have to search for results it couldn’t explain, and this is the true strength of the theory.

  So I said all this, and then I challenged the usefulness of “emergence” as a concept. One commenter cited superconductivity and ferromagnetism as examples of emergence. I replied that non-superconductivity and non-ferromagnetism were also examples of emergence, which was the problem. But be it far from me to criticize the commenter! Despite having read extensively on “confirmation bias,” I didn’t spot the “gotcha” in the 2-4-6 task the first time I read about it. It’s a subverbal blink-reaction that has to be retrained. I’m still working on it myself.

  So much of a rationalist’s skill is below the level of words. It makes for challenging work in trying to convey the Art through words. People will agree with you, but then, in the next sentence, do something subdeliberative that goes in the opposite direction. Not that I’m complaining! A major reason I’m writing this is to observe what my words haven’t conveyed.

  Are you searching for positive examples of positive bias right now, or sparing a fraction of your search on what positive bias should lead you to not see? Did you look toward light or darkness?

  *

  1. Peter Cathcart Wason, “On the Failure to Eliminate Hypotheses in a Conceptual Task,” Quarterly Journal of Experimental Psychology 12, no. 3 (1960): 129–140, doi:10.1080/17470216008416717.

  39

  Lawful Uncertainty

  In Rational Choice in an Uncertain World, Robyn Dawes describes an experiment by Tversky:1,2

  Many psychological experiments were conducted in the late 1950s and early 1960s in which subjects were asked to predict the outcome of an event that had a random component but yet had base-rate predictability—for example, subjects were asked to predict whether the next card the experimenter turned over would be red or blue in a context in which 70% of the cards were blue, but in which the sequence of red and blue cards was totally random.

  In such a situation, the strategy that will yield the highest proportion of success is to predict the more common event. For example, if 70% of the cards are blue, then predicting blue on every trial yields a 70% success rate.

  What subjects tended to do instead, however, was match probabilities—that is, predict the more probable event with the relative frequency with which it occurred. For example, subjects tended to predict 70% of the time that the blue card would occur and 30% of the time that the red card would occur. Such a strategy yields a 58% success rate, because the subjects are correct 70% of the time when the blue card occurs (which happens with probability .70) and 30% of the time when the red card occurs (which happens with probability .30); (.70 × .70) + (.30 × .30) = .58.

  In fact, subjects predict the more frequent event with a slightly higher probability than that with which it occurs, but do not come close to predicting its occurrence 100% of the time, even when they are paid for the accuracy of their predictions . . . For example, subjects who were paid a nickel for each correct prediction over a thousand trials . . . predicted [the more common event] 76% of the time.

  Do not think that this experiment is about a minor flaw in gambling strategies. It compactly illustrates the most important idea in all of rationality.

  Subjects just keep guessing red, as if they think they have some way of predicting the random sequence. Of this experiment Dawes goes on to say, “Despite feedback through a thousand trials, subjects cannot bring themselves to believe that the situation is one in which they cannot predict.”

  But the error must go deeper than that. Even if subjects think they’ve come up with a hypothesis, they don’t have to actually bet on that prediction in order to test their hypothesis. They can say, “Now if this hypothesis is correct, the next card will be red”—and then just bet on blue. They can pick blue each time, accumulating as many nickels as they can, while mentally noting their private guesses for any patterns they thought they spotted. If their predictions come out right, then they can switch to the newly discovered sequence.

  I wouldn’t fault a subject for continuing to invent hypotheses—how could they know the sequence is truly beyond their ability to predict? But I would fault a subject for betting on the guesses, when this wasn’t necessary to gather information, and literally hundreds of earlier guesses had been disconfirmed.

  Can even a human be that overconfident?

  I would suspect that something simpler is going on—that t
he all-blue strategy just didn’t occur to the subjects.

  People see a mix of mostly blue cards with some red, and suppose that the optimal betting strategy must be a mix of mostly blue cards with some red.

  It is a counterintuitive idea that, given incomplete information, the optimal betting strategy does not resemble a typical sequence of cards.

  It is a counterintuitive idea that the optimal strategy is to behave lawfully, even in an environment that has random elements.

  It seems like your behavior ought to be unpredictable, just like the environment—but no! A random key does not open a random lock just because they are “both random.”

  You don’t fight fire with fire; you fight fire with water. But this thought involves an extra step, a new concept not directly activated by the problem statement, and so it’s not the first idea that comes to mind.

  In the dilemma of the blue and red cards, our partial knowledge tells us—on each and every round—that the best bet is blue. This advice of our partial knowledge is the same on each and every round. If 30% of the time we go against our partial knowledge and bet on red instead, then we will do worse thereby—because now we’re being outright stupid, betting on what we know is the less probable outcome.

  If you bet on red every round, you would do as badly as you could possibly do; you would be 100% stupid. If you bet on red 30% of the time, faced with 30% red cards, then you’re making yourself 30% stupid.

  When your knowledge is incomplete—meaning that the world will seem to you to have an element of randomness—randomizing your actions doesn’t solve the problem. Randomizing your actions takes you further from the target, not closer. In a world already foggy, throwing away your intelligence just makes things worse.

 

‹ Prev