Rationality- From AI to Zombies

Page 66

by Eliezer Yudkowsky

The art is closely related to:

Pragmatism, because seeing in this way often gives you a much closer connection to anticipated experience, rather than propositional belief;

Reductionism, because seeing in this way often forces you to drop down to a lower level of organization, look at the parts instead of your eye skipping over the whole;

Hugging the query, because words often distract you from the question you really want to ask;

Avoiding cached thoughts, which will rush in using standard words, so you can block them by tabooing standard words;

The writer’s rule of “Show, don’t tell!,” which has power among rationalists;

And not losing sight of your original purpose.

How could tabooing a word help you keep your purpose?

From Lost Purposes:

As you read this, some young man or woman is sitting at a desk in a university, earnestly studying material they have no intention of ever using, and no interest in knowing for its own sake. They want a high-paying job, and the high-paying job requires a piece of paper, and the piece of paper requires a previous master’s degree, and the master’s degree requires a bachelor’s degree, and the university that grants the bachelor’s degree requires you to take a class in twelfth-century knitting patterns to graduate. So they diligently study, intending to forget it all the moment the final exam is administered, but still seriously working away, because they want that piece of paper.

Why are you going to “school”? To get an “education” ending in a “degree.” Blank out the forbidden words and all their obvious synonyms, visualize the actual details, and you’re much more likely to notice that “school” currently seems to consist of sitting next to bored teenagers listening to material you already know, that a “degree” is a piece of paper with some writing on it, and that “education” is forgetting the material as soon as you’re tested on it.

Leaky generalizations often manifest through categorizations: People who actually learn in classrooms are categorized as “getting an education,” so “getting an education” must be good; but then anyone who actually shows up at a college will also match against the concept “getting an education,” whether or not they learn.

Students who understand math will do well on tests, but if you require schools to produce good test scores, they’ll spend all their time teaching to the test. A mental category, that imperfectly matches your goal, can produce the same kind of incentive failure internally. You want to learn, so you need an “education”; and then as long as you’re getting anything that matches against the category “education,” you may not notice whether you’re learning or not. Or you’ll notice, but you won’t realize you’ve lost sight of your original purpose, because you’re “getting an education” and that’s how you mentally described your goal.

To categorize is to throw away information. If you’re told that a falling tree makes a “sound,” you don’t know what the actual sound is; you haven’t actually heard the tree falling. If a coin lands “heads,” you don’t know its radial orientation. A blue egg-shaped thing may be a “blegg,” but what if the exact egg shape varies, or the exact shade of blue? You want to use categories to throw away irrelevant information, to sift gold from dust, but often the standard categorization ends up throwing out relevant information too. And when you end up in that sort of mental trouble, the first and most obvious solution is to play Taboo.

For example: “Play Taboo” is itself a leaky generalization. Hasbro’s version is not the rationalist version; they only list five additional banned words on the card, and that’s not nearly enough coverage to exclude thinking in familiar old words. What rationalists do would count as playing Taboo—it would match against the “play Taboo” concept—but not everything that counts as playing Taboo works to force original seeing. If you just think “play Taboo to force original seeing,” you’ll start thinking that anything that counts as playing Taboo must count as original seeing.

The rationalist version isn’t a game, which means that you can’t win by trying to be clever and stretching the rules. You have to play Taboo with a voluntary handicap: Stop yourself from using synonyms that aren’t on the card. You also have to stop yourself from inventing a new simple word or phrase that functions as an equivalent mental handle to the old one. You are trying to zoom in on your map, not rename the cities; dereference the pointer, not allocate a new pointer; see the events as they happen, not rewrite the cliché in a different wording.

By visualizing the problem in more detail, you can see the lost purpose: Exactly what do you do when you “play Taboo”? What purpose does each and every part serve?

If you see your activities and situation originally, you will be able to originally see your goals as well. If you can look with fresh eyes, as though for the first time, you will see yourself doing things that you would never dream of doing if they were not habits.

Purpose is lost whenever the substance (learning, knowledge, health) is displaced by the symbol (a degree, a test score, medical care). To heal a lost purpose, or a lossy categorization, you must do the reverse:

Replace the symbol with the substance; replace the signifier with the signified; replace the property with the membership test; replace the word with the meaning; replace the label with the concept; replace the summary with the details; replace the proxy question with the real question; dereference the pointer; drop into a lower level of organization; mentally simulate the process instead of naming it; zoom in on your map.

The Simple Truth was generated by an exercise of this discipline to describe “truth” on a lower level of organization, without invoking terms like “accurate,” “correct,” “represent,” “reflect,” “semantic,” “believe,” “knowledge,” “map,” or “real.” (And remember that the goal is not really to play Taboo—the word “true” appears in the text, but not to define truth. It would get a buzzer in Hasbro’s game, but we’re not actually playing that game. Ask yourself whether the document fulfilled its purpose, not whether it followed the rules.)

Bayes’s Rule itself describes “evidence” in pure math, without using words like “implies,” “means,” “supports,” “proves,” or “justifies.” Set out to define such philosophical terms, and you’ll just go in circles.

And then there’s the most important word of all to Taboo. I’ve often warned that you should be careful not to overuse it, or even avoid the concept in certain cases. Now you know the real reason why. It’s not a bad subject to think about. But your true understanding is measured by your ability to describe what you’re doing and why, without using that word or any of its synonyms.

*

169

Fallacies of Compression

“The map is not the territory,” as the saying goes. The only life-size, atomically detailed, 100% accurate map of California is California. But California has important regularities, such as the shape of its highways, that can be described using vastly less information—not to mention vastly less physical material—than it would take to describe every atom within the state borders. Hence the other saying: “The map is not the territory, but you can’t fold up the territory and put it in your glove compartment.”

A paper map of California, at a scale of 10 kilometers to 1 centimeter (a million to one), doesn’t have room to show the distinct position of two fallen leaves lying a centimeter apart on the sidewalk. Even if the map tried to show the leaves, the leaves would appear as the same point on the map; or rather the map would need a feature size of 10 nanometers, which is a finer resolution than most book printers handle, not to mention human eyes.

Reality is very large—just the part we can see is billions of lightyears across. But your map of reality is written on a few pounds of neurons, folded up to fit inside your skull. I don’t mean to be insulting, but your skull is tiny. Comparatively speaking.

Inevitably, then, certain things that are distinct in reality, will be compressed into the same point on yo
ur map.

But what this feels like from inside is not that you say, “Oh, look, I’m compressing two things into one point on my map.” What it feels like from inside is that there is just one thing, and you are seeing it.

A sufficiently young child, or a sufficiently ancient Greek philosopher, would not know that there were such things as “acoustic vibrations” or “auditory experiences.” There would just be a single thing that happened when a tree fell; a single event called “sound.”

To realize that there are two distinct events, underlying one point on your map, is an essentially scientific challenge—a big, difficult scientific challenge.

Sometimes fallacies of compression result from confusing two known things under the same label—you know about acoustic vibrations, and you know about auditory processing in brains, but you call them both “sound” and so confuse yourself. But the more dangerous fallacy of compression arises from having no idea whatsoever that two distinct entities even exist. There is just one mental folder in the filing system, labeled “sound,” and everything thought about “sound” drops into that one folder. It’s not that there are two folders with the same label; there’s just a single folder. By default, the map is compressed; why would the brain create two mental buckets where one would serve?

Or think of a mystery novel in which the detective’s critical insight is that one of the suspects has an identical twin. In the course of the detective’s ordinary work, their job is just to observe that Carol is wearing red, that she has black hair, that her sandals are leather—but all these are facts about Carol. It’s easy enough to question an individual fact, like WearsRed(Carol) or BlackHair(Carol). Maybe BlackHair(Carol) is false. Maybe Carol dyes her hair. Maybe BrownHair(Carol). But it takes a subtler detective to wonder if the Carol in WearsRed(Carol) and BlackHair(Carol)—the Carol file into which their observations drop—should be split into two files. Maybe there are two Carols, so that the Carol who wore red is not the same woman as the Carol who had black hair.

Here it is the very act of creating two different buckets that is the stroke of genius insight. ’Tis easier to question one’s facts than one’s ontology.

The map of reality contained in a human brain, unlike a paper map of California, can expand dynamically when we write down more detailed descriptions. But what this feels like from inside is not so much zooming in on a map, as fissioning an indivisible atom—taking one thing (it felt like one thing) and splitting it into two or more things.

Often this manifests in the creation of new words, like “acoustic vibrations” and “auditory experiences” instead of just “sound.” Something about creating the new name seems to allocate the new bucket. The detective is liable to start calling one of their suspects “Carol-2” or “the Other Carol” almost as soon as they realize that there are two Carols.

But expanding the map isn’t always as simple as generating new city names. It is a stroke of scientific insight to realize that such things as acoustic vibrations, or auditory experiences, even exist.

The obvious modern-day illustration would be words like “intelligence” or “consciousness.” Every now and then one sees a press release claiming that a research study has “explained consciousness” because a team of neurologists investigated a 40Hz electrical rhythm that might have something to do with cross-modality binding of sensory information, or because they investigated the reticular activating system that keeps humans awake. That’s an extreme example, and the usual failures are more subtle, but they are of the same kind. The part of “consciousness” that people find most interesting is reflectivity, self-awareness, realizing that the person I see in the mirror is “me”; that and the hard problem of subjective experience as distinguished by David Chalmers. We also label “conscious” the state of being awake, rather than asleep, in our daily cycle. But they are all different concepts going under the same name, and the underlying phenomena are different scientific puzzles. You can explain being awake without explaining reflectivity or subjectivity.

Fallacies of compression also underlie the bait-and-switch technique in philosophy—you argue about “consciousness” under one definition (like the ability to think about thinking) and then apply the conclusions to “consciousness” under a different definition (like subjectivity). Of course it may be that the two are the same thing, but if so, genuinely understanding this fact would require first a conceptual split and then a genius stroke of reunification.

Expanding your map is (I say again) a scientific challenge: part of the art of science, the skill of inquiring into the world. (And of course you cannot solve a scientific challenge by appealing to dictionaries, nor master a complex skill of inquiry by saying “I can define a word any way I like.”) Where you see a single confusing thing, with protean and self-contradictory attributes, it is a good guess that your map is cramming too much into one point—you need to pry it apart and allocate some new buckets. This is not like defining the single thing you see, but it does often follow from figuring out how to talk about the thing without using a single mental handle.

So the skill of prying apart the map is linked to the rationalist version of Taboo, and to the wise use of words; because words often represent the points on our map, the labels under which we file our propositions and the buckets into which we drop our information. Avoiding a single word, or allocating new ones, is often part of the skill of expanding the map.

*

170

Categorizing Has Consequences

Among the many genetic variations and mutations you carry in your genome, there are a very few alleles you probably know—including those determining your blood type: the presence or absence of the A, B, and + antigens. If you receive a blood transfusion containing an antigen you don’t have, it will trigger an allergic reaction. It was Karl Landsteiner’s discovery of this fact, and how to test for compatible blood types, that made it possible to transfuse blood without killing the patient. (1930 Nobel Prize in Medicine.) Also, if a mother with blood type A (for example) bears a child with blood type A+, the mother may acquire an allergic reaction to the + antigen; if she has another child with blood type A+, the child will be in danger, unless the mother takes an allergic suppressant during pregnancy. Thus people learn their blood types before they marry.

Oh, and also: people with blood type A are earnest and creative, while people with blood type B are wild and cheerful. People with type O are agreeable and sociable, while people with type AB are cool and controlled. (You would think that O would be the absence of A and B, while AB would just be A plus B, but no . . .) All this, according to the Japanese blood type theory of personality. It would seem that blood type plays the role in Japan that astrological signs play in the West, right down to blood type horoscopes in the daily newspaper.

This fad is especially odd because blood types have never been mysterious, not in Japan and not anywhere. We only know blood types even exist thanks to Karl Landsteiner. No mystic witch doctor, no venerable sorcerer, ever said a word about blood types; there are no ancient, dusty scrolls to shroud the error in the aura of antiquity. If the medical profession claimed tomorrow that it had all been a colossal hoax, we layfolk would not have one scrap of evidence from our unaided senses to contradict them.

There’s never been a war between blood types. There’s never even been a political conflict between blood types. The stereotypes must have arisen strictly from the mere existence of the labels.

Now, someone is bound to point out that this is a story of categorizing humans. Does the same thing happen if you categorize plants, or rocks, or office furniture? I can’t recall reading about such an experiment, but of course, that doesn’t mean one hasn’t been done. (I’d expect the chief difficulty of doing such an experiment would be finding a protocol that didn’t mislead the subjects into thinking that, since the label was given you, it must be significant somehow.) So while I don’t mean to update on imaginary evidence, I would predict a positive result for the experiment:
I would expect them to find that mere labeling had power over all things, at least in the human imagination.

You can see this in terms of similarity clusters: once you draw a boundary around a group, the mind starts trying to harvest similarities from the group. And unfortunately the human pattern-detectors seem to operate in such overdrive that we see patterns whether they’re there or not; a weakly negative correlation can be mistaken for a strong positive one with a bit of selective memory.

You can see this in terms of neural algorithms: creating a name for a set of things is like allocating a subnetwork to find patterns in them.

You can see this in terms of a compression fallacy: things given the same name end up dumped into the same mental bucket, blurring them together into the same point on the map.

Or you can see this in terms of the boundless human ability to make stuff up out of thin air and believe it because no one can prove it’s wrong. As soon as you name the category, you can start making up stuff about it. The named thing doesn’t have to be perceptible; it doesn’t have to exist; it doesn’t even have to be coherent.

And no, it’s not just Japan: Here in the West, a blood-type-based diet book called Eat Right 4 Your Type was a bestseller.

Any way you look at it, drawing a boundary in thingspace is not a neutral act. Maybe a more cleanly designed, more purely Bayesian AI could ponder an arbitrary class and not be influenced by it. But you, a human, do not have that option. Categories are not static things in the context of a human brain; as soon as you actually think of them, they exert force on your mind. One more reason not to believe you can define a word any way you like.

*

171

‹ Prev Next ›