Rationality- From AI to Zombies

Page 87

by Eliezer Yudkowsky

All of these are cases of Mind Projection Fallacy, and what I call “naive philosophical realism”—the confusion of philosophical intuitions for direct, veridical information about reality. Your inability to imagine something is just a computational fact about what your brain can or can’t imagine. Another brain might work differently.

*

215

Angry Atoms

Fundamental physics—quarks ’n’ stuff—is far removed from the levels we can see, like hands and fingers. At best, you can know how to replicate the experiments that show that your hand (like everything else) is composed of quarks, and you may know how to derive a few equations for things like atoms and electron clouds and molecules.

At worst, the existence of quarks beneath your hand may just be something you were told. In which case it’s questionable in what sense you can be said to “know” it at all, even if you repeat back the same word “quark” that a physicist would use to convey knowledge to another physicist.

Either way, you can’t actually see the identity between levels—no one has a brain large enough to visualize avogadros of quarks and recognize a hand-pattern in them.

But we at least understand what hands do. Hands push on things, exert forces on them. When we’re told about atoms, we visualize little billiard balls bumping into each other. This makes it seem obvious that “atoms” can push on things too, by bumping into them.

Now this notion of atoms is not quite correct. But so far as human imagination goes, it’s relatively easy to imagine our hand being made up of a little galaxy of swirling billiard balls, pushing on things when our “fingers” touch them. Democritus imagined this 2,400 years ago, and there was a time, roughly 1803–1922, when Science thought he was right.

But what about, say, anger?

How could little billiard balls be angry? Tiny frowny faces on the billiard balls?

Put yourself in the shoes of, say, a hunter-gatherer—someone who may not even have a notion of writing, let alone the notion of using base matter to perform computations—someone who has no idea that such a thing as neurons exist. Then you can imagine the functional gap that your ancestors might have perceived between billiard balls and “Grrr! Aaarg!”

Forget about subjective experience for the moment, and consider the sheer behavioral gap between anger and billiard balls. The difference between what little billiard balls do, and what anger makes people do. Anger can make people raise their fists and hit someone—or say snide things behind their backs—or plant scorpions in their tents at night. Billiard balls just push on things.

Try to put yourself in the shoes of the hunter-gatherer who’s never had the “Aha!” of information-processing. Try to avoid hindsight bias about things like neurons and computers. Only then will you be able to see the uncrossable explanatory gap:

How can you explain angry behavior in terms of billiard balls?

Well, the obvious materialist conjecture is that the little billiard balls push on your arm and make you hit someone, or push on your tongue so that insults come out.

But how do the little billiard balls know how to do this—or how to guide your tongue and fingers through long-term plots—if they aren’t angry themselves?

And besides, if you’re not seduced by—gasp!—scientism, you can see from a first-person perspective that this explanation is obviously false. Atoms can push on your arm, but they can’t make you want anything.

Someone may point out that drinking wine can make you angry. But who says that wine is made exclusively of little billiard balls? Maybe wine just contains a potency of angerness.

Clearly, reductionism is just a flawed notion.

(The novice goes astray and says “The art failed me”; the master goes astray and says “I failed my art.”)

What does it take to cross this gap? It’s not just the idea of “neurons” that “process information”—if you say only this and nothing more, it just inserts a magical, unexplained level-crossing rule into your model, where you go from billiards to thoughts.

But an Artificial Intelligence programmer who knows how to create a chess-playing program out of base matter has taken a genuine step toward crossing the gap. If you understand concepts like consequentialism, backward chaining, utility functions, and search trees, you can make merely causal/mechanical systems compute plans.

The trick goes something like this: For each possible chess move, compute the moves your opponent could make, then your responses to those moves, and so on; evaluate the furthest position you can see using some local algorithm (you might simply count up the material); then trace back using minimax to find the best move on the current board; then make that move.

More generally: If you have chains of causality inside the mind that have a kind of mapping—a mirror, an echo—to what goes on in the environment, then you can run a utility function over the end products of imagination, and find an action that achieves something that the utility function rates highly, and output that action. It is not necessary for the chains of causality inside the mind, that are similar to the environment, to be made out of billiard balls that have little auras of intentionality. Deep Blue’s transistors do not need little chess pieces carved on them, in order to work. See also The Simple Truth.

All this is still tremendously oversimplified, but it should, at least, reduce the apparent length of the gap. If you can understand all that, you can see how a planner built out of base matter can be influenced by alcohol to output more angry behaviors. The billiard balls in the alcohol push on the billiard balls making up the utility function.

But even if you know how to write small AIs, you can’t visualize the level-crossing between transistors and chess. There are too many transistors, and too many moves to check.

Likewise, even if you knew all the facts of neurology, you would not be able to visualize the level-crossing between neurons and anger—let alone the level-crossing between atoms and anger. Not the way you can visualize a hand consisting of fingers, thumb, and palm.

And suppose a cognitive scientist just flatly tells you “Anger is hormones”? Even if you repeat back the words, it doesn’t mean you’ve crossed the gap. You may believe you believe it, but that’s not the same as understanding what little billiard balls have to do with wanting to hit someone.

So you come up with interpretations like, “Anger is mere hormones, it’s caused by little molecules, so it must not be justified in any moral sense—that’s why you should learn to control your anger.”

Or, “There isn’t really any such thing as anger—it’s an illusion, a quotation with no referent, like a mirage of water in the desert, or looking in the garage for a dragon and not finding one.”

These are both tough pills to swallow (not that you should swallow them) and so it is a good deal easier to profess them than to believe them.

I think this is what non-reductionists/non-materialists think they are criticizing when they criticize reductive materialism.

But materialism isn’t that easy. It’s not as cheap as saying, “Anger is made out of atoms—there, now I’m done.” That wouldn’t explain how to get from billiard balls to hitting. You need the specific insights of computation, consequentialism, and search trees before you can start to close the explanatory gap.

All this was a relatively easy example by modern standards, because I restricted myself to talking about angry behaviors. Talking about outputs doesn’t require you to appreciate how an algorithm feels from inside (cross a first-person/third-person gap) or dissolve a wrong question (untangle places where the interior of your own mind runs skew to reality).

Going from material substances that bend and break, burn and fall, push and shove, to angry behavior, is just a practice problem by the standards of modern philosophy. But it is an important practice problem. It can only be fully appreciated, if you realize how hard it would have been to solve before writing was invented. There was once an explanatory gap here—though it may not seem that way in hindsight,
now that it’s been bridged for generations.

Explanatory gaps can be crossed, if you accept help from science, and don’t trust the view from the interior of your own mind.

*

216

Heat vs. Motion

After the last essay, it occurred to me that there’s a much simpler example of reductionism jumping a gap of apparent-difference-in-kind: the reduction of heat to motion.

Today, the equivalence of heat and motion may seem too obvious in hindsight—everyone says that “heat is motion,” therefore, it can’t be a “weird” belief.

But there was a time when the kinetic theory of heat was a highly controversial scientific hypothesis, contrasting to belief in a caloric fluid that flowed from hot objects to cold objects. Still earlier, the main theory of heat was “Phlogiston!”

Suppose you’d separately studied kinetic theory and caloric theory. You now know something about kinetics: collisions, elastic rebounds, momentum, kinetic energy, gravity, inertia, free trajectories. Separately, you know something about heat: temperatures, pressures, combustion, heat flows, engines, melting, vaporization.

Not only is this state of knowledge a plausible one, it is the state of knowledge possessed by e.g. Sadi Carnot, who, working strictly from within the caloric theory of heat, developed the principle of the Carnot cycle—a heat engine of maximum efficiency, whose existence implies the Second Law of Thermodynamics. This in 1824, when kinetics was a highly developed science.

Suppose, like Carnot, you know a great deal about kinetics, and a great deal about heat, as separate entities. Separate entities of knowledge, that is: your brain has separate filing baskets for beliefs about kinetics and beliefs about heat. But from the inside, this state of knowledge feels like living in a world of moving things and hot things, a world where motion and heat are independent properties of matter.

Now a Physicist From The Future comes along and tells you: “Where there is heat, there is motion, and vice versa. That’s why, for example, rubbing things together makes them hotter.”

There are (at least) two possible interpretations you could attach to this statement, “Where there is heat, there is motion, and vice versa.”

First, you could suppose that heat and motion exist separately—that the caloric theory is correct—but that among our universe’s physical laws is a “bridging law” which states that, where objects are moving quickly, caloric will come into existence. And conversely, another bridging law says that caloric can exert pressure on things and make them move, which is why a hotter gas exerts more pressure on its enclosure (thus a steam engine can use steam to drive a piston).

Second, you could suppose that heat and motion are, in some as-yet-mysterious sense, the same thing.

“Nonsense,” says Thinker 1, “the words ‘heat’ and ‘motion’ have two different meanings; that is why we have two different words. We know how to determine when we will call an observed phenomenon ‘heat’—heat can melt things, or make them burst into flame. We know how to determine when we will say that an object is ‘moving quickly’—it changes position; and when it crashes, it may deform, or shatter. Heat is concerned with change of substance; motion, with change of position and shape. To say that these two words have the same meaning is simply to confuse yourself.”

“Impossible,” says Thinker 2. “It may be that, in our world, heat and motion are associated by bridging laws, so that it is a law of physics that motion creates caloric, and vice versa. But I can easily imagine a world where rubbing things together does not make them hotter, and gases don’t exert more pressure at higher temperatures. Since there are possible worlds where heat and motion are not associated, they must be different properties—this is true a priori.”

Thinker 1 is confusing the quotation and the referent: 2 + 2 = 4, but “2 + 2” ≠ “4.” The string “2 + 2” contains five characters (including whitespace) and the string “4” contains only one character. If you type the two strings into a Python interpreter, they yield the same output, >>> 4. So you can’t conclude, from looking at the strings “2 + 2” and “4,” that just because the strings are different, they must have different “meanings” relative to the Python Interpreter.

The words “heat” and “kinetic energy” can be said to “refer to” the same thing, even before we know how heat reduces to motion, in the sense that we don’t know yet what the referent is, but the referents are in fact the same. You might imagine an Idealized Omniscient Science Interpreter that would give the same output when we typed in “heat” and “kinetic energy” on the command line.

I talk about the Science Interpreter to emphasize that, to dereference the pointer, you’ve got to step outside cognition. The end result of the dereference is something out there in reality, not in anyone’s mind. So you can say “real referent” or “actual referent,” but you can’t evaluate the words locally, from the inside of your own head. You can’t reason using the actual heat-referent—if you thought using real heat, thinking “one million Kelvin” would vaporize your brain. But, by forming a belief about your belief about heat, you can talk about your belief about heat, and say things like “It’s possible that my belief about heat doesn’t much resemble real heat.” You can’t actually perform that comparison right there in your own mind, but you can talk about it.

Hence you can say, “My beliefs about heat and motion are not the same beliefs, but it’s possible that actual heat and actual motion are the same thing.” It’s just like being able to acknowledge that “the morning star” and “the evening star” might be the same planet, while also understanding that you can’t determine this just by examining your beliefs—you’ve got to haul out the telescope.

Thinker 2’s mistake follows similarly. A physicist told them, “Where there is heat, there is motion” and Thinker 2 mistook this for a statement of physical law: The presence of caloric causes the existence of motion. What the physicist really means is more akin to an inferential rule: Where you are told there is “heat,” deduce the presence of “motion.”

From this basic projection of a multilevel model into a multilevel reality follows another, distinct error: the conflation of conceptual possibility with logical possibility. To Sadi Carnot, it is conceivable that there could be another world where heat and motion are not associated. To Richard Feynman, armed with specific knowledge of how to derive equations about heat from equations about motion, this idea is not only inconceivable, but so wildly inconsistent as to make one’s head explode.

I should note, in fairness to philosophers, that there are philosophers who have said these things. For example, Hilary Putnam, writing on the “Twin Earth” thought experiment:1

Once we have discovered that water (in the actual world) is H2O, nothing counts as a possible world in which water isn’t H2O. In particular, if a “logically possible” statement is one that holds in some “logically possible world,” it isn’t logically possible that water isn’t H2O.

On the other hand, we can perfectly well imagine having experiences that would convince us (and that would make it rational to believe that) water isn’t H2O. In that sense, it is conceivable that water isn’t H2O. It is conceivable but it isn’t logically possible! Conceivability is no proof of logical possibility.

It appears to me that “water” is being used in two different senses in these two paragraphs—one in which the word “water” refers to what we type into the Science Interpreter, and one in which “water” refers to what we get out of the Science Interpreter when we type “water” into it. In the first paragraph, Hilary seems to be saying that after we do some experiments and find out that water is H2O, water becomes automatically redefined to mean H2O. But you could coherently hold a different position about whether the word “water” now means “H2O” or “whatever is really in that bottle next to me,” so long as you use your terms consistently.

I believe the above has already been said as well? Anyway . . .

It is quite possible for there t
o be only one thing out-there-in-the-world, but for it to take on sufficiently different forms, and for you yourself to be sufficiently ignorant of the reduction, that it feels like living in a world containing two entirely different things. Knowledge concerning these two different phenomena may be taught in two different classes, and studied by two different academic fields, located in two different buildings of your university.

You’ve got to put yourself quite a ways back, into a historically realistic frame of mind, to remember how different heat and motion once seemed. Though, depending on how much you know today, it may not be as hard as all that, if you can look past the pressure of conventionality (that is, “heat is motion” is an un-weird belief, “heat is not motion” is a weird belief). I mean, suppose that tomorrow the physicists stepped forward and said, “Our popularizations of science have always contained one lie. Actually, heat has nothing to do with motion.” Could you prove they were wrong?

Saying “Maybe heat and motion are the same thing!” is easy. The difficult part is explaining how. It takes a great deal of detailed knowledge to get yourself to the point where you can no longer conceive of a world in which the two phenomena go separate ways. Reduction isn’t cheap, and that’s why it buys so much.

Or maybe you could say: “Reductionism is easy, reduction is hard.” But it does kinda help to be a reductionist, I think, when it comes time to go looking for a reduction.

*

‹ Prev Next ›