Rationality- From AI to Zombies

Home > Science > Rationality- From AI to Zombies > Page 89
Rationality- From AI to Zombies Page 89

by Eliezer Yudkowsky


  Our map, then, is very much unlike the territory; our maps are multi-level, the territory is single-level. Since the representation is so incredibly unlike the referent, in what sense can a belief like “I am wearing socks” be called true, when in reality itself, there are only quarks?

  In case you’ve forgotten what the word “true” means, the classic definition was given by Alfred Tarski:

  The statement “snow is white” is true if and only if snow is white.

  In case you’ve forgotten what the difference is between the statement “I believe ‘snow is white’” and “‘Snow is white’ is true,” see Qualitatively Confused. Truth can’t be evaluated just by looking inside your own head—if you want to know, for example, whether “the morning star = the evening star,” you need a telescope; it’s not enough just to look at the beliefs themselves.

  This is the point missed by the postmodernist folks screaming, “But how do you know your beliefs are true?” When you do an experiment, you actually are going outside your own head. You’re engaging in a complex interaction whose outcome is causally determined by the thing you’re reasoning about, not just your beliefs about it. I once defined “reality” as follows:

  Even when I have a simple hypothesis, strongly supported by all the evidence I know, sometimes I’m still surprised. So I need different names for the thingies that determine my predictions and the thingy that determines my experimental results. I call the former thingies “belief,” and the latter thingy “reality.”

  The interpretation of your experiment still depends on your prior beliefs. I’m not going to talk, for the moment, about Where Priors Come From, because that is not the subject of this essay. My point is that truth refers to an ideal comparison between a belief and reality. Because we understand that planets are distinct from beliefs about planets, we can design an experiment to test whether the belief “the morning star and the evening star are the same planet” is true. This experiment will involve telescopes, not just introspection, because we understand that “truth” involves comparing an internal belief to an external fact; so we use an instrument, the telescope, whose perceived behavior we believe to depend on the external fact of the planet.

  Believing that the telescope helps us evaluate the “truth” of “morning star = evening star” relies on our prior beliefs about the telescope interacting with the planet. Again, I’m not going to address that in this particular essay, except to quote one of my favorite Raymond Smullyan lines: “If the more sophisticated reader objects to this statement on the grounds of its being a mere tautology, then please at least give the statement credit for not being inconsistent.” Similarly, I don’t see the use of a telescope as circular logic, but as reflective coherence; for every systematic way of arriving at truth, there ought to be a rational explanation for how it works.

  The question on the table is what it means for “snow is white” to be true, when, in reality, there are just quarks.

  There’s a certain pattern of neural connections making up your beliefs about “snow” and “whiteness”—we believe this, but we do not know, and cannot concretely visualize, the actual neural connections. Which are, themselves, embodied in a pattern of quarks even less known. Out there in the world, there are water molecules whose temperature is low enough that they have arranged themselves in tiled repeating patterns; they look nothing like the tangles of neurons. In what sense, comparing one (ever-fluctuating) pattern of quarks to the other, is the belief “snow is white” true?

  Obviously, neither I nor anyone else can offer an Ideal Quark Comparer Function that accepts a quark-level description of a neurally embodied belief (including the surrounding brain) and a quark-level description of a snowflake (and the surrounding laws of optics), and outputs “true” or “false” over “snow is white.” And who says the fundamental level is really about particle fields?

  On the other hand, throwing out all beliefs because they aren’t written as gigantic unmanageable specifications about quarks we can’t even see . . . doesn’t seem like a very prudent idea. Not the best way to optimize our goals.

  It seems to me that a word like “snow” or “white” can be taken as a kind of promissory note—not a known specification of exactly which physical quark configurations count as “snow,” but, nonetheless, there are things you call snow and things you don’t call snow, and even if you got a few items wrong (like plastic snow), an Ideal Omniscient Science Interpreter would see a tight cluster in the center and redraw the boundary to have a simpler definition.

  In a single-layer universe whose bottom layer is unknown, or uncertain, or just too large to talk about, the concepts in a multi-layer mind can be said to represent a kind of promissory note—we don’t know what they correspond to, out there. But it seems to us that we can distinguish positive from negative cases, in a predictively productive way, so we think—perhaps in a fully general sense—that there is some difference of quarks, some difference of configurations at the fundamental level, that explains the differences that feed into our senses, and ultimately result in our saying “snow” or “not snow.”

  I see this white stuff, and it is the same on several occasions, so I hypothesize a stable latent cause in the environment—I give it the name “snow”; “snow” is then a promissory note referring to a believed-in simple boundary that could be drawn around the unseen causes of my experience.

  Hilary Putnam’s “Twin Earth” thought experiment (where water is not H2O but some strange other substance denoted XYZ, otherwise behaving much like water), and the subsequent philosophical debate, helps to highlight this issue. “Snow” doesn’t have a logical definition known to us—it’s more like an empirically determined pointer to a logical definition. This is true even if you believe that snow is ice crystals is low-temperature tiled water molecules. The water molecules are made of quarks. What if quarks turn out to be made of something else? What is a snowflake, then? You don’t know—but it’s still a snowflake, not a fire hydrant.

  And of course, these very paragraphs I have just written are likewise far above the level of quarks. “Sensing white stuff, visually categorizing it, and thinking ‘snow’ or ‘not snow’”—this is also talking very far above the quarks. So my meta-beliefs are also promissory notes, for things that an Ideal Omniscient Science Interpreter might know about which configurations of the quarks (or whatever) making up my brain correspond to “believing ‘snow is white.’”

  But then, the entire grasp that we have upon reality is made up of promissory notes of this kind. So, rather than calling it circular, I prefer to call it self-consistent.

  This can be a bit unnerving—maintaining a precarious epistemic perch, in both object-level beliefs and reflection, far above a huge unknown underlying fundamental reality, and hoping one doesn’t fall off.

  On reflection, though, it’s hard to see how things could be any other way.

  So at the end of the day, the statement “reality does not contain hands as fundamental, additional, separate causal entities, over and above quarks” is not the same statement as “hands do not exist” or “I don’t have any hands.” There are no fundamental hands; hands are made of fingers, palm, and thumb, which in turn are made of muscle and bone, all the way down to elementary particle fields, which are the fundamental causal entities, so far as we currently know.

  This is not the same as saying, “there are no ‘hands.’” It is not the same as saying, “the word ‘hands’ is a promissory note that will never be paid, because there is no empirical cluster that corresponds to it”; or “the ‘hands’ note will never be paid, because it is logically impossible to reconcile its supposed characteristics”; or “the statement ‘humans have hands’ refers to a sensible state of affairs, but reality is not in that state.”

  Just: There are patterns that exist in reality where we see “hands,” and these patterns have something in common, but they are not fundamental.

  If I really had no hands—if reality suddenly transit
ioned to be in a state that we would describe as “Eliezer has no hands”—reality would shortly thereafter correspond to a state we would describe as “Eliezer screams as blood jets out of his wrist stumps.”

  And this is true, even though the above paragraph hasn’t specified any quark positions.

  The previous sentence is likewise meta-true.

  The map is multilevel, the territory is single-level. This doesn’t mean that the higher levels “don’t exist,” like looking in your garage for a dragon and finding nothing there, or like seeing a mirage in the desert and forming an expectation of drinkable water when there is nothing to drink. The higher levels of your map are not false, without referent; they have referents in the single level of physics. It’s not that the wings of an airplane unexist—then the airplane would drop out of the sky. The “wings of an airplane” exist explicitly in an engineer’s multilevel model of an airplane, and the wings of an airplane exist implicitly in the quantum physics of the real airplane. Implicit existence is not the same as nonexistence. The exact description of this implicitness is not known to us—is not explicitly represented in our map. But this does not prevent our map from working, or even prevent it from being true.

  Though it is a bit unnerving to contemplate that every single concept and belief in your brain, including these meta-concepts about how your brain works and why you can form accurate beliefs, are perched orders and orders of magnitude above reality . . .

  *

  221

  Zombies! Zombies?

  Your “zombie,” in the philosophical usage of the term, is putatively a being that is exactly like you in every respect—identical behavior, identical speech, identical brain; every atom and quark in exactly the same position, moving according to the same causal laws of motion—except that your zombie is not conscious.

  It is furthermore claimed that if zombies are “possible” (a term over which battles are still being fought), then, purely from our knowledge of this “possibility,” we can deduce a priori that consciousness is extra-physical, in a sense to be described below; the standard term for this position is “epiphenomenalism.”

  (For those unfamiliar with zombies, I emphasize that this is not a strawman. See, for example, the Stanford Encyclopedia of Philosophy entry on Zombies. The “possibility” of zombies is accepted by a substantial fraction, possibly a majority, of academic philosophers of consciousness.)

  I once read somewhere, “You are not the one who speaks your thoughts—you are the one who hears your thoughts.” In Hebrew, the word for the highest soul, that which God breathed into Adam, is N’Shama—“the hearer.”

  If you conceive of “consciousness” as a purely passive listening, then the notion of a zombie initially seems easy to imagine. It’s someone who lacks the N’Shama, the hearer.

  (Warning: Very long 6,600-word essay involving David Chalmers ahead. This may be taken as my demonstrative counterexample to Richard Chappell’s Arguing with Eliezer Part II, in which Richard accuses me of not engaging with the complex arguments of real philosophers.)

  When you open a refrigerator and find that the orange juice is gone, you think “Darn, I’m out of orange juice.” The sound of these words is probably represented in your auditory cortex, as though you’d heard someone else say it. (Why do I think this? Because native Chinese speakers can remember longer digit sequences than English-speakers. Chinese digits are all single syllables, and so Chinese speakers can remember around ten digits, versus the famous “seven plus or minus two” for English speakers. There appears to be a loop of repeating sounds back to yourself, a size limit on working memory in the auditory cortex, which is genuinely phoneme-based.)

  Let’s suppose the above is correct; as a postulate, it should certainly present no problem for advocates of zombies. Even if humans are not like this, it seems easy enough to imagine an AI constructed this way (and imaginability is what the zombie argument is all about). It’s not only conceivable in principle, but quite possible in the next couple of decades, that surgeons will lay a network of neural taps over someone’s auditory cortex and read out their internal narrative. (Researchers have already tapped the lateral geniculate nucleus of a cat and reconstructed recognizable visual inputs.)

  So your zombie, being physically identical to you down to the last atom, will open the refrigerator and form auditory cortical patterns for the phonemes “Darn, I’m out of orange juice.” On this point, epiphenomalists would willingly agree.

  But, says the epiphenomenalist, in the zombie there is no one inside to hear; the inner listener is missing. The internal narrative is spoken, but unheard. You are not the one who speaks your thoughts. You are the one who hears them.

  It seems a lot more straightforward (they would say) to make an AI that prints out some kind of internal narrative, than to show that an inner listener hears it.

  The Zombie Argument is that if the Zombie World is possible—not necessarily physically possible in our universe, just “possible in theory,” or “imaginable,” or something along those lines—then consciousness must be extra-physical, something over and above mere atoms. Why? Because even if you somehow knew the positions of all the atoms in the universe, you would still have be told, as a separate and additional fact, that people were conscious—that they had inner listeners—that we were not in the Zombie World, as seems possible.

  Zombie-ism is not the same as dualism. Descartes thought there was a body-substance and a wholly different kind of mind-substance, but Descartes also thought that the mind-substance was a causally active principle, interacting with the body-substance, controlling our speech and behavior. Subtracting out the mind-substance from the human would leave a traditional zombie, of the lurching and groaning sort.

  And though the Hebrew word for the innermost soul is N’Shama, that-which-hears, I can’t recall hearing a rabbi arguing for the possibility of zombies. Most rabbis would probably be aghast at the idea that the divine part which God breathed into Adam doesn’t actually do anything.

  The technical term for the belief that consciousness is there, but has no effect on the physical world, is epiphenomenalism.

  Though there are other elements to the zombie argument (I’ll deal with them below), I think that the intuition of the passive listener is what first seduces people to zombie-ism. In particular, it’s what seduces a lay audience to zombie-ism. The core notion is simple and easy to access: The lights are on but no one’s home.

  Philosophers are appealing to the intuition of the passive listener when they say “Of course the zombie world is imaginable; you know exactly what it would be like.”

  One of the great battles in the Zombie Wars is over what, exactly, is meant by saying that zombies are “possible.” Early zombie-ist philosophers (in the 1970s) just thought it was obvious that zombies were “possible,” and didn’t bother to define what sort of possibility was meant.

  Because of my reading in mathematical logic, what instantly comes into my mind is logical possibility. If you have a collection of statements like {(A ⇒ B),(B ⇒ C),(C ⇒¬A)}, then the compound belief is logically possible if it has a model—which, in the simple case above, reduces to finding a value assignment to {A,B,C} that makes all of the statements (A ⇒ B), (B ⇒ C), and (C ⇒¬A) true. In this case, A = B = C = 0 works, as does {A = 0,B = C = 1} or {A = B = 0,C = 1}.

  Something will seem possible—will seem “conceptually possible” or “imaginable”—if you can consider the collection of statements without seeing a contradiction. But it is, in general, a very hard problem to see contradictions or to find a full specific model! If you limit yourself to simple Boolean propositions of the form ((A or B or C) and (B or ¬C or D) and (D or ¬A or ¬C)…), conjunctions of disjunctions of three variables, then this is a very famous problem called 3-SAT, which is one of the first problems ever to be proven NP-complete.

  So just because you don’t see a contradiction in the Zombie World at first glance, it doesn’t mean that no contradiction is there.
It’s like not seeing a contradiction in the Riemann Hypothesis at first glance. From conceptual possibility (“I don’t see a problem”) to logical possibility, in the full technical sense, is a very great leap. It’s easy to make it an NP-complete leap, and with first-order theories you can make it arbitrarily hard to compute even for finite questions. And it’s logical possibility of the Zombie World, not conceptual possibility, that is needed to suppose that a logically omniscient mind could know the positions of all the atoms in the universe, and yet need to be told as an additional non-entailed fact that we have inner listeners.

  Just because you don’t see a contradiction yet is no guarantee that you won’t see a contradiction in another thirty seconds. “All odd numbers are prime. Proof: 3 is prime, 5 is prime, 7 is prime . . .”

  So let us ponder the Zombie Argument a little longer: Can we think of a counterexample to the assertion “Consciousness has no third-party-detectable causal impact on the world”?

  If you close your eyes and concentrate on your inward awareness, you will begin to form thoughts, in your internal narrative, that go along the lines of “I am aware” and “My awareness is separate from my thoughts” and “I am not the one who speaks my thoughts, but the one who hears them” and “My stream of consciousness is not my consciousness” and “It seems like there is a part of me that I can imagine being eliminated without changing my outward behavior.”

  You can even say these sentences out loud, as you meditate. In principle, someone with a super-fMRI could probably read the phonemes out of your auditory cortex; but saying it out loud removes all doubt about whether you have entered the realms of testability and physical consequences.

 

‹ Prev