Rationality- From AI to Zombies

Page 92

by Eliezer Yudkowsky

Which is to say: I hold that it is an empirical fact, given what the word “consciousness” actually refers to, that it is logically impossible to eliminate consciousness without moving any atoms. What it would mean to eliminate “consciousness” from a world, rather than consciousness, I will not speculate.

(2) It’s misleading to say it’s “miraculous” (on the property dualist view) that our qualia line up so neatly with the physical world. There’s a natural law which guarantees this, after all. So it’s no more miraculous than any other logically contingent nomic necessity (e.g. the constants in our physical laws).

It is the natural law itself that is “miraculous”—counts as an additional complex-improbable element of the theory to be postulated, without having been itself justified in terms of things already known. One postulates (a) an inner world that is conscious, (b) a malfunctioning outer world that talks about consciousness for no reason, and (c) that the two align perfectly. Statement (c) does not follow from (a) and (b), and so is a separate postulate.

I agree that this usage of “miraculous” conflicts with the philosophical sense of violating a natural law; I meant it in the sense of improbability appearing from no apparent source, a la perpetual motion belief. Hence the word was ill-chosen in context. But is this not intuitively the sort of thing we should call a miracle? Your consciousness doesn’t really cause you to say you’re conscious, there’s a separate physical thing that makes you say you’re conscious, but also there’s a law aligning the two—this is indeed an event on a similar order of wackiness to a cracker taking on the substance of Christ’s flesh while possessing the exact appearance and outward behavior of a cracker, there’s just a natural law which guarantees this, you know.

That is, Zombie (or “Outer”) Chalmers doesn’t actually conclude anything, because his utterances are meaningless. A fortiori, he doesn’t conclude anything unwarrantedly. He’s just making noises; these are no more susceptible to epistemic assessment than the chirps of a bird.

Looking at this from an AI-design standpoint, it seems to me like you should be able to build an AI that systematically refines an inner part of itself that correlates (in the sense of mutual information or systematic relations) to the environment, perhaps including floating-point numbers of a sort that I would call “probabilities” because they obey the internal relations mandated by Cox’s Theorems when the AI encounters new information—pardon me, new sense inputs.

You will say that, unless the AI is more than mere transistors—unless it has the dual aspect—the AI has no beliefs.

I think my views on this were expressed pretty clearly in The Simple Truth.

To me, it seems pretty straightforward to construct maps that correlate to territories in systematic ways, without mentioning anything other than things of pure physical causality. The AI outputs a map of Texas. Another AI flies with the map to Texas and checks to see if the highways are in the corresponding places, chirping “True” when it detects a match and “False” when it detects a mismatch. You can refuse to call this “a map of Texas” but the AIs themselves are still chirping “True” or “False,” and the said AIs are going to chirp “False” when they look at Chalmers’s belief in an epiphenomenal inner core, and I for one would agree with them.

It’s clear that the function of mapping reality is performed strictly by Outer Chalmers. The whole business of producing belief representations is handled by Bayesian structure in causal interactions. There’s nothing left for the Inner Chalmers to do, but bless the whole affair with epiphenomenal meaning. Where now “meaning” is something entirely unrelated to systematic map-territory correspondence or the ability to use that map to navigate reality. So when it comes to talking about “accuracy,” let alone “systematic accuracy,” it seems to me like we should be able to determine it strictly by looking at the Outer Chalmers.

(B) In yesterday’s text, I left out an assumption when I wrote:

If a self-modifying AI looks at a part of itself that concludes “B” on condition A—a part of itself that writes “B” to memory whenever condition A is true—and the AI inspects this part, determines how it (causally) operates in the context of the larger universe, and the AI decides that this part systematically tends to write false data to memory, then the AI has found what appears to be a bug, and the AI will self-modify not to write “B” to the belief pool under condition A.

. . .

But there’s no possible warrant for the outer Chalmers or any reflectively coherent self-inspecting AI to believe in this mysterious correctness. A good AI design should, I think, be a reflectively coherent intelligence with a testable theory of how it operates as a causal system, hence with a testable theory of how that causal system produces systematically accurate beliefs on the way to achieving its goals.

Actually, you need an additional assumption to the above, which is that a “good AI design” (the kind I was thinking of, anyway) judges its own rationality in a modular way; it enforces global rationality by enforcing local rationality. If there is a piece that, relative to its context, is locally systematically unreliable—for some possible beliefs “Bi” and conditions Ai, it adds some “Bi” to the belief pool under local condition Ai, where reflection by the system indicates that Bi is not true (or in the case of probabilistic beliefs, not accurate) when the local condition Ai is true—then this is a bug. This kind of modularity is a way to make the problem tractable, and it’s how I currently think about the first-generation AI design. [Edit 2013: The actual notion I had in mind here has now been fleshed out and formalized in Tiling Agents for Self-Modifying AI, section 6.]

The notion is that a causally closed cognitive system—such as an AI designed by its programmers to use only causally efficacious parts; or an AI whose theory of its own functioning is entirely testable; or the outer Chalmers that writes philosophy papers—that believes that it has an epiphenomenal inner self, must be doing something systematically unreliable because it would conclude the same thing in a Zombie World. A mind all of whose parts are systematically locally reliable, relative to their contexts, would be systematically globally reliable. Ergo, a mind that is globally unreliable must contain at least one locally unreliable part. So a causally closed cognitive system inspecting itself for local reliability must discover that at least one step involved in adding the belief of an epiphenomenal inner self is unreliable.

If there are other ways for minds to be reflectively coherent that avoid this proof of disbelief in zombies, philosophers are welcome to try and specify them.

The reason why I have to specify all this is that otherwise you get a kind of extremely cheap reflective coherence where the AI can never label itself unreliable. E.g., if the AI finds a part of itself that computes 2 + 2 = 5 (in the surrounding context of counting sheep) the AI will reason: “Well, this part malfunctions and says that 2 + 2 = 5 . . . but by pure coincidence, 2 + 2 is equal to 5, or so it seems to me . . . so while the part looks systematically unreliable, I better keep it the way it is, or it will handle this special case wrong.” That’s why I talk about enforcing global reliability by enforcing local systematic reliability—if you just compare your global beliefs to your global beliefs, you don’t go anywhere.

This does have a general lesson: Show your arguments are globally reliable by virtue of each step being locally reliable; don’t just compare the arguments’ conclusions to your intuitions. [Edit 2013: See Proofs, Implications, and Models for a discussion of the fact that valid logic is locally valid.]

(C) An anonymous poster wrote:

A sidepoint, this, but I believe your etymology for “n’shama” is wrong. It is related to the word for “breath,” not “hear.” The root for “hear” contains an ayin, which n’shama does not.

Now that’s what I call a miraculously misleading coincidence—although the word N’Shama arose for completely different reasons, it sounded exactly the right way to make me think it referred to an inner listener.

Oops.

*

223

The Generalized Anti-Zombie Principle

Each problem that I solved became a rule which served afterwards to solve other problems.

—René Descartes, Discours de la Méthode1

“Zombies” are putatively beings that are atom-by-atom identical to us, governed by all the same third-party-visible physical laws, except that they are not conscious.

Though the philosophy is complicated, the core argument against zombies is simple: When you focus your inward awareness on your inward awareness, your internal narrative (the little voice inside your head that speaks your thoughts) says “I am aware of being aware” soon after, and then you say it out loud, and then you type it into a computer keyboard, and create a third-party visible blog post.

Consciousness, whatever it may be—a substance, a process, a name for a confusion—is not epiphenomenal; your mind can catch the inner listener in the act of listening, and say so out loud. The fact that I have typed this paragraph would at least seem to refute the idea that consciousness has no experimentally detectable consequences.

I hate to say “So now let’s accept this and move on,” over such a philosophically controversial question, but it seems like a considerable majority of Overcoming Bias commenters do accept this. And there are other conclusions you can only get to after you accept that you cannot subtract consciousness and leave the universe looking exactly the same. So now let’s accept this and move on.

The form of the Anti-Zombie Argument seems like it should generalize, becoming an Anti-Zombie Principle. But what is the proper generalization?

Let’s say, for example, that someone says: “I have a switch in my hand, which does not affect your brain in any way; and if this switch is flipped, you will cease to be conscious.” Does the Anti-Zombie Principle rule this out as well, with the same structure of argument?

It appears to me that in the case above, the answer is yes. In particular, you can say: “Even after your switch is flipped, I will still talk about consciousness for exactly the same reasons I did before. If I am conscious right now, I will still be conscious after you flip the switch.”

Philosophers may object, “But now you’re equating consciousness with talking about consciousness! What about the Zombie Master, the chatbot that regurgitates a remixed corpus of amateur human discourse on consciousness?”

But I did not equate “consciousness” with verbal behavior. The core premise is that, among other things, the true referent of “consciousness” is also the cause in humans of talking about inner listeners.

As I argued (at some length) in the sequence on words, what you want in defining a word is not always a perfect Aristotelian necessary-and-sufficient definition; sometimes you just want a treasure map that leads you to the extensional referent. So “that which does in fact make me talk about an unspeakable awareness” is not a necessary-and-sufficient definition. But if what does in fact cause me to discourse about an unspeakable awareness is not “consciousness,” then . . .

. . . then the discourse gets pretty futile. That is not a knockdown argument against zombies—an empirical question can’t be settled by mere difficulties of discourse. But if you try to defy the Anti-Zombie Principle, you will have problems with the meaning of your discourse, not just its plausibility.

Could we define the word “consciousness” to mean “whatever actually makes humans talk about ‘consciousness’”? This would have the powerful advantage of guaranteeing that there is at least one real fact named by the word “consciousness.” Even if our belief in consciousness is a confusion, “consciousness” would name the cognitive architecture that generated the confusion. But to establish a definition is only to promise to use a word consistently; it doesn’t settle any empirical questions, such as whether our inner awareness makes us talk about our inner awareness.

Let’s return to the Off-Switch.

If we allow that the Anti-Zombie Argument applies against the Off-Switch, then the Generalized Anti-Zombie Principle does not say only, “Any change that is not in-principle experimentally detectable (IPED) cannot remove your consciousness.” The switch’s flipping is experimentally detectable, but it still seems highly unlikely to remove your consciousness.

Perhaps the Anti-Zombie Principle says, “Any change that does not affect you in any IPED way cannot remove your consciousness”?

But is it a reasonable stipulation to say that flipping the switch does not affect you in any IPED way? All the particles in the switch are interacting with the particles composing your body and brain. There are gravitational effects—tiny, but real and IPED. The gravitational pull from a one-gram switch ten meters away is around 6 × 10-16 m/s2. That’s around half a neutron diameter per second per second, far below thermal noise, but way above the Planck level.

We could flip the switch light-years away, in which case the flip would have no immediate causal effect on you (whatever “immediate” means in this case) (if the Standard Model of physics is correct).

But it doesn’t seem like we should have to alter the thought experiment in this fashion. It seems that, if a disconnected switch is flipped on the other side of a room, you should not expect your inner listener to go out like a light, because the switch “obviously doesn’t change” that which is the true cause of your talking about an inner listener. Whatever you really are, you don’t expect the switch to mess with it.

This is a large step.

If you deny that it is a reasonable step, you had better never go near a switch again. But still, it’s a large step.

The key idea of reductionism is that our maps of the universe are multi-level to save on computing power, but physics seems to be strictly single-level. All our discourse about the universe takes place using references far above the level of fundamental particles.

The switch’s flip does change the fundamental particles of your body and brain. It nudges them by whole neutron diameters away from where they would have otherwise been.

In ordinary life, we gloss a change this small by saying that the switch “doesn’t affect you.” But it does affect you. It changes everything by whole neutron diameters! What could possibly be remaining the same? Only the description that you would give of the higher levels of organization—the cells, the proteins, the spikes traveling along a neural axon. As the map is far less detailed than the territory, it must map many different states to the same description.

Any reasonable sort of humanish description of the brain that talks about neurons and activity patterns (or even the conformations of individual microtubules making up axons and dendrites) won’t change when you flip a switch on the other side of the room. Nuclei are larger than neutrons, atoms are larger than nuclei, and by the time you get up to talking about the molecular level, that tiny little gravitational force has vanished from the list of things you bother to track.

But if you add up enough tiny little gravitational pulls, they will eventually yank you across the room and tear you apart by tidal forces, so clearly a small effect is not “no effect at all.”

Maybe the tidal force from that tiny little pull, by an amazing coincidence, pulls a single extra calcium ion just a tiny bit closer to an ion channel, causing it to be pulled in just a tiny bit sooner, making a single neuron fire infinitesimally sooner than it would otherwise have done, a difference which amplifies chaotically, finally making a whole neural spike occur that otherwise wouldn’t have occurred, sending you off on a different train of thought, that triggers an epileptic fit, that kills you, causing you to cease to be conscious . . .

If you add up a lot of tiny quantitative effects, you get a big quantitative effect—big enough to mess with anything you care to name. And so claiming that the switch has literally zero effect on the things you care about, is taking it too far.

But with just one switch, the force exerted is vastly less than thermal uncertainties, never mind quantum uncertainties. If you don’t expect your consciousness to flicker in and out of exi
stence as the result of thermal jiggling, then you certainly shouldn’t expect to go out like a light when someone sneezes a kilometer away.

The alert Bayesian will note that I have just made an argument about expectations, states of knowledge, justified beliefs about what can and can’t switch off your consciousness.

This doesn’t necessarily destroy the Anti-Zombie Argument. Probabilities are not certainties, but the laws of probability are theorems; if rationality says you can’t believe something on your current information, then that is a law, not a suggestion.

Still, this version of the Anti-Zombie Argument is weaker. It doesn’t have the nice, clean, absolutely clear-cut status of, “You can’t possibly eliminate consciousness while leaving all the atoms in exactly the same place.” (Or for “all the atoms” substitute “all causes with in-principle experimentally detectable effects,” and “same wavefunction” for “same place,” etc.)

But the new version of the Anti-Zombie Argument still carries. You can say, “I don’t know what consciousness really is, and I suspect I may be fundamentally confused about the question. But if the word refers to anything at all, it refers to something that is, among other things, the cause of my talking about consciousness. Now, I don’t know why I talk about consciousness. But it happens inside my skull, and I expect it has something to do with neurons firing. Or maybe, if I really understood consciousness, I would have to talk about an even more fundamental level than that, like microtubules, or neurotransmitters diffusing across a synaptic channel. But still, that switch you just flipped has an effect on my neurotransmitters and microtubules that’s much, much less than thermal noise at 310 Kelvin. So whatever the true cause of my talking about consciousness may be, I don’t expect it to be hugely affected by the gravitational pull from that switch. Maybe it’s just a tiny little infinitesimal bit affected? But it’s certainly not going to go out like a light. I expect to go on talking about consciousness in almost exactly the same way afterward, for almost exactly the same reasons.”

‹ Prev Next ›