Book Read Free

Rationality- From AI to Zombies

Page 63

by Eliezer Yudkowsky


  “What does the sorting scanner tell you?”

  “It tells you whether to put the object into the blegg bin or the rube bin. That’s why it’s called a sorting scanner.”

  At this point you fall silent.

  “Incidentally,” Susan says casually, “it may interest you to know that bleggs contain small nuggets of vanadium ore, and rubes contain shreds of palladium, both of which are useful industrially.”

  “Susan, you are pure evil.”

  “Thank you.”

  So now it seems we’ve discovered the heart and essence of bleggness: a blegg is an object that contains a nugget of vanadium ore. Surface characteristics, like blue color and furredness, do not determine whether an object is a blegg; surface characteristics only matter because they help you infer whether an object is a blegg, that is, whether the object contains vanadium.

  Containing vanadium is a necessary and sufficient definition: all bleggs contain vanadium and everything that contains vanadium is a blegg: “blegg” is just a shorthand way of saying “vanadium-containing object.” Right?

  Not so fast, says Susan: Around 98% of bleggs contain vanadium, but 2% contain palladium instead. To be precise (Susan continues) around 98% of blue egg-shaped furred flexible opaque objects contain vanadium. For unusual bleggs, it may be a different percentage: 95% of purple bleggs contain vanadium, 92% of hard bleggs contain vanadium, etc.

  Now suppose you find a blue egg-shaped furred flexible opaque object, an ordinary blegg in every visible way, and just for kicks you take it to the sorting scanner, and the scanner says “palladium”—this is one of the rare 2%. Is it a blegg?

  At first you might answer that, since you intend to throw this object in the rube bin, you might as well call it a “rube.” However, it turns out that almost all bleggs, if you switch off the lights, glow faintly in the dark, while almost all rubes do not glow in the dark. And the percentage of bleggs that glow in the dark is not significantly different for blue egg-shaped furred flexible opaque objects that contain palladium, instead of vanadium. Thus, if you want to guess whether the object glows like a blegg, or remains dark like a rube, you should guess that it glows like a blegg.

  So is the object really a blegg or a rube?

  On one hand, you’ll throw the object in the rube bin no matter what else you learn. On the other hand, if there are any unknown characteristics of the object you need to infer, you’ll infer them as if the object were a blegg, not a rube—group it into the similarity cluster of blue egg-shaped furred flexible opaque things, and not the similarity cluster of red cube-shaped smooth hard translucent things.

  The question “Is this object a blegg?” may stand in for different queries on different occasions.

  If it weren’t standing in for some query, you’d have no reason to care.

  Is atheism a “religion”? Is transhumanism a “cult”? People who argue that atheism is a religion “because it states beliefs about God” are really trying to argue (I think) that the reasoning methods used in atheism are on a par with the reasoning methods used in religion, or that atheism is no safer than religion in terms of the probability of causally engendering violence, etc. . . . What’s really at stake is an atheist’s claim of substantial difference and superiority relative to religion, which the religious person is trying to reject by denying the difference rather than the superiority(!).

  But that’s not the a priori irrational part: The a priori irrational part is where, in the course of the argument, someone pulls out a dictionary and looks up the definition of “atheism” or “religion.” (And yes, it’s just as silly whether an atheist or religionist does it.) How could a dictionary possibly decide whether an empirical cluster of atheists is really substantially different from an empirical cluster of theologians? How can reality vary with the meaning of a word? The points in thingspace don’t move around when we redraw a boundary.

  But people often don’t realize that their argument about where to draw a definitional boundary, is really a dispute over whether to infer a characteristic shared by most things inside an empirical cluster . . .

  Hence the phrase, “disguised query.”

  *

  161

  Neural Categories

  In Disguised Queries, I talked about a classification task of “bleggs” and “rubes.” The typical blegg is blue, egg-shaped, furred, flexible, opaque, glows in the dark, and contains vanadium. The typical rube is red, cube-shaped, smooth, hard, translucent, unglowing, and contains palladium. For the sake of simplicity, let us forget the characteristics of flexibility/hardness and opaqueness/translucency. This leaves five dimensions in thingspace: color, shape, texture, luminance, and interior.

  Suppose I want to create an Artificial Neural Network (ANN) to predict unobserved blegg characteristics from observed blegg characteristics. And suppose I’m fairly naive about ANNs: I’ve read excited popular science books about how neural networks are distributed, emergent, and parallel just like the human brain!! but I can’t derive the differential equations for gradient descent in a non-recurrent multilayer network with sigmoid units (which is actually a lot easier than it sounds).

  Then I might design a neural network that looks something like Figure 161.1.

  Figure 161.1: Network 1

  Network 1 is for classifying bleggs and rubes. But since “blegg” is an unfamiliar and synthetic concept, I’ve also included a similar Network 1b in Figure 161.2 for distinguishing humans from Space Monsters, with input from Aristotle (“All men are mortal”) and Plato’s Academy (“A featherless biped with broad nails”).

  Figure 161.2: Network 1b

  A neural network needs a learning rule. The obvious idea is that when two nodes are often active at the same time, we should strengthen the connection between them—this is one of the first rules ever proposed for training a neural network, known as Hebb’s Rule.

  Thus, if you often saw things that were both blue and furred—thus simultaneously activating the “color” node in the + state and the “texture” node in the + state—the connection would strengthen between color and texture, so that + colors activated + textures, and vice versa. If you saw things that were blue and egg-shaped and vanadium-containing, that would strengthen positive mutual connections between color and shape and interior.

  Let’s say you’ve already seen plenty of bleggs and rubes come off the conveyor belt. But now you see something that’s furred, egg-shaped, and—gasp!—reddish purple (which we’ll model as a “color” activation level of -2/3). You haven’t yet tested the luminance, or the interior. What to predict, what to predict?

  What happens then is that the activation levels in Network 1 bounce around a bit. Positive activation flows to luminance from shape, negative activation flows to interior from color, negative activation flows from interior to luminance . . . Of course all these messages are passed in parallel!! and asynchronously!! just like the human brain . . .

  Finally Network 1 settles into a stable state, which has high positive activation for “luminance” and “interior.” The network may be said to “expect” (though it has not yet seen) that the object will glow in the dark, and that it contains vanadium.

  And lo, Network 1 exhibits this behavior even though there’s no explicit node that says whether the object is a blegg or not. The judgment is implicit in the whole network!! Bleggness is an attractor!! which arises as the result of emergent behavior!! from the distributed!! learning rule.

  Now in real life, this kind of network design—however faddish it may sound—runs into all sorts of problems. Recurrent networks don’t always settle right away: They can oscillate, or exhibit chaotic behavior, or just take a very long time to settle down. This is a Bad Thing when you see something big and yellow and striped, and you have to wait five minutes for your distributed neural network to settle into the “tiger” attractor. Asynchronous and parallel it may be, but it’s not real-time.

  And there are other problems, like double-counting the evidence
when messages bounce back and forth: If you suspect that an object glows in the dark, your suspicion will activate belief that the object contains vanadium, which in turn will activate belief that the object glows in the dark.

  Plus if you try to scale up the Network 1 design, it requires O(N2) connections, where N is the total number of observables.

  So what might be a more realistic neural network design?

  Figure 161.3: Network 2

  In Network 2 of Figure 161.3, a wave of activation converges on the central node from any clamped (observed) nodes, and then surges back out again to any unclamped (unobserved) nodes. Which means we can compute the answer in one step, rather than waiting for the network to settle—an important requirement in biology when the neurons only run at 20Hz. And the network architecture scales as O(N), rather than O(N2).

  Admittedly, there are some things you can notice more easily with the first network architecture than the second. Network 1 has a direct connection between every two nodes. So if red objects never glow in the dark, but red furred objects usually have the other blegg characteristics like egg-shape and vanadium, Network 1 can easily represent this: it just takes a very strong direct negative connection from color to luminance, but more powerful positive connections from texture to all other nodes except luminance.

  Nor is this a “special exception” to the general rule that bleggs glow—remember, in Network 1, there is no unit that represents blegg-ness; blegg-ness emerges as an attractor in the distributed network.

  So yes, those O(N2) connections were buying us something. But not very much. Network 1 is not more useful on most real-world problems, where you rarely find an animal stuck halfway between being a cat and a dog.

  (There are also facts that you can’t easily represent in Network 1 or Network 2. Let’s say sea-blue color and spheroid shape, when found together, always indicate the presence of palladium; but when found individually, without the other, they are each very strong evidence for vanadium. This is hard to represent, in either architecture, without extra nodes. Both Network 1 and Network 2 embody implicit assumptions about what kind of environmental structure is likely to exist; the ability to read this off is what separates the adults from the babes, in machine learning.)

  Make no mistake: Neither Network 1 nor Network 2 is biologically realistic. But it still seems like a fair guess that however the brain really works, it is in some sense closer to Network 2 than Network 1. Fast, cheap, scalable, works well to distinguish dogs and cats: natural selection goes for that sort of thing like water running down a fitness landscape.

  It seems like an ordinary enough task to classify objects as either bleggs or rubes, tossing them into the appropriate bin. But would you notice if sea-blue objects never glowed in the dark?

  Maybe, if someone presented you with twenty objects that were alike only in being sea-blue, and then switched off the light, and none of the objects glowed. If you got hit over the head with it, in other words. Perhaps by presenting you with all these sea-blue objects in a group, your brain forms a new subcategory, and can detect the “doesn’t glow” characteristic within that subcategory. But you probably wouldn’t notice if the sea-blue objects were scattered among a hundred other bleggs and rubes. It wouldn’t be easy or intuitive to notice, the way that distinguishing cats and dogs is easy and intuitive.

  Or: “Socrates is human, all humans are mortal, therefore Socrates is mortal.” How did Aristotle know that Socrates was human? Well, Socrates had no feathers, and broad nails, and walked upright, and spoke Greek, and, well, was generally shaped like a human and acted like one. So the brain decides, once and for all, that Socrates is human; and from there, infers that Socrates is mortal like all other humans thus yet observed. It doesn’t seem easy or intuitive to ask how much wearing clothes, as opposed to using language, is associated with mortality. Just, “things that wear clothes and use language are human” and “humans are mortal.”

  Are there biases associated with trying to classify things into categories once and for all? Of course there are. See e.g. Cultish Countercultishness.

  *

  162

  How An Algorithm Feels From Inside

  “If a tree falls in the forest, and no one hears it, does it make a sound?” I remember seeing an actual argument get started on this subject—a fully naive argument that went nowhere near Berkeleian subjectivism. Just:

  “It makes a sound, just like any other falling tree!”

  “But how can there be a sound that no one hears?”

  The standard rationalist view would be that the first person is speaking as if “sound” means acoustic vibrations in the air; the second person is speaking as if “sound” means an auditory experience in a brain. If you ask “Are there acoustic vibrations?” or “Are there auditory experiences?,” the answer is at once obvious. And so the argument is really about the definition of the word “sound.”

  I think the standard analysis is essentially correct. So let’s accept that as a premise, and ask: Why do people get into such arguments? What’s the underlying psychology?

  A key idea of the heuristics and biases program is that mistakes are often more revealing of cognition than correct answers. Getting into a heated dispute about whether, if a tree falls in a deserted forest, it makes a sound, is traditionally considered a mistake.

  So what kind of mind design corresponds to that error?

  In Disguised Queries I introduced the blegg/rube classification task, in which Susan the Senior Sorter explains that your job is to sort objects coming off a conveyor belt, putting the blue eggs or “bleggs” into one bin, and the red cubes or “rubes” into the rube bin. This, it turns out, is because bleggs contain small nuggets of vanadium ore, and rubes contain small shreds of palladium, both of which are useful industrially.

  Except that around 2% of blue egg-shaped objects contain palladium instead. So if you find a blue egg-shaped thing that contains palladium, should you call it a “rube” instead? You’re going to put it in the rube bin—why not call it a “rube”?

  But when you switch off the light, nearly all bleggs glow faintly in the dark. And blue egg-shaped objects that contain palladium are just as likely to glow in the dark as any other blue egg-shaped object.

  So if you find a blue egg-shaped object that contains palladium and you ask “Is it a blegg?,” the answer depends on what you have to do with the answer. If you ask “Which bin does the object go in?,” then you choose as if the object is a rube. But if you ask “If I turn off the light, will it glow?,” you predict as if the object is a blegg. In one case, the question “Is it a blegg?” stands in for the disguised query, “Which bin does it go in?” In the other case, the question “Is it a blegg?” stands in for the disguised query, “Will it glow in the dark?”

  Now suppose that you have an object that is blue and egg-shaped and contains palladium; and you have already observed that it is furred, flexible, opaque, and glows in the dark.

  This answers every query, observes every observable introduced. There’s nothing left for a disguised query to stand for.

  So why might someone feel an impulse to go on arguing whether the object is really a blegg?

  Figure 162.1: Network 1

  Figure 162.2: Network 2

  These diagrams from Neural Categories show two different neural networks that might be used to answer questions about bleggs and rubes. Network 1 (Figure 162.1) has a number of disadvantages—such as potentially oscillating/chaotic behavior, or requiring O(N2) connections—but Network 1’s structure does have one major advantage over Network 2: every unit in the network corresponds to a testable query. If you observe every observable, clamping every value, there are no units in the network left over.

  Network 2 (Figure 162.2), however, is a far better candidate for being something vaguely like how the human brain works: It’s fast, cheap, scalable—and has an extra dangling unit in the center, whose activation can still vary, even after we’ve observed every single one of the
surrounding nodes.

  Which is to say that even after you know whether an object is blue or red, egg or cube, furred or smooth, bright or dark, and whether it contains vanadium or palladium, it feels like there’s a leftover, unanswered question: But is it really a blegg?

  Usually, in our daily experience, acoustic vibrations and auditory experience go together. But a tree falling in a deserted forest unbundles this common association. And even after you know that the falling tree creates acoustic vibrations but not auditory experience, it feels like there’s a leftover question: Did it make a sound?

  We know where Pluto is, and where it’s going; we know Pluto’s shape, and Pluto’s mass—but is it a planet?

  Now remember: When you look at Network 2, as I’ve laid it out here, you’re seeing the algorithm from the outside. People don’t think to themselves, “Should the central unit fire, or not?” any more than you think “Should neuron #12,234,320,242 in my visual cortex fire, or not?”

  It takes a deliberate effort to visualize your brain from the outside—and then you still don’t see your actual brain; you imagine what you think is there. Hopefully based on science, but regardless, you don’t have any direct access to neural network structures from introspection. That’s why the ancient Greeks didn’t invent computational neuroscience.

  When you look at Network 2, you are seeing from the outside; but the way that neural network structure feels from the inside, if you yourself are a brain running that algorithm, is that even after you know every characteristic of the object, you still find yourself wondering: “But is it a blegg, or not?”

  This is a great gap to cross, and I’ve seen it stop people in their tracks. Because we don’t instinctively see our intuitions as “intuitions,” we just see them as the world. When you look at a green cup, you don’t think of yourself as seeing a picture reconstructed in your visual cortex—although that is what you are seeing—you just see a green cup. You think, “Why, look, this cup is green,” not, “The picture in my visual cortex of this cup is green.”

 

‹ Prev