Rationality- From AI to Zombies

Home > Science > Rationality- From AI to Zombies > Page 118
Rationality- From AI to Zombies Page 118

by Eliezer Yudkowsky


  Fundamentals should be simple. “Life” is not a good fundamental, “oxygen” is a good fundamental, and “electromagnetic field” is a better fundamental. Life might look simple to a vitalist—it’s the simple, magical ability of your muscles to move under your mental direction. Why shouldn’t life be explained by a simple, magical fundamental substance like élan vital? But phenomena that seem psychologically very simple—little dots of light in the sky, orangey-bright hot flame, flesh moving under mental direction—often conceal vast depths of underlying complexity. The proposition that life is a complex phenomenon may seem incredible to the vitalist, staring at a blankly opaque mystery with no obvious handles; but yes, Virginia, there is underlying complexity. The criterion of simplicity that is relevant to Occam’s Razor is mathematical or computational simplicity. Once we render down our model into mathematically simple fundamental elements, not in themselves sharing the mysterious qualities of the mystery, interacting in clearly defined ways to produce the formerly mysterious phenomenon as a detailed prediction, that is as non-mysterious as humanity has ever figured out how to make anything.

  * * *

  Many people in this world believe that after dying they will face a stern-eyed fellow named St. Peter, who will examine their actions in life and accumulate a score for morality. Presumably St. Peter’s scoring rule is unique and invariant under trivial changes of perspective. Unfortunately, believers cannot obtain a quantitative, precisely computable specification of the scoring rule, which seems rather unfair.

  The religion of Bayesianity holds that your eternal fate depends on the probability judgments you made in life. Unlike lesser faiths, Bayesianity can give a quantitative, precisely computable specification of how your eternal fate is determined.

  Our proper Bayesian scoring rule provides a way to accumulate scores across experiments, and the score is invariant regardless of how we slice up the “experiments” or in what order we accumulate the results. We add up the logarithms of the probabilities. This corresponds to multiplying together the probability assigned to the outcome in each experiment, to find the joint probability of all the experiments together. We take the logarithm to simplify our intuitive understanding of the accumulated score, to maintain our grip on the tiny fractions involved, and to ensure we maximize our expected score by stating our honest probabilities rather than placing all our play money on the most probable bet.

  Bayesianity states that when you die, Pierre-Simon Laplace examines every single event in your life, from finding your shoes next to your bed in the morning to finding your workplace in its accustomed spot. Every losing lottery ticket means you cared enough to play. Laplace assesses the advance probability you assigned to each event. Where you did not assign a precise numerical probability in advance, Laplace examines your degree of anticipation or surprise, extrapolates other possible outcomes and your extrapolated reactions, and renormalizes your extrapolated emotions to a likelihood distribution over possible outcomes. (Hence the phrase “Laplacian superintelligence.”)

  Then Laplace takes every event in your life, and every probability you assigned to each event, and multiplies all the probabilities together. This is your Final Judgment—the probability you assigned to your life.

  Those who follow Bayesianity strive all their lives to maximize their Final Judgment. This is the sole virtue of Bayesianity. The rest is just math.

  Mark you: the path of Bayesianity is strict. What probability shall you assign each morning, to the proposition, “The Sun shall rise?” (We shall discount such quibbles as cloudy days, and that the Earth orbits the Sun.) Perhaps one who did not follow Bayesianity would be humble, and give a probability of 99.9%. But we who follow Bayesianity shall discard all considerations of modesty and arrogance, and scheme only to maximize our Final Judgment. Like an obsessive video-game player, we care only about this numerical score. We’re going to face this Sun-shall-rise issue 365 times per year, so we might be able to improve our Final Judgment considerably by tweaking our probability assignment.

  As it stands, even if the Sun rises every morning, every year our Final Judgment will decrease by a factor of 0.999365 = 0.7, roughly -0.52 bits. Every two years, our Final Judgment will decrease more than if we found ourselves ignorant of a coinflip’s outcome! Intolerable. If we increase our daily probability of sunrise to 99.99%, then each year our Final Judgment will decrease only by a factor of 0.964. Better. Still, in the unlikely event that we live exactly 70 years and then die, our Final Judgment will only be 7.75% of what it might have been. What if we assign a 99.999% probability to the sunrise? Then after 70 years, our Final Judgment will be multiplied by 77.4%.

  Why not assign a probability of 1.0?

  One who follows Bayesianity will never assign a probability of 1.0 to anything. Assigning a probability of 1.0 to some outcome uses up all your probability mass. If you assign a probability of 1.0 to some outcome, and reality delivers a different answer, you must have assigned the actual outcome a probability of zero. This is Bayesianity’s sole mortal sin. Zero times anything is zero. When Laplace multiplies together all the probabilities of your life, the combined probability will be zero. Your Final Judgment will be doodly-squat, zilch, nada, nil. No matter how rational your guesses during the rest of your life, you’ll spend eternity next to some guy who believed in flying saucers and got all his information from the Weekly World News. Again we find it helpful to take the logarithm, revealing the innocent-sounding “zero” in its true form. Risking an outcome probability of zero is like accepting a bet with a payoff of negative infinity.

  What if humanity decides to take apart the Sun for mass (stellar engineering), or to switch off the Sun because it’s wasting entropy? Well, you say, you’ll see that coming, you’ll have a chance to alter your probability assignment before the actual event. What if an Artificial Intelligence in someone’s basement recursively self-improves to superintelligence, stealthily develops nanotechnology, and one morning it takes apart the Sun? If on the last night of the world you assign a probability of 99.999% to tomorrow’s sunrise, your Final Judgment will go down by a factor of 100,000. Minus 50 decibels! Awful, isn’t it?

  So what is your best strategy? Well, suppose you 50% anticipate that a basement-spawned AI superintelligence will disassemble the Sun sometime in the next ten years, and you figure there’s about an equal chance of this happening on any given day between now and then. On any given night, you would 99.98% anticipate the Sun rising tomorrow. If this is really what you anticipate, then you have no motive to say anything except 99.98% as your probability. If you feel nervous that this anticipation is too low, or too high, it must not be what you anticipate after your nervousness is taken into account.

  But the deeper truth of Bayesianity is this: You cannot game the system. You cannot give a humble answer, nor a confident one. You must figure out exactly how much you anticipate the Sun rising tomorrow, and say that number. You must shave away every hair of modesty or arrogance, and ask whether you expect to end up being scored on the Sun rising, or failing to rise. Look not to your excuses, but ask which excuses you expect to need. After you arrive at your exact degree of anticipation, the only way to further improve your Final Judgment is to improve the accuracy, calibration, and discrimination of your anticipation. You cannot do better except by guessing better and anticipating more precisely.

  Er, well, except that you could commit suicide when you turned five, thereby preventing your Final Judgment from decreasing any further. Or if we patch a new sin onto the utility function, enjoining against suicide, you could flee from mystery, avoiding all situations in which you thought you might not know everything. So much for that religion.

  * * *

  Ideally, we predict the outcome of the experiment in advance, using our model, and then we perform the experiment to see if the outcome accords with our model. Unfortunately, we can’t always control the information stream. Sometimes Nature throws experiences at us, and by the time we think of an explanation, we�
��ve already seen the data we’re supposed to explain. This was one of the scientific sins committed by nineteenth century evolutionism; Darwin observed the similarity of many species, and their adaptation to particular local environments, before the hypothesis of natural selection occurred to him. Nineteenth century evolutionism began life as a post facto explanation, not an advance prediction.

  Nor is this a trouble only of semitechnical theories. In 1846, the successful deduction of Neptune’s existence from gravitational perturbations in the orbit of Uranus was considered a grand triumph for Newton’s theory of gravitation. Why? Because Neptune’s existence was the first observation that confirmed an advance prediction of Newtonian gravitation. All the other phenomena that Newton explained, such as orbits and orbital perturbations and tides, had been observed in great detail before Newton explained them. No one seriously doubted that Newton’s theory was correct. Newton’s theory explained too much too precisely, and it replaced a collection of ad hoc models with a single unified mathematical law. Even so, the advance prediction of Neptune’s existence, followed by the observation of Neptune at almost exactly the predicted location, was considered the first grand triumph of Newton’s theory at predicting what no previous model could predict. Considerable time elapsed between widespread acceptance of Newton’s theory and the first impressive advance prediction of Newtonian gravitation. By the time Newton came up with his theory, scientists had already observed, in great detail, most of the phenomena that Newtonian gravitation predicted.

  But the rule of advance prediction is a morality of science, not a law of probability theory. If you have already seen the data you must explain, then Science may darn you to heck, but your predicament doesn’t collapse the laws of probability theory. What does happen is that it becomes much more difficult for a hapless human to obey the laws of probability theory. When you’re deciding how to rate a hypothesis according to the Bayesian scoring rule, you need to figure out how much probability mass that hypothesis assigns to the observed outcome. If we must make our predictions in advance, then it’s easier to notice when someone is trying to claim every possible outcome as an advance prediction, using too much probability mass, being deliberately vague to avoid falsification, and so on.

  No numerologist can predict next week’s winning lottery numbers, but they will be happy to explain the mystical significance of last week’s winning lottery numbers. Say the winning Mega Ball was seven in last week’s lottery, out of 52 possible outcomes. Obviously this happened because seven is the lucky number. So will the Mega Ball in next week’s lottery also come up seven? We understand that it’s not certain, of course, but if it’s the lucky number, you ought to assign a probability of higher than 1/52 . . . and then we’ll score your guesses over the course of a few years, and if your score is too low we’ll have you flogged . . . what’s that you say? You want to assign a probability of exactly 1/52? But that’s the same probability as every other number; what happened to seven being lucky? No, sorry, you can’t assign a 90% probability to seven and also a 90% probability to eleven. We understand they’re both lucky numbers. Yes, we understand that they’re very lucky numbers. But that’s not how it works.

  Even if the listener does not know the way of Bayes and does not ask for formal probabilities, they will probably become suspicious if you try to cover too many bases. Suppose they ask you to predict next week’s winning Mega Ball, and you use numerology to explain why the number one ball would fit your theory very well, and why the number two ball would fit your theory very well, and why the number three ball would fit your theory very well . . . even the most credulous listener might begin to ask questions by the time you got to twelve. Maybe you could tell us which numbers are unlucky and definitely won’t win the lottery? Well, thirteen is unlucky, but it’s not absolutely impossible (you hedge, anticipating in advance which excuse you might need).

  But if we ask you to explain last week’s lottery numbers, why, the seven was practically inevitable. That seven should definitely count as a major success for the “lucky numbers” model of the lottery. And it couldn’t possibly have been thirteen; luck theory rules that straight out.

  * * *

  Imagine that you wake up one morning and your left arm has been replaced by a blue tentacle. The blue tentacle obeys your motor commands—you can use it to pick up glasses, drive a car, etc. How would you explain this hypothetical scenario? Take a moment to ponder this puzzle before continuing.

  (Spoiler space . . .)

  How would I explain the event of my left arm being replaced by a blue tentacle? The answer is that I wouldn’t. It isn’t going to happen.

  It would be easy enough to produce a verbal explanation that “fit” the hypothetical. There are many explanations that can “fit” anything, including (as a special case of “anything”) my arm’s being replaced by a blue tentacle. Divine intervention is a good all-purpose explanation. Or aliens with arbitrary motives and capabilities. Or I could be mad, hallucinating, dreaming my life away in a hospital. Such explanations “fit” all outcomes equally well, and equally poorly, equating to hypotheses of complete ignorance.

  The test of whether a model of reality “explains” my arm’s turning into a blue tentacle is whether the model concentrates significant probability mass into that particular outcome. Why that dream, in the hospital? Why would aliens do that particular thing to me, as opposed to the other billion things they might do? Why would my arm turn into a tentacle on that morning, after remaining an arm through every other morning of my life? And in all cases I must look for an argument compelling enough to make that particular prediction in advance, not mere compatibility. Once I already knew the outcome, it would become far more difficult to sift through hypotheses to find good explanations. Whatever hypothesis I tried, I would be hard-pressed not to allocate more probability mass to yesterday’s blue-tentacle outcome than if I extrapolated blindly, seeking the model’s most likely prediction for tomorrow.

  A model does not always predict all the features of the data. Nature has no privileged tendency to present me with solvable challenges. Perhaps a deity toys with me, and the deity’s mind is computationally intractable. If I flip a fair coin there is no way to further explain the outcome, no model that makes a better prediction than the maximum-entropy hypothesis. But if I guess a model with no internal detail or a model that makes no further predictions, I not only have no reason to believe that guess, I have no reason to care. Last night my arm was replaced with a blue tentacle. Why? Aliens! So what will they do tomorrow? Similarly, if I attribute the blue tentacle to a hallucination as I dream my life away in a coma, I still don’t know any more about what I’ll hallucinate tomorrow. So why do I care whether it was aliens or hallucination?

  What might be a good explanation, then, if I woke up one morning and found my arm transformed into a blue tentacle? To claim a “good explanation” for this hypothetical experience would require an argument such that, contemplating the hypothetical argument now, before my arm has transformed into a blue tentacle, I would go to sleep worrying that my arm really would transform into a tentacle.

  People play games with plausibility, explaining events they expect to never actually encounter, yet this necessarily violates the laws of probability theory. How many people who thought they could “explain” the hypothetical experience of waking up with their arm replaced by a tentacle, would go to sleep wondering if it might really happen to them? Had they the courage of their convictions, they would say: I do not expect to ever encounter this hypothetical experience, and therefore I cannot explain, nor have I a motive to try. Such things only happen in webcomics, and I need not prepare explanations, for in real life I shall never have a chance to use them. If I ever find myself in this impossible situation, let me miss no jot or tittle of my valuable bewilderment.

  To a Bayesian, probabilities are anticipations, not mere beliefs to proclaim from the rooftops. If I have a model that assigns probability mass to waking up with a blue tentacl
e, then I am nervous about waking up with a blue tentacle. What if the model is a fanciful one, like a witch casting a spell that transports me into a randomly selected webcomic? Then the prior probability of webcomic witchery is so low that my real-world understanding doesn’t assign any significant weight to that hypothesis. The witchcraft hypothesis, if taken as a given, might assign non-insignificant likelihood to waking up with a blue tentacle. But my anticipation of that hypothesis is so low that I don’t anticipate any of the predictions of that hypothesis. That I can conceive of a witchcraft hypothesis should in no wise diminish my stark bewilderment if I actually wake up with a tentacle, because the real-world probability I assign to the witchcraft hypothesis is effectively zero. My zero-probability hypothesis wouldn’t help me explain waking up with a tentacle, because the argument isn’t good enough to make me anticipate waking up with a tentacle.

  In the laws of probability theory, likelihood distributions are fixed properties of a hypothesis. In the art of rationality, to explain is to anticipate. To anticipate is to explain. Suppose I am a medical researcher, and in the ordinary course of pursuing my research, I notice that my clever new theory of anatomy seems to permit a small and vague possibility that my arm will transform into a blue tentacle. “Ha ha!” I say, “how remarkable and silly!” and feel ever so slightly nervous. That would be a good explanation for waking up with a tentacle, if it ever happened.

  If a chain of reasoning doesn’t make me nervous, in advance, about waking up with a tentacle, then that reasoning would be a poor explanation if the event did happen, because the combination of prior probability and likelihood was too low to make me allocate any significant real-world probability mass to that outcome.

 

‹ Prev