Rationality- From AI to Zombies

Home > Science > Rationality- From AI to Zombies > Page 116
Rationality- From AI to Zombies Page 116

by Eliezer Yudkowsky


  Humans often seem to want to have their cake and eat it too. Whichever result we witness is the one that proves our theory. As Spee, the priest in Conservation of Expected Evidence, put it, “The investigating committee would feel disgraced if it acquitted a woman; once arrested and in chains, she has to be guilty, by fair means or foul.”9

  The way human psychology seems to work is that first we see something happen, and then we try to argue that it matches whatever hypothesis we had in mind beforehand. Rather than conserved probability mass, to distribute over advance predictions, we have a feeling of compatibility—the degree to which the explanation and the event seem to “fit.” “Fit” is not conserved. There is no equivalent of the rule that probability mass must sum to one. A psychoanalyst may explain any possible behavior of a patient by constructing an appropriate structure of “rationalizations” and “defenses”; it fits, therefore it must be true.

  Now consider the fable told in Fake Explanations—the students seeing a radiator, and a metal plate next to the radiator. The students would never predict in advance that the side of the plate near the radiator would be cooler. Yet, seeing the fact, they managed to make their explanations “fit.” They lost their precious chance at bewilderment, to realize that their models did not predict the phenomenon they observed. They sacrificed their ability to be more confused by fiction than by truth. And they did not realize “heat induction, blah blah, therefore the near side is cooler” is a vague and verbal prediction, spread across an enormously wide range of possible values for specific measured temperatures. Applying equations of diffusion and equilibrium would give a sharp prediction for possible joint values. It might not specify the first values you measured, but when you knew a few values you could generate a sharp prediction for the rest. The score for the entire experimental outcome would be far better than any less precise alternative, especially a vague and verbal prediction.

  * * *

  You now have a technical explanation of the difference between a verbal explanation and a technical explanation. It is a technical explanation because it enables you to calculate exactly how technical an explanation is. Vague hypotheses may be so vague that only a superhuman intelligence could calculate exactly how vague. Perhaps a sufficiently huge intelligence could extrapolate every possible experimental result, and extrapolate every possible verdict of the vague guesser for how well the vague hypothesis “fit,” and then renormalize the “fit” distribution into a likelihood distribution that summed to one. But in principle one can still calculate exactly how vague is a vague hypothesis. The calculation is just not computationally tractable, the way that calculating airplane trajectories via quantum mechanics is not computationally tractable.

  I hold that everyone needs to learn at least one technical subject: physics, computer science, evolutionary biology, Bayesian probability theory, or something. Someone with no technical subjects under their belt has no referent for what it means to “explain” something. They may think “All is Fire” is an explanation, as did the Greek philosopher Heraclitus. Therefore do I advocate that Bayesian probability theory should be taught in high school. Bayesian probability theory is the sole piece of math I know that is accessible at the high school level, and that permits a technical understanding of a subject matter—the dynamics of belief—that is an everyday real-world domain and has emotionally meaningful consequences. Studying Bayesian probability would give students a referent for what it means to “explain” something.

  Too many academics think that being “technical” means speaking in dry polysyllabisms. Here’s a “technical” explanation of technical explanation:

  The equations of probability theory favor hypotheses that strongly predict the exact observed data. Strong models boldly concentrate their probability density into precise outcomes, making them falsifiable if the data hits elsewhere, and giving them tremendous likelihood advantages over models less bold, less precise. Verbal explanation runs on psychological evaluation of unconserved post facto compatibility instead of conserved ante facto probability density. And verbal explanation does not paint sharply detailed pictures, implying a smooth likelihood distribution in the vicinity of the data.

  Is this satisfactory? No. Hear the impressive and weighty sentences, resounding with the dull thud of expertise. See the hapless students, writing those sentences on a sheet of paper. Even after the listeners hear the ritual words, they can perform no calculations. You know the math, so the words are meaningful. You can perform the calculations after hearing the impressive words, just as you could have done before. But what of one who did not see any calculations performed? What new skills have they gained from that “technical” lecture, save the ability to recite fascinating words?

  “Bayesian” sure is a fascinating word, isn’t it? Let’s get it out of our systems: Bayes Bayes Bayes Bayes Bayes Bayes Bayes Bayes Bayes . . .

  The sacred syllable is meaningless, except insofar as it tells someone to apply math. Therefore the one who hears must already know the math.

  Conversely, if you know the math, you can be as silly as you like, and still technical.

  We thus dispose of yet another stereotype of rationality, that rationality consists of sere formality and humorless solemnity. What has that to do with the problem of distinguishing truth from falsehood? What has that to do with attaining the map that reflects the territory? A scientist worthy of a lab coat should be able to make original discoveries while wearing a clown suit, or give a lecture in a high squeaky voice from inhaling helium. It is written nowhere in the math of probability theory that one may have no fun. The blade that cuts through to the correct answer has no dignity or silliness of itself, though it may fit the hand of a silly wielder.

  * * *

  A useful model isn’t just something you know, as you know that an airplane is made of atoms. A useful model is knowledge you can compute in reasonable time to predict real-world events you know how to observe. Maybe someone will find that, using a model that violates Conservation of Momentum just a little, you can compute the aerodynamics of the 747 much more cheaply than if you insist that momentum is exactly conserved. So if you’ve got two computers competing to produce the best prediction, it might be that the best prediction comes from the model that violates Conservation of Momentum. This doesn’t mean that the 747 violates Conservation of Momentum in real life. Neither model uses individual atoms, but that doesn’t imply the 747 is not made of atoms. Physicists use different models to predict airplanes and particle collisions because it would be too expensive to compute the airplane particle by particle.

  You would prove the 747 is made of atoms with experimental data that the aerodynamic models couldn’t handle; for example, you would train a scanning tunneling microscope on a section of wing and look at the atoms. Similarly, you could use a finer measuring instrument to discriminate between a 747 that really disobeyed Conservation of Momentum like the cheap approximation predicted, versus a 747 that obeyed Conservation of Momentum like underlying physics predicted. The winning theory is the one that best predicts all the experimental predictions together. Our Bayesian scoring rule gives us a way to combine the results of all our experiments, even experiments that use different methods.

  Furthermore, the atomic theory allows, embraces, and in some sense mandates the aerodynamic model. By thinking abstractly about the assumptions of atomic theory, we realize that the aerodynamic model ought to be a good (and much cheaper) approximation of the atomic theory, and so the atomic theory supports the aerodynamic model, rather than competing with it. A successful theory can embrace many models for different domains, so long as the models are acknowledged as approximations, and in each case the model is compatible with (or ideally mandated by) the underlying theory.

  Our fundamental physics—quantum mechanics, the standard family of particles, and relativity—is a theory that embraces an enormous family of models for macroscopic physical phenomena. There is the physics of liquids, and solids, and gases; yet th
is does not mean that there are fundamental things in the world that have the intrinsic property of liquidity.

  Apparently there is colour, apparently sweetness, apparently bitterness, actually there are only atoms and the void.

  —Democritus, 420 BCE, from Robinson and Groves10

  * * *

  In arguing that a “technical” theory should be defined as a theory that sharply concentrates probability into specific advance predictions, I am setting an extremely high standard of strictness. We have seen that a vague theory can be better than nothing. A vague theory can win out over the hypothesis of ignorance, if there are no precise theories to compete against it.

  There is an enormous family of models belonging to the central underlying theory of life and biology, the underlying theory that is sometimes called neo-Darwinism, natural selection, or evolution. Some models in evolutionary theory are quantitative. The way in which DNA encodes proteins is redundant; two different DNA sequences can code for exactly the same protein. There are four DNA bases {A,T,C,G} and 64 possible combinations of three DNA bases. But those 64 possible codons describe only 20 amino acids plus a stop code. Genetic drift ought therefore to produce non-functional changes in species genomes, through mutations which by chance become fixed in the gene pool. The accumulation rate of non-functional differences between the genomes of two species with a common ancestor depends on such parameters as the number of generations elapsed and the intensity of selection at that genetic locus. That’s an example of a member of the family of evolutionary models that produces quantitative predictions. There are also disequilibrium allele frequencies under selection, stable equilibria for game-theoretical strategies, sex ratios, etc.

  This all comes under the heading of “fascinating words.” Unfortunately, there are certain religious factions that spread gross disinformation about evolutionary theory. So I emphasize that many models within evolutionary theory make quantitative predictions that are experimentally confirmed, and that such models are far more than sufficient to demonstrate that, e.g., humans and chimpanzees are related by a common ancestor. If you’ve been victimized by creationist disinformation—that is, if you’ve heard any suggestion that evolutionary theory is controversial or untestable or “just a theory” or non-rigorous or non-technical or in any way not confirmed by an unimaginably huge mound of experimental evidence—I recommend reading the TalkOrigins FAQ11 and studying evolutionary biology with math.

  But imagine going back in time to the nineteenth century, when the theory of natural selection had only just been discovered by Charles Darwin and Alfred Russel Wallace. Imagine evolutionism just after its birth, when the theory had nothing remotely like the modern-day body of quantitative models and great heaping mountains of experimental evidence. There was no way of knowing that humans and chimpanzees would be discovered to have 95% shared genetic material. No one knew that DNA existed. Yet even so, scientists flocked to the new theory of natural selection. And later it turned out that there was a precisely copied genetic material with the potential to mutate, that humans and chimps were provably related, etc.

  So the very strict, very high standard that I proposed for a “technical” theory is too strict. Historically, it has been possible to successfully discriminate true theories from false theories, based on predictions of the sort I called “vague.” Vague predictions of, say, 80% confidence, can build up a huge advantage over alternate hypotheses, given enough experiments. Perhaps a theory of this kind, producing predictions that are not precisely detailed but are nonetheless correct, could be called “semitechnical”?

  But surely technical theories are more reliable than semitechnical theories? Surely technical theories should take precedence, command greater respect? Surely physics, which produces exceedingly exact predictions, is in some sense better confirmed than evolutionary theory? Not implying that evolutionary theory is wrong, of course; but however vast the mountains of evidence favoring evolution, does not physics go one better through vast mountains of precise experimental confirmation? Observations of neutron stars confirm the predictions of General Relativity to within one part in a hundred trillion (1014). What does evolutionary theory have to match that?

  Daniel Dennett once said that measured by the simplicity of the theory and the amount of complexity it explained, Darwin had the single greatest idea in the history of time.12

  Once there was a conflict between nineteenth century physics and nineteenth century evolutionism. According to the best physical models then in use, the Sun could not have been burning very long. Three thousand years on chemical energy, or 40 million years on gravitational energy. There was no energy source known to nineteenth century physics that would permit longer burning. Nineteenth century physics was not quite as powerful as modern physics—it did not have predictions accurate to within one part in 1014. But nineteenth century physics still had the mathematical character of modern physics, a discipline whose models produced detailed, precise, quantitative predictions. Nineteenth century evolutionary theory was wholly semitechnical, without a scrap of quantitative modeling. Not even Mendel’s experiments with peas were then known. And yet it did seem likely that evolution would require longer than a paltry 40 million years in which to operate—hundreds of millions, even billions of years. The antiquity of the Earth was a vague and semitechnical prediction, of a vague and semitechnical theory. In contrast, the nineteenth century physicists had a precise and quantitative model, which through formal calculation produced the precise and quantitative dictum that the Sun simply could not have burned that long.

  The limitations of geological periods, imposed by physical science, cannot, of course, disprove the hypothesis of transmutation of species; but it does seem sufficient to disprove the doctrine that transmutation has taken place through “descent with modification by natural selection.”

  —Lord Kelvin, from Lyle Zapato13

  History records who won.

  The moral? If you can give 80% confident advance predictions on yes-or-no questions, it may be a “vague” theory; it may be wrong one time out of five; but you can still build up a heck of a huge scoring lead over the hypothesis of ignorance. Enough to confirm a theory, if there are no better competitors. Reality is consistent; every correct theory about the universe is compatible with every other correct theory. Imperfect maps can conflict, but there is only one territory. Nineteenth century evolutionism might have been a semitechnical discipline, but it was still correct (as we now know) and by far the best explanation (even in that day). Any conflict between evolutionism and another well-confirmed theory had to reflect some kind of anomaly, a mistake in the assertion that the two theories were incompatible. Nineteenth century physics couldn’t model the dynamics of the Sun—they didn’t know about nuclear reactions. They could not show that their understanding of the Sun was correct in technical detail, nor calculate from a confirmed model of the Sun to determine how long the Sun had existed. So in retrospect, we can say something like: “There was room for the possibility that nineteenth century physics just didn’t understand the Sun.”

  But that is hindsight. The real lesson is that, even though nineteenth century physics was both precise and quantitative, it didn’t automatically dominate the semitechnical theory of nineteenth century evolutionism. The theories were both well-supported. They were both correct in the domains over which they were generalized. The apparent conflict between them was an anomaly, and the anomaly turned out to stem from the incompleteness and incorrect application of nineteenth century physics, not the incompleteness and incorrect application of nineteenth century evolutionism. But it would be futile to compare the mountain of evidence supporting the one theory, versus the mountain of evidence supporting the other. Even in that day, both mountains were too large to suppose that either theory was simply mistaken. Mountains of evidence that large cannot be set to compete, as if one falsifies the other. You must be applying one theory incorrectly, or applying a model outside the domain it predicts well.

/>   So you shouldn’t necessarily sneer at a theory just because it’s semitechnical. Semitechnical theories can build up high enough scores, compared to every available alternative, that you know the theory is at least approximately correct. Someday the semitechnical theory may be replaced or even falsified by a more precise competitor, but that’s true even of technical theories. Think of how Einstein’s General Relativity devoured Newton’s theory of gravitation.

  But the correctness of a semitechnical theory—a theory that currently has no precise, computationally tractable models testable by feasible experiments—can be a lot less cut-and-dried than the correctness of a technical theory. It takes skill, patience, and examination to distinguish good semitechnical theories from theories that are just plain confused. This is not something that humans do well by instinct, which is why we have Science.

  People eagerly jump the gun and seize on any available reason to reject a disliked theory. That is why I gave the example of nineteenth century evolutionism, to show why one should not be too quick to reject a “non-technical” theory out of hand. By the moral customs of science, nineteenth century evolutionism was guilty of more than one sin. Nineteenth century evolutionism made no quantitative predictions. It was not readily subject to falsification. It was largely an explanation of what had already been seen. It lacked an underlying mechanism, as no one then knew about DNA. It even contradicted the nineteenth century laws of physics. Yet natural selection was such an amazingly good post facto explanation that people flocked to it, and they turned out to be right. Science, as a human endeavor, requires advance prediction. Probability theory, as math, does not distinguish between post facto and advance prediction, because probability theory assumes that probability distributions are fixed properties of a hypothesis.

 

‹ Prev