Theory and Reality

Home > Other > Theory and Reality > Page 30
Theory and Reality Page 30

by Peter Godfrey-Smith


  It can seem to be a weakness because Bayesianism cannot criticize very strange initial assignments of probability. And, one might think, where you end up after updating your probabilities must depend on where you start.

  But this is only true in a sense. Bayesians argue that although prior probabilities are freely chosen and might be weird initially, the starting point gets "washed out" by incoming evidence, so long as updating is done rationally. The starting point matters less and less as more evidence is taken into account.

  This idea is usually expressed as a kind of convergence. Consider two people with very different prior probabilities for h, but the same likelihoods for all possible pieces of evidence (el, e2, e3 . . .). And suppose the two people see all the same actual evidence. Then these two people's probability for h will get closer and closer. It can be proved that for any amount of initial disagreement about h, there will be some amount of evidence that will get the two people to any specified degree of closeness in their final probabilities for h. That is, if having final probabilities within (say) o.ooi of each other counts as being in close agreement, then no matter how far apart people start out, there is some amount of evidence that will get them within o.ooi of each other by the end. So initial disagreement is eventually washed out by the weight of evidence.

  This convergence could, however, take a very long time. These "in the limit" proofs may not help much. As Henry Kyburg likes to put it, we must also accept that for any amount of evidence, and any measure of agreement, there is some initial set of priors such that this evidence will not get the two people to agree by the end. So some Bayesians have tried to work out a way of "constraining" the initial assignments of probability that Bayesianism allows.

  I think there is a more basic problem with the arguments about convergence or the "washing out" of prior probabilities. The convergence proofs assume that when two people start with very different priors, they nonetheless agree about all their likelihoods (probabilities of the form P(ellh), etc.). That is needed for disagreement about the priors to "wash out." But why should we expect this agreement about likelihoods? Why should two people who disagree massively on many things have the same likelihoods for all possible evidence? Why don't their disagreements affect their views on the relevance of possible observations? This agreement might be present, but there is no general reason why it should be. (This is another aspect of the problem of holism.) Presentations of Bayesianism often use simple examples involving gambling games or sampling processes, in which it seems that there will be agreement about likelihoods even when people have different priors. But those cases are not typical.

  This argument suggests that convergence results do not help much with problems about theory choice in science. It's not clear whether this is a big problem for Bayesianism, however, as there is controversy about how important the "washing out of priors" really is to Bayesianism.

  Prior probabilities are also the key to the standard Bayesian answer to Goodman's new riddle of induction. The riddle was introduced back in section 3.4. Suppose we are presented with two inductive arguments made from the same set of observations of green emeralds. One induction concludes that all emeralds are green, and the other concludes that all emeralds are grue. Why is one induction good and the other bad?

  The standard Bayesian reply is that both inductive arguments are OK. Both hypotheses are confirmed by the observations of green emeralds. However, the "All emeralds are green" hypothesis will, for most people, have a much higher prior probability than the "All emeralds are grue" hypothesis. Then although both hypotheses are confirmed by the observations, the green-emerald hypothesis ends up with a much higher posterior probability than the grueemerald hypothesis. That is the difference between the two inductions.

  This does establish a difference, but why does the grueemerald hypothesis have a low prior probability? Is anything stopping a person from having things the other way around-having a higher prior for the grueemerald hypothesis? Nothing is preventing this. A person's prior probability for the grueemerald hypothesis will usually be the result of much past experience with colors, minerals, and so on. But this need not be the case. Suppose you have never had a single thought about emeralds in your life, and you arbitrarily decide to set a higher prior for the grueemerald hypothesis. Bayesianism offers no criticism of this decision, so long as your probabilities are internally coherent and you update properly. Is this a bad result, or a good one?

  14.5 Scientific Realism and Theories of Evidence

  Perhaps Bayesianism will win in the end. But in the rest of this chapter I will discuss some other ideas. I should note that it is not clear which of these ideas are really in competition with Bayesianism, as opposed to complementing it. My aim will be to tie the problem of evidence to the discussions of realism and naturalism in earlier chapters.

  Let us look again at what a theory of evidence should try to analyze. Much twentiethcentury empiricism based its discussions of evidence upon a simple picture of what scientific testing is supposed to achieve: the aim of testing is to confirm and disconfirm generalizations by means of observation. That is what is fundamental to science. In the case of disconfirmation, deductive logic will suffice. Confirmation was to be analyzed using a special inductive logic.

  This view failed. It failed to connect with actual science, and it failed even in its own terms; it could not make much sense of confirmation within the simple picture being used.

  So here is a better picture of what scientific testing aims to do. Testing in science is typically an attempt to choose between rival hypotheses about the hidden structure of the world. These hypotheses will sometimes be expressed using mathematical models, sometimes using linguistic descriptions, and sometimes in other ways. Sometimes the "hidden structures" postulated will be causal mechanisms, sometimes they will be mathematical relationships that may be hard to interpret in terms of cause and effect, and sometimes they will have some other form. Sometimes the aim is to work out a whole new kind of explanation, and sometimes it is just to work out the details (like the value of a key parameter). Sometimes the aim is to understand general patterns, and sometimes it is to reconstruct particular events in the past; I mean to include attempts to answer questions like "Where did HIV come from?" as well as questions like "Why do things fall when dropped?"

  Back during the Scientific Revolution, it was common to think of the problem of evidence using the analogy of a clock. A scientist is like someone observing the motions of a clock from outside and trying to make inferences about the clock's inner workings (Shapin 1996). This analogy is too restricted as a picture of how science works, but it is closer to the truth than the picture found in much twentiethcentury empiricist philosophy.

  Approaching the problem of evidence with this realist orientation is a good idea, but we should be careful not to overgeneralize. I said that testing is typically an attempt to choose between hypotheses about hidden structure. Typically does not mean always. My discussion of realism in chapter z z allowed that not all science is like this. Kuhn and Laudan have both emphasized, correctly, that different theories and paradigms can bring with them somewhat different accounts of what good scientific theories should do. These differences are likely to show up when we try to give a philosophical account of testing and evidence. For this reason and others, we may need to get used to the idea of a mixed or pluralist theory of evidence in science.

  Once we move toward scientific realism, though, it becomes clear that understanding explanatory inference will be crucially important in any future account of testing. Explanatory inference, as defined in chapter 3, is inference from a set of data to a hypothesis about a structure or process that would explain the data. This is far more important in actual science than the philosopher's traditional conception of induction. Indeed, it can be argued that good inferences about what to expect, what patterns will continue in our experience, and so on, typically are developed via inferences about what the world is like and what processes are operatin
g.

  Philosophers have not made a lot of progress on understanding explanatory inference yet. But one idea that was long neglected, and recently revived, will surely turn out to be part of the answer. This is inference via the elimination of alternatives, supporting one option by ruling out others. I will call this "eliminative inference." (It is sometimes called "eliminative induction;' but that is another overly broad use of the term "induction.")

  Eliminative inference is, of course, the kind of reasoning associated with the famous fictional detective Sherlock Holmes. If we can rule out all the suspects except one, we know who committed the crime. This approach to evidence and testing has an odd history in twentiethcentury philosophy. It was often neglected, partly because philosophers tended to assume that in science there is always an infinite number of possible alternatives to any given theory. If a theory has an infinite number of rivals, then ruling out any finite number of alternatives does not reduce the number of possibilities remaining. This argument might not be very good, however. Maybe there are ways of constraining the relevant alternatives to a theory being considered, in which case we might be able to rule out most or all of the relevant alternatives.

  A chemist, John Platt, once wrote a paper in which he argued that good science is generally based on eliminative inference (1964). His view looked like a modified version of Popperian testing. The paper was mostly ignored by philosophers but taken seriously by quite a few scientists. In recent years, philosophers have begun to resurrect the idea of eliminative inference. For example, John Earman has done this within a Bayesian framework (199z), and Philip Kircher has done so without linking the idea to Bayesianism (1993).

  Eliminative inference can have a deductive or a nondeductive form. The simplest cases are those where we are able to decisively rule out all options except one. If we can do that, then our inference can be presented as a deductive argument. (As always, such an argument is only as good as its premises.) This is what Sherlock was trying to do. There are two ways in which a nondeductive element can be introduced. First, there may be a less decisive ruling out of alternatives; maybe we can only hope to show that all alternatives except one are very unlikely. Second, we need to consider the case where we are able to rule out most, but not all, of the alternatives to a hypothesis. Maybe as we rule out more and more of the alternatives to a given hypothesis, that hypothesis acquires a kind of partial support, even though some doubt remains. Perhaps we can say that the theory becomes more and more likely to be true. Clearly, in the nondeductive cases it might make sense to embed eliminative inference within a Bayesian framework, which enables us to handle the idea of probability in a precise way. This is indeed a compatible relationship, since Bayesianism explicitly handles evidence in a comparative manner (for one hypothesis to gain credibility, another hypothesis has to lose).

  One important feature of eliminative inference is this: it is clear that scientists give arguments of this kind all the time; there is no possibility that this is just a philosophical fiction. Elimination is also clearly important in understanding the hardest cases for a theory of explanatory inference: those in which whole new kinds of explanations, models, or theories were established in science. For these often involve what were seen at the time as head-to-head competitions: Darwin versus nineteenthcentury creationism, Galileo versus Aristotelian physics, Skinner's behaviorist theory of language versus the "cognitivist" approach of Chomsky. In fact, looking at the history of science reminds us of the chief difficulty with arguments of the eliminative form: how do we know that we have considered all the relevant alternatives? It can be argued that scientists constantly tend toward overconfidence on this point (Stanford zoos). Scientists often think they have ruled out (or rendered very unlikely) all the feasible alternatives to their preferred theory-but in hindsight we can see that in many cases they did not do so, because we now believe a theory they did not even consider. So a focus on eliminative inference has the potential to illuminate both the successes and the failures found in scientific reasoning.

  An emphasis on eliminative inference will probably be part of any good future philosophical theory of testing and evidence in science. It should not be made too central, though. The idea that supporting one option always works by ruling out others is too narrow; there can also be more direct support of one option. An example is discussed in the next section.

  I will mention one other crucial form of reasoning that we see in actual science. This one, however, is much more philosophically perplexing than eliminative inference. Scientists often support hypotheses via an appeal to simplicity, or "parsimony." This was discussed briefly in chapter 3. Given two possible explanations for the data, scientists often prefer the simpler one. Despite various elaborate attempts, I do not think we have made much progress on understanding the operation of, or justification for, this preference.

  14.6 Procedural Naturalism (Optional Section)

  In this section I will outline some of my own ideas on the topic of evidence and testing. These ideas are intended to provide an alternative general picture to Bayesianism. But some of the ideas can be (and have sometimes been) combined with Bayesianism. The general viewpoint described is also supposed to be compatible with the discussion of eliminative inference just above.

  The main idea I will defend is that we should analyze evidence, confirmation, and testing by focusing on procedures. If an observation provides support for a theory, that will be by virtue of the procedure that the observation was embedded within. Not all procedures must be explicit, planned tests or experiments; some can be more informal.

  This procedure-oriented view goes back a fair way. One important source is Hans Reichenbach, who did not follow standard logical positivist thinking about confirmation. Reichenbach was also influenced by some statistical methods used in science. My version of this idea will be linked to naturalism. It also uses the idea of reliability; a good procedure is one that has the capacity to reliably answer the questions we put to it. In order to have a simple label, I will call the view procedural naturalism.

  I will illustrate this view by looking at a particular type of procedure employed often in science: using random samples to make inferences about the characteristics of a larger population. This is the kind of procedure involved when we use a survey to find out (for example) how many teenagers smoke cigarettes. In a way, this is the closest thing to a scientific home for the traditional philosophical picture of inductive inference. But it turns out that if we approach some of the standard philosophical problems from a procedure-based point of view, it makes a big difference. So let us look again at the two famous puzzle cases discussed in chapter 3, the ravens problem and Goodman's "grue" problem.

  The ravens problem was described in section 3.3. If generalizations are confirmed by their instances, and if any observation that confirms H also confirms anything logically equivalent to H, then it seems that a white shoe confirms the hypothesis that all ravens are black. After all, the white shoe is a nonblack nonraven. So it is an instance of the generalization "all nonblack things are nonravens;' which is logically equivalent to "all ravens are black."

  In most philosophical discussions of this problem, there is little attention paid to how the observations of ravens (and shoes) are being collected. Of course, the whole example is very unrealistic. A biologist would not try to learn about bird color simply by generalizing from observed cases. But let us imagine that a biologist is doing something like this. Rather than relying on casual observation, though, the biologist uses a statistical method.

  Let us now distinguish two questions about ravens.

  The general raven question: What is the proportion of blackness among ravens? The specific raven question: Is it the case that zoo percent of ravens are black?

  Questions of this kind can be reliably answered using samples from a larger population, if we have a sample that is random and of a reasonable size. Statistical theory will tell us exactly how large a sample we need in order to get an answer
with a desired degree of reliability (here "size" of sample means absolute size, not size in relation to the size of overall population). So how might we collect an appropriate sample?

  The most obvious approach is to collect a random sample of ravens and record the birds' colors. A sample of this kind can be used to answer both the specific and the general raven questions, using ordinary statistical methods. So far, so good.

  But now consider a more unusual approach. Suppose we could collect a random sample of nonblack things and record whether or not they are ravens. This method will be useless for answering the general raven question. Interestingly, though, it can be used to reliably answer the specific raven question. If there are nonblack ravens, we can learn this, in principle, by randomly sampling the nonblack things.

  Now that we are imagining unusual sampling methods, there are two others to consider: collecting a sample of black things, and collecting a sample of nonravens. Neither of these can be used, without further assumptions, to answer either of the raven questions. Knowing what proportion of the black things are ravens does not tell us what proportion of ravens are black, and a sample of nonravens is of no use either.

  So far we have distinguished between some procedures that can, and some procedures that cannot, answer our questions about ravens. Now we can look at the role of particular observations. Consider a particular observation of a white shoe. Does it tell us anything about raven color? It depends on what procedure the observation was part of. If the white shoe was encountered as part of a random sample of nonblack things, then it is evidence. It is just one data point, but it is a nonblack thing that turned out not to be a raven. It is part of a sample that we can use to answer the specific question (though not the general question), and work out whether there are nonblack ravens. But if the very same white shoe is encountered in a sample of nonravens, it tells us nothing. The observation is now part of a procedure that cannot answer either question.

 

‹ Prev