Statistical Inference as Severe Testing

Page 56

by Deborah G Mayo

A Bayesian text by Gelman et al. (2014 ), to its credit, doesn’ t blithely assume that because probability works to express uncertainty about events in games of chance, we may assume it is relevant in making inferences about parameters. They aim to show the overall usefulness of the approach. What about meanings of priors?

We consider two basic interpretations that can be given to prior distributions. In the population interpretation, the prior distribution represents a population of possible parameter values, from which the θ of current interest has been drawn. In the more subjective state of knowledge interpretation, the guiding principle is that we must express our knowledge (and uncertainty) about θ as if its value could be thought of as a random realization from the prior distribution.

(p. 34)

An example from Ghosh et al. (2010 ) lends itself to the “ population interpretation,” which to me sounds like a frequentist prior.

Exhibit (i): Blood pressure and historical aside

Exhibit (i): Blood Pressure and Historical Aside. “ Let X 1 , X 2 , … , Xn be IID N (µ , σ 2 ) and assume for simplicity σ 2 is known. … µ may be the expected reduction of blood pressure due to a new drug. You want to test H 0 : µ ≤ µ 0 vs. H 1 : µ > µ 0 , where µ 0 corresponds with a standard drug already in the market” (Ghosh et al. 2010 , p. 34; their Example 2.4).

Here, µ can be viewed as a random variable that takes on values with different probabilities. The drug of interest may be regarded as a random selection from a population of drugs, each with its expected reductions in blood pressure, i.e., various values of µ . Neyman and Pearson would not have objected; here’ s a historical aside:

“ I began as a quasi-Bayesian” : Neyman

I began as a quasi-Bayesian. My assumption was that the estimated parameter (just one!) is a particular value of a random variable having an unknown prior distribution.

(Neyman 1977 , p. 128)

Finding them so rarely available, he sought interval estimators with a probability of covering the true value being independent of the prior distribution.

[My student Churchill Eisenhart in the 1930s] attended my lectures at the University College, London, and witnessed my introducing a prior distribution … and then making efforts to produce an interval estimator, the properties of which would be independent of the prior. Once, Eisenhart’ s comment was that the whole theory would look nicer if it were built from the start without any reference to Bayesianism and priors. That remark proved inspiring.

(ibid.)

Even the famous 1933 paper considered the Bayesian possibility. E. Pearson had been fully convinced by Fisher’ s non-Bayesian stance before Neyman, never mind the clash with his (Bayesian-leaning) father. It’ s one thing to forgo marriage with the woman you love because dad disapproves (as K. Pearson did); it’ s quite another to follow his view of probability (Section 3.2). Neyman was still exploring. He thought it “ important to show that even if a statistician started from the point of view of inverse probabilities he would be led to the same” tests as those he and Pearson recommended (C. Reid 1998 , p. 83). Neyman begged Pearson to sign on to a paper that included inverse probability solely for this purpose, but he would not. Pearson worried “ they would find themselves involved in a disagreement with Fisher, who had come out decisively against [inverse probability]” (ibid., p. 84) and he never signed on. For more on this episode see C. Reid.

The kind of frequentist prior Neyman allowed were those in genetics. One might consider the probability a person is born with a trait as the effect of a combination of environmental and genetic factors that combine to produce the trait. In an example very like Exhibit (i) , Neyman worries that we only know of a finite number of drugs, and we at best have estimates of their average pressure-lowering ability. However, Neyman (1977 , p. 115) welcomes the “ brilliant idea … due to Herbert Robbins (1956 )” launching “ a novel chapter of frequentist mathematical statistics” : Empirical Bayes Theory. There may be a sufficient stockpile of information of drugs (or, for that matter, black holes or pulsars) deemed similar to the one in question to arrive at frequentist priors, important for prediction. Some develop “ enthusiastic priors” to be contrasted to “ skeptical ones” in recommending policy (Spiegelhalter et al. 1994). The severe tester questions if even a fully warranted frequentist posterior gives a report of well-testedness, in and of itself. In any event, most cases aren’ t like this.

We sometimes hear: But the claim { θ = θ ′ } and a claim {X = x } are both statements, as if to say, if you can apply probability in one case, why not the other. There’ s a huge epistemic difference in assessing the probabilities of these different statements. There needs to be a well-defined model assigning probabilities to event statements – just what’ s missing when we are loath to assign probabilities to parameters. On the other hand, if we are limited to the legitimate frequentist priors of Neyman, there’ s no difference between what the Bayesian and frequentist can do, if they wanted to. Donald Fraser (2011 ) says only these frequentist priors should be called “ objective” (p. 313) but, like Fisher, denies this is “ Bayesian” inference, because it’ s just a deductive application of conditional probability, and “ whether to include the prior becomes a modeling issue” (ibid., p. 302). But then it’ s not clear how much of the current Bayesian revolution is obviously Bayesian. Lindley himself famously said that there’ s “ no one less Bayesian than an Empirical Bayesian … because he has to consider a sequence of similar problems” (1969 , p. 421). Non-frequentist Bayesians switch the role of probability (compared to the frequentist) in a dramatic enough way to be a gestalt change of perspective.

A Dramatic Switch: Flipping the Role of Probability

“ A Bayesian takes the view that all unknown quantities, namely the unknown parameter and data before observation, have a probability distribution” (Ghosh et al. 2010 , p. 30). By contrast, frequentists don’ t assign probability to parameters (excepting the special cases noted), and data retain probabilities even after they are observed. This assertion, or close rewordings of it, while legion in Bayesian texts, is jarring to the frequentist ear because it flips the role of probability. Statisticians David Draper and David Madigan put it clearly:

When we reason in a frequentist way, … we view the data as random and the unknown as fixed. When we are thinking Bayesianly, we hold constant things we know, including the data values, … – the data are fixed and the unknowns are random.

(1997 , p. 18)

That’ s why, when the Higgs researchers spoke of the probability the results are mere statistical flukes, they appeared to be assigning probability to a hypothesis. There was nothing left as random, given the data – at least to a Bayesian. If known data x are given probability 1, we are led to the “ old (or known) evidence” problem (Section 1.5 ) where no Bayes boost is forthcoming. Some further consequences will arise in this tour. 5

Even where parameters are regarded as fixed, we may assign them probabilities to express uncertainty in them. Where do I get a probability on θ if fixed but unknown? The classic subjective way, we saw, is to find an event with known probability, and build a subjective prior by considering { θ < θ ′ } for different values of parameter θ , now regarded as a random variable. If you locate an event E, with known frequentist probability k , such that you’ re indifferent to bets on { θ < θ ′ } and E, then the former gets probability k . A non- subjective/default approach can avoid this, and in some cases arrive at the same test as the frequentist in Exhibit (i) by setting a mathematically convenient conjugate, or an uninformative prior, say by viewing θ itself as Normally distributed N ( η , τ 2 ). Instead of reporting the significance level of 0.05, this allows reporting that the posterior probability of H 0 is 0.05.

Pr( θ = θ 0 ) = 0.95 is meaningless unless θ is a random variable. … this expression signifies that we are ready to bet that θ is equal to θ 0 with a 95/5 odds ratio, or, in other words, that the uncertainty about the value of θ is reduced to a 5% zone.

(Robert 2007 , p. 25; Pr for P )

Would we want to equate error probabilities to readiness to bet? As always it’ s most useful to look at cases of poor or weak evidence. Suppose you arrive at statistical significance of 0.2. We would be entitled to say we’ re as ready to bet on θ > θ ′ as on the occurrence of an event with probability 0.8. I don’ t think we’ d want to be so entitled. The default Bayesian replies, this just means the default prior doesn’ t reflect my beliefs. OK, but recall the question at the outset of this tour: why assume we want a posterior probability on statistical hypotheses , in any of the ways now available? The default Bayesian was to supply the (ideally) unique prior to use, not send us back to subjective priors.

The Bayesian treats the blood-pressure example very differently if the null is a point such as θ = 0, whereas there’ s no difference for a frequentist. The spike and smear priors surveyed in Excursion 4 are common. Greenland and Poole (2013 ) suggest:

[A] null spike represents an assertion that, with prior probability q , we have background data that prove θ t = 0 with absolute certainty; q = ½ thus represents a 50– 50 bet that there is decisive information literally proving the null. [Otherwise a] spike at the null is an example of ‘ spinning knowledge out of ignorance.’

(p. 66)

This is an interesting construal. Instead of how strongly you believe the null, it’ s how strongly you believe in a proof of it. That decisive information exists (their second clause) is weaker than actually having it (their first clause), but both are stronger than presuming they arise from a noncommittal “ equipoise.” Of course, the severe tester wants to know how strong the existing demonstration of H is, not how strong your belief in such a demonstration is.

Some subjective Bayesians would chafe at the idea of betting on scientific hypotheses or theoretical quantities. For one thing, it’ s hard to imagine people would be indifferent between a bet they know will be settled and one that is unlikely to be – as in the case of most scientific hypotheses. No one’ s going to put their money down now (unless they get interest). Still, cashing out Bayesian uncertainty with betting seems the most promising way to “ operationalize it.” Other types of scoring functions may be used, but still, there’ s a nagging feeling they leave us in the dark about what’ s really meant.

For both subjectivist and objectivist [default] Bayesians, probability models including both parameter priors and sampling models do not model the data-generating process, but rather represent plausibility or belief from a certain point of view.

(Gelman and Hennig 2017 , pp. 990– 1)

Yet Gelman et al. (above) suggested expressing uncertainty as if a parameter’ s “ value could be thought of as a random realization from the prior distribution” (2014 , p. 34). If this is bending your brain, then you’ re getting it. Claims like it’ s “ the knowledge [of fixed but unknown parameters] that Bayesians model as random” (Gelman and Robert 2013 , p. 4) feel as if they ought to make perfect sense, but the more you think about them, the more they’ re liable to slip from grasp. For our purposes, let’ s understand claims that unknown quantities have probability distributions in terms of a person or persons who are doing the having – by assigning different degrees of belief (or other weights) to different parameter values.

The Probabilities of Events

Many Bayesian texts open with a focus on probabilities of simple events, or statements of events, like the “ heads” on the toss of a coin. By focusing on probabilities of events which even frequentists condone, the reader may wonder what all the fuss is about. Problem is, the central point of contention between Bayesians and frequentists is whether to place probabilities on parameters in a statistical model. It isn’ t that frequentists don’ t assign probabilities to events, and any statistics based on them. It is that they recognize the need to infer a statistical model, and hypotheses about its parameters, in order to get those probabilities. How does that inference occur? It rests on probabilistic properties of a test method, which is very different from the deductive assignment of probability to hypotheses.

The severe tester uses probabilities assigned to events like {test T yields d( X ) > d( x )} to detach statistical inferences. She might argue: If Pr{d( X ) > d( x ); H ′ } is not very small, infer there’ s a poor indication of a discrepancy H ′ . Computing Pr{d( X ) > d( x ); θ } for varying θ tells me the capability of the test to detect various discrepancies from a reference value of interest. This does not give a posterior probability to the hypothesis, but it allows making statistical inferences which are qualified by how well or poorly tested claims are.

True, when the frequentist assigns a probability to an event, it is seen as a general type, whereas a Bayesian can assign subjective probability to a unique event on November 8, 2016! Or so it is averred. But when they appeal to bets by reference to events with known probabilities, aren’ t they viewing it as a type? (“ That’ s the kind of thing I’ d bet 0.9 on.” )

Cox points out that even subjectivists must think their probabilities have a frequentist interpretation. Consider n events/hypotheses:

… all judged by You to have the same probability p and not to be strongly dependent … It follows from the Weak Law of Large Numbers obeyed by personalistic probability that Your belief that about a proportion p of the events are true has probability close to 1.

(Cox 2006a , p. 79)

This suggests, Cox continues, that to elicit Your probability for H you try to find events or hypotheses that you judge for good reason to have the same probability as H , and then find out what proportion of this set is true. This proportion would yield Your subjective probability for H . Echoes of the screening model of tests (Section 5.6). Here the (hypothetical or actual) urn contains hypotheses that you thus far judge to be as probable as the H of interest. If the proportion of hypotheses in this urn that turned out true was, say, 80%, then H would get probability 0.8. It would be rare to know the truth rates of the hypotheses in this urn – would it be the proportion now assigned probability 1 by the subjectivist? Perhaps the proportion not yet falsified could be used.

Still, this would be a crazy way to actually go about evaluating evidence and hypotheses! But what if you considered H as if it were randomly selected from an urn of hypotheses that had passed severe tests, perhaps made up of claims in the same field. You check the relative frequency that are true or have held up so far. You’ d still need to show why you’ re putting H in the high severity urn. In other words, you would have circled right back to the initial assignment of severity. All you’ d be doing is reporting how often severely corroborated claims are true, or continue to solve their empirical problems (predicting or explaining). There would be nothing added by the imaginary urn.

A different attempt to assign a frequentist probability to a hypothesis H might try to consider how probable it is that the universe is such that H is true, considering fundamental laws, other worlds, multiverses, or what have you. One might consider the rarity of possible worlds that would have such a law. Even if we could somehow compute this, how would it be relevant to assessing hypotheses about this world? Here’ s C. S. Peirce:

[The present account] does not propose to look through all the possible universes, and say in what proportion of them a certain uniformity occurs; such a proceeding, were it possible, would be quite idle. The theory here presented only says how frequently, in this universe, the special form of induction or hypothesis would lead us right. The probability given by this theory is in every way different – in meaning, numerical value, and form – from that of those who would apply to ampliative inference the doctrine of inverse chances.

(Peirce 2.748)

This objection, I take it, is different from trying to determine, on theoretical principles, how “ fine tuned” this world would have to be for various parameters to be as we estimate them. Those pursuits, whose validity I’ m in no position to judge, are aimed at deciding whether we should fiddle with theoretical assumptions so that this universe is
not so “ unnatural.”

6.3 Unification or Schizophrenia: Bayesian Family Feuds

COX: There’ s a lot of talk about what used to be called inverse probability and is now called Bayesian theory. That represents at least two extremely different approaches. How do you see the two? Do you see them as part of a single whole? Or as very different?

MAYO: It’ s hard to give a single answer, because of a degree of schizophrenia among many Bayesians. On paper at least, the subjective Bayesian and the so-called default Bayesians … are wildly different. For the former the prior represents your beliefs apart from the data, … Default Bayesians, by contrast, look up ‘ reference’ priors that do not represent beliefs and might not even be probabilities, … Yet in reality default Bayesians seem to want it both ways.

(Cox and Mayo 2011 , p. 104)

If you want to tell what’ s true about today’ s Bayesian debates, you should consider what they say in talking amongst themselves. I began to sense a shifting of sands in the foundations of statistics landscape with an invitation to comment on Jim Berger (2003 ). The trickle of discontent from family feuds issuing from Bayesian forums pulls back the curtain on how Bayesian– frequentist debates have metamorphosed. To show you what I mean, let’ s watch the proceedings of a conference at Carnegie Mellon, published in Bayesian Analysis (vol. 1, no. 3, 2006) in the museum library. Unlike J. Berger’ s (2003 ) attempted amalgam of Jeffreys, Neyman, and Fisher (Section 3.6), here it’ s Berger smoking the peace pipe, making “ The Case for Objective Bayesianism” to his subjective compatriots (Section 4.1 ). The forum gives us a look at the inner sanctum, with Berger presenting a tough love approach: If we insist on subjectivity, we’ re out. “ [T]hey come to statistics in large part because they wish it to provide objective validation of their science” (J. Berger 2006 , p. 388).

‹ Prev Next ›