Statistical Inference as Severe Testing

Home > Other > Statistical Inference as Severe Testing > Page 69
Statistical Inference as Severe Testing Page 69

by Deborah G Mayo


  mixture of, 171

  standard normal, 143 , 241 , 326 , 348 , 357 , 378

  one-sided test of mean of, 142 – 145 , 323 , 348 , 357

  two-sided tests of mean of, 42 , 248 , 257 , 430

  normative epistemology, 54 , 422 normative statistical requirements, 437 , 441

  Nosek, B., 106

  novelty requirement, 90 – 92 and severity, 92

  and eclipse tests, 119n1

  temporal, 90 – 91

  theoretical, 90 – 91 , 96 , 119n1

  use, 90 – 91

  Novick, M., 288 – 289

  nuisance parameters, 385 , 392 , 411 , 433 replaced by sufficient statistic, 385

  null hypothesis significance testing (NHST), 94 illicit animal, 179 , 438

  O’ Hagan, T., 202 , 213 – 214 , 412

  O’ Neil, C., 229

  objectivity, 4 three requirements, 223

  and Bayesian priors, 232 – 234

  as embracing subjectivity, 232

  in epistemology, 235 – 236

  and equipoise priors, 231

  idols of, 224

  and journalism, 232

  and model checking, 298

  and observation, 225

  repertoire of mistakes, 235

  and sampling distribution, 231

  in statistics, 221

  and transparency, 236 – 237

  and triangulation, 237 , 423

  washout theorems, 231 – 232 ; see also default/non-subjective priors

  Oishi, S., 101

  optional stopping, see stopping rules

  outcome-switching, 40 , 439

  Overbye, D., 210 – 211

  P -curves, 285

  P -value distribution, 151 , 325 as measuring sensitivity (D. Cox), 151

  P -values, 4 actual (computed) vs. reported (nominal), 17 , 43 , 179 , 274 , 303

  invalidated by selection effects, 17 , 285

  and N-P tests, 138 , 175 ; see also N-P tests

  Bayesian P -values, 305 , 433

  can’ t be trusted except when used to show can’ t be trusted, 284 – 285

  and error probabilities, 173 – 176 , 440

  exaggerate evidence, 246 – 253 , 260 – 264 , 332 , 411 , 440 – 441

  police, 204

  precise vs imprecise, 333 – 334 , 366 ; see also significance tests

  paradox of replication, 270 – 271 , 441

  Parameterized Post Newtonian (PPN) framework, 160 – 161

  Pascal’ s wager, 381

  Pearson, E., 8 , 37 , 47 , 50 , 55 , 59 , 64 – 65 , 83 , 86 , 88 , 93 , 95 , 121 , 132 – 133 , 135 – 137 , 144 , 146 – 147 , 151 , 164 – 165 , 172 – 174 , 182 , 189 , 239 , 269 , 285 , 341 , 371 , 379 – 381 , 384 – 388 , 392 , 403 – 404 , 421 1933 paper with Neyman, 371 – 378

  answering Fisher criticism, 140 , 177 , 388

  armour-piercing, 180 – 181

  on Bayesian priors, 226 , 404

  and inferential construal, 180 – 181 , 380 – 381 , 391 – 392

  in love with woman his cousin was to marry, 137

  on power (post data), 324

  rejects behaviorism, 127 , 180 , 381 – 382 , 391

  and tail areas, 169

  and three steps in test construction, 131 , 178 , 386 ; see also Neyman and Pearson

  Pearson, K., 120 – 121 , 131 – 132 , 140 , 146 , 189 , 386 , 404

  Peirce, C. S., 18 , 86 on faulty analogy of induction and deduction, 64 – 66 , 380

  on inverse chance, 408 – 409

  on justifying induction, 113 – 114 , 267 , 307

  randomization and predesignation, 89 , 267 , 288n9

  and testing assumptions, 307

  Perez, B., 6 , 18

  performance construal of N-P, 174 – 178 vs. severity, 139 – 140

  and Fisher’ s fiducial probability, 382 , 390 ; see also N-P tests

  pest control, 299 – 300

  phenomenon vs. data, 121

  philosophy in statistical methodology, xii , 4 – 5 , 8 – 14 , 28 , 49 , 73 , 114 , 432 and cheating, 46 , 270 , 332

  in identifying good science, 77

  severe testing philosophy, 23 , 195 , 437 , 444

  Bayesian philosophy, 24 , 26 , 396

  Pickrite method, 19 , 30 , 51 , 276

  piecemeal testing, 162 , 308 , 380 , 400 , 443 division of labor, 85 , 392 , 423

  Pigliucci, M., 78

  Playfair, L., 17 – 18

  Pleiades, 373

  Poole, C., 26 , 256 – 257 , 264 , 406

  Popper, K., 8 – 9 , 27 , 40 , 59 , 66 – 68 , 72 – 73 , 75 – 80 , 82 – 93 , 95 – 96 , 114 , 119 , 125 – 126 , 159 , 195 , 209 , 227 , 229 , 237 , 259 , 294 , 390 , 433 ; see also falsification

  positive predictive value (PPV), see diagnostic screening

  posterior predictive distribution, 433 – 434 Duhem’ s Problem in, 435 ; see also M-S tests

  Potti, A., 6 , 13 , 18 , 97 , 230

  power, 135 – 139 3 roles accorded by N-P, 324

  and the clinically relevant/irrelevant difference, 326 – 327

  Cohen’ s snafu, 324

  detailed discussion, 323 – 341

  fallacious transposition in, 331

  and Fisher, 325

  how to increase, 325

  incomplete concept, 353

  low power and violated assumptions, 361

  predata and postdata, 323 – 325

  retrospective (post hoc), 353 – 359 , 359 – 361

  and severity, 323 – 332

  and Type II error probability, 138 ; see also power attained

  power analysis, 323 and CIs, 356 – 358

  fallacy of, 353 – 356

  Jacob Cohen on, 324 , 338

  and Neyman, 339

  ordinary, 340

  ordinary vs. shpower, 355

  vs. severity, 338 , 343 , 350

  and significance test reasoning, 339

  power attained (att power), and attained sensitivity Π ( γ ), 151 , 164 , 196 , 324 , 342 – 343 , 355n2 , 358

  Power Peninsula, 323 , 353 – 354 , 382

  Pratt, J., 44 , 175 , 240 , 248 , 252 , 254 , 339

  precautionary principle, 341

  prespecification/predesignation, 40 , 106 , 269 , 373 – 377 , 438 and error probabilities, 286 , 320 ; see also novelty requirement

  principle of indifference, 386 , 391 , 400

  prions, 81 – 82 , 85 , 88 , 109 – 110 , 238 prion protein (PrP), 109

  protein folding (pN), 109

  protein misfolding (pD), 110

  probabilistic instantiation fallacy, 367

  probabilistic reduction (Spanos), 312 dememorized data, 316

  detrended data, 307 , 316

  lags, 314 , 316

  menu of assumptions, 312

  reparameterizations, 320n5

  respecification, 313 ; see also M-S tests

  probability, roles of probabilism, performance, and probativeness, 13 – 14 , 24 – 27 , 33 , 77 , 436 – 437

  avoiding the need for different, 54 – 55 , 429

  events vs. hypotheses, 407

  formal vs. informal meanings, 10 , 194 , 214 , 427

  performance vs. severity, 15 , 26 – 27 , 50 , 54 , 162

  probabilism and performance, 428

  probativism vs. probabilism, 127 , 226

  variability vs. belief, xi , 54 , 80 , 428 ; see also methodological probability

  probable errors, 124

  probare , 10 , 226 , 423

  protein misfolding cyclical amplification (PMCA), 110

  Prusiner, S., 81 – 82 , 109 – 110 , 238 , 369

  pseudoscience, see Demarcation Problem

  questionable research practices (QRPs), 20 , 78 , 98 , 267 , 271 , 292 , 439

  quicksand, 183 , 187 – 188 , 367 , 402

  radical skepticism, 229

  Raftery, A., 305

  Raiffa, H., 44

  randomization, 286 – 289 possible Bayesian home for, 288 – 289

  and cloud seeding, 126


  and deliberation, 292 , 294

  in GWAS, 293

  and C. S. Peirce, 18 , 267 , 288n9

  and the philosophers, 289 – 290

  Poverty Action Lab (MIT), 290 – 291

  randomized controlled trials (RCTs), 98

  RCT4D, 290 – 292

  rational reconstruction, 8 , 73 , 85 , 162

  Ratliff, K., 101

  real random experiments, 111 , 298 – 299

  realism vs. antirealism, 79 severe tester agnostic on, 297

  theoretical mistakes, 297

  Reich, E., 210

  Reid, C., 120 – 121 , 137 , 139 , 141 , 146 , 189 – 190 , 372 , 387 – 388 , 404

  Reid, N., 54 , 186 , 392 , 396 , 429

  rejection ratio, 337 – 338

  repertoire of errors, 89 , 234 , 308 , 400 , 414 , 442 in selection effects, 279

  replicability/reproducibility, 6 , 20 , 28 ASA definition, 97

  and diagnostic testing, 368

  equivocation in, 246

  and predesignation, 270 , 320

  and Popper, 82 – 83

  replication (crisis), 59 , 89 , 156 , 221 , 361 in GWAS, 295

  in psychology, 78 , 97 – 107 ; see also paradox of replication

  residuals, 298 , 303 , 310 – 311 , 317 small residuals vs. adequacy, 318

  rigged hypothesis, 108

  Robbins, H., 390 , 404

  Robert, C., 401 , 402 , 406 , 407 , 413 , 428

  Romano, J., 172 , 175 , 191

  Rosenkrantz, R., 40 , 69 , 269 , 319 – 320 , 419n9

  Rosenthal, R., 239

  Rothman, K., 264 , 276 , 272n3

  Royall, R., 33 – 39 , 41 , 44 , 50 , 52 , 68 , 70n5 , 82 , 212 , 225 , 243 , 283 , 319 , 332 , 421

  rubbing-off construal, 65 , 194 , 244 , 391 , 429

  Rubin, D., 47 , 433

  Salmon, W., 64 , 95 , 114 , 310

  Samanta, T., 51 , 124 , 305n2 , 402 – 403 , 405 , 431 , 434

  sampling distribution, 32 , 142 and error probabilities, 130 , 173 , 428 , 438

  and bootstrapping, 306

  and frequentist objectivity, 231

  relevant, 199

  as testable meeting ground, 178

  sampling plan, freedom from, see Likelihood Principle stopping rules

  sampling theory/philosophy, 55 , 172

  Sanna, L., 284

  Savage Forum 1959, 46 – 48 , 420 – 421 , 430

  Savage, L., 8 , 41 – 43 , 44 – 50 , 173 , 214 , 228 , 230n3 , 248 , 252 , 256 , 260 , 269 , 287 , 302 , 397 , 401 , 417 , 420 , 424 , 430 , 432

  Schachtman, N., 272n3

  schizophrenia and split personality, 409 , 414 , 424 , 436 , 434

  Schlaifer, R., 44

  Schnall, S., 100

  Schweder, T., 195 , 392n10

  SCOTUS, 272n3

  Sebastiani, P., 293

  Seidenfeld, T., 411n6

  selection effects, biasing, 3 , 19 , 21n2 , 40 – 41 , 78 defn., 92 , 285 , 437 ; see also stopping rules

  adjusting for, 154 , 268 , 275 – 277 , 364 – 365 , 418

  and auditing, 234 , 267 , 269

  NHST, 95

  preregistration, 106 , 266 , 275 , 286 ; see also Bonferroni correction

  self-correcting, 20 , 162 induction as, 114 , 307 ; see also arguments from error and coincidence

  self-sealing fallacy, 103

  Sellke, T., 175 – 176 , 184 , 248 – 252 , 258 – 260 , 338

  Selvin, H., 274 , 279

  semantic entailment: severity version, 65

  Senn, S., 151 , 162n11 , 247 , 251 – 253 , 259n8 , 264 – 265 , 266 , 287 – 288 , 290 , 293n12 , 326 – 327 , 336 , 345 – 346 , 365n9 , 366 , 413 – 414 , 417 – 419

  sensitivity, achieved or attained, 151 function, Π ( γ ), 151 – 152

  and severity, 152 ; see also power attained

  sequential trials, 47 ; see also stopping rules

  severe tester, tribal features, 9 , 27 , 114 , 437 on comparativism, 79 , 421 , 441

  on the demarcation of science, 88 – 89

  and Duhem’ s problem, 85 – 86

  on improving confidence intervals, 194 , 244 – 245 , 358 , 429 , 442

  interpretation of probable flukes, 217

  on large-scale theories, 129

  on Likelihoodism and the LP, 39 – 41 , 48 – 50 , 72

  new name, 55

  vs. the N-P behavioristic prison, 140

  vs. Popperian severity, 83

  on the revolution in psychology, 100 , 103 – 104 , 107 , 370

  solving the problem of induction, 107 – 114

  on statistical falsification, 235

  on statistical objectivity, 320

  and translation guide, 52

  severity, 5 applied, water plant accident, 143 – 145

  attained power, 342 – 343

  and confidence levels, 193

  and difference between two means, 345 – 346

  disobeys the probability axioms, 423

  and explanatory content/informativeness, 79 – 80 , 237

  and Fisherian tribes, 146

  function (SEV), 143

  and large-scale theories, 128 , 162

  in meta-methodology, 9 , 32

  and Popperian corroboration, 72 , 75 , 87

  vs. power analysis, 343

  and replicability, 370

  and sensitivity, 152

  when not calculable, 200

  severity curves, 348 – 349 , 360n4

  severity interpretation of negative results (SIN), 143 – 145 , 152 , 212 , 343 , 346 – 347 , 351 ; see also severity

  severity interpretation of rejection (SIR), 143 , 265 – 266 , 351

  severity requirement/principle, 5 , 92 , 125 , 258 and biasing selection effects, 92

  and error control, 269

  and failed replication, 158 , 266

  to block fallacies of rejection, 144 , 357

  as heuristic tool, 12 , 264

  informal, 109

  from low P -value, 209

  as minimal principle of evidence, 5 , 396

  for non-significance (Higgs), 212

  in terms of solving a problem, 300

  vs. fit measures, 72

  weak and strong, 22 , 108

  strong, 14

  Sewell, W., 280

  sexy science: severe testing in large-scale theories, 121 , 163 , 300

  Shaffer, J., 275

  Shalizi, C., 27 , 432 , 434

  Sharpe, G., 137

  shpower (retrospective power analysis), 354 – 356 howlers of, 355 – 356

  vs. severity, 356 ; see also power analysis.

  significance levels as predesignated, 137

  attained vs. predesignated, 173 – 175 , 177 ; see also P -values

  significance tests vs. comparativism, 35

  Cox definition of, 93

  criticisms of, 93 – 95 , 438 ; see also chestnuts and howlers of tests

  fallacies of rejection/non-rejection

  falsifying alternatives in, 159

  Fisherian (pure/simple), 132 , 150

  in Higgs, 202

  roles for model testing and discovery, 298 – 304 ; see also M-S tests

  simple or point hypotheses, 33

  test T+, 144 ; see also Neyman and Pearson (N-P) Tests

  Silberstein, L., 127

  Silver, N., 232 – 233

  similar tests, 385 , 386n6

  Simmons, J., 43 , 237 , 270

  Simonsohn, U., 43 , 237 , 270 , 284 , 285

  Singh, K., 391

  skin off your nose, 273

  Skyrms, B., 62 , 73

  Slovic, P., 422

  Smeesters, D., 284

  Smith, C., 47

  Smith, H., 339n4

  Sober, E., 35 – 36 , 47 , 92n3 , 242 , 317 – 318 , 380 – 381

  Spanos, A., 120 , 133 – 134 , 139 , 146n5 , 200 , 254 – 255 , 305 , 308 , 312 – 313 , 317 – 319 , 331 , 352n10 , 355 , 367 , 387 , 426

  Spiegelhalter, D., 204 – 205 , 401 , 404

  spike and s
mear priors, 239 , 248 , 250 – 251 , 259 , 336 Bayesian justification for, 251

  coffee shop, 257

  criticisms of, 252n4 , 256 , 259 , 406 , 440

  cult of the holy, 252

  severe tester on, 257 – 258

  spongiform diseases, 81 , see also kuru

  Sprenger, J., 307

  Sprott, D., 180 , 399 , 421

  spurious associations, 3 batch effects, 293

  in longevity study, 293 , 362

  population and number of shoes, 308 , 317

  sea level and price of bread, 317

  Staley, K., 203 , 235 – 236

  standard model (SM) physics, 203 , 206 , 214 , 215

  Stapel, D., 78 , 97 , 100 , 276

  statistical battles, current state of play, 11 – 12 , 23 – 28 , 395 – 397 , 400 – 402 , 444 proxy, 437 ; see also getting beyond statistics wars

  statistical fluctuations/flukes and Higgs, 202 – 205 , 210 – 212 interpreting, 214 – 215

  statistical inference, 7 , 20 , 42 , 65 – 66 , 174 and scientific theory appraisal, 119 , 202 ; see also inductive inference

  Statistical Methods for Research Workers (SMRW) , 387 Fisher backtracks, 182

  Statistical Power Analysis for the Behavioral Sciences (Cohen), 324

  statistical tests, elements of, 129 – 130 test hypothesis, 31 , 109 , 130 , 133 , 341

  test rule, 130

  test statistic, 34 , 94 , 129 , 132 , 167

  test statistic, pivotal, 378

  statistical tests, properties of consistent, 136

  monotonicity, 134

  powerful, 135 – 136

  unbiased, 136 , 141

  Steegen, S., 105

  Stern, H., 433

  Stevens, S., 100

  Stigler, S., 288n9

  Stone, M., 254 , 413

  stopping rules/optional stopping, 42 – 53 , 170 , 270 and Bayesian intervals, 430

  principle, 43 , 54 , 431

  proper, 42

  statisticians on: Armitage, 47

  G. Barnard, 48

  J. Berger /Wolpert, 49 , 187 , 430 – 431

  G. Box, 303

  D. Cox and Hinkley, 45

  Savage (E, L, & S), 43 , 46 ; see also intentions

 

‹ Prev