Why Trust Science?
Page 20
To conclude, in public policy processes concerning wicked problems and large-scale risks such as climate change, trust in integrated policy assessment is central for decision-making under uncertainty. As an amendment to, and specification of, Oreskes’s compelling arguments concerning trust in scientific expertise, we argue that even these assessments of policies can be trustworthy and legitimate despite the controversial value judgments involved. An interdisciplinary and multi-stakeholder exploration of alternative future policy pathways and their various practical implications is required to facilitate legitimate learning processes about the pros and cons of specific pathways. In the end, this could lead to revisions of initially fixed values, policy goals, and means and to the identification of areas of practical overlap between divergent sets of values. Instead, merely insisting on scientific “facts” or criticizing right-wing policy beliefs and values on an abstract level leads to fruitless ideological controversies. We rather need collaborative and inclusive learning processes about alternative futures that acknowledge and critically explore a range of values—as a promising response to the renaissance of populism that is so much based on divergent sets of values.
Chapter 6
COMMENTS ON THE PRESENT AND FUTURE OF SCIENCE, INSPIRED BY NAOMI ORESKES
Jon A. Krosnick
Inspired by Dr. Oreskes, my comments in this essay come from the perspective not of a historian or an expert on the philosophy of science but instead from the perspective of a practitioner of science, observing the present and future of our enterprise.
I believe in the scientific quest for truth, and I believe in the scientific method. I’m glad that contemporary societies value scientific investigation, fund our work, and give us prominence in the news media. I want more young people to choose careers in science. I want scientific disciplines and professional associations to thrive. I want funding of science to increase. And I am looking forward to seeing how scientific discoveries unfold in the coming decades in constructive ways.
To help science to thrive, I cofounded, with Professor Lee Jussim from Rutgers University, the Group on Best Practices in Science (BPS) at the Center for Advanced Study at Stanford University. There are countless stories of scientific successes over many years, and in numerous instances science has gotten off track for a little while before getting back under control eventually. So one can look glowingly at the long-term history of science and smile. But very recent history tells a more distressing and alarming story. And the problem now is not a particular finding that is wrong. During the last decade, we have discovered numerous inefficiencies in science across many disciplines, and dramatic reform is needed, as I will outline in this essay.
My story begins in the field of my PhD, social psychology, with Diederik Stapel, who was the focus of a story in the New York Times because he had fabricated the data in more than one hundred publications in top journals in psychology.1 After this was discovered, numerous papers were retracted, and young coauthors suffered significantly in the process.
Daryl Bem, a very well-known social psychologist at Cornell University, published a paper in the top journal of social psychology claiming to show that extrasensory perception, ESP, was real.2 It set off a firestorm, because the results seemed implausible from the start and could not be reproduced.
John Bargh, a professor at Yale University, produced a huge amount of beloved work in social psychology. When a group of young scholars sought to reproduce a finding and failed, this led to widespread concern about the replicability of other findings as well.3 Daniel Kahneman, who won the Nobel Prize in economics, urged Dr. Bargh to engage with the critics of his work to pull the field toward an understanding of which empirical findings are real. But no such reconciliation has yet occurred.
At one time the most downloaded article in the history of the New Yorker magazine was an essay written by Jonah Lehrer about the so-called decline effect.4 In the piece, psychologist Jonathan Schooler explained how he discovered an important phenomenon, called verbal overshadowing, but the more he studied it, the weaker and weaker the effect became, until it disappeared entirely.
Another landmark paper described what were called “voodoo correlations.”5 In studies of neuroscience, gigantic correlations were being published between brain functioning and other indicators of psychological experience, and those correlations later appeared to have been the fabricated results of manipulative research practices.6
Consider Phil Zimbardo’s prison experiment, in which a group of participants were randomly assigned to be guards or prisoners in the basement of the psychology building at Stanford.7 The BBC, a few years ago, attempted to conduct the same study and failed to observe the same results.8
One of the most famous studies in social psychology, by Leon Festinger on cognitive dissonance, documented the difference between how people reacted to a task when they were paid one dollar versus when they were paid twenty dollars.9 This study has been cited numerous times over the years but has never been successfully replicated, as far as I know. Most importantly, Festinger himself is famous for having said that he had to run the study numerous times, tweaking the methodology, before he got it to “work”—that is, to produce the result he wanted.
Yet another example: a paper on what’s called biased assimilation and attitude polarization.10 The authors concluded that if a person reads a balanced set of evidence, with about half supporting a particular conclusion and the other half refuting that conclusion, the person evaluates the evidence so as to protect his or her predilections. As a result, reading a balanced set of evidence was said to have made people’s opinions on the matter even more extreme than were their original views. But A. G. Miller and coauthors showed that the original paper was incorrect because it had used an improper measurement approach.11 The original paper has been cited more than 3,000 times, and Dr. Miller’s paper has been cited just 136 times. This is not an example of science correcting itself successfully.
These are not isolated incidents, cherry-picked to tell a pessimistic story. Consider a story in the New York Times under the online headline “Many Psychology Findings Not as Strong as Claimed, Study Says.”12 This study, conducted in 2015 by Brian Nosek and colleagues, attempted to replicate the findings of many highly prestigious, randomly selected publications in psychology. When the story appeared in print, the headline was “Most Psychology Findings Cannot Be Replicated.” And that is, in fact, the conclusion of the paper. A random selection of studies, an aggressive effort to try to reproduce the findings, and majority failure.
This problem is not confined to my discipline of psychology alone. Consider political science. A paper entitled “When Contact Changes Minds” was published in Science exploring the idea that conversations on doorsteps could change attitudes about gay marriage.13 The paper was also written about in the New York Times following the discovery that although the principal author claimed to have collected data, he had not.14 The whole study had been fabricated. In economics, a series of papers show that in attempts to replicate the findings of empirical studies, the majority could not be reproduced.15 And surveys of voters failed spectacularly in recent efforts to predict the outcomes of elections in the United States, Britain, and Israel.16
This problem has been illustrated in the physical sciences as well, especially vividly by Amgen Pharmaceuticals, whose scientists attempted to replicate fifty-three landmark findings, published in the most prestigious journals, Science, Nature, and Cell.17 Amgen had tried to develop new drugs based on such findings and had failed repeatedly. So Amgen stepped back to the fundamentals—to determine whether they could trust what they read in these journals. A team of one hundred scientists found that 89 percent of the findings they tried to reproduce could not be reproduced.18 When the Amgen scientists told one study’s author that Amgen had tried numerous times to replicate his or her finding and couldn’t, the original author said that he or she had failed a good number of times before finally producing the desired findings.
/>
When Amgen went public, Bayer Pharmaceuticals reported having had the same experience.19 They tried to replicate sixty-seven published findings, and 79 percent of them could not be reproduced. And problems have become vivid in chemistry as well. In recent years, an increasing number of publications included doctored graphs that made findings look better than they were.20 And in ecology, genetics, and evolutionary biology, findings were published and then disappeared, never to be reproduced.21 In line with all of these trends, the website called Retraction Watch has documented a skyrocketing number of retractions of published articles.
When the BPS group at Stanford spoke to engineers about their experiences in this regard, we were paralyzed by a shocking comment. When asked whether there are problems with reproducibility and integrity in engineering, they replied, “Truthfully, we don’t believe the findings of any other labs.” So we asked, “Do you believe the findings from your lab?” and they said, “Sometimes.”
Why? In engineering, it’s not uncommon for authors to intentionally leave out a key ingredient of the formula needed to make the soup, so that the competing labs can’t get ahead of them in the race to build, for example, a battery that lasts longer.
What about in a field where lives are at stake, medicine? John Ioannidis has been meta-analyzing research findings in health research for decades. One of his papers gauged the reproducibility of preclinical medical research and found that more than 50 percent of the studies could not be replicated even once. This lack of replication wastes $28 billion each year in research that leads to nothing.22 This publication, “Why Most Published Research Findings Are False,” was at one time the most downloaded paper from the journal PLOS One.23
Why is all this happening? How can it be that science is wonderful when all of these examples pile up to suggest we’re in trouble? One answer is that contemporary scientists engage in a variety of counterproductive, destructive practices.
One is called p-hacking: manipulating and massaging data in order to get desired results. Another problem is reliance on small sample sizes. If a study with a small sample doesn’t work out, an investigator can discard it and do another small study until desired results are obtained by chance alone, because the cost of doing each study is low. Another problem is improper calculation of statistics, leading to overconfidence in the replicability of a finding. Some disciplines have been in the habit of computing statistical tests in ways that are knowingly biased toward getting significant findings. If computed properly, the statistics would suggest more caution. And in some physical science fields, experiments never involve random assignment to conditions and do not involve statistical significance testing, which leaves investigators vulnerable to being misled.
Of course, accidental mistakes in statistical analysis will occur, so proofing is essential. But when a scientist has obtained desired results, it’s tempting to celebrate with optimism. And if undesired results are obtained, perhaps scientists are more motivated to check their work and more likely to detect errors. Thus perhaps errors go uncaught when results are desired.
How prevalent are suboptimal practices by scientists? In one survey, a large number of psychologists said that they have implemented many suboptimal practices.24 So perhaps the prevalence of irreproducible results should be no surprise.
All of us in science confront this reality by marshaling a desire to remedy the problems. And in order to do so, we need to know what causes suboptimal behavior on the part of scientists. And, unfortunately, the causes are horrifying. There are individual-level motivators of scientists themselves. Many scientists want to be famous, to get research grants, to be employed with tenure, to get outside job offers to increase their salaries, to get promoted, to be well paid, to found mega-profitable start-ups, to be respected by their peers, to be respected by non-scientists, and more.
If you put a scientist in a quiet dark room by himself or herself, he or she will most likely acknowledge that, almost always, we operate in an environment (in academia or outside) where these motivations are very clearly powerful for everyone. So we don’t need to know whether a project’s funding comes from ExxonMobil or the National Science Foundation. We’re all operating in an environment in which we have these motivations.
Everything’s fine if these motivations are coupled with a desire always to be right, always to publish the truth. But unfortunately, system-level causes carry us off that path. Systems value productivity, reward faculty who publish a lot, and don’t reward faculty who publish less. And most disciplines value innovation and unexpected findings. At Stanford, graduate students in psychology have been taught not to waste their time publishing something that people already thought was probably true. The goal should be to produce findings that would surprise people. And if that’s the goal, should we always be asking ourselves whether a really surprising finding is surprising because it contradicts existing theory and evidence, and is unlikely to be real?
Systems value statistically significant findings, so journals have been biased against publishing null findings, which slows down the process of discovering that previously published findings are false.
Researchers want to publish a lot: they want to publish novel, counterintuitive findings; they want to tell a good story; they want to defend their own previous publications and reputation; they don’t want to admit that they didn’t predict their findings and that they were surprised. They want to disseminate findings as quickly as possible, and they attract the attention of the news media.
Institutions encourage all this, because universities are increasingly using metrics counting publications and citation accounts in tenure and promotion decisions. Journals are biased against messy results, where study findings are inconsistent from one to the next. Journals disseminate innovative findings more quickly. News reporters sometimes cause problems as well, by asking for a simple, general conclusion when a qualified claim would be more justified. Journals have page limits, despite the fact that we no longer need paper to disseminate our work. Page limits restrict the degree to which we can be transparent about our methodology. And journals are favorable to some types of findings, in some cases, that favor particular political agendas. And, of course, research assistants often want to make the PIs happy, which can create motivations to produce certain findings instead of others.
My observations above are mostly speculations. Everything I have said could be wrong. But I might be right as well. As far as I know, no one is testing a theory like this to explain the behavior of scientists. And we need that sort of testing, because we now know that scientific literatures and popular media are filled with findings that cannot be reproduced. We don’t know as much as we say we know about the matters that we study. And if that’s true, we need to embrace the problem, and we need to get to work on solving it by implementing reforms that empirical evidence shows will work.
So this is a problem of social and behavioral science. It’s a problem of human psychology. It’s not a problem of chemistry. It’s not a problem of physics. It’s not a problem of intuition. And it’s a problem that requires empirical research, informed by theory and using rigorous methods of investigation.
What about solutions? Many have been proposed, but in my opinion they are mostly band-aids. They make people feel good in the short term. But we don’t yet know whether they actually work to increase the efficiency of scientific investigation.
What conclusions can we reach? First of all, I don’t think that the source of funding is a primary problem for science. In fact, I would guess that’s among the least of our problems. A huge amount of research has been funded by federal agencies and private foundations that have no real agendas other than supporting scientists’ making discoveries as quickly as possible.
Rather than funding sources, the fundamental problem is with the incentives inherent in the world in which science operates today. Blanket discounting or acceptance of findings based on who paid for them is probably missing the point. The problem is th
at new technology has sped up the process of science. We hoped that technology would make science more efficient. But instead, science is either operating incredibly inefficiently or publishing a vast majority of findings that are false.
What is the path forward? First, we must acknowledge the problem. It does a patient no good for the doctor to withhold the information that he or she has cancer. Second, we have to identify the real causes of problematic behaviors, instead of speculating. Third, we need to develop solutions to undermine the counterproductive motivations that drive science in the wrong directions. Last, we should be scientists, and we should test the effectiveness of those solutions.
I hope that these thoughts, inspired by Dr. Oreskes’s lectures and essays, complement her essays with a focus on more contemporary history and the present of science. In offering these thoughts, I hope to encourage all scientists to consider this to be a good moment to stop and reflect, to try to learn from the past of science, and to redirect the present and future of science in ways that considerably enhance the efficiency of the enterprise in achieving its goals.
RESPONSE
Chapter 7
REPLY
Susan Lindee has given us a brilliant exposition of the ways in which late twentieth-century scientists built a narrative stressing the distinction between science and technology and the reasons why they did so. During the Cold War, scientists’ ambivalence toward the project of building a massive nuclear arsenal expressed itself (in part) through their insistence that science and technology were separate domains. Noncompeting magisteria, we might say, to borrow Steven Jay Gould’s famous formulation of the relation between science and religion, except that whereas Gould urged respect for both domains, scientists, Lindee argues, were constructing a thesis of separate and unequal: science was separate and superior to technology because of its moral purity. To retain that purity, it needed to stay separate.