by Lee McIntyre
The experimenters reported that subjects found this task to be incredibly difficult. Indeed, just 10 percent got the right answer, which is that only the E and the 7 needed to be turned over. The E needs to be turned over because it might have an odd number on the other side, which would disprove the rule. The 7 also needs to be checked, because if there were a vowel on its other side, the rule would also be falsified. Subjects tended to be flummoxed by the fact that the 4 did not need to be turned over. For those who have studied logic, it might be obvious that the 4 is irrelevant to the truth of the rule, because it does not matter what is on the other side of the card. The rule only says that an even number would appear if there is a vowel on the other side of the card; it does not say that an even number can appear only if there is a vowel on the other side. Even if the other side of the card had a consonant, it would not falsify the rule for a 4 to appear on its obverse. Likewise, one need not check the card with the K. Again, the rule says what must follow if there is a vowel on one side of the card; it says nothing about what must be the case if there is not.
The really exciting part of the experiment, however, is what happens when one asks subjects to solve this problem in a group setting. Here 80 percent get the right answer. Note that this is not because everyone else in the group merely defers to the “smartest person.” The experimenters checked for this and found that even in groups where none of the individual subjects could solve the task, the group often could. What this suggests is that when we are placed in a group, our reasoning skills improve. We scrutinize one another’s hypotheses and criticize one another’s logic. By mixing it up we weed out the wrong answers. In a group we are open to persuasion, but also to persuading others. As individuals, we may lack the motivation to criticize our own arguments. Once we have arrived at an answer that “seems right,” why would we bother to rethink it? In groups, however, there is much more scrutiny and, as a result, we are much more likely to arrive at the right answer. This is not because someone at the back of the room knows it, but because human reasoning skills seem to improve when we are placed within a group.14
In his book Infotopia: How Many Minds Produce Knowledge,15 Cass Sunstein explores the idea—based on empirical evidence such as that cited above—that there are myriad benefits to collective reasoning. This is sometimes referred to as the “wisdom of crowds” effect, but this populist moniker does not do justice to the case of science, where we correctly value expert opinion. As we shall see, scientific practice is nonetheless well described by Sunstein, who acknowledges the power of experts. And there are other effects as well. In his book, Sunstein explores three principle findings, each backed up by experimental evidence. On the whole, in human reasoning:
(1) groups do better than individuals,
(2) interactive groups do better than aggregative groups, and
(3) experts do better than laypeople.16
As we saw with the Wason selection task, the findings in numbers (1) and (2) are on full display. Groups do better than individuals not only because someone in the room might have the right answer (which would presumably be recognized once it was arrived at) but because even in those situations in which no one knows the right answer, the group can interact and ask critical questions, so that they may discover something that none of them individually knew. In this case, the collective outcome can be greater than what would have been reached by any of its individual members (or a mere aggregation of group opinion).
There is of course some nuance to this argument, and one must state the limiting conditions, for there are some circumstances that undermine the general result. For instance, in groups where there is heavy pressure to defer to authority, or there is strict hierarchy, groups can succumb to “group think.” In such cases, the benefits of group reasoning can be lost to the “cascade effect” of whomever speaks first, or to an “authority effect” where one subordinates one’s opinion to the highest ranking member of the group. Indeed, in such cases the aggregative effect of groups can even be a drawback, where everyone expresses the same opinion—whether they believe it or not—and group reasoning converges on falsehood rather than truth. To remedy this, Sunstein proposes a set of principles to keep groups at their most productive. Among them:
(1) groups should regard dissent as an obligation,
(2) critical thinking should be prized, and
(3) devil’s advocates should be encouraged.17
This sounds a good deal like science. Especially in a public setting, scientists are notably competitive with one another. Everyone wants to be right. There is little deference to authority. In fact, the highest praise is often reserved not for reaching consensus but for finding a flaw in someone else’s reasoning. In such a super-charged interactive environment, Sunstein’s three earlier findings may reasonably be combined into one where we would expect that groups of experts who interact with one another would be the most likely to find the right answer to a factual question. Again, this sounds a lot like science.
I think it is important, however, not to put too much emphasis on the idea that it is the group aspect of science that makes so much difference as the fact that science is an open and public process. While it is true that scientists often work in teams and participate in large conferences and meetings, even the lone scientist critiquing another’s paper in the privacy of her office is participating in a community process. It is also important to note that even though these days it is more common for scientists to collaborate on big projects like the Large Hadron Collider, this could hardly be the distinguishing feature of what makes science special, for if it were then what to say about the value of those scientific theories put forth in the past by sole practitioners like Newton and Einstein? Working in groups is one way of exposing scientific ideas to public scrutiny. But there are others. Even when working alone, one knows that before a scientific theory can be accepted it still must be vetted by the larger scientific community. This is what makes science distinctive: the fact that its values are embraced by a wider community, not necessarily that the practice of science is done in groups.
It is the hallmark of science that individual ideas—even those put forth by experts—are subjected to the highest level of scrutiny by other experts, in order to discover and correct any error or bias, whether intentional or not. If there is a mistake in one’s procedure or reasoning, or variance with the evidence, one can expect that other scientists will be motivated to find it. The scientific attitude is embraced by the community as embodied in a set of practices. It is of course heartening to learn from Sunstein and others that experimental work in psychology has vindicated a method of inquiry that is so obviously consonant with the values and practices of science. But the importance of these ideas has long been recognized by philosophers of science as well.
In a brilliant paper entitled “The Rationality of Science versus the Rationality of Magic,”18 Tom Settle examines what it means to say that scientific belief is rational whereas belief in magic is not. The conclusion he comes to is that one need not disparage the individual rationality of members of those communities that believe in magic any more than one should attempt to explain the distinctiveness of science through the rationality of individual scientists. Instead, he finds the difference between science and magic in the “corporate critical-mindedness” of science, which is to say that there is a tradition of group criticism of individual ideas that is lacking in magic.19 As Hansson explains in his article “Science and Pseudo-Science,”
[According to Settle] it is the rationality and critical attitude built into institutions, rather than the personal intellectual traits of individuals, that distinguishes science from non-scientific practices such as magic. The individual practitioner of magic in a pre-literate society is not necessarily less rational than the individual scientists in modern Western society. What she lacks is an intellectual environment of collective rationality and mutual criticism.20
As Settle explains further:r />
I want to stress the institutional role of criticism in the scientific tradition. It is too strong a requirement that every individual within the scientific community should be a first rate critic, especially of his own ideas. In science, criticism may be predominantly a communal affair.21
Noretta Koertge too finds much to praise in the “critical communities” that are a part of science. In her article “Belief Buddies versus Critical Communities,” she writes:
I have argued that one characteristic differentiating typical science from typical pseudoscience is the presence of critical communities, institutions that foster communication and criticism through conferences, journals, and peer review. … We have a romantic image of the lone scientist working in isolation and, after many years, producing a system that overturns previous misconceptions. We forget that even the most reclusive of scientists these days is surrounded by peer-reviewed journals; and if our would-be genius does make a seemingly brilliant discovery, it is not enough to call a news conference or promote it on the web. Rather, it must survive the scrutiny and proposed amendments of the relevant critical scientific community.22
Finally, in the work of Helen Longino, we see respect for the idea that, even if one views science as an irreducibly social enterprise—where the values of its individual practitioners cannot help but color their scientific work—it is the collective nature of scientific practice as a whole that helps to support its objectivity. In her pathbreaking book Science as Social Knowledge,23 Longino embraces a perspective that one may initially think of as hostile to the claim that scientific reasoning is privileged. Her overall theme seems consonant with the social constructivist response to Kuhn: that scientific inquiry, like all human endeavors, is value laden, and therefore cannot strictly speaking be objective; it is a myth that we choose our beliefs and theories based only on the evidence.
Indeed, in the preface to her book, Longino writes that her initial plan had been to write “a philosophical critique of the idea of value-free science.” She goes on, however, to explain that her account grew to be one that “reconciles the objectivity of science with its social and cultural construction.” One understands that her defense will not draw on the standard empiricist distinction between “facts and values.” Nor is Longino willing to situate the defense of science in the logic of its method. Instead, she argues for two important shifts in our thinking about science: (1) that we should embrace the idea of seeing science as a practice, and (2) that we should recognize science as practiced not primarily by individuals but by social groups.24 But once we have made these shifts a remarkable insight results, for she is able to conclude from this that “the objectivity of scientific inquiry is a consequence of this inquiry’s being a social, and not an individual, enterprise.”
How does this occur? Primarily through recognition that the diversity of interests held by each of the individual scientific practitioners may grow into a system of checks and balances whereby peer review before publication, and other scrutiny of individual ideas, acts as a tonic to individual bias. Thus scientific knowledge becomes a “public possession.”
What is called scientific knowledge, then, is produced by a community (ultimately the community of all scientific practitioners) and transcends the contributions of any individual or even of any subcommunity within the larger community. Once propositions, theses, and hypotheses are developed, what will become scientific knowledge is produced collectively through the clashing and meshing of a variety of points of view. …
Objectivity, then, is a characteristic of a community’s practice of science rather than of an Individual’s, and the practice of science is understood in a much broader sense than most discussions of the logic of scientific method suggest.25
She concludes, “values are not incompatible with objectivity, but objectivity is analyzed as a function of community practices rather than as an attitude of individual researchers toward their material.”26 One can imagine no better framing of the idea that the scientific attitude must be embraced not just by individual researchers but by the entire scientific community.
It should be clear by now how the conclusions of Settle, Koertge, and Longino tie in neatly with Sunstein’s work. It is not just the honesty or “good faith” of the individual scientist, but fidelity to the scientific attitude as community practice that makes science special as an institution. No matter the biases, beliefs, or petty agendas that may be put forward by individual scientists, science is more objective than the sum of its individual practitioners. As philosopher Kevin deLaplante states, “Science as a social institution is distinctive in its commitment to reducing biases that lead to error.”27 But how precisely does science do this? It is time now to take a closer look at some of the institutional techniques that scientists have developed to keep one another honest.
Methods of Implementing the Scientific Attitude to Mitigate Error
It would be easy to get hung up on the question of the sources of scientific error. When one finds a flaw in scientific work—whether it was intentional or otherwise—it is natural to want to assign blame and examine motives. Why are some studies irreproducible? Surely not all of them can be linked to deception and fraud. As we have seen, there also exists the possibility of unconscious cognitive bias that is a danger to science. It is wrong to think of every case in which a study is flawed as one of corruption.28 Still, the problem has to be fixed; the error needs to be caught. The important point at present is not where the error comes from but that science has a way to deal with it.
We have already seen that groups are better than individuals at finding scientific errors. Even when motivated to do so, an individual cannot normally compete with the “wisdom of crowds” effect among experts. Fortunately, science is set up to bring precisely this sort of group scrutiny to scientific hypotheses. Science has institutionalized a plethora of techniques to do this. Three of them are quantitative methods, peer review, and data sharing and replication.
Quantitative Methods
There are entire books on good scientific technique. It is important to point out that these exist not just for quantitative research, but for qualitative investigation as well.29 There are some time-honored tropes in statistical reasoning that one trusts any scientist would know. Some of these—such as that there is a difference between causation and correlation—are drilled into students in Statistics 101. Others—for instance, that one should use a different data set for creating a hypothesis than for testing it—are a little more subtle. With all of the rigor involved in this kind of reasoning, there is little excuse for scientists to make a quantitative error. And yet they do. As we have seen, the tendency to withhold data sets is associated with mathematical errors in published work. Given that everyone is searching for the holy grail of a 95 percent confidence level that indicates statistical significance, there is also some fudging.30 But publicity over any malfeasance—whether it is due to fraud or mere sloppiness—is one of the ways that the scientific attitude is demonstrated. Scientists check one another’s numbers. They do not wait to find an error; they go out and look for one. If it is missed at peer review, some errors are found within hours of publication. I suppose one could take it as a black mark against science that any quantitative or analytical errors ever occur. But it is better, I think, to count it as a virtue of science that it has a culture of finding such errors—where one does not just trust in authority and assume that everything is as stated. Yet there are some methodological errors that are so endemic—and so insidious—that they are just beginning to come to light.31
Statistics provides many different tests one can do for evidential relationships. Of course, none of these will ever amount to causation, but correlation is the coin of the realm in statistics, for where we find correlation we can sometimes infer causation. One of the most popular calculations is to determine the p-value of a hypothesis, which is the probability of finding a particular result if the null hypothesis were true (which is to say if there were no rea
l-world correlation between the variables). The null hypothesis is the working assumption against which statistical significance is measured. It is the “devil’s advocate” hypothesis that no actual causal relationship exists between two variables, which means that if one finds any such correlation it will be the result of random chance. To find statistical significance, one must therefore find strong enough statistical evidence to reject the null hypothesis—one must show that the correlation found is greater than what one would expect from chance alone. By convention, scientists have chosen 0.05 as the inflection point to indicate statistical significance, or the likelihood that a correlation would not have occurred just by chance. When we have reached this threshold, it suggests a higher likelihood of real-world correlation.32 The p-value is therefore the probability that you would get your given data if the null hypothesis were true. A small p-value indicates that it is highly improbable that a correlation is due to chance; the null hypothesis is probably wrong. This is what scientists are looking for. A large p-value is just the opposite, indicating weak evidence; the null hypothesis is probably true. A p-value under 0.05 is therefore highly sought after, as it is customarily the threshold for scientific publication.
P-hacking (also known as data dredging) is when researchers gather a large amount of data, then sift through it looking for anything that might be positively correlated.33 As understood, since there is a nonzero probability that two things may be correlated by chance, this means that if one’s data set is large enough, and one has enough computing power, one will almost certainly be able to find some positive correlation that meets the 0.05 threshold, whether it is a reflection of a real-world connection or not. This problem is exacerbated by the degrees of freedom we talked about earlier, whereby researchers make decisions about when to stop collecting data, which data to exclude, whether to hold a study open to seek more data, and so on. If one looks at the results midway through a study to decide whether to continue collecting data, that is p-hacking.34 Self-serving exploitation of such degrees of freedom can be used to manipulate almost any data set into some sort of positive correlation. And these days it is much easier than it used to be.35 Now one merely has to run a program and the significant results pop out. This in and of itself raises the likelihood that some discovered results will be spurious. We don’t even need to have a prior hypothesis that two things are related anymore. All we need are raw data and a fast computer. An additional problem is created when some researchers choose to selectively report their results, excluding all of the experiments that fall below statistical significance. If one treats such decisions about what to report and what to leave out as one’s final degree of freedom, trouble can arise. As Simmons et al. put it in their original paper: “it is unacceptably easy to publish ‘statistically significant’ evidence consistent with any hypothesis.”36