Are we fooling ourselves? Can science ever be free of our very human biases?
Since the Enlightenment, independent replication of findings has been a core tenet of the scientific method. But are we fooling ourselves? Can science ever be free of our very human biases? David Mellor thinks it can…
It is one thing to proclaim that a discovery has been made, but in order for a claim to be credible a higher level of evidence is required. Back when the basic features of the scientific method were being developed by Francis Bacon and his contemporaries in the seventeenth century, gentlemen (and in a society where women were excluded from public discourse, they were almost exclusively men) would tour each other’s estates and demonstrate interesting new findings. This culture of verification was codified in the motto of the Royal Society, Nullius in verba; Take nobody's word for it.
As the practice of scientific discovery widened, it became impractical to provide live demonstrations of new methods. The primary means of disseminating new information gradually evolved into the peer-reviewed article, which summarized the methods and results of each new finding. The peer review ensured that these findings were vetted by independent experts, who could ensure that the basic methodology and interpretations were sound. Unfortunately, even this expert level review could only assess the author’s summary of the study, not the essential underlying data, which could not be easily transported. And even more limited, this mode of dissemination can not assess anything left out of that summary article. This has resulted in one of the two necessary conditions for a replication crisis: secrecy.
The other part of the problem is bias. Bias creeps into the scientific workflow just like it creeps into every part of human life. We value evidence that confirms what we already believe more than we value evidence that contradicts our beliefs and we convince ourselves that events were obvious and predictable after they have occurred, despite not having come to that conclusion earlier. These biases in how the human mind works manifest themselves in a body of scientific literature that is itself biased toward statistically significant, new findings, even though there is no rational reason that negative evidence should be any less accurate or rigorous.
Statistical significance, novelty of results, or the possible impact of the results are unscientific reasons to share evidence
We are much more likely to tell a story that states: “I found this unexpected difference between a treatment and a control group” than one that states “I tried this new treatment, and nothing happened.” Critics of that latter story will quickly pounce and assume that there was no good reason that treatment should have worked, or perhaps that the study was done wrong and should not be shared, despite the fact that the rationale and methods were perfectly justifiable. This results in a large “file drawer” of research that is never shared outside of a researcher’s lab.
Some results are more equal than others
Even though we’ve known about publication bias for a long time (since at least 1979
), it’s been tough to figure out if the work that stays in the file drawer is any good. Perhaps the work was done in a sloppy manner and deserves to stay there. Fortunately, that question has recently been answered for a large body of research.
Time-sharing Experiments for the Social Sciences (TESS) is a service that researchers use to reach a large, representative sample of people for taking surveys. Researchers propose a survey study to TESS and, if accepted, TESS then gathers the data. Researchers use the data to answer whatever questions they are working on. Importantly, all of the research is conducted in the same way, and all of the questions given to TESS are deemed important and of high quality before the survey is sent out. However, only some of the findings are ever reported in peer reviewed articles. Researchers from Stanford University (Annie Franco, Neil Malhotra, and Gabor Simonovits) were able to compare the results that were reported in peer reviewed articles to those that were not reported.
A majority of the findings that were reported (63%) were “statistically significant”, whereas only 23% of the findings that never reached publication were significant. If decisions were based only on the merit of the research, there should be no difference between the reported and unreported findings, but they’re not.
The final way that bias creeps into the scientific workflow is during data analysis. Answering a scientific question often starts out with a seemingly straightforward question. For example: do referees give more penalties to players who have darker skin tones than they do to players with light skin tones? However, even in these seemingly simple questions, there are many ways to go about answering a question. Different statistical methods, different ways to quantify skin color, and different variables that may affect the results that are not part of the original question
(for example, maybe players with different skin tones tend to play in different positions, and that has a larger impact on penalty cards than the skin tone itself) can lead to different conclusions even when using the same dataset
To recap, there is a strong bias toward reporting significant findings. Also, lots of research is done in relative secrecy with few details being shared. Finally, any single dataset can result in different interpretations, with positive results being more valued and likely to be shared. The combination of these factors have lead to what is now being called the reproducibility crisis: most published research findings cannot be replicated by others who try to do the same experiments. This is true in psychology, cancer biology, economics, and is likely true for any field where the above conditions exist.
We are much more likely to tell a story that states: “I found this unexpected difference between a treatment and a control group” than one that states “I tried this new treatment, and nothing happened.”
One fear that we hear from fellow scientists is that this talk about a “reproducibility crisis” will provide fuel to climate skeptics who assert that the overwhelming evidence for human-caused climate change is not true. However, the climate research community has recognised this and has taken steps to ensure that these conditions do not exist.
Fortunately, there are concrete and realistic solutions to these reproducibility problems. At the Center for Open Science, we work to promote tools and solutions that address each of the above issues. Our mission is to increase the reproducibility of scientific research through increased transparency, and we do that by advocating for policies that reward ideal scientific practices and by building tools that help researchers organise their work and make it ready for sharing and archiving.
Two for one
Two of the most important solutions reduce bias in data analysis and in publication. The first, preregistration, requires that researchers specify in advance how data will be collected and analysed. The practice is well established in clinical research, where patients are given experimental treatments, but is relevant to any research where hidden biases can affect the outcomes. Ultimately, preregistration makes it clear when a researcher is conducting a strong test of their hypothesis.
When researchers preregister their work, they lay out which tests will be done. Then, after collecting and analysing the data according to their plan, they report everything that was planned ahead of time, so that they do not unintentionally cherry pick only the few tests that actually return the most tantalising results. It’s possible those other tests are more important, but it’s tough to be sure unless all of the results are reported together.
The second major solution builds upon preregistration and also ensures that all high quality evidence gets shared. This solution, Registered Reports starts with a preregistration, but the researcher shares it with journals and with expert peer reviewers. The research plan undergoes scrutiny and improvement. If the research in question is important enough to share regardless of outcome, and if the methods are appropriate to answer those questions, the journal gives the authors a promise to publish regardless of outcome. After conducting the study, the results are peer reviewed one more time prior to publication to ensure that the study was conducted as promised. Statistical significance, novelty of results, or the possible impact of the results are unscientific reasons to share evidence, and these factors play no role in deciding whether or not a finding is published when they are part of a registered report.
Our goal is to increase the use of these practices, and others such as sharing data and earlier versions of research reports, preprints. Doing that takes a couple of steps, such as convincing policy makers at journals and funding agencies to implement them. It also takes building tools so that researchers can make a preregistration in a persistent, online repository. Ultimately, we want to both educate the research community about these solutions and to enable anyone who wants to undertake these solutions.
Reducing bias is an underlying obligation of every scientific investigation. As Richard Feynman famously stated: “The first principle is that you must not fool yourself – and you are the easiest person to fool.” Our work at the Center for Open Science is designed to help those researchers who are willing to take the steps necessary to prevent themselves from being fooled.
David Mellor is Project Manager for Journal and Funder Initiatives at the Center for Open Science (COS). He received his PhD in Ecology and Evolution from Rutgers University. His research interests have covered behavioural ecology in cichlid fish, citizen science, and scientific reproducibility.