Sabine Hossenfelder: Backreaction: The Reproducibility Crisis: An Interview with Prof. Dorothy Bishop

Saturday, February 15, 2020

The Reproducibility Crisis: An Interview with Prof. Dorothy Bishop

On my recent visit to Great Britain (the first one post-Brexit) I had the pleasure of talking to Dorothy Bishop. Bishop is Professor of Psychology at the University of Oxford and has been a leading force in combating the reproducibility crisis in her and other disciplines. You find her on twitter under the handle @deevybee . The comment for Nature magazine which I mention in the video is here.

25 comments:

Rick Lubbock1:11 PM, February 15, 2020
Here is something from personal experience: about a decade ago I decided to get involved in biochemistry. First project was to work with a researcher who had gotten great results from a wunderkind grad student. We were simply going to replicate his experiment before doing a grant proposal based on his results. His work resulted in a published paper, by the way. And we couldn't do it. We found that his lab notes were not adequate and we even spoke with the student, who had moved on to another lab, and he didn't quite recall how he had done it. We spent six months in the attempt to replicate those results with no success. Ultimately the decision was made to assume the results were valid and to write that grant proposal anyway. Which we did. And the grant was awarded. I had a bad feeling about it and didn't want to be involved beyond that point. I found something else to do. I really don't know what came of this, what was ultimately published. I learned years later that the director of this particular project had retired. I do not know whether anyone ever managed to replicate the original study. I was somewhat disillusioned.
ReplyDelete
Replies
Bourbaki1:55 PM, February 15, 2020
Thank you Sabine for this interview,
Reproducibility and its 4 horsemen is indeed a growing problem in all areas of science.
If I may share my mathematician perspective it is getting increasingly worse. Its bad, really bad.
Despite these problems of reproducibility, mathematical areas that have been exploding (in terms of publications numbers) are the ones deriving from statistics. Long gone are the days when being an Applied Mathematician meant someone studying differential equations, numerical analysis or probabilities. Statisticians are the ones taking over universities departments of mathematics. Most of the time, all they do is data analysis and crushing numbers into Statistics softwares to publish their results (I have colleagues writing articles in ONE day). And they do publish a lot of results.. most of the time very similar results in different journals.
People like me, pure mathematicians, that can take 2-3 years (not to say a lifetime) to find something interesting to say can not keep up with this level of production. This is killing foundamental research.

Journals are indeed not interested in null results. If you invest 1 year (or more) researching an idea, developing a great deal of knowledge on a problem but end up finding nothing no journal will publish your study. No one will know that the path or angle you chose is fruitless. This is also killing foundamental research.

Lastly, and on this I would like to have your(s) opinion(s). Before this age of communication and easy access to information, research used to be "vertical": we built our knowledge based on the work of our predecessors. It was a slow process but it is what made the 20th century breakthroughs possible.
Nowadays, research seems "horizontal": huge amounts of articles are beeing published and no-one is able to keep track of all of this. There is a lot of noise, rubbish, a frenzy to publish as fast as possible. Our peer-review system cannot keep up with this. It does not work anymore.
And the worse is that as researchers we are not trying to solve difficult problems, we are biasing our work towards easier problems. In this context, anyone can be a researcher in any area. Most of these areas... applied. And that is a big problem for me... because we are loosing, little by little, all the know-how that made the discoveries of the 20th century possible.
ReplyDelete
Replies
Philosopher Eric8:06 PM, February 15, 2020
Though my gut reaction is that fixing psychology this way will be about like making socialism work, I am happy that this problem is at least now being acknowledged and worked on rather than not. I had to smile about how professor Bishop suggested that certain standard text book cases would be removed. Hopefully so, but apparently that’s not yet displayed by the Stanford Prison Experiment. Julia Galef did a great podcast on a French book by Thibault Le Texier that suggests psychologists are still in denial. http://rationallyspeakingpodcast.org/show/rs-241-thibault-le-texier-on-debunking-the-stanford-prison-e.html

Beyond efforts from within, I believe that science will need better structural principles from which to work so that our soft sciences can finally harden up. Or now that evidence has become scarce in physics, so that even physics may be defended from getting “Lost in Math”. I propose one for metaphysics, two for epistemology, and one for axiology.
ReplyDelete
Replies
t marvell9:25 PM, February 15, 2020
As a social scientists, I run across these problems constantly, and I agree with everything except the importance of a pre-registered research plan and hypothesis. These in effect mean that the researchers have the conceit to play god, pretending that they know enough to form a plan and hypothesis, and it rules out serendipitious results if they fall outside the researchers' initial mindsets. Such results are likely to be more important than a pre-set hypothesis. This usually involves some p-hacking, but the problem is with the "p". Researchers need to use a much lower "p" than the standard .05. Physicists when hunting for new particles use something like .000001. In many situations researchers can use more stringent significance tests, such as the Bonferroni correction, to handle multiple testing problems.
Even with a pre-registered plan and hypothesis the .05 criteria is misleading if many researchers are working on the same issue, because by chance some 5% will find significant relationships even if there is no relationship. And, as stressed, such results are far more likely to be published.
ReplyDelete
Replies
Ashley8:46 AM, February 16, 2020
I observed, as a cognitive behavioural therapist, that research papers have either a high P value, but the standard deviation is large, or a small standard deviation, but a low P value. This was a feature back when I started training back in the late 1990s.

As for the reproducibility of some of the classic experiments, I observe that people's personalities are a confounding variable when coming to a conclusion of what the results meant.

So my inclination is to examine behaviours or symptoms, which can be measured, not the cognitions or feelings that are difficult to quantize objectively.

Like physics, there are those in psychology who subscribe to beliefs that are not scientific.
ReplyDelete
Replies
MarkinDallas10:23 AM, February 16, 2020
Great episode. I appreciate the concern about weaponization, but science to me is about shining light, so transparency with the public is important.

I understand not using data to support a hypothesis not originally contemplated by the experiment, but I assume it is okay to look at data, acknowledge it did not support the original hypothesis, and formulate a new hypothesis from the data, provided that the new hypothesis is then tested independent from the first attempt.
ReplyDelete
Replies
Cal Billings1:55 PM, February 16, 2020
This replication crisis has been a disaster for social psychology, and medical research, but a boon for the researchers. P-hacking, contrary to the claims made in the interview, is a conscious act and will only be eliminated by constant attention and brutal consequences for the offenders - not a highly probable outcome, at least not less than .05.
ReplyDelete
Replies
rom4:22 PM, February 16, 2020
When I started my PhD in solvent extraction forty odd years ago, my supervisor got me to do an undergraduate experiment extracting copper. I could not get the accepted answer of 2.0 for the slope. Kept getting 1.8. Tried for three months … different spectrometers etc. Eventually gave up and got on with the thesis. A couple of years later, supervising the under graduate class, a student pointed out he was getting 1.8 (none of the preceding or subsequent students complained). A bit later a fellow PhD student mentioned he could not get a slope of 2 either.

Eventually I went through the math and showed it should not be 2.0. That was about the only original thing in my thesis and not the main point of it.
ReplyDelete
Replies
Prof. Sholl9:19 AM, February 17, 2020
For a recent quantitative analysis of how often experimental reports of the synthesis of a new material are repeated in the scientific literature, see this recent PNAS paper: https://www.pnas.org/content/117/2/877.short

The process of writing this paper involved some hard-working students in my group analyzing thousands of individual papers.
ReplyDelete
Replies
RK3:12 PM, February 17, 2020
Hi Sabine,

Recently, the presentations at Metascience 2019 have been made available.

Glad you interviewed Professor Bishop. Her presentation is available here, entitled : "The role of cognitive biases in sustaining bad science"

https://www.metascience2019.org/presentations/dorothy-bishop/

Cheers
ReplyDelete
Replies
Kevin S. Van Horn3:22 PM, February 17, 2020
Andrew Gelman has also written extensively on this topic on his blog. One additional "horseman" I think he would add is the binary thinking promoted by null-hypothesis statistical testing (NHST): results are either "significant" and the hypothesis can be considered verified, or "not significant" and the hypothesis can be considered disproven. (Confidence intervals and Bayesian methods are two alternatives that avoid this problem.)

Gelman furthermore notes,

"The problem with null hypothesis significance testing is that rejection of straw-man hypothesis B is used as evidence in favor of preferred alternative A. This is a disaster. See here."
ReplyDelete
Replies
Lockley3:36 PM, February 18, 2020
Thank you for your link from Gelman's article.

I found it useful and helps delineate some of the issues discussed above.
ReplyDelete
Replies
Michael John Sarnowski4:48 PM, February 18, 2020
The worse thing with scientific studies is that they can only control for so many variables. Lets say that we publish that sunlight causes melanoma. Everybody puts sunscreen on and 40 years later we find out that the chemicals from the sunscreen cause 10 other kinds of cancer and not getting enough sun causes 10 other kinds of cancer. Studies rarely can be good enough to make decisions about health.
ReplyDelete
Replies
Lance6:15 PM, February 19, 2020
How does someone get a PhD in whatever branch of knowledge without knowing which research approaches, methodologies and methods are pertinent to that subject and having been properly trained in the applicable research methods!!? It appears to me that the standard of awarding PhDs needs to be improved so that any PhD student shows mastery of the applicable research methodologies etc. Yes the incentives may be wrong but is no one ethical anymore? How can you knowingly produce faulty research? How can journal reviewers knowingly accept faulty research? I do not buy the idea they don't know - they appear rather to simple not care as the driving force is article publication no matter what the validity and reliability of the research.
ReplyDelete
Replies
stor8:49 AM, February 20, 2020
Bourbaki,

You don't need money. Most math PhDs are supported by the school.
ReplyDelete
Replies
stor7:02 AM, February 21, 2020
Bourbaki,

you said that nowadays anyone with money can get a PhD. All I was saying in my comment is that you don't even need to have money. Anyone can get a PhD money or not. I didn't have any money, but I did get my PhD in pure math in a US university.
ReplyDelete
Replies