Saturday, December 07, 2013

Are irreproducible scientific results okay and just business as usual?

That I even have to write about this tells you how bad it is.

During the last years a lot of attention has been drawn to the prevalence of irreproducible results in science. That published research findings tend to weaken or vanish over time is a pressing problem in particular in some areas of the life sciences, psychology and neuroscience. On the face of it, the issue is that scientists work with too small samples and frequently cherry-pick their data. Next to involuntarily poor statistics, the blame has primarily been put on the publish-or-perish culture of modern academia.

While I blame that culture for many ills, I think here the finger is pointed at the wrong target.

Scientists aren’t interested in publishing findings that they suspect to be spurious. That they do it anyway is because a) funding agencies don’t hand out sufficient money for decent studies with large samples b) funding agencies don’t like reproduction studies because, eh, it’s been done before and c) journals don’t like to publish negative findings. The latter in particular leads scientists to actively search for effects, which creates a clear bias. It also skews meta-studies against null results.

That’s bad, of course.

I will not pretend that physics is immune to this problem, though in physics the issue is, forgive my language, significantly less severe.

A point in case though is the application of many different analysis methods to the same data set. Collaborations have their procedures sorted out to avoid this pitfall, but once the data is public it can be analyzed by everybody and their methods, and sooner or later somebody will find something just by chance. That’s why, every once in while we hear of a supposedly interesting peculiarity in the cosmic microwave background, you know, evidence for a bubble collision, parallel universes, a cyclic universe, a lopsided universe, an alien message, and so on. One cannot even blame them for not accounting for other researchers who are trying creative analysis methods on the same data, because that’s unknown unknowns. And theoretical papers can be irreproducible in the sense of just being wrong, but the vast majority of these just get ignored (and if not the error is often of interest in itself).

So even while the fish at my doorstep isn’t the most rotten one, I think irreproducible results are highly problematic, and I welcome measures that have been taken, eg by Nature magazine, to improve the situation.

And then there’s Jared Horvath, over at SciAm blogs, who thinks irreproducibility is okay, because it’s been done before. He lists some famous historical examples where scientists have cherry-picked their data because they had a hunch that their hypothesis is correct even if the data didn’t support it. Jared concludes:
“There is a larger lesson to be gleaned from this brief history. If replication were the gold standard of scientific progress, we would still be banging our heads against our benches trying to arrive at the precise values that Galileo reported.”
You might forgive Jared, who is a is a PhD candidate in cognitive neuroscience, for cherry picking his historical data, because he’s been trained in today’s publish-and-perish culture. Unfortunately, he’s not the only one who believes that something is okay because a few people in the past succeeded with it. Michael Brooks has written a whole book about it. In “Free Radicals: The Secret Anarchy of Science”, you can read for example
“It is the intuitive understanding, the gut feeling about what the answer should be, that marks the greatest scientists. Whether they fudge their data or not is actually immaterial.”
Possibly the book gets better after this, but I haven’t progressed beyond this page because every time I see that paragraph I want to cry.

The “gut feeling about what the answer should be” does mark great scientists, yes. It also marks pseudoscientists and crackpots, just that you don’t find these the history books. The argument that fudging data is okay because great scientists did it and time proved them right is like browsing bibliographies and concluding that in the past everybody was famous.

I’m not a historian and I cannot set that record straight, but I can tell you that the conclusion that irreproducibility is a necessary ingredient to scientific progress is unwarranted.

But I have one piece of data to make my case, a transcript of a talk given by Irwin Langmuir in the 1950s, published in Physics Today in 1989. It carries the brilliant title “Pathological Science” and describes Langmuir’s first-hand encounters with scientists who had a gut feeling about what the answer should be. I really recommend you read the whole thing (pdf here), but just for the flavor here’s an excerpt:
Mitogenic rays.
About 1923 there was a whole series of papers by Gurwitsch and others. There were hundreds of them published on mitogenic rays. There are still a few of them being published [in 1953]. I don’t know how many of you have ever heard of mitogenic rays. They are given off by growing plants, living things, and were proved, according to Gurwitsch, to be something that would go through glass but not through quarz. They seemed to be some sort of ultraviolet light… If you looked over these photographic plates that showed this ultraviolet light you found that the amount of light was not so much bigger than the natural particles of the photographic plate, so that people could have different opinions as to whether or not it showed this effect. The result was that less than half of the people who tried to repeat these experiments got any confirmation of it…”
Langmuir relates several stories of this type, all about scientists who discarded some of their data or read output to their favor. None of these scientists has left a mark in the history books. They have however done one thing. They’ve wasted their and other scientist’s time by not properly accounting for their methods.

There were hundreds of papers published on a spurious result – in 1953. Since then the scientific community has considerably grown, technology has become much more sophisticated (not to mention expensive), and scientists have become increasingly specialized. For most research findings, there are very few scientists who are able to conduct a reproduction study, even leaving aside the problems with funding and publishing. In 2013, scientists have to rely on their colleagues much more than was the case 60 years ago, and certainly in the days of Millikan and Galileo. The harm being caused by cherry picked data and non-reported ‘post-selection’ (a euphemism for cherry-picking), in terms of waste of time has increase with the community. Heck, there were dozens of researchers who wasted time (and thus their employers money...) on ‘superluminal neutrinos’ even though everybody knew these results to be irreproducible (in the sense that they hadn’t been found by any previous measurements).

Worse, this fallacious argument signals a basic misunderstanding about how science works.

The argument is based on the premise that if a scientific finding is correct, it doesn’t matter where it came from or how it was found. That is then taken to justify the ignorance of any scientific method (and frequently attributed to Feyerabend). It is correct in that in the end it doesn’t matter exactly how a truth about nature was revealed. But we do not speak of a scientific method to say that there is only one way to make progress. The scientific method is used to increase the chances of progress. It’s the difference between letting the proverbial monkey hammer away and hiring a professional science writer for your magazine’s blog. Yes, the monkey can produce a decent blogpost, and if that is so then that is so. But chances are eternal inflation will end before you get to see a good result. That’s why scientists have quality control and publishing ethics, why we have peer review and letters of recommendation, why we speak about statistical significance and double-blind studies and reproducible results: Not because in the absence of methods nothing good can happen, but because these methods have proven useful to prevent us from fooling ourselves and thereby make success considerably more likely.

Having said that, expert intuition can be extremely useful and there is nothing wrong with voicing a “gut feeling” as long as it is marked as such. It is unfortunate indeed that the present academic system does not give much space for scientists to express their intuition, or maybe they are shying away from it. But that’s a different story and shell be told another time.

So the answer to the question posed in the title is a clear no. The question is not whether science has progressed despite the dishonest methods that have been employed in the past, but how much better if would have progressed if that had not been so.


I stole that awesome gif from over here. I don't know its original source.


  1. This is simultaneously brilliant and depressing. The world of manufactured reality has even invaded science. Heck, I think it's even invaded mathematics which is just about the last straw.

  2. Are the or reproducible results concentrated in areas which are regulated? E.g., pharma drug trials, pesticide environmental safety tests, etc.?

  3. Not all irreproducible results are bad. Some of them are a necessary part of science. Any real experimental system is complicated. If an investigator works diligently and honestly, makes a complicated measurement, uses all available means to try to rule out sources of error, and reports the result with the appropriate caveats and with enough methodological detail for others to attempt to replicate the work, the experimentalist has done honest science. If the report spurs further experimental work that helps the community learn something, the experimentalist has done beneficial science. Hopefully what we'll learn is that the measurement was right, but even if it was wrong, if it was done honestly and spurred an intellectually important line of inquiry, the experimentalist has done something useful.

    People should not be afraid of irreproducible results. They should be afraid of shoddy work. If they refrain from doing shoddy work, most of the work that they do report will be reproducible, and the rest will at least be a healthy contribution to the line of inquiry. If they do shoddy work, irreproducibility will be the inevitable consequence.

  4. Just a correction. It's Irving Langmuir (not Irwin), who coined the term "plasma" for ionized gas, and who worked with Katharine Blodgett to develop reflectionless coatings on glass using thin films.

  5. The Sci. Am. article is warm and squishy, saying diversity (proven inability to perform) is OK. Sci. Am. tightens nuts with pliers, valuing process over product. "Mitogenic rays," single photon counting, the bane of (solar) axion detection. Often wonderful Luboš Motl spun theory of axion dark matter. He had an empirical deficit factor exceeding 10^18. The sin is Luboš not admitting revealed error.
    Is the bottom band is morally reprehensible for denying the top band "deserved" boons?

  6. There's a big difference between a gut feeling that is just a gut feeling and a gut feeling that has a serious basis to it. For example, one might have a gut feeling that a theory is correct, even if an observation appears to contradict it, because it has qualities such as simplicity and explanatory value. So if someone tries to defend themselves by appealing to their gut, one should ask, "And why does your gut feel the way it does?"

  7. John Horvath wrote:

    "At the turn of the 17th century, Galileo rolled a brass ball down a wooden board and concluded that the acceleration he observed confirmed his theory of the law of the motion of falling bodies. Several years later, Marin Mersenne attempted the same experiment and failed to achieve similar precision, causing him to suspect that Galileo fabricated his experiment."

    The issue seems to have been something different.


    "This article analyzes the evolution of Mersenne's views concerning the validity of Galileo's theory of acceleration. After publishing, in 1634, a treatise designed to present empirical evidence in favor of Galileo's odd-number law, Mersenne developed over the years the feeling that only the elaboration of a physical proof could provide sufficient confirmation of its validity. In the present article, I try to show that at the center of Mersenne's worries stood Galileo's assumption that a falling body had to pass in its acceleration through infinite degrees of speed. His extensive discussions with, or his reading of, Descartes, Gassendi, Baliani, Fabri, Cazre, Deschamps, Le Tenneur, Huygens, and Torricelli led Mersenne to believe that the hypothesis of a passage through infinite degrees of speed was incompatible with any mechanistic explanation of free fall."

    Infinite Degrees of Speed Marin Mersenne and the Debate Over Galileo's Law of Free Fall

    Author: Carla Rita Palmerino 1

  8. Here is modern attempt to replicate Galileo's experiment in the original. Unfortunately I do not find the page where they tell us what they found.

  9. Horvath wrote:

    "Early in the 19th century, after mixing oxygen with nitrogen, John Dalton concluded that the combinatorial ratio of the elements proved his theory of the law of multiple proportions. Over a century later, J. R. Parington tried to replicate the test and concluded that “…it is almost impossible to get these simple ratios in mixing nitric oxide and air over water.”"

    1. It is J.R. Partington, I think.

    2. I'm sure J.R. Partington has written more in his 4 volume work on the history of chemistry, which is inaccessible to me. You may want to read this:

    3. Dalton's results were not solely from nitric oxide, but also ethylene (oleifant gas) and methane (carburetted hydrogen).

  10. The most interesting, recent and relevant one of Horvath's mentions is that of Millikan.

    In that regard, please read this:

    Brit. J. Phil. Sci.56
    (2005), 681–702
    An Appraisal of the Controversial
    Nature of the Oil Drop Experiment: Is Closure Possible? Mansoor Niaz

  11. Oh, I should have mentioned, in case you didn't follow the links, Dalton appears to have arrived at his theory and then confirmed it, rather than the "he made observations and came up with a theory".

  12. This comment has been removed by the author.

  13. Aha! Galileo apparently had the wrong value for the acceleration due to gravity, but not the wrong law of accelerated motion.

    An Experiment in Measurement
    Alexandre Koyré
    Proceedings of the American Philosophical Society,
    Vol. 97, No. 2 (Apr. 30, 1953), pp.
    Published by:
    American Philosophical Society
    Stable URL:

  14. This video has a nice modern rendering of one of Galileo's inclined plane experiments.

  15. Let us all look back at the front page hype of the discovery of the element Bicentinuum (126) and the terse retractions in the back pages it was am illusion of echoes in mica or a hoax. I said n the radio this could not be despite the consensus going against deep faith in the scientists and journals. Sometimes, with honesty gut and deeper intuition wins - while I can. still ask you where are the elements beyond 120? Seaborg's extension of chemistry is still a useful science fiction outside the reach of current experiments. Peer review can replicate the biased cherry picked of peers replicated to collapse and control research to generate the then evolved funding. This has delayed discovery more than who in rare creative times are above the struggle of who are the crackpots with no clear paradigm to demonstrate and replicate that. But to work together like luminescencent bacteria with clear replication codes a critical mass has to be reached or there is no deeper light found save by chance as physics with a sound method. If in consensus independent thinkers reach
    The same picture that is evidence of sorts for what could be sound in our intuitions as science. But let us not underestimate science in just its serious survival value, it also touches everyone as entertainment by immortal roles or individual selves in the scripts of living actors.

  16. On reconstructions of Galileo's experimental methods, please see:


    Reenacting Galileo’s Experiments: Rediscovering the Techniques of Seventeenth-Century Science
    Palmieri, Paolo


    supporting multimedia materials

  17. Arun, you gave an informative list of examples. I wonder if in a sense we replicate our conceptual mistakes if they are mistakes. Mersenne with his infinite velocity against mechanism seems, like Einstein to have raised both the paradoxes and solutions for a more general continuous or discrete unified theory. Thus we have p-adics via his number theory musings as part of descrete theory that might explain limitations to a continuous gravitional field. It may not matter if such fields are seen as phase shifts crystalline quantized or not. Could we not imagine a generational possibility of discrete complexity shifts in matter as phases natural in design patterns : electron shells, a nucleus of pairs in shells, and in the question of a black hole singularity some mechanism like a biological cell nucleolus for a third generation and so on?
    If a lowly planarium under electric potential may replicate its structure and coding that it becomes two heads or two tails what is the probability half of people you know are speaking out of the wrong end and how does science distinguish which? The idea of gut feelings or third eyes goes back to when the "brontosaur " with its large mass and walnut brain needed a smaller one to respond rapidly enough for its tail. Can the WMAP not be seen this way, a spherical apotosaur to which fossil reconstructers put the wrong head? Or we can continue to fine tune it all and blame it on our gut bacteria environs as some now begin to do.

  18. Axel,

    I agree and I note that the title of my blogpost is somewhat misleading. It's not so much about irreproducible results per se, as about what causes the prevalence of these irreproducible results: too small samples, post-selection, trying to find results where there really are none, construing patterns in noise. You are right of course that sometimes results may be irreproducible eg because an effect hovers at the limits of current measurement precision or there are unknown noise sources etc, issues that in principle can be addressed properly. Best,


  19. Arun,

    I don't see a connection with regulation. I think that areas that are prone to this kind of problem are those in which small statistical significance is the norm and scientists try to draw 'big' conclusions from 'small' findings. Best,


  20. There are lots of prominent scientists who use their gut feelings to ignore experimental results that they don't agree with.

    Indeed, scientist who only popularize findings that suit them can do very well, in most cases better than level headed colleagues.

    “Perhaps we should encourage our colleagues in the climate research community to no longer submit to, or cite papers in, this journal.” – Michael Mann (referring to the journal Climate Research)

  21. Dear Bee,

    What I really can't figure out is how Allison managed to succeed in a blind test of the Allison effect (as mentioned in Langmuir's talk) I suppose there are things we will never understand. History is unfortunately, a non-reproducible experiment, and so our knowledge will necessarily be constrained.

    What all the other reading that I indiscriminately linked in my various comments led me to yesterday is a renewed appreciation of how difficult the cutting edge of science is.

    Regarding the fudging of data: as a simple example, regarding the modern publications of collections of Galileo's works, here is quote from something I haven't linked to (a talk by Willim Shea, 2007)

    "I'm not focusing on Favaro because he is a singularity, but because he illustrates how a conscientious historian can ride slipshod over evidence because of a philosophical commitment that he is only vaguely aware of, in this case, naïve positivism. So what did Favaro to leave out? The answer is large chunks of three collections of manuscript notes in Galileo's own hand that are bound in some of the 347 volumes of the Galilean Manuscripts in the National Library in Florence. The first of these collections deals with logical treatises and related essays on Aristotelian philosophy, the second with astrological computations, and the third with laboratory notes on experiments with inclined planes and the pendulum. Favaro rejected the first collection because they were "pre-Galilean" and hence could only have been trite scholastic exercises that "poor" young Galileo had to undergo in high school. The second, astrological collection, he set aside because it was, epistemologically speaking, equally "pre-Galilean", and the third, experimental set of notes, he only published in part because he had trouble making sense of them."

    Galileo remained a practicing astrologer throughout his life. In doing astrology, per our modern ideas, Galileo could have been doing nothing but fudging data. We remember Galileo because of what he did get right and its revolutionary nature. And today, historians inadvertently fudge the data on Galileo!

    Here your remark is most apt : The argument that fudging data is okay because great scientists did it and time proved them right is like browsing bibliographies and concluding that in the past everybody was famous.

    Best wishes,

  22. Arun, I too have doubts as to what advancements in technology QM bits can bring. Where is the chaos in QM theory? But I have no angst but comfort that science has not reached some end but a great new beginning with lots of work to do. The old guys made assumptions to shore up calculus as limit theory but after all their inductive method (as probability) now seem sound. So I agree with Sabine 's carefully weighed replies for considerations.
    Newton, between his being our last great astrologer and investigation of Bible codes gave us great physics. So you report Gallileo practiced astrology too. That leaves me with a very wide angst in that for a world where sound inquiry is possible science should not have to bully or beg for our enterprise.

  23. /*Are irreproducible scientific results okay and just business as usual?*/

    If not, what would remain from quantum gravity after then?

  24. /* the “gut feeling about what the answer should be” does mark great scientists, yes. It also marks pseudoscientists and crackpots, just that you don’t find these the history books... */

    The scientists can be distinguished from crackpots easily by the fact, they're getting right. BTW I never used the gut feeling for my predictions, only databasis of experiments and findings. The problem is, the mainstream physicists have memory of tropical fish and they really don't know, what the other people have found. Why? They're not motivated for it, until their money are going. They've their grants, conference travels, easy and safe life without actual testable results, not to say about progress. Why the heck they should hurry with it?

  25. I have the impression that fundamental physics is at a peculiar energy and conceptual threshold beyond which theoretical physics will overwhelmingly dominate over experimental physics. Fundamental physics and theoretical physics would become synonyms and experimental physics would not even be applicable as a concept.

    HEP theoretical physics is already decoupling from experiments in the sense that it doesn't need the constant provision of data to make progress and evolve. It is becoming more and more complex and outspread-ed encompassing the vast space of mathematically consistent theories exploring potential patterns of physical behavior without the need to verify anything via experiments in the strict sense. I guess this tendency will be strengthened in the future and fundamental physics will maintain only a loose connection with experiments via dubious phenomenological models.

    Let me say it another way the: Milner (or Milner like) prizes will replace Nobel prizes for fundamental physics.

    Or in other words, we run out of relevant or marginal couplings and there is nothing left to explore that the vast space of irrelevant ones:-)

    Overall I expect that in the next 100 to 200 years people will understand Nature at a such deep level that would render the current trial and error method pretty much useless.

    So I think similar discussions will become essentially obsolete for fundamental physics.

  26. /* such deep level that would render the current trial and error method pretty much useless */

    We already understand it at such level. But for physicists it's advantageous to ignore this understanding, as it enables them to ask for grants longer. It's visible on the situation, that every just a bit more insightful idea or finding becomes encapsulated with pluralistic ignorance like calcified deposit of tuberculosis. The cold fusion is a typical example, as it would terminate the research in many other areas. The mainstream research is an industry which just wants to continue in its running. But when the solution is reached, then the research will end. What the scientists are supposed to do after then? To ignore such a solution as a single man.

    BTW I didn't invent neither named this approach. I just excerpted it from scientific literature in standard way.

  27. When the results are irreproducible there is a finite chance that fraud, ala Jan Schoen is the cause. The percentage of what is called sloppy that is actually believed false but still published is unknown I suppose. From what I can see it is almost always motivated by money and ego. So I think that when one can see those motivations being a big factor in a result one needs to be careful in scrutiny.

  28. Arun, Allison's "success " is afterall a question of history as causal or casual when we imagine a structure as now that involves orthogonal finite properties. The binary information in a branching tree is the same as the description describing the canopy.

  29. And in rooted centered shell views the valences as discrete are orthogonal like in QM. But tripple valences is in 4 space where elements around 81 should be found this way. Kepler's music of the spheres was the first falsifiable science system as we debate if its higher dimensional analogs can be falsified by thought or experiment. At 8 we in 3D find chaos by which a single instance (Big Bang? )
    may have wide effects. We need a wider arithmetic of singularities.

  30. Hi Sabine,

    I stumbled across this after reading your take on Steinhardt et al’s problems with inflationary cosmology after the Planck results. But since I’m here, I thought I’d take the opportunity to defend myself!

    First, I’m really sorry to make you cry :) Nowhere in Free Radicals do I say that fudging data is ok. I say that it happens, and that for many scientists, it’s not as big a crime as you might think. This correspondence to Nature by Frederick Grinnell makes a good point, I think: “In basic research, intuition... is an important, and perhaps in the end a researcher's best, guide to distinguishing between data and noise.” (

    But that doesn’t at all imply that we have to take irreproducible results at face value. Science is “organized skepticism”, and only a scientific fool would take a single result at face value. I’ve written elsewhere about the problems with poor (or outright dishonest) statistical analysis ( The point I make in the book is that sometimes data comes out clean, and sometimes it doesn’t. If you’ve fudged it (as Galileo and Newton did, and as Einstein did with the gyromagnetic ratio), you’ll get found out. But when you’re found out, no one will be terribly upset - forgiveness is easy to get in science (, and most people are distracted from your dishonesty by their interest in the new, better results.

    The thing that distinguishes great scientists from pseudoscientists and crackpots in this regard is the fact that the great scientists move forward - they are eventually vindicated by better experiments, or they accept the results of others that disprove their original claims, and collaborate to investigate further. Crackpots and pseudoscientists in my experience are immovable, whatever the data say.

    I’m really not saying that “something is okay because a few people in the past succeeded with it”. I’m saying that these things happen, they’re part of the human process of science, that it’s not a linear, robotic process, but is as interesting and flawed and creative as any artistic endeavour - and that we would do well to acknowledge this. It doesn’t hurt the reputation of the arts to have the human side of artists, writers and musicians exposed for public scrutiny (it makes them more interesting), and it won’t hurt science for people to see scientists as they really are.

    So please, do read on!

    Best wishes,


  31. Michael Brooks wrote:

    "This correspondence to Nature by Frederick Grinnell makes a good point, I think: “In basic research, intuition... is an important, and perhaps in the end a researcher's best, guide to distinguishing between data and noise.” (

    Intuition, maybe. Conclusions made inductively from data -- never. Unless there is demonstrable correspndence between a theory and the physical results it predicts, we may as well throw scientific method out the window and resume doing science according to Aristotle.

    " ... it won’t hurt science for people to see scientists as they really are."

    Unlike art, music and literature, science isn't even in part about scientists and their idiosyncrasies. It's about the objective correspondence between abstract theory and physical experiment.


  32. Michael,

    I agree with you of course that science is a human enterprise and mistakes are unavoidable. They have happened and will continue to happen, and I also agree that these are interesting case studies from which we can learn something (which is why I bought the book). I just think, basically, that at least as far as I've read the book you haven't given much consideration to the damage that thoughtlessness (and, let's be honest, inflated egos) do to science, and to the measures we have taken, and should be taking, to avoid mistakes.

    Yes, I'll probably read on - if I'm done with all these referee reports... Best,


  33. Giotis,

    What you lay out is a possible development, but I find it unlikely. Just look around, there is already now a strong opposition to this development. I think that not only will this opposition to 'physics turning philosophy' become even stronger, it will eventually break because I am convinced that sooner or later we'll find evidence for 'something', or maybe we already have, just that we haven't connected the dots. Best,


  34. For example, the quantum gravitists ignore the Tesla scalar waves, Podkletnov/Tajmar experiments, etc. - despite these findings are all applications of quantum gravity at its best. On the other hand, we are flooded with messages like these ones.

  35. This comment has been removed by the author.

  36. Zephir,

    Please refrain from posting random news items in my comment section. I assure you my feeds are working properly. Best,


  37. I just finished teaching a course in the Responsible Conduct of Research given to young graduate students in physics and engineering. We discussed Cold Fusion, Element 118 and the Schön Affair as examples of pathological science and the damage it does to legitimate, evidence-based research.

    This excellent blog post gives another example and highlights a key issue. I wish it would have been available during the course! I will certainly make use of some of the material you mention and that appears in some of the comments above.

    It would be wonderful to have students debate the issue of great scientists and their bad technique but great intuition, or the contrast between honest experimental work and bending the rules to get promoted. Maybe such debates can turn the tide?

  38. I have a relativitic transform that preserves Chirality! Anyone interested?

    Ron Butte


COMMENTS ON THIS BLOG ARE PERMANENTLY CLOSED. You can join the discussion on Patreon.

Note: Only a member of this blog may post a comment.