When Replication Goes Bad

By Neuroskeptic | October 20, 2012 7:45 am

How to ensure that results in psychology (and other fields) are replicated has become a popular topic of discussion recently. There’s no doubt that many results fail to replicate, and also, that people don’t even try to replicate findings as much as they should.

Yet psychologist Gregory Francis warns that replication per se is not always a good thing: Publication bias and the failure of replication in experimental psychology

Among experimental psychologists, successful replication enhances belief in a finding, while a failure to replicate is often interpreted to mean that one of the experiments is flawed. This view is wrong.

Because experimental psychology uses statistics, empirical findings should appear with predictable probabilities. In a misguided effort to demonstrate successful replication of empirical findings and avoid failures to replicate, experimental psychologists sometimes report too many positive results.

Rather than strengthen confidence in an effect, too much successful replication actually indicates publication bias, which invalidates entire sets of experimental findings…

Even populations with strong effects should have some experiments that do not reject the null hypothesis. Such null findings should not be interpreted as failures to replicate, because if the experiments are run properly and reported fully, such nonsignificant
findings are an expected outcome of random sampling… If there are not enough null findings in a set of moderately powered experiments, the experiments were either not run properly or not fully reported. If experiments are not run properly or not reported fully, there is no reason to believe the reported effect is real.

Say you took a pack of playing cards and removed half the red cards. Your pack would now be 2/3rds black, so if you took a random sample of cards, say a poker hand of 5 cards, then you’d expect more blacks than reds (a significant ‘effect’ of color). But you’d still expect some reds, and some random hands would in fact be entirely red, just by chance. If someone claimed to have drawn 10 random hands and they’d all been mainly black, that would be implausible – “too good”.

Francis’s approach is a bit like Uri Simonsohn’s method for detecting fraudulent data – they both work on the principle that “If it’s too good to be true, it’s probably false” – but they differ in their specifics, and I believe that we should not conflate fraud with publication bias… so let’s not get carried away with the parallels.

Earlier this year, Francis wrote a critical letter about a paper published in PNAS purporting to show that wealthier Americans are less ethical. He argued that the paper’s results were “unbelievable” – it reported on the results of seven separate experiments, all of which showed a small, but significant, effect in favour of the hypothesis.

Even if rich people really were meaner, Francis said, the chance of 7/7 experiments being positive is very low: just by chance, you’d expect some of them to show no difference (given that the size of the difference in those seven was low, with a lot of overlap between the groups). Francis suggested that the authors may have run more than seven experiments, and only published the positive ones; the authors denied this in their Letter.

Anyway, in the new paper, Francis expands on this approach in much more detail, drawing from this 2007 paper, and suggests a Bayesian approach that might help mitigate the problem.

ResearchBlogging.orgFrancis G (2012). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin and Review PMID: 23055145

CATEGORIZED UNDER: FixingScience, methods, papers, statistics
  • http://generallythinking.com Warren

    Not really a problem with replication though, just good old fashioned publication bias again.

  • Nitpicker

    I totally agree with him and have been saying this for years. It doesn't matter if there are one or two failed replications. You have to replicate many times to get a reliable estimate of the replication distribution. This is what the vocal proponents of replication who recently have made the news don't seem to understand.

    In an ideal world you could have multiple straight replications using identical methods but it is probably fine to accumulate evidence from both straight and conceptual replications. And of course all findings must be published, not only the successful replications.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Nitpicker: I think there is a genuine problem with people not doing enough replications, but like you & Francis I'm not convinced that this is the big problem. Actually I am going to write another post on this because Francis mentions (and rejects) my favored idea of preregistration of all studies; I'll argue why I disagree.

  • Anonymous

    @NS I'd also like to know your opinion on Simonsohn's disagreement with GFs method (mainly about false positives and problems with post-hoc power in conceptual replications), see the OSF mailing list for specifics.

    Also, @Nitpicker I believe the most vocal proponents of replication do understand this problem. I don't know who you are referring to, but the OSF mailing list discussed GFs papers. And of course assessing publication bias is an important goal of the Reproducibility Project, so it's not like they're blind on that eye and just going for more replications.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    I haven't read Simonsohn's comments yet, I will do though.

    Re: OSF/Reproducibility Project, I'm sure we're all aware of all of the issues broadly speaking. There are differences of opinion on which are the most urgent. But that's fine, everyone has their own particular interests.

  • http://www1.psych.purdue.edu/~gfrancis/pubs.htm Greg Francis

    A pre-print of Simonsohn's comments can be found at his web site

    He had previously published a slightly longer document as a “working paper”. This document is no longer available, but the pre-print has essentially the same arguments. I wrote a rebuttal of the working paper at


    which pretty much also applies to the pre-print. As you might imagine, I think Simonsohn's arguments are entirely without merit.

  • http://egtheory.wordpress.com/ Artem Kaznatcheev

    I feel like a lot of these issues could be resolved if the culture could be pushed from doing cute isolated experiments that are only interesting if their results come out in a particular way to doing measurements. If there were experiments that are well-designed, well-founded, and where the outcome doesn't matter and the goal is just to be as accurate as possible in the measurement. This is the case in a large part of physics, we expect some result to be more probably, but the experiment is set up such that either result is publishable just due to the good nature of the design. In this setting replication makes sense, because you can do NOVEL science while replication. I.e. your goal is to measure the same thing as somebody else did, but to a degree to accuracy that they could not achieve. As opposed to psychology, where you replicate cute experiments either for the sake of mere verification, or in hopes of producing a different result.

  • omg

    Psychology is nasty not cute. Economists use govt data to support or refute policies, psychology research keep these beasts accountable.

    I don't see how this is a replication failure normalizing data yeh right. It's not on a normal curve, it's either a hypothesis yes or no. If it isn't foolproof to replicate then there's something wrong with the experiment.

  • Nitpicker

    @Anon: I was not referring to any project or paper but to informal comments various people (staunch supporters of the recent replication wave) have made to me. I won't name anyone in particular because that's entirely unnecessary. Often there appears to be this implicit assumption that “an effect doesn't replicate” and that this alone is a strong reason to disregard it. This is nonsense.

    But as far as the reproducibility project and similar endeavors are concerned I completely agree that they are important and necessary. We need better ways to accumulate evidence about replication.

  • http://www.blogger.com/profile/17686665037607780553 RAJ

    Your comment on publication bias is spot on as I recently discovered. I submitted two papers on environmental risk factors in autism to the Journal of Autism and Developmental Disorders. Both papers were sent to peer review. When I discovered that I would have to pay $3,500 to retain copyright ownership I withdrew both papers from further consideration while they were in peer review. I sent both papers to the UK Journal ‘Autism’ because under UK copyright laws I would still have to pay $3,500 to make the papers open access but I would still retain copyright ownership and would be permitted to publish the full text after a 12 month waiting period. Neither paper was sent to peer review by the ‘Autism’ editor because they were not consistent with the editorial board’s exclusive focus on genetic risk factors in autism. The ‘Autism’ editorial board consists almost entirely of a who’s who in Behavioral Genetics. I’m not complaining about the editor’s decision that’s their right but I found it interesting that they didn’t consider interesting enough to send the papers to peer review while the editors of the JADD did.

  • Ivana Fulli MD


    Without anyway to know how smart is your submitted paper to two journals, there is no way to know which policy is to blame for refusing your paper.

    Reviewing submitted papers is time consuming -very much so-and not only for the reviewers and the editors -even the editor in chief on occasions.

    On the other hand, even JK Rowlings and Marcel Proust( a man who remembered adult life as well as JK R remembered adolescence and children's life) have been rejected by editors and what we need most from autism autism are new ideas.

    Have you seen the Archives's paper on SSRI prescription in pregnant women and autism ?

  • Ivana Fulli MD


    Please read in my previous comment to you :

    What we need most from autism researchers is new ideas in place of what we need most from autism autism…

    Sorry about that.

    And many thanks for telling many about the different policies against paywalls.

  • http://www.blogger.com/profile/17412168482569793996 Eric Charles

    Giving the PNAS authors the benefit of the doubt, too many positive results could just be publication bias. They and their friends might really have only done 7 studies, all of which were positive. But that doesn't mean there are not 50 papers on the same subject without significant results sitting in people's file drawers.

    The best paper on this is still Meehl's (1990) paper on “Why summaries of research on psychological theories are often uninterpretable”. The paper covers a ton of ground. In the end, he lays out some pretty generous numbers regarding the publication process and concludes that with standard power levels 84% of published papers should support a theory if the theory is true. That seems pretty good, until he reruns the numbers and finds that 74% of published papers should support a theory that is not true. I'll try to summarize that paper, it is supper relevant to all the current worries.

  • omg

    That's a publication bias. Science on the otherhand has a method section so it can be replicated. How things get published is a political matter. I blame sophisticated analyses. You can prove anything by playing with numbers these days and it's certainly abused in economic papers.

    Science has a published method section so the same shouldn't be happening. The trend on why it's happening could be a socio-cultural issue, too many people are busy paying off mortgages, redo the analyses many times, random cohort is so yesterday, I ain't no Einstein I need to survive.

    How do you have a marker to assess academic credibility if not for journals? That's the multi-billion dollar question govts are probably asking. Lovely if they had more academic think tanks tackling these problems and contributing to policies.

  • Anonymous

    What if a result is replicated? What does that mean to you? I am curious.

    Replication is important but do not be mislead by its presence or absence.

  • http://www.blogger.com/profile/06832177812057826894 pj

    Without bothering to read the criticism in detail, surely this is just naive Bayesianism (multiplying together a load of low probabilities and then making a big fuss about it)?

    Because surely we'd expect at least one paper showing seven small positive effects to be sitting in the literature, just by chance and the vast number of studies conducted.

    It is very unlikely that any one person would win the lottery, yet they do.

  • http://www.blogger.com/profile/00782852952891496899 neuroaholic

    I worry that we as a group tend to over analyse and repeatedly run into an epidemiological problem. Like a fox running after its own tail, there is no end to this back and forth argument on which findings are credible. Findings replicated by different scientist, in different countries, with different tools is entirely important. I also appreciate the necessity of reporting negative findings, but in practice, scientific findings that are already complex can become difficult to read or interpret when interspersed with negative findings.

    Over all, i feel a gap in the scientific episteme, and over zealous statistics, that has been good for delineating these problems, should be just as over zealous in giving practical solutions for filling this gap.

  • omg

    I just don't think it's in the domain of statistics. I'm of the belief inherent flaws in truth discursions should be resolved practically.

    i.e. how science can contribute practically in people related issues. Sentient beings evolve, we're not numbers, you can't observe the galaxies or make space ships using people. People and societies are malleable, we change, we adapt, we adjust, we evolve, so what rocks the boat scientifically in these fields should benefit masses of people practically.

    Having said that, going back to the sadistic era of MK Ultra, NAZI medical experiments and such and such should be avoided through technological applications. Evolving algorithms, AI models, maybe we have a few decades to go before this happens..

  • Ivana Fulli MD


    //there is a genuine problem with people not doing enough replications, (…)I'm not convinced that this is the big problem//

    I agree with you and think -if a lay person might venture both- that part of your big problem in might also be the lack of great experiments getting a lot of people in the field excited about and want to work on and replicate.

    Are those experiments very rare or drowned in a huge garbage of poor quality published experiments.

    A French hematologist, for example, got a Nobel prize- and before that an academic career- mostly for a paper published in French(HLA :Human Leucocytes Antigen system). Still,fellow researchers were quick to spot and check the truth of a great discovery by a man who didn’t achieved much when he worked in a USA lab and was not known as a bright young thing to read about with passion.


    It is the same in psychology:

    Great ideas had been known to be replicated soon enough -a few decades ago.

    NB: Mirror replications are a way to feel and be innovative-sort of- when you work on somebody else discovery when and I read not enough of them.

    Is not mirror replication easier to conceive in psychology research than in medical research?



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar