Why Scientists Shouldn’t Replicate Their Own Work

By Neuroskeptic | February 25, 2017 3:15 pm

Last week, I wrote about a social psychology paper which was retracted after the data turned out to be fraudulent. The sole author on that paper, William Hart, blamed an unnamed graduate student for the misconduct.

Now, more details have emerged about the case. On Tuesday, psychologist Rolf Zwaan blogged about how he was the one who first discovered a problem with Hart’s data, in relation to a different paper. Back in 2015, Zwaan had co-authored a paper reporting a failure to replicate a 2011 study by Hart & Albarracín. During the peer review process, Hart and his colleagues were asked to write a commentary that would appear alongside the paper.

Zwaan reports that Hart’s team submitted a commentary which presented their own succesful replication of the finding in question. However, Zwaan was suspicious of this convenient “replication” and decided to take a look at the raw data. He noticed anomalies and, after some discussion, Hart’s “replication” was removed from the commentary. When the commentary was eventually published, it contained no reference to the problematic replication.

fixing_science

Meanwhile, following an investigation, Hart’s nameless student confessed to manipulating the data in the “replication” and also in other previous studies – Hart’s retracted paper being one of them.

There are a number of lessons we can take from this story but to me, it serves as a reminder that scientists should not be replicating their own work. Replication is a crucial part of science, but “auto-replications” put researchers under great pressure to find a certain result.

For a career-minded scientist, to fail to replicate your own work is worse than never doing the replication at all. First, because replications are less sexy than original studies and usually end up in low ranking journals. But it gets worse – if you publish an effect and then later fail to replicate it, an observer (e.g. someone deciding whether to award you a grant, fellowship, or job) might conclude that you don’t know what you’re doing.

In order to succeed, researchers today are expected to craft and project a “career narrative” in which all of their experiments and papers constitute a beautiful upward arc of progress. It’s very difficult to fit a negative auto-replication into such a tidy and optimistic story. This is why “failed” studies, especially replications, tend to end up unpublished. Or, as in the Hart case, worse happens.

Here’s another way of looking at it: a replication attempt has much in common with peer review, in that they’re both an evaluation of the validity of a scientific claim. Who would want scientists to peer review their own work?

So I wonder if we should “discount” apparently succesful auto-replications: perhaps when performing a meta-analysis, we should include the largest study from each research group and ignore the others. I think we certainly shouldn’t expect scientists to replicate their own work before they can publish it. Rather, we should encourage scientists to perform more independent replications of other peoples’ studies.

CATEGORIZED UNDER: papers, science, select, Top Posts
ADVERTISEMENT
  • Shane O’Mara

    Could you elaborate this sentence a bit? I really am struggling with it: “There are a number of lessons we can take from this story but to me, it serves as a reminder that scientists should not be replicating their own work. Replication is a crucial part of science, but “auto-replications” put researchers under great pressure to find a certain result.”
    It seems to imply that Tim Bliss shouldn’t have tried to replicate LTP until someone else did it, or O’Keefe should have paused after discovering place cells until else went off and did it again, or in my own case, we shouldn’t try to find head direction cells again in nucleus reuniens until someone else does it first. Or think of Baddeley’s articulatory loop, optogenetics or replication samples in a neurogenetics paper, or whatever it happens to be.
    I’m sure this is isn’t what you mean, but it seems to me a matter of good lab practice that new lab members should be easily able to reliably replicate previous lab findings. If they can’t then you’re in real trouble.
    Am I misunderstanding something here? Or are you being domain-dependent? Is this good advice for social psychology, but not physics?

    • practiCalfMRI

      I second Shane’s concern. In my view you may have it exactly backwards, NS. For fMRI studies I would like to see sufficiently large samples that two independent analyses can be performed by the initial investigators. These can be done in tandem or sequentially. But I am getting really tired of someone who finds something in one study and then leaves the huge time & expense for another lab to do the first replication. In physics it’s a basic requirement to do separate observations (note the two independent teams working on dark matter on the LHC, for example) if/when repeated observations on the same samples isn’t feasible. Ideally, of course, you measure separately *and* repeatedly. Then publish.

      • practiCalfMRI

        (Should have said Higgs boson rather than dark matter. Got CMB on the brain. Reading about cerebral microbleeds but keep on interpreting CMB as cosmic microwave background. Damn you, Smoot!)

      • Ricardo Segurado

        I don’t agree at all. If a researcher has enough money to collect two analyses in a split of a large sample… where to start?
        1) There is no statistical justification for splitting the sample, it will always have worse power than analysing in a subset and trying to replicate in the other subset. Ergo it’s unscientific (and IMO unethical) to do so.
        2) How does anyone get funding for a sample twice the size needed for adequate power? I can see that happening over a long-term project. We will collect N first, analyse and publish… then while analysing, collect another N, and at the end analyse N2, and also do a meta-/mega-analysis of N1 and N2.
        3) If the plan is as above, that’s statistically problematic – it’s effectively an interim analysis, and the type I error should be shared between the stages.

        All that said, I actually agree that replication by an independent group is not always feasible (Higgs) or necessary (exploratory / observational work, or experimental work with very large effects). And this gets at the problem, the former get around it by being extremely stringent on statistical/probabilistic criteria, and the latter by seeing what is obvious to the eye – i.e. the effect sizes are so massive, that statistical tests are unnecessary.

        But where it is feasible, it is essential for believing a finding. At least that’s what I learned in my science undergrad.

        As a statistician, I see the problem as arising out of null hypothesis significance testing, and it’s our own success in convincing scientists and paper/grant reviewers that power calcs and p-values were the end-all, that’s to blame.

        • Ricardo Segurado

          I should add this point, in favour of independent replication. I come from a place where I’ve advised hundreds of (many clinical) researchers on their study design and stats.

          People make the same mistakes over and over. (This is what the independent research group hopefully avoids.) Even when a flaw is pointed out to people, they will go on collecting data or analysing data in the same wrong way, mostly with the excuse “I didn’t know how to do it any other way”. Science as rote – physicians particularly bad on this.

          • practiCalfMRI

            Yes, this is true, and it’s a systematic flaw rather than a sampling flaw. In fMRI in particular it is very easy to introduce a systematic flaw that may well replicate. It’s a bad experimental design, e.g. from bad parameter selection or from poor setup (such as head restraint). My point about self-replication assumes people know enough to do a good experiment. Doing one’s own experiment twice in physics is a good way to check for oversights, mis-set parameters, dust in the sample chamber, etc. I guess it’s different in psychology & fMRI.

        • practiCalfMRI

          Shane’s point is made regarding domain specificity, it seems. FMRI has a major replication problem if/when it follows the ways psychology experiments are performed rather than other branches of science. I wish the fMRI crowd good luck! Maybe if the amount of p-hacking etc in your final paragraph can be reduced then the reproducibility issues will reduce. Let’s hope so.

        • practiCalfMRI

          Quick note about cost & split samples. Isn’t this the crux of replication? We’re talking about me paying 2x for it versus me paying 1x and you paying 1x to try to reproduce what I did. Surely the overall cost is similar. And I’m not a statistician but I’ve heard that with very big samples one can tease out relatively weak effects, versus looking for stronger effects in two independent samples. I would have thought we should be at the latter stage today, given the history in fMRI. (Again, domain specificity.)

    • PsyoSkeptic

      The post does seem overly pessimistic but I agree there should be some discounting of any auto-replication until an independent replication comes in. That doesn’t mean auto replications have no value. But their value is greatly enhanced by independent replication.

      Further, I see the article more as a damning of the process. Article reviewers should be more tolerant of imperfection since a supporting replication can have a non-significant result. Grant reviewers should be looking for imperfect applications that recognize the reality of the process. Hiring and promotion committees should see an open and honest publication history including failures to replicate as virtuous and valuable. All of the faults listed are products of a system that seems like it doesn’t understand science at all.

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Good comment. What I’m worried about is researchers conducting and publishing replications of their own claims.

      Carrying out multiple checks to verify that something is really valid before publishing it is not what I mean by “auto-replication”.

      But if you later published a direct replication of your effect, that would be.

      • practiCalfMRI

        Ah, okay, so we’re talking about a temporal separation: original experiment (with, perhaps separate sample groups etc) and a later, fully independent experiment but that is performed by the same group. This is quite different to what I read into the post. So now we’re talking about systematic differences. I can get faster-than-light neutrinos in the machine in my lab, why can’t you? (Loser! 😉 This is getting quite far away from my expertise and entering the dark statistical realm where dragons live, so I think I’ll shut up now.

      • Shane O’Mara

        So let me try and think this through:
        A. if I publish paper 1 claiming we have found head direction cells in nucleus reuniens (we have published this, btw).
        B. and we then publish paper 2 on HD cells in NRe, showing x,y,z about these cells (e.g. inactivation of some other region causes these cells to decohere), then have we/I autoreplicated?
        Is this because we made the original claim (HD cells in NRe) in paper A, and then we replicated the claim in paper B, and showed something about the neural mechanisms or circuits supporting HD cells in NRe?
        I’m still struggling here, to be honest.
        It seems wholly legitimate to me for us to do A followed by B (and subsequently C, D, etc).
        I might still be missing something. Personally, I think it would have been a good thing if some of the social priming stuff had been robustly replicated within the labs that had made the original claims. Take the rapid progress made in understanding LTP as a contrast: most of the early work on the NMDA receptor was done by a single lab (Collingridge and others), but it was robust, and replicable and reliable (I’ve even done the AP5 experiments myself and found they work!). Progress was rapid because there was a good sense of the causal mechanisms, whereas in the social priming work that has caused so much trouble there was never any real attempt to figure out the underlying causal mechanisms. Here, it might be useful to think about a continuum – from observation to correlation to causal relationships to underlying mechanisms. In some domains, we are very close to the underlying mechanisms, in others, not so much.

    • http://nonsignificance.blogspot.com non_sig

      “but it seems to me a matter of good lab practice that new lab members
      should be easily able to reliably replicate previous lab findings.”

      I think the problem is that under the current circumstances (with a lot of pressure to publish novel findings) it may be asked much of the one who published the previous studies to accept and publish the replication if it can’t confirm the previous findings.

      Of course they should be willing to publish it anyways (if everything was done correctly), but one has to assume that a lot of such replications are not published. If confirming replications out of the same lab are published however, the possiblity that they have been endlessly p-hacked, manipulated or are just 1 out of 10 or 20 may be even greater than it is anyways.

  • OWilson

    Once you lift the lid on fake, suddenly everybody recognizes the scams that have been perpetrated on an obtuse public, by elements of an elitist society who wish to preserve power and influence. They cheat a little, then a little more, and then they get arrogantly careless.

    Fake news is bad, fake science is worse.

    Keep up the good work and stay vigilante!

    • Mike Richardson

      “Vigilante?” Freudian slip, perhaps? 😋

      • OWilson

        I’m vigilant against trolls, Mikey.

        The ones, like you, who have nothing on topic to add to a thread. :)

        • http://cosmic.lifeform.org/ Thomas Lee Elifritz

          Here you replicate the evidence of your fakery.

          • OWilson

            From a friend of Mikey. that’s a compliments! :)

          • http://cosmic.lifeform.org/ Thomas Lee Elifritz

            Just as soon as you can point me to some actual science that needs replicating, do let me know.

            Here is one I am particularly interested in replicating.

            https://arxiv.org/abs/1702.04794

            That should be current enough for you.

          • OWilson

            The late Professor Irwin Corey had a very interesting take on that very subject! :)

          • http://cosmic.lifeform.org/ Thomas Lee Elifritz

            I was more interested in YOUR take on the subject, since I have you on the line right now, Dr. Wilson.

            Absent your opinion, I’m afraid I will have to default to Drs. Kruger and Dunning’s take on you.

          • OWilson

            I leave that sort of thing up to the readers.

            (It’s the only form of “peer review” we commenters have)

            Have a nice day! :)

          • http://cosmic.lifeform.org/ Thomas Lee Elifritz

            Actually, no, Dr. Wilson, there is no ‘we’, there is just you, and you speak only for yourself here.

            But I do agree that you are not my peer.

          • OWilson

            Disqus, in their wisdom, have a “peer review” mechanism built in!

            Maybe you missed that? :)

          • http://cosmic.lifeform.org/ Thomas Lee Elifritz

            Science isn’t valued by voting, maybe you missed that too. I ignore the disqus voting.

            Since you claimed to be an arbiter of fake science, I wanted your esteemed opinion. Since I have determined that you are a fake scientist, your opinion is no longer valued. I was looking for your opinion for its entertainment value only.

          • OWilson

            Sorry to disappoint you!

            Maybe you should try Lady Ga Ga? :)

        • Mike Richardson

          Better be vigilant in front of a mirror, then. 😋

    • http://cosmic.lifeform.org/ Thomas Lee Elifritz

      Through the use of scientific methods, fake new, fake science and fake scientists are easy to spot. You’re a fake, You’re easy to spot.

      • OWilson

        Have a nice day!

  • http://www.mazepath.com/uncleal/qz4.htm Uncle Al

    I do not criticize a nascent psychologist for so quickly embracing the functional essence of his vocation. I condemn him for being so easily caught.

    Jan 18, 2017 – Ponzi scheme fraudster Bernie Madoff is still honing his business skills in prison, where sources say he has started a hot chocolate empire.” Genius never sleeps, diversity always weeps.

  • Nick

    It seems ironic that the whole point of the four-study article format of JPSP and other “prestigious” journals is (if I have understood correctly) to show that an effect is replicable — indeed, replicable under several circumstances and conditions — by the original authors.

    Yet, apparently, such multiple-study articles haven’t helped much in producing reliable effects. It’s almost as if multiple observations of similar experiments with the same investigators cannot be treated as if they were independent samples drawn from the population of all possible experiments. Who would have thought it?

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Indeed, and what room is there in that format for a negative result? It would ruin the narrative. So instead we see four positive results with 0.01 < p < 0.05.

      • http://nonsignificance.blogspot.com non_sig

        What I find surprising is: A lot of people tell you (well, me, of course I don’t know about other places) how important the narrative is. But they don’t seem to see anything wrong with it?

  • http://nonsignificance.blogspot.com non_sig

    “For a career-minded scientist, to fail to replicate your own work is worse than never doing the replication at all.”

    I agree!! I hope this isn’t bothering, but this is my experience with “self”-replications:

    When I started my PhD my supervisor wanted me to (essentially) replicate two (different, but related) effects (in different populations). I did preliminary studies and found neither effect.

    The first effect to be replicated wasn’t my supervisors, but the second is.

    My supervisors’ behavior towards both of these effects had been very different.

    Because the project I was working in was a (very loose) cooperation with other labs we meet (once a year) to discuss progress etc. So I had to report the failure of replication (or of finding the correct paradigm to replicate) to them. But I only was allowed to tell them for the first effect, the one which isn’t my supervisors. Of course at that point I didn’t know that it was possibly the effect itself that made it difficult to replicate and thought that I did mistakes, so at first it was difficult to tell them (I was a new PhD-student at that time and not on twitter…). However, it turned out that they (the other collaborators) had tried (either currently or in the past) to replicate that effect as well and had been either completely unsuccessful or found very small effects. My supervisor accepted that then and didn’t focus on it very much anymore. I still did all the experiments (after the preliminary studies, with rare patient groups) but in the ends it’s a file-drawer study. My supervisor didn’t want to publish it as is, but didn’t insist that there is/has to be an effect either.

    With the second effect that was very different. It is my supervisors most liked effect and s/he wanted me to do everything to find it (whether it’s there or not). I wasn’t allowed to discuss that with the other collaborators either (though I know that they did not study that effect) or report anything “negative” about it anywhere. After (a lot of!) preliminary studies had been unsuccessful I had to do the same stuff in the scanner (and with patients) anyways (because of time pressure). My supervisor thought that there might be brain correlates even though there was no behavioral effect. (Well, everyone told me, that there ARE ALWAYS brain correlates. While it is of course true that different experiences (i.e. different conditions) have to be represented differently in the brain (because otherwise it had to be exactly the same experience) I don’t think that you’ll ALWAYS find some real brain correlates).

    Anyways, of course there was no behavioral effect in the MRI-study either. However I had to search for the mystical brain correlates for forever (until I now left, because I already did everything (p-hacking) and there’s nothing but my supervisor wants me to keep searching).

    I know that at least some of the other studies of that effect (of my supervisor) contain some (to say it carefully) mis-reporting (and lots and lots of p-hacking of course) and (at least one) statistical inconsistencies. However, my supervisor doesn’t care about that. Just about the fact that I couldn’t deliver the effect s/he asked for (and has shown so many times before…. :/ ).

    Interestingly the person who first reported the other effect did replications of his effect as well. I was at a conference in 2015 where he told (in a talk) that it is unclear under which circumstances the effect shows and under which it doesn’t show. Furthermore he said it is possible that the effect is “there” and one moment in a particular person and not in the next. To me this means, the effect itself is pretty uncertain, but he basically said that, and I think that’s great. My non-replication of that effect of course is not published, because my supervisor doesn’t publish unsuccessful stuff…

    Sorry for writing so much, it’s just my experience, but I agree that it is “difficult” to replicate own stuff (or that of the supervisor). I would not take a PhD-position in such a project again. Unfortunately I didn’t know it then.

  • Brenton Wiernik

    Regarding meta-analyses, I am always loathe to discard data based on “study quality” or similar classification. The best estimate of the true effect + lab effects is the meta-analysis of the studies by the original lab. It is better to control for lab effects by examining original lab vs. independent replication as a moderator variable, rather than by discarding all but one study from the original lab (thereby increasing the sampling error).

  • Brad Wyble

    I disagree that you should take an example of fraud as an argument against self-replications. In balance, self replication is enormously beneficial in allowing a researcher to ensure that something discovered through a post-hoc analysis is in fact a real thing. I agree that self-replications don’t carry the same weight as independent ones, but that doesn’t make them worthless.

    Moreover, if you are willing to cite that fraudulent researchers make self-replication worthless, then you must make the same argument against pre-registration as well. It is easy to run a study, find results, “pre-reg” the analysis plan, then claim that the effect was discovered after the pre-reg was created.

    None of these methods are fraud-proof. If someone is willing to fabricate data, adjust time stamps, etc, there is no simple technique that prevents this. Rather, we do what we have been doing, which is keep an eye out of things that look dodgy and investigate.

  • RMacCoun

    Need to persuade yourself before persuading a reviewer. So with new tools for finding p-hacking, why discourage self-replication? If we don’t trust your 2nd study, why would we believe your 1st study?

  • Pingback: Why Scientists Shouldn't Replicate Their Own Work | Sigmaceutical.com()

  • Pingback: [BLOG] Some Sunday links | A Bit More Detail()

  • http://multiplecomparisons.blogspot.com/ Chris Filo Gorgolewski

    Of course independent replications are more valuable than auto replications, but so few replications are currently being published we cannot completely discard all auto replications. They are yet another piece of evidence, but possibly with less weight than an independent replication.

    It’s also worth acknowledging that “independence” of a replication is a spectrum. Researchers attempting to replicate an effect can be influenced or inspired by the original scientists. There could be conflicts of interest. It’s all quite complex, but we should take all pieces of evidence available and acknowledge biases.

  • Pingback: Weekend reads: How to speed up peer review; the whipsaw of science news headlines; data-sharing stance sparks resignation request - Retraction Watch at Retraction Watch()

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar
+