Replication Alone Is Not Enough

By Neuroskeptic | August 25, 2012 4:16 pm

Psychology has lately been hit by high-profile fraud scandals, and broader concerns over questionable research practices. Now the Society for Personality and Social Psychology (SPSP) has released a statement on “Responsible Conduct”, and a task force has produced a report.

This is a start, and the SPSP is to be commended for facing up these problems (which affect many other fields) relatively early. However, neither of their documents contains much meat in my view.

Point One on the task force report is that “Replication is the key to building our science” and they suggest a “web site for depositing replications and failures to replicate” – but don’t mention that various enterprising researchers have already made one. Nor do they tip their hats to the Open Science Initiative addressing just this issue. This makes me worried that they’re planning to reinvent the wheel.

More fundamentally I disagree that replication is key to psychology or any field. Our goal should be replicability. Failure to replicate findings is a symptom of problems with those original findings, rather than being a problem in and of itself. Good results replicate; we want better results to be published.

In other words, we should strike at the root cause of invalid research, namely, the perverse incentives towards publishing as many eye-catching positive results with p values below 0.05 as possible by any means necessary. P-value fishing, selective reporting, post-hoc “prior hypotheses” and other questionable practices are a large part of what make unreplicable results.

We should encourage replication, but it’s no panacea.

An overemphasis on replication, without addressing the incentives, could actually harm science. It could lead to scientists spending all their time worrying about the political drama of who’s replicating who and why, and which questionable practices they can use to replicate their friends’ data – rather than actually doing science.

This is why we shouldn’t be satisfied with any reform effort that puts replication before replicability. If you can fudge a result, you can fudge the data a replication. How to fight questionable practices is another question but I’ve proposed reforms that I think would work, namely pre-registration of hypotheses, methods, and statistical analyses. Others have their own ideas.

A lesson from clinical medicine here. Clinical trials of new drugs adopted pre-registration, but only after they tried replication and it didn’t work. Pharmaceutical regulators have long required multiple demonstrations of drug efficacy. One trial was not enough. Sounds good – but the problem was that drug companies just did lots of trials and analyses, picked the positive ones, and used them.

So in summary: replication is important, and we don’t do enough of it, but replication alone is not enough to fix psychology.

CATEGORIZED UNDER: FixingScience, methods, science, statistics
  • http://www.blogger.com/profile/07811309183398223358 Zen Faulkes

    Excellent stuff. Not much more to add at the moment.

  • http://www.blogger.com/profile/03391083965108348139 laurenrmeyer

    An often overlooked but important distinction (between replication and replicability)!

  • http://www.blogger.com/profile/13932181553932615830 Wintz

    Good stuff. I'm surprised I didn't come across your pre-registration idea sooner (considering I regularly read this blog). Especially as I proposed something similar here

  • Anonymous

    Completely agree with your post. Replication is not the problem – it's the crazy emphasis academia places on publishing for the sake of publishing, whether what we are doing says anything useful/applicable or not.

  • http://www.blogger.com/profile/17412168482569793996 Eric Charles

    This is all such a bizarre conversation. Do you know why chemists don't worry about things like this? Because the most important results, published in the bestest, best, best chemistry journals, are instantly replicated in hundreds of labs. The results are important exactly because people want to do that thing they read about in the article. There are obvious exceptions for big-big science that takes huge teams to accomplish… but in general other sciences think results are important under conditions in which people will almost instantly notice if it does not work. Under those conditions a distinction between “replicable” and “replicated” is silly.

  • http://www.blogger.com/profile/03391083965108348139 laurenrmeyer

    If only psychology was as easy/ cheap/ fast to do as chemistry apparently is…

  • http://www.nhsilbert.net Noah Motion

    I think I get the distinction between replication and replicability, but I'm not sure I agree that the latter is more important. How do we know something is replicable (i.e., has the property of replicability) if people aren't replicating it (or trying and failing)?

    There is a big a cultural difference between behavioral science (psych, experimental linguistics, etc…) and “hard” science. As Eric Charles notes, this discussion sounds bizarre from the perspective of chemistry (and physics, I assume), where replication is required prior to general acceptance of any results of import.

    On the other hand, as Lauren Meyer notes, straight replication in behavioral science isn't particularly easy or fast. And as is widely recognized, there really isn't any incentive to replicate in behavioral fields. And, as noted in the post, it's easy to imagine replication-publication just feeding into the same screwy system we already have.

    I don't know what the answer is, but the pre-registration system seems like it could be gamed fairly easily. How is adherence to pre-registered methods enforced, for example? It's all too easy to imagine people proposing one set of methods, finding nothing of interest, and fiddling around (in much the same way it seems people do already) to get “good” results. While guaranteed publication of pre-approved hypotheses and methods helps, there's would still be an incentive to publish “real” results rather than null results, I would think.

    To be clear, I don't know what the answer is. And I appreciate that you're thinking and writing about these issues. They're worth discussing openly, even if they're difficult (or impossible) to solve.

  • http://www.blogger.com/profile/16203083806436919715 Bernard Carroll

    Would you please clarify the difference between replication and replicability? Operationally, how would we demonstrate the latter without the former?

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Eric: I don't think it's bizarre. Because the fundamentals of chemistry (i.e. physics) are so well understood, most chemists nowadays are essentially engineers, their discoveries are more like inventions than observations. They invent a new way of synthesizing X or adding group Y. So as you say, everyone can then try to use that invention and if it doesn't work they'll say so. But psychology isn't like that (neither is, say, evolutionary biology).

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Re: Replicability, that's the overall reliability of findings in a particular field. It's not very useful to talk about the replicability of a particular paper: that would just mean, is it true or not? But at the level of fields, some fields seem to suffer many more failures to replicate than others.

    The only way to measure replicability is through replication attempts, but once you've done that, you need to try and improve replicability and that will take more than just replications.

  • https://sites.google.com/site/mbarnettcowan/ Michael Barnett-Cowan

    I'd like to point out that in addition to attempting to replicate a large sample of Psychology studies (through pre-registration), The Reproducibility Project is also investigating factors such as replication power, study design, and the original study’s sample and effect sizes as predictors of reproducibility. Each of these has a distinct implication for interventions to improve reproducibility.

  • http://www.blogger.com/profile/16939466411432555103 octern

    I also want to stump for the Reproducibility Project. There's good reason to be concerned that psychology is publishing unreproducible findings, but we haven't demonstrated it in a systematic, empirical fashion. We should find out the actual extent of the problem (and, if possible, which subfields and study types are most affected) before we start tossing around solutions.

    The Reproducibility Project also avoids many problems with questionable research practices by 1) doing pre-trial registration and 2) requiring that contributors replicate the analysis used in the original study and not just the methods. The original author could have inflated their alpha by fiddling with multiple analyses, but the replication is required to process and analyze the data only once, using the analytic plan that was actually published.

    Regardless, I strongly agree that we also need to reform the “p<.05 = publication" problem. Even if it turns out that there aren't pervasive replicability problems, we know for certain that a lot of valuable findings are getting file-drawered.

  • http://www.blogger.com/profile/16939466411432555103 octern

    @Eric Psychology findings generally take a lot more time and money to replicate than chemistry ones (Human subjects protections mean that even the simplest findings require hours of work and weeks of waiting — at best — before we can even start a replication). More importantly, most findings in the social sciences aren't directly useful. The purpose is to demonstrate a theory on which other applications can be built.

    That kind of replication does happen, but when it fails, it's impossible to tell whether it reflects on the original finding or on the way the new researchers tried to extend it. I believe the root of many of our problems is the default assumption that the new authors must have messed up, meaning that their negative finding is of no interest.

  • http://www.blogger.com/profile/10454997825101692494 Heather Bunting

    I find this a bit odd – primarily due to the diverse nature of psychology as a field. If what we are studying is human behaviour, surely replicability and/or replication is neither here nor there.

    For example, if one is a social psychologist looking at how individuals understand a particular phenomena, replicability of the research is largely redundant. In the case of research investigating human experience, no two individuals are going to have had, or perceive to have had exactly the same experience as the other. The experiences are contextually, historically and culturally bound and coloured by individual experience of the social world.

    Maybe the replication/replicability of research makes major paradigmatic assumptions about the nature of research, which tends to be generalised to psychology as a discipline, rather than treating it as a field made up of numerous inter-relating sub-disciplines.

    I think the problem lies with the underlying implication that the 'aim' of psychology is essentially to define, measure and ultimately predict aspects of human behaviour.

  • http://www.blogger.com/profile/02537151821869153861 Andrew Oh-Willeke

    To echo Eric Charles a bit, I would have to agree that meaningful, consensus operational definitions that have some basis in an underlying fundamental psychological reality – as many other academic disciplines do – is a goal that remains elusive in psychology.

    An important reason that this is the case is that a lot of research is done with lazy methods – convenience samples of “WEIRD” college students, assuming that survey responses reflect the objective reality, artificial laboratory methods that an incapable of handling levels of social complexity that elementary school kids can grok easily.

    Replication and replicability aren't so terribly important when the hole discipline is lost in the wilderness and using methods that have far too low resolution to capture what is going on in a complex world.

    Psychology needs more researchers who already really, deeply, understand people who invest a great deal of time in a more rich analysis and a better conceptual framework in a way based on observations of an ordinary mix of people, in real life contexts, based upon tools more objective than paper and pencil (or internet) surveys.

    For example, rather than striving to take a reductionist minimal number of diagnostic symptoms approach to DSM diagnosis, more research should be devoted to identifying non-diagnostic symptoms that are part of the syndrome and might help us to get the bottom of the fundamental etiology of a DSM condition and would improve the level of consensus between professionals in the field about a diagnosis. Clinical psychologists, who all too often try to make a diagnosis based on conversations with one or two people in a single office visit without really seeing the condition in act in the patient are just as guilty of lazy methods as the academic researchers.

    The pressure to make replicable results, even if those results are the product of such poor measurement tools that they are like four or sixteen pixel photographs, can be harmful too.

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT
Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »