The Replication Crisis: Response to Lieberman

By Neuroskeptic | August 31, 2014 3:57 pm

In a long and interesting article over at Edge, social neuroscientist Matthew Lieberman discusses (amongst other things) the ‘replication crisis’ in his field. Much of what he says will be of interest to regular readers of this blog.

Lieberman notes that there has been a lot of controversy over ‘embodied cognition‘ and social priming research. For instance,

There are studies suggesting that washing your hands can affect your sense of being moral or immoral, and so on. These studies are very interesting. They’re very counter-intuitive, which I think leads lots of people to wonder whether or not they’re legitimate.

Lately there was a particular, well-publicized case of a non-replication of one of these counter-intuitive effects, and Lieberman discusses this, but I think the issue is a general one. Here’s what Lieberman says (emphasis mine) about the effort to try and replicate these findings:

I do have some issues with the process of selecting who’s going to do the replications — what their qualifications are for doing those things, have they done successful work in that area previously — because if they haven’t shown that they can successfully get other priming effects, or other embodied cognition effects, how do I know that they can do this? I wouldn’t go and try to do chemistry. I don’t know anything about doing chemistry. There are issues like that.

This argument – which Lieberman is by no means alone in making – might be called the Harry Potter Theory of social psychology. On this model, some effects are real but are difficult to get to work in an experiment (‘spells’). Some people (‘wizards’) have the knack of getting spells to work. Other researchers (‘muggles’) just can’t do it. So if a muggle fails to cast a spell, that’s not evidence against the spell working. What else would you expect? They’re a muggle!

Only if a wizard fails to replicate a spell, should we be worried about the reliability of that particular piece of magic. Accordingly, muggles should not even be trying to test whether any spells work. Wizards can safely ignore muggles.

potter_psychologyLieberman would probably object at this point that he’s not saying that some researchers should be banned from the replication process. Rather, he might say, he is only emphasizing the fact that some scientists are more qualified than others for particular tasks.

If so, fair enough, but all I’m saying is that there’s something odd about the idea that ones qualifications should include a track record in finding positive results in the field in question. That seems to be putting the cart before the horse. I agree that replicators should have the necessary technical skills, but I question whether generating positive (as opposed to negative) results can be used as a proxy for being skilled.

That would make sense if we assume that our basic psychological theory (e.g. of social priming) is valid, and therefore that at least some of our effects are real and replicable. If we grant that, then yes, we could assume that people who fail to find effects, must be doing it wrong. (If magic exists, then non-wizards are muggles.)

But can we assume that? Isn’t that, in fact, the issue under debate in many cases?

  • FannySampson54

    Now wait, this can’t be the same Matt Lieberman whose absurdly high
    .88 correlation between brain activity and a psychological trait was immortalized in web art like http://thestutteringbrain.blogspot.com/2009/01/voodoo-correlation-in-social-sciences.html

    and even inspired designer mugs:

    http://www.zazzle.com/vinny_voodoo_mug-168577234785439037

    can it be?

    The guy who never got around to using cross-validation to find out what the true values of his big old whopper correlations should have been?

    And surely it can’t be the same Lieberman who repeatedly bolloxed up his fMRI significance calculations by confusing simulated type 1 error rates derived from 2D slices versus 3D volumes, thus declaring many nonsignificant results to be significant (finally acknowledged, more or less, in Lieberman, M. D., & Cunningham, W. A. (2009). Type I and Type II error concerns in fMRI research: re-balancing the scale. Social cognitive and affective neuroscience)?

    I am sure_that_ guy can’t possibly be lecturing the scientific
    community on the competence requirements for engaging in research…

    • matus

      True magicians don’t need methods and statistics. And who says otherwise is just a jealous muggle.

  • Susan Wright

    Replication is made difficult when the original experiment is not described (or even conducted) with scientific precision.

  • Anna O

    Brilliant analogy with the world of Harry Potter.

    It is worrisome that any neuroscientist, let alone someone who is often in the spotlight like Lieberman, would publicize such an opinion and not be lambasted for it immediately. This kind of attitude maintains the severely skewed status quo that is the reality of neuroscience research where the power, influence and money is centered on a paltry number of institutions the world over.

    And what exactly is he suggesting? Only qualified psychology and neuroscience labs/researchers ever attempt to replicate studies in the field. Why would anyone else bother? It is usually an arduous and thankless endeavor. Getting funding for replication studies is not exactly an easy task. And the minimum prerequisite that grant reviewers have to typically gauge is whether the applicant is qualified to do the study in question.

    Lieberman should specify what he means by “qualifications” here because he is obviously referring to something apart from educational qualifications. To suggest that one can feel free to ignore any result that goes against one’s findings as long as it heralds from a lab that appears less qualified (or less illustrious?) than one’s own is not only against the spirit of scientific exchange, it is also profoundly elitist.

  • Nick

    The Harry Potter analogy seems particularly appropriate (although perhaps David Blaine or Derren Brown might be even better examples) because of the way in which researchers flip-flop between /a/ claims (usually in the discussion section of the article, the last sentence of the abstract, and perhaps somewhere in the funding proposal for the next round of studies) that they have discovered a large universal effect with huge implications for public policy, educational practice, therapy, etc, and /b/ arguing that unfortunately only they and a few initiates can reproduce these effects which are in fact highly elusive. I can’t help being reminded of when Uri Geller went on Johnny Carson’s show and talked about this “gift” that God had given him, which apparently didn’t extend to bending spoons provided by anyone apart from Geller himself.

  • John W Stelling

    So, by the same argument TCM should only be tested by Chinese Acupuncturists (who never get negative results) etc ….
    There was me thinking that the strength of science was to try to disprove your findings and see how the evidence holds up.

  • Michael Kovari

    brilliant post

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Thanks!

  • Morsel

    How about trying a chefs metaphor instead of a wizards one? Following a methods section is a lot like following a recipe. In my humble experience, even well-written recipes can be botched by the cook. Do I believe the Emeril Lagasse’s or Paula Deen’s recipe is wrong and that the dish never actually existed? Well, not likely. I think that there are several aspects of being a seasoned experimenter that are difficult to verbalize. And even when verbalized, it may not translate precisely into how another cook may implement the recipe. This has nothing to do with Lieberman in particular, it’s true for the sciences in general. The quality of the cook matters in how the dish is prepared, and how well a recipe can be followed. If taking intro chem lab was a sign of anything, it’s that smart undergrads often can’t seem to replicate really well established procedures even when provided with detailed instructions. This doesn’t mean that these experiments are ‘false’ of course, just that it takes practice to hone some skills. This is not to say that replication is not important- obviously it is. But there ought to be principles about how such replication proceeds. Rather than arguing values about ‘it replicates or replicates not’, it may more useful perhaps to identify what these principles are, how many replication attempts should be required before deeming a finding improbable, etc.

  • Aaron Goetz

    Related to Susan Wright’s comment, if Researcher B cannot replicate Researcher A’s effect or relationship, then either (i) the effect or relationship is tenuous, volatile, or nonexistent, or (ii) Researcher A’s methods were poorly described. Most of the time, it’s the former. And both reasons places more fault on Researcher A than Researcher B. Undergrads and grad students, neither of whom have credentials or much experience, replicate effects all the time, particularly effects that are thoroughly described and real.

    • Thom Baguley

      While I agree partly with the sentiment, it isn’t a logical argument. There are other reasons for failure to replicate that might be B’s fault or neither A nor B’s fault. For example, what if the participants at University A differ from those in University B and that difference is correlated with performance on the task (or interacts with it). This sort of thing has happened in the history of psychology (e.g., different strains of lab rats; different cultural interpretations of materials; translation issues).

      • Aaron Goetz

        Thanks for the comment, Thom. The issues you raise are all methodological issues and would thus be captured by my reason ii: inadequate description of methods. If a particular effect is specific to a certain population, for example, then this should have been noted by Researcher A, or minimally, it’s certainly not Researcher B’s incompetence, as Lieberman seems to argue.
        Also, even if these weren’t methodological issues, which I think they are, I fail how this would render my argument illogical. Maybe incomplete, at best.

        • Thom Baguley

          How could you know if an effect is limited to a subpopulation if you have only tested it with that subpopulation? You can not in principle generalize findings to untested populations unless they are a random subset of the superpopulation.

          There are reasonable things to expect in methods and unreasonable things to expect. A complete description of all possible sample characteristics that might influence the task is unreasonable (e.g., it might take several hundred weeks or years of testing).

          (In general, no sample is a random subset of all possible participants.)

          More generally there are always potential counterexamples. You assume A or B and not C, but C is logically possible (if not probable). For example, C could be the divine intervention of the flying spaghetti monster.

  • http://twitter.com/green_minds R. Gordon

    A lot of writers who’ve suggested this Wizarding Theory of experimentation seem to assume that we’re moving from a no-replication model of research to a one-replication model, when really something more like the Many Labs approach is needed.

    Lieberman, for all the flaws in his argument, actually doesn’t make this particular error, suggesting: “If we’re going to do it as a group, we should perhaps have a set of
    nominated studies every year that should be replicated. Those studies
    should be assigned to labs that say, “I’ll take whatever study you
    assign to me, and here are my qualifications,” and we assign them to the qualified labs. We get them to give their predictions before they’re
    assigned anything so we know what their predictions are, we know what
    their expectancy effects might be, and then maybe we do it that way.”

    Any multi-lab replication method ought to overcome issues with a particular researcher’s competence or lack thereof.

    Then there’s Dienes’s suggestions for using Bayesian analysis to interpret non-significant results from hypothesis testing: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4114196/

    Something like that could also help distinguish meaningful non-replications from those that are non-informative for whatever reason. No muggle-screening necessary.

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Thanks for the comment – I agree. And I have no issue with that part of Lieberman’s argument.

  • Pingback: Hierarchy and fairness in primates: the source of indignation and the elusive Baldwin effect | Writing my own user manual()

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT
Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »