The Misuse of Meta-Analysis?

By Neuroskeptic | March 24, 2017 5:33 pm

Over at Data Colada, Uri Simonsohn argues that The Funnel Plot is Invalid.

Funnel plots are a form of scatter diagram which are widely used to visualize possible publication bias in the context of a meta-analysis. In a funnel plot, each data point represents one of the studies in the meta-analysis. The x-axis shows the effect size reported by the study, while the y-axis represents the standard error of the effect size, which is usually inversely related to the sample size.


In theory, the points should form a triangular “funnel” pointing upwards, if there is no publication bias. If the funnel is asymmetric, this is taken as evidence of publication bias. Typically, we say that an asymmetric plot indicates that small studies that find a large effect are being published, while other small studies, that happen to find no effect, remain in the proverbial file drawer.

However, Simonsohn points out that we can only infer publication bias from a funnel plot if we assume that there is no “real” correlation between the effect size and the sample size of the included studies. This assumption, he says, is probably false, because researchers might choose larger samples to study effects that they predict to be smaller:

The assumption is false if researchers use larger samples to investigate effects that are harder to detect, for example, if they increase sample size when they switch from measuring an easier-to-influence attitude to a more difficult-to-influence behavior. It is also false if researchers simply adjust sample size of future studies based on how compelling the results were in past studies… Bottom line. Stop using funnel plots to diagnose publication bias.

In my view, Simonsohn is right about this “crazy assumption” behind funnel plots – but the problem goes deeper.

Simonsohn’s argument applies whenever the various different studies in a meta-analysis are studying different phenomena, or at least measuring the same phenomenon in different ways. It’s this variety of effects that could give rise to a variety of predicted effect sizes. Simonsohn uses this meta-analysis about the “bilingual advantage” in cognition as an example, noting that it “includes quite different studies; some studied how well young adults play Simon, others at what age people got Alzheimer’s.”

Simonsohn’s conclusion is that we shouldn’t do a funnel plot with a meta-analysis like this, but I wonder if we should be doing a meta-analysis like this in the first place?

Is meta-analysis an appropriate tool for synthesizing evidence from methodologically diverse studies? Can we really compare apples and oranges and throw them all into the same statistical juicer?

The “comparing apples and oranges” debate around meta-analysis is an old one, but I think that researchers today often gloss over this issue.

For instance, in psychology and neuroscience there seems to be a culture of “thematic meta-analysis” – i.e. a meta-analysis is used to “sum up” all of the often diverse research addressing a particular theme. I’m not sure that meta-analysis is the best tool for this. In many cases, it would make more sense to just rely on a narrative review, that is, to just write about the various studies.

We also see the phenomenon of “meta-analytic warfare” – one side in a controversy will produce a meta-analysis of the evidence, and then their opponents will reply with a different one, and so on back and forth. These wars can go on for years, as the two sides accuse each other of wrongly including or excluding certain studies. My concern is that the question of which studies to include has no right answer in the case of a “theme” meta-analysis, because a theme is a vague concept not a clearly-defined grouping.

CATEGORIZED UNDER: science, select, statistics, Top Posts
  • Ulrich Schimmack

    Dear Neurskeptic,

    I am very disappointed with your non-skeptical comments about Simonsohn’s blog post.

    “In my view, Simonsohn is right about this “crazy assumption” behind funnel plots – but the problem goes deeper.”

    I have two comments.
    1. The funnel plot has been an extremely valuable tool to reveal publication bias in meta-analysis. Compare for example, the uncritical meta-analysis of ego-depletion that produced an effect size estimate of d = .6 with one where a funnel plot shows publication bias and the corrected estimate is d = .2 +.- .2, meaning the ego-depletion effect could be zero, followed by a pre-registered replication report that showed no effect. Crazy assumption this funnel plot or crazy to dismiss it?

    2. Uri also fails to mention that we have now other tools that can assess publication bias that are immune to the criticism of funnel plots. I published one of these tests and showed publication bias (or p-hacking) in Bem’s (2011) crazy ESP paper. The correlation between sample size and effect size in that paper is r = -.9 (see picture).
    You can see how closely the observed effect sizes are to the regression line. What explains this relationship?

    A. Publication bias
    B. Bem has ESP and can predict the amount of random sampling error before he conducts a study?

    For more information about statistical tools to detect publication bias and p-hacking in sets of studies, please visit my replicability blog.

    • Neuroskeptic

      Re: 1) There are of course many examples when the funnel plot indicated publication bias, and publication bias indeed appears likely (true positives), but Simonsohn’s point was about the possibility of false positives under certain conditions.

      Personally, on further reflection, I can see a continuing use for funnel plots even in a heterogenous literature of the kind that Simonsohn warns about. If we enhance the funnel plot by showing the zone of statistical significance (as in this case) it may become apparent that there is not just asymmetry but in fact a glut of just-significant studies.

      In such a case it’s hard to see any explanation other than p-hacking or publication bias, unless we assume that authors are able to predict effect sizes almost perfectly.

      But I think Simonsohn is still right that asymmetry per se is a dubious metric.

      Re: 2), thanks for the link!

      • Frank Coleman

        I was paid 104000 dollars past year by doing an internet job furthermore I was able to do it by w­orking in my own time f­o­r quite a few hours each day. I applied work opportunity I stumbled upon on the net and I am thrilled that I was in the position to earn such good money. It’s undoubtedly newbie-friendly and therefore I am so blessed that I discovered out about it. Look into exactly what I do… http://b1z­.­org/37X

  • Pingback: Weekend reads: The risks of spotlighting reproducibility; harassment = scientific misconduct?; trouble with funnel plots - Retraction Watch at Retraction Watch()

  • Uncle Al

    Published statistics must validate a signal. The fundamental goal of a publication is to publish. Grant Funding demands it. Observe Dissertation Abstracts: STEM (technological society) vs. Other Stuff (forest killers uneffecting the world).

    Bottled water measures SSRI consumption. SSRI’s are mostly inert short of causing dry mouth and sexual dysfunction. Double-blind studies are not blind. The hot group gets dry mouths. SSRIs are publishing machines forever seeks an SSRI (combination of SSRIs, combination of SSRIs plus adjuvant pharma) that lifts depression (“I feel lousy”). SSRI’s make you feel lousy. CA-CHING!

    Wet crawling things, ants, and murine test subjects leave explicit exudate trails. Vast acreages of maze publications are meaningless.

  • Pingback: Weekend reads: The risks of spotlighting reproducibility; harassment = scientific misconduct?; trouble with funnel plots | shaka()

  • Pingback: Post Of The Week – Saturday 25th March 2017 | DHSB/DHSG Psychology Research Digest()

  • FFlint

    There are two things which confuse me here:
    1) That meta-analysts still use funnel plots and Fail-safe N as evidence for publication bias given that they’ve both long been debunked as valid methods (see Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting for publication bias in the presence of heterogeneity. Statistics in medicine, 22(13), 2113-2126. and SCARGLE, J. D. (2000). Publication Bias: The “File-Drawer” Problem in Scientific Inference. Journal of Scientific Exploration, 14(1), 91-106.)
    2) That Simonsohn has only just come to this conclusion (given both publications are 10+ years old and pretty well known in the field).

    • matus

      Have you looked at other posts on Simonsohn’s blog? In other news, the interactions of logistic regression have been found to be non-linear 😀 However, they are writing for psychologists. I can imagine the blog posts read like revelations to some of their readers. In addition, a lack of discussion section provides an aura of authority :)

  • matus

    I wouldn’t be that harsh with funnel plot. I think it’s perfectly
    fine as an exploratory tool or as a supplementary visualization.

    the problem with MA is really that it uses standardized effect sizes.
    People seem to believe that by dividing by standard deviation they
    magically get unit-less quantities which are universally comparable. I
    would just abandon standardized effect sizes, there is no use for them.
    Instead report the effect size in original units and do meta-analysis in
    original units. In practice this doesn’t work because researchers use
    unvalidated measures and unvalidated manipulations – and as a
    consequence they don’t know what they are measuring and manipulating in
    their experiments. (And why bother, validation work just costs
    resources.) This also solves the question which studies to include in
    the meta-analysis: those that measure the same thing and that use
    validated and valid tools to do so.

    • FFlint

      What is it ‘perfectly fine’ for? Terrin et al. show that the asymmetry is a natural consequence of different researchers undertaking different studies making different initial estimates of likely effect size. So what does drawing a funnel plot do for us?

      I do agree with the issue of using standardised effect sizes. Simpson (2017) has just published a paper taking effect size in educational mega-synthesis and meta-analysis apart ( ( A somewhat older blog by Jan Vanhove ( raises some of the same issues as you do.

      • matus

        Imagine you made a funnel plot and it’s asymetric but it’s the left side that is missing – that’s an interesting find and useful to know. Perhaps you did not encode ES correctly or there is something else going on. In general looking at data from multiple perspectives is always helpful even if funnel plot does not provide the best perspective.

        I recall reading Jan Hoves blog some time in distant past. It’s good to see published work give some scrutiny to standardized effect sizes. We need more of that in psychological literature.

        • FFlint

          That’s an odd viewpoint: a histogram of effect sizes, a stem-and-leaf; a scatterplot of effect size against number of letters in the first authors’ name; … each of these gives you a view on the data, some useful and some probably not so much. The sole reason given for a funnel plot is to look for publication bias and Terrin shows that it simply doesn’t do that. Equally a funnel plot won’t spot a typo or a problematic encoding – the only reason is to check for symmetry which it turns out you shouldn’t expect!

  • Pingback: Lectuur op zaterdag: negatief over positieve gedachten, Marx anders bekeken en eetbuien | X, Y of Einstein?()



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar