“Cluster Failure”: fMRI False Positives Revisited

By Neuroskeptic | July 22, 2018 9:31 am

Two years ago, a paper by Swedish neuroscientist Anders Eklund and colleagues caused a media storm. The paper, Cluster Failure, reported that the most widely used methods for the analysis of fMRI data are flawed and produce a high rate of false positives.

As I said at the time, Cluster Failure wasn’t actually making especially new claims because Eklund et al. had been publishing quite similar results years earlier – but it wasn’t until Cluster Failure that they attracted widespread attention.


Perhaps Cluster Failure went mainstream is that it was the first of Eklund et al.’s false positive papers to be published in a high-impact journal (PNAS). But another reason is that it contained an alarming statement, namely that “These results question the validity of some 40,000 fMRI studies.”┬áThis triggered many headlines implying that all of fMRI was suspect.

Tom Nichols, one of the Cluster Failure authors, later clarified that 40,000 referred to the total number of fMRI studies out there, and wasn’t meant to imply that all of those studies were invalid. He went on to estimate that about 10% of the 40,000 fMRI experiments were at high risk of false positives from the Cluster Failure problem, although another 33% suffer from a different problem (no multiple comparisons correction at all.)

Now, Eklund et al. have released a biorXiv preprint looking back on their much-discussed 2016 paper: Cluster Failure Revisited: Impact of First Level Design and Data Quality on Cluster False Positive Rates

In the new article, Eklund et al. consider various technical critiques of their previous work, but conclude that they are unfounded. In response to concerns that the high false positive rates in Cluster Failure were a result of “idiosyncratic attributes of our first level designs”, they show that elevated false positive rates are also seen with modified designs (first level models with two regressors, and models with two intersubject-randomized regressors).

Eklund et al. also further explore how best to analyze fMRI data to avoid false positives. After trying several different approaches, they conclude that a combination of two things is required: non-parametric statistical thresholding, and noise reduction (using ICA FIX).

Finally, Eklund et al. revisit the question of how many fMRI papers may be compromised by sub-optimal analysis. They reiterate that about 10% of fMRI papers used a multiple comparisons correction that Cluster Failure showed to be most problematic (p<0.01 cluster defining threshold), and say that any marginally significant results obtained by this method (p close to 0.05) should be “judged with great skepticism”. But studies that used no multiple comparisons correction at all are even more dubious, the authors say.

In my view, for all of the hyperbole it created, Cluster Failure was a great paper and highlights an issue that shouldn’t be ignored. Yet I wonder whether new and different problems may be on the horizon.

New analysis tools for fMRI have emerged in the past few years, which – unlike earlier methods – aren’t based on mapping clusters of brain activity. Popular new methods include MVPA and network-based analyses. These approaches are exciting and impressive and they are not (as far as I know) subject to the problems identified in Cluster Failure. But then again, they are new enough that they might have undiscovered issues of their own. In 2026, will we be worrying about the implications of a bombshell paper called Network Failure…?

CATEGORIZED UNDER: fMRI, methods, papers, select, statistics, Top Posts
  • practiCalfMRI

    Use of ICA FIX, eh? NRP (not read paper), but it suggests physiological artifacts – most likely fluctuations in arterial CO2 – are the next biggest problem after (real) head motion. However, apparent head motion, from respiration, might also be driving the need for FIX.

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Yes, that’s what Eklund et al. suggest. They go on to say that recording physiological regressors is better than relying on FIX:

      “We also recommend researchers to collect physiological data, such that signal related to breathing and pulse can be modeled (Glover et al., 2000; Lund et al., 2006; Birn et al., 2006; Chang & Glover, 2009; Bollmann et al., 2018). This is especially important for 7T fMRI data, for which the physiological noise is often stronger compared to the thermal noise (Triantafyllou et al., 2005; Hutton et al., 2011).”

  • OWilson

    I’m a huge fan of chaos and probability theory, and there are many excellent books which explain some of the common fallacies in correlation and coincidence occurance in random distribution clustering, from stars in the sky, through cancer research, to weather events, to the classic clustering of flying bombs over London in WW2.

    I am often surprised and disappointed in the lack of rigour and understanding of this basic statistical science, in many studies today!

    • hookdump

      Hi, would you mind sharing some of those book titles? Thank you!

  • Pingback: Additional reads in September 2018()



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar