Blobs and Pitfalls: Challenges for fMRI Research

By Neuroskeptic | June 22, 2016 2:40 am

Brain scanning is big at the moment. In particular, the technique of functional MRI (fMRI) has become hugely popular within neuroscience. But now a group of big-name neuroimaging researchers, led by Russ Poldrack, have taken a skeptical look at the field, in a new preprint (currently under peer review) called Scanning the Horizon: Future challenges for neuroimaging research.

Poldrack et al. do a great job of discussing the various problems including limited statistical power, undisclosed analytic flexibility (producing scope for p-hacking) and inflated false positive rates in the software tools used. They also cover proposed solutions including my favorite, preregistration of study designs. Neuroskeptic readers will find much of this familiar as I’ve covered a lot of these issues on this blog.

The authors also offer some interesting new illustrations of the problems. I was particularly struck by the observation that out of a sample of 65 fMRI papers retreived from PubMed, 9 of the papers used FSL and SPM software for most of the data analysis but then switched to the seperate AFNI software package for the final inference step of multiple comparisons correction. There seems to be no good reason to do this. FSL and SPM provide their own multiple comparisons correction tools. Although it’s impossible to be sure what’s going on here, it looks like researchers may be ‘shopping around’ for statistical tools that happen to give them the results they want.

Poldrack et al. also provide a neat graph showing the sample sizes in fMRI studies over the years. The lines show the estimated median sample size. The typical size has increased steadily from about 10, in the 1990s, up to around 25 today. This is still, in absolute terms, rather small.

fmri_sample_sizeOne issue that’s not covered in Scanning the Horizon is problems in the interpretation of fMRI results. Even if researchers use the correct statistical techniques and software, it is easy to misinterpret or overinterpret the results. One very common problem is the so-called imager’s fallacy in which the existence of a statistically significant ‘blob’ in one area of the brain, and the absence of a blob in another area, is taken as evidence of a significant difference between those two areas.

CATEGORIZED UNDER: FixingScience, fMRI, methods, select, Top Posts
ADVERTISEMENT
  • Wouter

    “This [sample size] is still, in absolute terms, rather small.”
    Well, in absolute terms, sample size says absolutely nothing. Whether or not a sample size is large enough to obtain sufficient power, is completely dictated by the effect size of the outcome.

    The authors of the mentioned article know this, as the currently presented figure is accompanied in the article by a graph of the effect size over the last years (showing a decreasing line). The authors argue that this is still worrisome, since most studies address brain-behavior correlations on group level which supposedly has inherently low power. I personally feel that this is completely study-specific. In line with my opinion, the authors merely supply the reader with guidelines to avoid QRPs, rather than dictating an increase in sample size.

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Thanks. The right panel of the figure actually doesn’t show empirical effect sizes over the years, rather it shows the required minimum effect sizes to get 80% power, given the sample sizes:

      Right panel: Using the sample sizes from the left panel, we estimated the standardized effect size required to detect an effect with 80% power for a whole­brain linear mixed effects analysis using a voxelwise 5% familywise error rate threshold from random field theory… This shows that even today, the average study is only sufficiently powered to detect very large effects; given that many of the studies will be assessing group differences or brain-­behavior correlations (which will inherently have lower power), this represents an optimistic lower bound on the powered effect size

      Overall I agree with you that sample size by itself is not important, but I know of no reason a priori to expect that interesting effects in fMRI are going to have larger effect sizes than interesting behavioural effects, yet behavioural sample sizes are typically bigger.

  • Anders Eklund

    One possible reason for using 3dClustSim in AFNI for the inference, even if analysis was performed in SPM or FSL, is that it was easier to get a significant result, due to a bug

    https://arxiv.org/abs/1511.01863

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Ah yes, Poldrack et al. mentioned this as well. What a mess!

    • disqus_Yg3epjmCXY

      It’s unclear from that paper how bad the bug is when using a CDT of 0.001 and the same smoothness as the other software packages. AFNI allows the user to decide what smoothness to use, including smoothness of residuals at 2nd level if that’s what everyone now thinks is best. I would bet when the same data is used with the same parameters, this bug increases false positives by a tiny amount. Now I could be wrong, but I didn’t see that test in the paper… To infer from this that researchers are engaging in a massive conspiracy to squeeze out smaller clusters smacks of methodological elitism. “Though shalt follow the one true pipeline!” It also effectively puts tools that allow you to jump between packages (nipype) out of business. There are plenty of steps that can impact the ability to detect a blob: from acquisition protocol, to preprocessing to modeling (or mis-modeling), to failing to account for nuisance variables. This is just one of them. Winking and nudging one another that everyone else is jumping between packages as some sort of underground p-hacking conspiracy is maybe one possible motive if you really want to be that cynical. Other “potential reasons” are that people jump between packages because it’s what they were trained to do, or people prefer the simplicity of AFNI’s solution of calculating the necessary cluster size, or they prefer the assumptions of AFNI’s method over the more opaque SPM ones (topological FDR anybody?), or they prefer a simple clear output over the clicking GUI hell that is SPM’s 2nd level inference. As long as they are not switching back and forth every other study, I doubt it’s done for the shady reasons that everyone is implying. Never attribute to malice that which can be adequately explained by laziness!

  • http://www.mazepath.com/uncleal/qz4.htm Uncle Al

    “Soft” sciences’ ethic is their own propagation, re Klimate Kaos and Anscombe’s quartet. The soft mind sciences are yet more egregious, crushing people as well as truth – Rosemary Kennedy; penology, DSM-5 plus Big Pharma, trial prosecution, education, and social policy. Will you believe Official Truth or your own lying eyes?

  • Gabriel Castellanos

    Other problem is the lack of a clinically meaningful effect size measure in studies with patients

  • Денис Бурчаков

    It seems, that the in popular view seat once occupied by phrenology now belongs to fMRI, whether it wants it or not. Technical progress made a couple of giants leap forward, but the human nature remains the same. Same fallacies, more sophisticated…

    I was surprised about how willingly people take the marvels of fMRI, when I read the “Brainwashed” book (http://preview.tinyurl.com/zw77dly). It deals mostly with law issues and God, there is so much potential to mess everything…

  • Matt D

    I’m glad you mentioned problems with fMRI results interpretation as well. I am shocked that this discussion about good neuroscience practices has been extremely limited in scope. Why emphasize only the statistical side of things? How are we not talking about improving fMRI task design? 9/10 studies I read contain fatal flaws in terms of extraneous confounding variables such as task difficulty. How are we not talking about improving anatomical precision? 9/10 studies I read mis-label regions or use archaic terms like dorsal ACC that don’t make sense in light of new research on cytoarchitecture, connectivity patterns, and receptor types. And how are we not talking about slowing the onslaught of empirical studies in favour of slowly building theoretical models based on integrating findings from divergent areas, and then designing empirical studies that will actually tell us something useful. As a young cognitive neuroscientist I am flabbergasted at the state of the field. Stats is the least pressing issue. The conversation needs to greatly expand if we truly desire to reform neuroscience research and conduct studies that will contribute something meaningful to society. How about a year long ban on publishing so we can all step back and gain some perspective :)

  • Pingback: The traps of neuroimaging research | What is behavioral?()

  • Pingback: Spike activity 24-06-2016 - Liveindex News People Places()

  • sometimes_science
    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Thanks! I believe (a preprint of) this paper was discussed in the Poldrack et al. piece.

  • Mark Eccles

    You know this is all about making money , right? What can I sell?
    Can we see the plaques in an Alzheimer’s patients brain with fMRI? Yes . Good . Now find the reasons for the defect and fix it.

  • Pingback: Spike activity 24-06-2016 - Mindfulnessless()

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar
+