Can We Rely on fMRI?

By Neuroskeptic | March 10, 2010 10:10 pm

Craig Bennett (of and Michael Miller, of dead fish brain scan fame, have a new paper out: How reliable are the results from functional magnetic resonance imaging?

Tal over at the [citation needed] blog has an excellent in-depth discussion of the paper, and Mind Hacks has a good summary, but here’s my take on what it all means in practical terms.

Suppose you scan someone’s brain while they’re looking at a picture of a cat. You find that certain parts of their brain are activated to a certain degree by looking at the cat, compared to when they’re just lying there with no picture. You happily publish your results as showing The Neural Correlates of Cat Perception.

If you then scanned that person again while they were looking at the same cat, you’d presumably hope that exact same parts of the brain would light up to the same degree as they did the first time. After all, you claim to have found The Neural Correlates of Cat Perception, not just any old random junk.

If you did find a perfect overlap in the area and the degree of activation that would be an example of 100% test-retest reliability. In their paper, Bennett and Miller review the evidence on the test-retest reliability of fMRI studies. They found 63 of them. On average, they found that the reliability of fMRI falls quite far short of perfection: the areas activated (clusters) had a mean Dice overlap of 0.476, while the strength of activation was correlated with a mean ICC of 0.50.

But those numbers, taken out of context, do not mean very much. Indeed, what is a Dice overlap? You’ll have to read the whole paper to find out, but even when you do, they still don’t mean that much. I suspect this is why Bennett and Miller don’t mention them in the Abstract of the paper, and in fact they don’t spend more than a few lines discussing them at all.

A Dice overlap of 0.476 and an ICC of 0.50 are what you get if average over all of the studies that anyone’s done looking at the test-retest reliability of any particular fMRI experiment. But different fMRI experiments have different reliabilities. Saying that the average reliability of fMRI is 0.5 is rather like saying that the mean velocity of a human being is 0.3 km per hour. That’s probably about right, averaging over everyone in the world, including those who are asleep in bed and those who are flying on airplanes – but it’s not very useful. Some people are moving faster than others, and some scans are more reliable than others.

Most of this paper is not concerned with “how reliable fMRI is”, but rather, with how to make any given scanning experiment more reliable. And this is an important thing to write about, because even the most optimistic cognitive neuroscientist would agree that many fMRI results are not especially reliable, and as Bennett and Miller say, reliability matters for lots of reasons:

Scientific truth. While it is a simple statement that can be taken straight out of an undergraduate research methods course, an important point must be made about reliability in research studies: it is the foundation on which scientific knowledge is based. Without reliable, reproducible results no study can effectively contribute to scientific knowledge…. if a researcher obtains a different set of results today than they did yesterday, what has really been discovered?
Clinical and Diagnostic Applications. The longitudinal assessment of changes in regional brain activity is becoming increasingly important for the diagnosis and treatment of clinical disorders…
Evidentiary Applications. The results from functional imaging are increasingly being submitted as evidence into the United States legal system…
Scientific Collaboration. A final pragmatic dimension of fMRI reliability is the ability to share data between researchers…

So what determines the reliability of any given fMRI study? Lots of things. Some of them are inherent to the nature of the brain, and are not really things we can change: activation in response to basic perceptual and motor tasks is probably always going to be more reliable than activation related to “higher” functions like emotions.

But there are lots of things we can change. Although it’s rarely obvious from the final results, researchers make dozens of choices when designing and analyzing an fMRI experiment, many of which can at least potentially have a big impact on the reliability of their findings. Bennett and Miller cover lots of them:

voxel size… repetition time (TR), echo time (TE), bandwidth, slice gap, and k-space trajectory… spatial realignment of the EPI data can have a dramatic effect on lowering movement-related variance … Recent algorithms can also help remove remaining signal variability due to magnetic susceptibility induced by movement… simply increasing the number of fMRI runs improved the reliability of their results from ICC = 0.26 to ICC = 0.58. That is quite a large jump for an additional ten or fifteen minutes of scanning…

The details get extremely technical, but then, when you do an fMRI scan you’re using a superconducting magnet to image human neural activity by measuring the quantum spin properties of protons. It doesn’t get much more technical.

Perhaps the central problem with modern neuroimaging research is that it’s all too easy for researchers to write off the important experimental design issues as “merely” technicalities, and just put some people in a scanner using the default scan sequence and see what happens. This is something few fMRI users are entirely innocent of, and I’m certainly not, but it is a serious problem. As Bennett and Miller point out, the devil is in the technical details.

The generation of highly reliable results requires that sources of error be minimized across a wide array of factors. An issue within any single factor can significantly reduce reliability. Problems with the scanner, a poorly designed task, or an improper analysis method could each be extremely detrimental. Conversely, elimination of all such issues is necessary for high reliability. A well maintained scanner, well designed tasks, and effective analysis techniques are all prerequisites for reliable results.

ResearchBlogging.orgBennett CM, Miller MB. (2010). How reliable are the results from functional magnetic resonance imaging? Annals of the New York Academy of Sciences

  • ramesam

    The last para of the article says: “The generation of highly reliable results requires that sources of error be minimized across a wide array of factors. …..”

    What is new here?

    Is that not true with respect to any research — not merely fMRI or Neuroscience?

    The experimental errors and interpretation biases may be more difficult to stand out in the studies of natural systems. But these aspects are basic precautions in any scientific inquiry.

    Please see the two slides titled “Observation hazards” and “Pitfalls in Interpretation” in the PPt presenataion at:

  • Neuroskeptic

    It's true that there's nothing new in saying that good science depends on good methods. However this is something that many people think fMRI (still a very young neuroscience method) often doesn't pay enough attention to…

  • Yigal Agam

    To me, the devil is in the statistics. Many authors run the whole brain through a t-test to find a “correlate” of something, with weak or no correction for multiple comparisons (this is getting better with time, though). These problems exist in every empirical discipline, but the large volumes of data in fMRI make it more susceptible. In addition to providing all the boring technical details about scan parameters, author should go into excruciating detail when writing up their statistical analyses.

  • Anonymous

    Absolutely nothing new here–but nice summary of other people's thoughts and comments of the past. What the author forgets to mention completely–and most everyone forgets—is that the statistics used for the analysis of fMRI is really not reliable no matter what one does or how one prepares for errors in the experiment. The reason why the stats is not reliable is because it requires independent observations, which is not possible when we are looking at voxels (arbitrary brain slices that are connected) in which neuronal axons maybe crossing and are hanging into two voxels at the same time or have multiple functions, as many do. Thus ANY statistics of fMRI is weak but we don;t have anyting better for the moment.

    So for those who keep on criticizing fMRI experimental analysis, just get your lips zipped and come up with a better method!

  • ML, MD

    fMRI might be more reliable for investigation of psychopathology vs. normal functioning but it's too early to tell if it will be useful for diagnosis.

  • generic viagra online

    This is a nice and wonderful article that has been written here. Great job has been done here.

  • Anonymous

    Thank you for your blog. I enjoy the discussions.
    On the topic of fMRI reliability…Yes, fMRI is a relatively new and complicated technique used by researchers who often do not have a good understanding of the underlying assumptions. Bennett and Miller discuss methods (e.g., more runs) for improve reliability that are likely to have a modest effect.
    But, let's be skeptical scientists here. If we were talking about any other measure, would a dependent variable with ICC of approx 0.4 be acceptable? Of course not. Some might argue that sometimes the ICC is better than 0.4, and it is for simple sensory and motor experiments. But, Bennett and Miller nicely show that the best estimate (average) of fMRI test-retest reliability from a cognitive activation task is approx 0.4 (ICC) with the activated region overlapping by approx 33% (Jacard index). This is an embarassment. Consider further that test-retest reliability is likely to be lower still if the test-retest interval is longer than 1 hr or 1 wk, or if the retest data are collected with a different scanner.

    fMRI research, and fMRI researchers, have a devastating problem here that they must more openly confront. Poor reliability is likely related to poor SNR. In addition, and more importantly, poor reliability comes from the abundance of fMRI studies with very small samples of 8-20 subjects. Continuing the status quo will continue to litter the landscape with unreproducible results. Journal and journal editors should step up and develop publication guidelines that address these problems. For example, the better genetics journals now require that candidate gene association studies have a very large N, or include a replication sample in the same publication with the original sample. Where is the outrage?

  • Neuroskeptic

    Anonymous: Thanks for the comment. I'd agree with that, although it's easier said than done, unfortunately.



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar