Craig Bennett (of Prefrontal.org) and Michael Miller, of dead fish brain scan fame, have a new paper out: How reliable are the results from functional magnetic resonance imaging?
Suppose you scan someone’s brain while they’re looking at a picture of a cat. You find that certain parts of their brain are activated to a certain degree by looking at the cat, compared to when they’re just lying there with no picture. You happily publish your results as showing The Neural Correlates of Cat Perception.
If you then scanned that person again while they were looking at the same cat, you’d presumably hope that exact same parts of the brain would light up to the same degree as they did the first time. After all, you claim to have found The Neural Correlates of Cat Perception, not just any old random junk.
If you did find a perfect overlap in the area and the degree of activation that would be an example of 100% test-retest reliability. In their paper, Bennett and Miller review the evidence on the test-retest reliability of fMRI studies. They found 63 of them. On average, they found that the reliability of fMRI falls quite far short of perfection: the areas activated (clusters) had a mean Dice overlap of 0.476, while the strength of activation was correlated with a mean ICC of 0.50.
But those numbers, taken out of context, do not mean very much. Indeed, what is a Dice overlap? You’ll have to read the whole paper to find out, but even when you do, they still don’t mean that much. I suspect this is why Bennett and Miller don’t mention them in the Abstract of the paper, and in fact they don’t spend more than a few lines discussing them at all.
A Dice overlap of 0.476 and an ICC of 0.50 are what you get if average over all of the studies that anyone’s done looking at the test-retest reliability of any particular fMRI experiment. But different fMRI experiments have different reliabilities. Saying that the average reliability of fMRI is 0.5 is rather like saying that the mean velocity of a human being is 0.3 km per hour. That’s probably about right, averaging over everyone in the world, including those who are asleep in bed and those who are flying on airplanes – but it’s not very useful. Some people are moving faster than others, and some scans are more reliable than others.
Most of this paper is not concerned with “how reliable fMRI is”, but rather, with how to make any given scanning experiment more reliable. And this is an important thing to write about, because even the most optimistic cognitive neuroscientist would agree that many fMRI results are not especially reliable, and as Bennett and Miller say, reliability matters for lots of reasons:
Scientific truth. While it is a simple statement that can be taken straight out of an undergraduate research methods course, an important point must be made about reliability in research studies: it is the foundation on which scientific knowledge is based. Without reliable, reproducible results no study can effectively contribute to scientific knowledge…. if a researcher obtains a different set of results today than they did yesterday, what has really been discovered?
Clinical and Diagnostic Applications. The longitudinal assessment of changes in regional brain activity is becoming increasingly important for the diagnosis and treatment of clinical disorders…
Evidentiary Applications. The results from functional imaging are increasingly being submitted as evidence into the United States legal system…
Scientific Collaboration. A final pragmatic dimension of fMRI reliability is the ability to share data between researchers…
So what determines the reliability of any given fMRI study? Lots of things. Some of them are inherent to the nature of the brain, and are not really things we can change: activation in response to basic perceptual and motor tasks is probably always going to be more reliable than activation related to “higher” functions like emotions.
But there are lots of things we can change. Although it’s rarely obvious from the final results, researchers make dozens of choices when designing and analyzing an fMRI experiment, many of which can at least potentially have a big impact on the reliability of their findings. Bennett and Miller cover lots of them:
voxel size… repetition time (TR), echo time (TE), bandwidth, slice gap, and k-space trajectory… spatial realignment of the EPI data can have a dramatic effect on lowering movement-related variance … Recent algorithms can also help remove remaining signal variability due to magnetic susceptibility induced by movement… simply increasing the number of fMRI runs improved the reliability of their results from ICC = 0.26 to ICC = 0.58. That is quite a large jump for an additional ten or fifteen minutes of scanning…
The details get extremely technical, but then, when you do an fMRI scan you’re using a superconducting magnet to image human neural activity by measuring the quantum spin properties of protons. It doesn’t get much more technical.
Perhaps the central problem with modern neuroimaging research is that it’s all too easy for researchers to write off the important experimental design issues as “merely” technicalities, and just put some people in a scanner using the default scan sequence and see what happens. This is something few fMRI users are entirely innocent of, and I’m certainly not, but it is a serious problem. As Bennett and Miller point out, the devil is in the technical details.
The generation of highly reliable results requires that sources of error be minimized across a wide array of factors. An issue within any single factor can significantly reduce reliability. Problems with the scanner, a poorly designed task, or an improper analysis method could each be extremely detrimental. Conversely, elimination of all such issues is necessary for high reliability. A well maintained scanner, well designed tasks, and effective analysis techniques are all prerequisites for reliable results.