Back in June, I warned that the ever-increasing number of clever methods for analyzing brain imaging data could be a double-edged sword:
Recently, psychologists Joseph Simmons, Leif Nelson and Uri Simonsohn made waves when they published a provocative article called False-Positive Psychology – Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.
It explained how there are so many possible ways to gather and analyze the results of a simple psychology experiment that, even if there’s nothing interesting really happening, it’ll be possible to find some “significant” positive results purely by chance…
The problem’s not just seen in psychology however, and I’m concerned that it’s especially dangerous in modern neuroimaging research.
In a comment on that post, The Neurocritic pointed out that Michigan PhD student Joshua Carp had put forward the same argument in a conference presentation, several months previously.
Now Carp’s published a paper on the topic: On the plurality of (methodological) worlds: estimating the analytic flexibility of fMRI experiments. It’s free to access, so check it out.
Whereas I just talked the talk by listing lots of possible ways in which you could analyze a given set of data, Carp walked the walk, and actually did loads of analyses. He took a single dataset, the results of a simple experiment and looked at it in almost 7000 different ways. Each set of results was then thresholded to correct for multiple comparisons in 5 ways, for a grand total of 35,000 outputs.
The variants he considered ranged from how much smoothing to apply, to how to correct for head motion, and many more.
What happened? In a nutshell, the different options made a difference – and the variability was the largest in parts of the brain that were most activated (the “blobs” that lit up). In other words, analytic flexibility makes the most difference in the most interesting places. See the picture at the top.
The location of the maximum peak activation also varied. This is not unexpected, and not, in itself, that worrying – the great majority of the peaks clustered in a few small areas. However, it underlines that different options really can make a difference.
Nearly every voxel in the brain showed significant activation under at least one analysis pipeline. In other words, a sufficiently persistent researcher determined to find significant activation in virtually any brain region is quite likely to succeed…
If investigators apply several analysis pipelines to an experiment, and only report the analyses that support their hypotheses, then the prevalence of false positive results in the literature may far exceed the nominal rate. However, analytic flexibility only translates into elevated false positive rates when combined with selective analysis reporting. If researchers reported the results of all analysis pipelines used in their studies, then it would not be problematic.
To the author’s knowledge, there is no evidence that fMRI researchers actually engage in selective analysis reporting. But researchers in other fields do appear to pursue this strategy.
In my experience, fMRI researchers are actually fairly conservative in terms of using different analyses, and certainly I doubt anyone has ever run thousands of them just to get the result they want and I’d estimate that most published findings are not the result of more than a handful of ‘attempts’ at most.
However it’s a serious concern that it could happen, and importantly it’s getting ever-easier to do this, with the continuing increase in computer power making running an analysis quicker and cheaper than ever. As to what to do about it, Carp makes several suggestions, and here’s one I made earlier…
Joshua Carp (2012). On the plurality of (methodological) worlds: estimating the analytic flexibility of fMRI experiments Front. Neurosci. DOI: 10.3389/fnins.2012.00149