Ed Vul et al recently created a splash with their paper, Puzzlingly high correlations in fMRI studies of emotion, personality and social cognition (better known by its previous title, Voodoo Correlations in Social Neuroscience.) Vul et al accused a large proportion of the published studies in a certain field of neuroimaging of committing a statistical mistake. The problem, which they call the “non-independence error”, may well have made the results of these experiments seem much more impressive than they should have been. Although there was no suggestion that the error was anything other than an honest mistake, the accusations still sparked a heated and ongoing debate. I did my best to explain the issue in layman’s terms in a previous post.
Now, like the aftershock following an earthquake, a second paper has appeared, from a different set of authors, making essentially the same accusations. But this time, they’ve cast their net even more widely. Vul et al focused on only a small sub-set of experiments using fMRI to examine correlations between brain activity and personality traits. But they implied that the problem went far beyond this niche field. The new paper extends the argument to encompass papers from across much of modern neuroscience.
The article, Circular analysis in systems neuroscience: the dangers of double dipping, appears in the extremely prestigious Nature Neuroscience journal. The lead author, Dr. Nikolaus Kriegeskorte, is a postdoc in the Section on Functional Imaging Methods at the National Institutes of Health (NIH).
Kriegeskorte et al’s essential point is the same as Vul et al’s. They call the error in question “circular analysis” or “double-dipping”, but it is the same thing as Vul et al’s “non-independent analysis”. As they put it, the error could occur whenever
data are first analyzed to select a subset and then the subset is reanalyzed to obtain the results.
and it will be a problem whenever the selection criteria in the first step are not independent of the reanalysis criteria in the second step. If the two sets of criteria are independent, there is no problem.
Suppose that I have some eggs. I want to know whether any of the eggs are rotten. So I put all the eggs in some water, because I know that rotten eggs float. Some of the eggs do float, so I suspect that they’re rotten. But then I decide that I also want to know the average weight of my eggs . So I take a handful of eggs within easy reach – the ones that happen to be floating – and weigh them.
Obviously, I’ve made a mistake. I’ve selected the eggs that weigh the least (the rotten ones) and then weighed them. They’re not representative of all my eggs. Obviously, they will be lighter than the average. Obviously. But in the case of neuroscience data analysis, the same mistake may be much less obvious. And the worst thing about the error is that it makes data look better, i.e. more worth publishing:
Distortions arising from selection tend to make results look more consistent with the selection criteria, which often reflect the hypothesis being tested. Circularity is therefore the error that beautifies results, rendering them more attractive to authors, reviewers and editors, and thus more competitive for publication. These implicit incentives may create a preference for circular practices so long as the community condones them.
To try to establish how prevalent the error is, Kriegeskorte et al reviewed all of the 134 fMRI papers published in the highly regarded journals Science, Nature, Nature Neuroscience, Neuron and the Journal of Neuroscience during 2008. Of these, they say, 42% contained at least one non-independent analysis, and another 14% may have done. That leaves 44% which were definitely “clean”. Unfortunately, unlike Vul et al who did a similar review, they don’t list the “good” and the “bad” papers.
They then go on to present the results of two simulated fMRI experiments in which seemingly exciting results emerge out of pure random noise, all because of the non-independence error. (One of these simulations concerns the use of pattern-classification algorithms to “read minds” from neural activity, a technique which I previously discussed). As they go on to point out, these are extreme cases – in real life situations, the error might only have a small impact. But the point, and it’s an extremely important one, is that the error can creep in without being detected if you’re not very careful. In both of their examples, the non-independence error is quite subtle and at first glance the methodology is fine. It’s only on closer examination that the problem becomes apparent. The price of freedom from the error is eternal vigilance.
But it would be wrong to think that this is a problem with fMRI alone, or even neuroimaging alone. Any neuroscience experiment in which a large amount of data is collected and only some of it makes it into the final analysis is equally at risk. For example, many neuroscientists use electrodes to record the electrical activity in the brain. It’s increasingly common to use not just one electrode but a whole array of them to record activity from more than brain one cell at once. This is a very powerful technique, but it raises the risk the non-independence error, because there is a temptation to only analyze the data from those electrodes where there is the “right signal”, as the author’s point out:
In single-cell recording, for example, it is common to select neurons according to some criterion (for example, visual responsiveness or selectivity) before applying
further analyses to the selected subset. If the selection is based on the same dataset as is used for selective analysis, biases will arise for any statistic not inherently independent of the selection criterion.
In fact, Kriegeskorte et al praise fMRI for being, in some ways, rather good at avoiding the problem:
To its great credit, neuroimaging has developed rigorous methods for statistical mapping from its beginning. Note that mapping the whole measurement volume avoids selection altogether; we can analyze and report results for all locations equally, while accounting for the multiple tests performed across locations..
With any luck, the publication of this paper and Vul’s so close together will force the neuroscience community to seriously confront this error and related statistical weaknesses in modern neuroscience data analysis. Neuroscience can only emerge stronger from the debate.
Kriegeskorte, N., Simmons, W., Bellgowan, P., & Baker, C. (2009). Circular analysis in systems neuroscience: the dangers of double dipping Nature Neuroscience DOI: 10.1038/nn.2303