More Brain Voodoo, and This Time, It’s Not Just fMRI

By Neuroskeptic | April 27, 2009 12:28 pm

Ed Vul et al recently created a splash with their paper, Puzzlingly high correlations in fMRI studies of emotion, personality and social cognition (better known by its previous title, Voodoo Correlations in Social Neuroscience.) Vul et al accused a large proportion of the published studies in a certain field of neuroimaging of committing a statistical mistake. The problem, which they call the “non-independence error”, may well have made the results of these experiments seem much more impressive than they should have been. Although there was no suggestion that the error was anything other than an honest mistake, the accusations still sparked a heated and ongoing debate. I did my best to explain the issue in layman’s terms in a previous post.

Now, like the aftershock following an earthquake, a second paper has appeared, from a different set of authors, making essentially the same accusations. But this time, they’ve cast their net even more widely. Vul et al focused on only a small sub-set of experiments using fMRI to examine correlations between brain activity and personality traits. But they implied that the problem went far beyond this niche field. The new paper extends the argument to encompass papers from across much of modern neuroscience.

The article, Circular analysis in systems neuroscience: the dangers of double dipping, appears in the extremely prestigious Nature Neuroscience journal. The lead author, Dr. Nikolaus Kriegeskorte, is a postdoc in the Section on Functional Imaging Methods at the National Institutes of Health (NIH).

Kriegeskorte et al’s essential point is the same as Vul et al’s. They call the error in question “circular analysis” or “double-dipping”, but it is the same thing as Vul et al’s “non-independent analysis”. As they put it, the error could occur whenever

data are first analyzed to select a subset and then the subset is reanalyzed to obtain the results.

and it will be a problem whenever the selection criteria in the first step are not independent of the reanalysis criteria in the second step. If the two sets of criteria are independent, there is no problem.

Suppose that I have some eggs. I want to know whether any of the eggs are rotten. So I put all the eggs in some water, because I know that rotten eggs float. Some of the eggs do float, so I suspect that they’re rotten. But then I decide that I also want to know the average weight of my eggs . So I take a handful of eggs within easy reach – the ones that happen to be floating – and weigh them.

Obviously, I’ve made a mistake. I’ve selected the eggs that weigh the least (the rotten ones) and then weighed them. They’re not representative of all my eggs. Obviously, they will be lighter than the average. Obviously. But in the case of neuroscience data analysis, the same mistake may be much less obvious. And the worst thing about the error is that it makes data look better, i.e. more worth publishing:

Distortions arising from selection tend to make results look more consistent with the selection criteria, which often reflect the hypothesis being tested. Circularity is therefore the error that beautifies results, rendering them more attractive to authors, reviewers and editors, and thus more competitive for publication. These implicit incentives may create a preference for circular practices so long as the community condones them.

To try to establish how prevalent the error is, Kriegeskorte et al reviewed all of the 134 fMRI papers published in the highly regarded journals Science, Nature, Nature Neuroscience, Neuron and the Journal of Neuroscience during 2008. Of these, they say, 42% contained at least one non-independent analysis, and another 14% may have done. That leaves 44% which were definitely “clean”. Unfortunately, unlike Vul et al who did a similar review, they don’t list the “good” and the “bad” papers.

They then go on to present the results of two simulated fMRI experiments in which seemingly exciting results emerge out of pure random noise, all because of the non-independence error. (One of these simulations concerns the use of pattern-classification algorithms to “read minds” from neural activity, a technique which I previously discussed). As they go on to point out, these are extreme cases – in real life situations, the error might only have a small impact. But the point, and it’s an extremely important one, is that the error can creep in without being detected if you’re not very careful. In both of their examples, the non-independence error is quite subtle and at first glance the methodology is fine. It’s only on closer examination that the problem becomes apparent. The price of freedom from the error is eternal vigilance.

But it would be wrong to think that this is a problem with fMRI alone, or even neuroimaging alone. Any neuroscience experiment in which a large amount of data is collected and only some of it makes it into the final analysis is equally at risk. For example, many neuroscientists use electrodes to record the electrical activity in the brain. It’s increasingly common to use not just one electrode but a whole array of them to record activity from more than brain one cell at once. This is a very powerful technique, but it raises the risk the non-independence error, because there is a temptation to only analyze the data from those electrodes where there is the “right signal”, as the author’s point out:

In single-cell recording, for example, it is common to select neurons according to some criterion (for example, visual responsiveness or selectivity) before applying
further analyses to the selected subset. If the selection is based on the same dataset as is used for selective analysis, biases will arise for any statistic not inherently independent of the selection criterion.

In fact, Kriegeskorte et al praise fMRI for being, in some ways, rather good at avoiding the problem:

To its great credit, neuroimaging has developed rigorous methods for statistical mapping from its beginning. Note that mapping the whole measurement volume avoids selection altogether; we can analyze and report results for all locations equally, while accounting for the multiple tests performed across locations..

With any luck, the publication of this paper and Vul’s so close together will force the neuroscience community to seriously confront this error and related statistical weaknesses in modern neuroscience data analysis. Neuroscience can only emerge stronger from the debate.

ResearchBlogging.orgKriegeskorte, N., Simmons, W., Bellgowan, P., & Baker, C. (2009). Circular analysis in systems neuroscience: the dangers of double dipping Nature Neuroscience DOI: 10.1038/nn.2303

CATEGORIZED UNDER: bad neuroscience, fMRI, methods, papers, voodoo
  • Mitch

    So without doing any research, I assume that eggshells are actually porous. And that air (and bacteria?) enter the eggs and begin the process of decomposition. I can certainly see where these gasses would make the egg less dense and begin to float. But would the egg actually weigh less? Has some of the mass of proteins and fats in the egg left the shell?

    Also, I’m glad I never got into imaging.

  • Neuroskeptic

    Hey, I’m a neuroscientist, not an… eggspert. And I hate eggs so I never cook them. I don’t actually know if bad eggs float or not, it might be an old wives tale.

    But if they do float, it must be because they weigh less, I think. their volume presumably doesn’t change, so their weight must change for their density to drop. Right?

  • Anonymous

    Fascinating! Thanks for catching that!
    Do you have, or know how to get the list of the specific papers containing the flawed circular analyses? The authors must have it, but I could not find it in the article or in the supplementary materials. Two reasons:

    1) I am not a voyeur or in need of schadenfreude, but would like to know what papers to trust.

    2) I want to evaluate the 42% figure myself – to see if I agree with how they classified the papers. How can one do it without knowing which papers they talk about.

  • Charlotte

    Sort of O/T, but you’re right about eggs. Fresh ones sink quickly, then as they get older they stand on one end, then even older ones float. Even if one end is touching the surface of the water they’re still ok to hard boil or use in baking, you don’t need to chuck them unless they’re really buoyant. I’ve never thought about the chemistry of it before, but I’ll try weighing some of my next lot.

    Oh, and nice write-up :)

  • Neuroskeptic

    Anon: No, the authors didn’t provide the list. Presumably on purpose, since they must have it sitting there in front of them.

    I don’t know why, but quite possibly it’s because of the criticism Vul faced after he made his list public.

    On the other hand, Vul’s list was based on questions he mailed to the authors. In this case, the list was based on the information only in the published papers. So it should be possible to work out which papers are the bad ones, yourself…

  • Vincent

    On-topic: if the current analysis is done proper, the 42% should emerge with any other data selection, right, no? So you can repeat their test without knowing the identity of the papers, just like you can validate experiments with different subjects than those initially used.

    Off-topic: I think the egg example is false and the eggs don’t float because of weight loss, but because of gas formation inside of the shell. There’s probably a slight volume increase due to pressure effects, ergo: density drop.

    Anyway, good post. This kind of error seems so ubiquitous 42% sounds like a low number. It’s good to be reminded of its fallaciousness every now and then :)

  • Neuroskeptic

    Vincent On-topic: Well, hard to know. I’d imagine that the 42% figure should be roughly right for other sets of papers but maybe not e.g. this was 2008 papers, but in 2002, say, fMRI experiments were in general rather less complex and probably methodologically less prone to this kind of problems.

    Vincent Off-topic: Possible, but doesn’t that assume that eggs are gas-impermeable? Which I don’t think they are…

  • Anonymous

    On Topic but minor. Just wanted to make an observation that the whole thing with a “secret list” of bad papers is getting kind of funny. Just see this Nature News report and the commentaries:

    Incidentally, how many bad papers appeared in Nature? Is this why Nature published Kriegeskorte’s paper? Is this why they don’t want to publish the list?

  • Neuroskeptic

    It could well be that they didn’t want to release the list because it makes Nature look worse than Science.

  • Anonymous

    I think if you look at less prestigious journals, you would find something closer to 90% of flawed papers. “We observed activations in regions x and y and z”.

  • Anonymous

    Since Nature Neuroscience seems so concerned with this… Why don’t they routinely publish “failure to replicate” articles? Especially for articles they originally published. Could it be that they are being hypocritical? Independent replication can settle all these issues directly. Statistics is only a poor surrogate for replication.
    So, Nature Neuroscience folks, put your money where your mouth is.

  • BrainGuy

    This is simply a failure of peer-review at the top most “prestigious” journals which do not specialize in neuroimaging or fMRI. Perhaps it might be a good idea to invite experts in the methodologies used to review the papers.

    On the other hand, I wonder if the situation is quite as bad is as claimed. If the correlation coefficients of the selected areas are not being used for statistical inferences, and a multiple comparison is done correctly, then this is not a circular analysis. It’s only so if the authors did hocus-pocus to arrive at a region-of-interest based on where the correlations were maximum, drew the ROI did the correlation analysis and then claimed significance.

    And, assuming a true effect exists at the ROI, we would like to know the magnitude of the effect. True using the ROI from the previous analysis biases the estimate slightly. Yet sometimes one must make do with biased estimators. And the bias is small when the ROI used is large, especially if spatial filtering was used to help define the ROI whereas the post-hoc correlation was done on the unfiltered dataset.

  • Neuroskeptic

    I think this:

    “failure of peer-review at the top most “prestigious” journals which do not specialize in neuroimaging or fMRI”is very true. But a wider problem is that there are surprisingly few people who are specialists in fMRI methodology. There are thousands of people who use fMRI but most of them scan first and ask questions later. Which is a big problem.

    I’ve got a lot more to say about that and I will at some point soon…

  • BrainGuy

    Sure, fMRI methodology is quite complicated, but you would think each site should have at least one person who is expert in it and and collaborate and make sure the experiments and analysis are done properly; or failing that, at least the site should collaborate with an outside expert. Nevertheless it shouldn’t happen that a paper with fundamental design or analysis flaws should be published in what are ostensibly the top scientific journals.

  • Anonymous

    I may have missed it, but do you have thoughts about Yarkoni's paper in the issue that contains the Vul et al. paper?

  • John

    Thanks for the nice post. Just stumbled on your blog. Check out this PNAS paper from way back in 2002, which outlined this issue nicely in the context of cancer diagnosis from gene data.



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar