Flawed Statistics Make Almost Everyone’s Brain “Abnormal”

By Neuroskeptic | January 3, 2013 7:41 pm

A popular method for detecting abnormalities in the shape and size of individual brains is seriously flawed, and is almost guaranteed to find ‘differences’ even in normal people.

So say Italian neuroscientists Scarpazza and colleagues in an important new report: Very high false positive rates in single case Voxel Based Morphometry.

Voxel Based Morphometry (VBM) is a way of analyzing brain scans to detect structural differences. It’s most commonly used to compare groups of brains to find average differences, but some neuroscientists have started using VBM to check for abnormalities in a single brain. Scarpazza et al list 34 pieces of research about that, including 13 since 2010.

So it would suck if there were a problem with individual VBM… but there is. This pic tells the tale:

The authors took 200 normal brains and compared each one of them in turn to a control group of 16 normal brains. Because all of them were healthy, the comparisons ought to show no significant differences.

The technique was set up so that, in theory, only 5% of the brains should have been wrongly labelled as containing an abnormality. But in fact, a full 93.5% of the normal brains gave at least one false positive.

So 5% is more like the rate of not being wrong. Oops.

The image shows that in some brain areas, almost 25% of the normal brains were branded as ‘abnormal’ just in that region alone – the hotter the colour, the higher the proportion of false ‘hits’. The top row is for false reports of brain volume increases, while the bottom row is decreases; false ‘increases’ were more common.

So what’s going wrong? It’s not entirely clear and several factors are probably at play, but the authors say that the main issue is that VBM makes the assumption of statistical normality which doesn’t in fact hold.

Either way, it’s a serious problem, and Scarpazza et al point to one especially worrying implication: some people have proposed using single-subject VBM in a legal context, to reinforce insanity pleas by showing subtle ‘brain abnormalities’ not obvious to the naked eye. Yet if this paper’s right, such evidence could be entirely meaningless, almost guaranteed to give a positive result.

P.S. Last time I posted about this kind of analysis flaw, the internet went crazy because they didn’t understand it. So just to be clear, this is not a problem for clinical scans – the kind you’d get to check whether you have a brain tumour.

ResearchBlogging.orgScarpazza, C., Sartori, G., De Simone, M., and Mechelli, A. (2013). When the single matters more than the group: Very high false positive rates in single case Voxel Based Morphometry NeuroImage DOI: 10.1016/j.neuroimage.2012.12.045

  • http://www.blogger.com/profile/04585807162496448781 Grubblaren

    Great paper, the end of parametric statistics in neuroimaging is approaching.

  • http://www.blogger.com/profile/17417022296647288072 grumpy-xl

    again afraid of dead fish statistics?

  • E

    Is this technique used in gender/sex-based studies? I'm assuming it is, since comparing groups to find average differences is basically the raison d'etre of the field.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Group VBM is used in those studies but this paper is about single-subject VBM so it's not (directly) applicable.

  • Nick

    Potentially a similar problem exists even for groupwise VBM (or any type of groupwise voxel-based analysis) and is very much related to the circularity issue discussed elsewhere (e.g. Vul 2009). My colleagues and I wrote about this in the context of FA-based analysis in a recent HBM article and have a conference paper where we discuss its applicability to optimized VBM (Good et al. 2001). It hasn't received much attention yet but this seems like a good place it might add to the discussion.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Hi Nick, thanks for dropping by! I've been meaning to blog about your HBM paper for a while… hope to find space for it soon.

    That conference paper sounds very interesting, but your link doesn't work…?

  • C

    Was the control group of 16 subjects scanned on the same machine as the 200 test subjects? If not, some of the variance may be equipment related. An alternative would have been tp draw one and compare to the remaining 199.

  • Nick

    I visit all the time as y'all do the hard work of sifting the interesting articles. :-)

    Sorry about the link. Hopefully, this link is better. One can search the program for “Statistical bias in optimized VBM”.

    I had a chance to read Scarpazza's manuscript and although they speculate as to the possible cause(s) of false positive elevation, I don't get the sense that the investigation was meant to flush this out. I suspect that the spatial normalization problem that we describe in our HBM paper is equally applicable to the single-subject VBM case. The core problem for all these voxel-based analysis methods is that underlying most spatial normalization strategies is some quantitative measurement of determining anatomical correspondence which is dependent not on minimizing anatomical discrepencies but rather minimizing some measurement of intensity-based differences. Of course, it is assumed that the latter is a satisfactory surrogate for the former. We point out in our paper that this subtle difference, particularly in the case of sum of squared differences (used in earlier versions of SPM and in TBSS), explicitly minimizes the average voxelwise variance over the group. Since minimizing this variance increases the voxelwise t-statistic, basically what one is doing during spatial normalization is finding the set of transformations which optimize statistical testing results (not necessarily resulting in maximal anatomical alignment). I would suspect that the situation would be similar for SPM8 (i.e. DARTEL), used by Scarpazza et al., which uses the gray matter probability image directly in its multinomial similarity function to drive the spatial normalization. However, we didn't test this although we looked at other commonly used similarity metrics. Note that this problem would persist even if the normality assumptions were met.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Thanks, that's a very clear overview of the problem!

    Like you say I don't think Scarpazza et al were focussed on finding the causes. IIRC they mention registration errors but I don't think they cover the issue you've highlighted.

    I suppose what we ideally want is a way of quantifying neuroanatomy that doesn't rely on normalization at all! i.e. to take a brain and just measure it in various ways… it would have to be a very clever system though to automatically judge where to make its measurements, despite variable anatomy (i.e. it would have to be able to recognize the OFC despite no two OFCs being alike.)

  • http://petrossa.me/ petrossa.me

    Not even meant as a joke, but when the climate craze is over there will be a gigantic superfluous computing power standing idle which could model any brain doing anything and could clear up the fMRI image to almost 100% accuracy.

    I'd say, start lobbying.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    You mean in about 50 years, when the climate has changed so much that there's no need for computer models? 😉

  • Nick

    That would certainly be ideal. One possible workaround (and one of the motivators for using FA data in our paper) is that you can still do VBM but use one set of image data for statistical analysis and another set of image data for spatial normalization. For our paper, our suggestion is to use T1 data to create the template and derive the deformable transformations. One can then reasonably assume a highly constrained mapping between intrasubject FA/T1 images and thus use a composition of transforms to go from individual FA to individual T1 to the normalized template space. You might still have subtle anatomical misalignments but at least the normalization process isn't explicitly formulated to produce false positives.

  • http://www.blogger.com/profile/04585807162496448781 Grubblaren

    The authors actually mention that the spatial normalization step may be improved by using non-intensity based registration algorithms

    “An alternative possibility is that the higher number of significant effects in the temporal and frontal
    regions is the result of less accurate spatial registration in these areas. Future studies could examine this by testing for a correlation between the number of detected significant differences across different regions and the spatially varying registration accuracy. A possible solution could be the use of more advances registration algorithms which use local image structure, as opposed to intensity information, to detect complex image relationships (Mellor and Brady, 2005; Mellor and
    Brady, 2004).”

  • Anonymous

    Hmmmm. Could this be a statistical power issue? Using a control group with n=16 seems very low for a comparison like this. I haven't read the paper yet so forget this comment if they also looked at larger groups.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    It's small but the author's point is that most of the papers in this field use small sample sizes.

    You'd probably see much better results with 1000 in the control group, but that's never used in practice.

  • Anonymous

    Someone should do a meta-analysis of these kinds of papers. My understanding is that twice that sample size would be more appropriate and the standard in the field. Conversely, a control group of 1000 probably suffers from being overpowered.



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar