Spurious Positive Mapping of the Brain?

By Neuroskeptic | May 2, 2012 5:32 pm

Many fMRI studies could be giving false-positive results according to an important new paper from Anders Eklund and colleagues: Does parametric fMRI analysis with SPM yield valid results?—An empirical study of 1484 rest datasets.

The authors examined the SPM8 software package, probably the most popular tool for analyzing neuroimaging data.

Their approach was beautifully simple. They wanted to check how often conventional analysis of fMRI would “find” a signal when there wasn’t really anything happening. So they took data from nearly 1,500 people who were scanned when they were just resting, and saw what would happen if you looked for “task related” activations in those scans, even though there was in fact no task. It’s a very clever use of the resting state data.

Eklund et al ran the analysis many thousands of times, under various different conditions. This is the key finding:

This shows the proportion of analyses which produced significant “activations” associated with various different “tasks”. In theory, the false positive rate should be way down at the bottom at 5% in each case. That’s the error rate they told SPM8 to provide. As you can see, it was often much higher. Oh dear.

The error rate depended on two main things. Most important was the task design. Block designs were much worse than event-related designs (see the labels at the bottom: B1,2,3,4 are block, E1,2,3,4 are event.) The longer the blocks, the more errors. B4, the most error-ridden design of all, corresponds to 30 second blocks.

That’s bad news because that’s a very common design.

Secondly, the repeat time (TR) mattered, especially for block designs. The TR is how long it takes to scan the whole brain once. The longer the TR, the better, the data showed: 1 second TRs are really dodgy. Luckily, they are rarely used. 2 seconds is OK for most event-related designs, but block designs really suffer. 3 seconds is even better.

Because most fMRI studies today use 2-3 second TRs, this is somewhat reassuring, but for block design B4 the error rate was still up to 30% even with TR=3. Oh dear, oh dear.

So what went wrong? It’s complicated, and you should read the paper, but in a nutshell the problem is that fMRI data analysis assumes that there are only two sources of data: the real brain activation signal, and white noise. The key assumption is that it’s white noise, which essentially means that it is random at any moment in time: knowing about what the noise did in the past tells you nothing about what it will do in the future. “Random” noise that’s actually correlated with itself over time is not white noise.

Now noise in the brain is certainly not white, for various reasons, including the effects of breathing and heart rate (which of course are cyclical, not random.) All fMRI analysis packages try to correct for this – but Eklund et al have shown that SPM8’s approach doesn’t manage to do that, at least for many designs.

What about rival fMRI software like FSL or BrainVoyager? We don’t know. They use different approaches to noise modelling, which might mean they do better, but maybe not.

And the really big question: does this mean we can’t trust published SPM8 results? Does SPM stand for Spurious Positive Mapping? Well, that’s also not clear. All of Eklund et al’s analyses were based on single subject data. But most fMRI studies pool the results from more like 20 or 30 subjects. Averaging over many subjects might make the false positives cancel out, but we don’t yet know if that would solve the problem or only lessen it.

ResearchBlogging.orgEklund, A., Andersson, M., Josephson, C., Johannesson, M., and Knutsson, H. (2012). Does parametric fMRI analysis with SPM yield valid results?—An empirical study of 1484 rest datasets NeuroImage DOI: 10.1016/j.neuroimage.2012.03.093

  • http://www.blogger.com/profile/15958927750339912134 Synge

    The logic of this paper is totally wrong. The human brain exhibit periodic fluctuations during resting-state, and this fluctuation might be captured by the block design setting. This means that the activation SPM got is not false positive but meaningful results.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Synge: You're right that they might be meaningful in the sense that they have a basis in brain activity rather than some other source of noise, but they're not meaningfully associated with the “task”, because there was no task.

    Imagine that you ran a study with a block design showing 30 second blocks of task / 30 seconds of rest, but in fact your task caused no brain activation, so people were in fact 'at rest' the whole time.

    These results show that you'd get false positive “blobs” at a much higher than 5% rate.

    This does raise the question though of whether people would ever be at 'at rest' in a task-related study. It could be (for example) that this problem only arises because of the default mode network (say) and would not arise if people were occupied 'doing something'.

  • Anonymous

    There's a Nobel Prize in fron of your eyes and you can't see it.

  • toddt

    It's not clear to me that this is a problem. As Synge mentioned, we know that there are spontaneous fluctuations within resting state networks, and it's likely that a block-design will capture them. Active-mode networks go up, default-mode networks go down, and vice versa.

    This is almost certainly not a problem with the autocorrelation approaches in analysis packages, but rather is a legitimate neural signal that just happens to be unrelated to a task.

    However, it seems *unlikely* that these fluctuations would be synchronized between subjects, and when you average subject data together (which is done in fMRI analyses essentially… always), the problem goes away. The spontaneous fluctuations wash out because they're not synchronized, and the task-related activations remain, because they are.

    I'm surprised (and disappointed) that the authors didn't take this to the logical next step and look for false positives in their group analyses… if you show false positives there, then yes, there's something to be concerned about. If not, then the authors have found more data that there are spontaneous fluctuations, and that's sexy in it's own way, but doesn't damn all fMRI research.

  • http://www.blogger.com/profile/04585807162496448781 Grubblaren

    Before we used the rest data, we used the random permutation test to analyze typical activity data collected during a block based design (20 s blocks). The result was the same as for the rest data, the random permutation test yielded a (much) higher significance threshold than the parametric approaches.

    In an interesting paper from 1999,


    the low frequency drift in the fMRI signal is investigated. The conclusion was

    “Time series data acquired using protocols sensitive to T*2 changes associated with BOLD contrast showed that physiological noise and subject motion do not seem to be the main cause of the low frequency drift reported in fMRI time series data. The most likely cause of the
    drifting are slight changes in the local magnetic field due to scanner instabilities, thus causing a partial voluming effect in the reconstructed images which is
    more apparent in regions with large spatial intensity gradient changes.”

    The most interesting thing is that they ran 3 dead persons in the scanner and still found low frequency oscillations, suggesting that they originate from the scanner and not from the subject.

  • http://petrossa.wordpress.com/ petrossa

    I want to point out I've stated the general gist of this post and the paper here on this Blog on several occasions . You can call me Awesome anytime.

  • http://www.fmrilab.net Greig de Zubicaray

    Hmm. I wonder whether the non-white noise structure in at least some of these studies was influenced by the rest acquisition having taken place after a task acquisition? Anybody have time to check this detail using the fcon1000 database? We've known for some time that 'resting' signal fluctuations are influenced by the participant having performed a task beforehand, e.g., Waites et al. (2005). Effect of prior cognitive state on resting state networks measured with functional connectivity. Human Brain Mapping, 24, 59-68.

  • http://www.blogger.com/profile/03790581939306172084 Denise

    And there is R. Douglas Fields and those glial cells…. hmmm, could their “broadcast” be being detected?

  • http://www.blogger.com/profile/18436870929097932680 muswellbrook

    I have a question about some of the default settings they changed in SPM8. In particular, the authors reduced the default value of the uncorrected threshold from 0.001 to 0.05. I think they are talking about the value of defaults.stats.fmri.ufp in spm_default.m. Wouldn't this result in a lot more datasets passing the threshold when they shouldn't – and thus inflate the type 1 error?

    From the paper:

    “The variable was changed from 0.001 to 0.05, to make sure that the rest datasets pass the first overall F-test that is applied in SPM. Without this modification, the error message “please check your data, there are no significant voxels” will arise for rather many of the analyses and then no further analysis will be performed. defaults.stats.fmri.ufp.”

  • http://www.blogger.com/profile/15290353366454873741 Jeff

    Are the authors of these programs not expected to test their programs against a neutral dataset when they publish the software? In molecular evolution it's standard operating procedure.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Jeff: Yes, but SPM is like 15 years old, and back when it was first designed, a study like this wouldn't have been possible. There weren't 1,500 resting state datasets lying around – probably more like 50 in the whole world! And computing power would have meant that this study would have taken years to finish.

  • http://neuroconscience.com/ neuroconscience.com

    Glad you covered this paper as it's pretty interesting and has some implications for the field. I was pretty curious myself so I spent some time discussing it with resident fMRI noise expert Torben Lund. These results are not really surprising at all given what we know about serial-correlation modeling in SPM, i.e. the AR(1) model. Back when SPM was first being developed several authors demonstrated that while the AR(1) model is extremely robust for removing first-level correlations, higher-order correlations are left largely intact. This isn't as much of a problem for a longer TR, even in the typical block design. However with fast TRs the correlations are aliased into the signal in a way that greatly increases their impact. In Lund et al they show that an effective way to deal with these higher-order correlations is to include respiratory and end-tidal C02 regressors in the design matrix, which do a much better job of handling higher order correlation. In fact, using noise regressors performs as well-as or better than the AR(1) model. As you know, low-frequency noise is particularly problematic in resting state where long block times and fast TR's are common. The authors of the paper state as much- this is pretty damning for a lot of the resting state papers analyzed using the AR(1) serial correlation correction. For the average block or event-related design, it's not such a problem. Another problem area is high-field MRI (i.e. >4 tesla) where fast TRs are becoming the norm. Lots of work is being done on better techniques for modeling these noise.

    In short, this paper is a beautiful demonstration of a well-known phenomenon. One certainly doesn't need such a large N to demonstrate the issue, as it's been shown in 10 or 20 subjects quite consistently. I did like Torben's take on it- “at least something good came of all those resting states!”

    Here is the paper that covers many of these issues and the older papers showing the same AR(1) problem if you are interested.


  • http://petrossa.wordpress.com/ petrossa

    Papers by the dozen. But basically there is this general misconception as to how much a (any) mathematical model represents reality (such as it is)

    There is an inverse relation between the accuracy of reality in the model and reality itself as parameters increase.

    In the case of fMRI it's confounded by almost every factor that makes it up.

    The assumption as to how the brain works:

    More blood means more activity means more relation to the task at hand.

    Then the way one arrives at separating out the signal wanted from the signals not wanted.

    What is useful noise, what is background noise therefore what exactly is the signal?.

    Then the software itself. How is it tweaked individually?

    Which baseline parameters are set, what is the baseline anyway? Are all baselines the same baseline across the population?

    Etc. etc. In the end what fMRI gives is a virtual representation which may or may not represent what goes on in the brain, but no way of knowing that what it shows is of any relevance.

    Anyone who has ever compared the results of climate models to reality can easily understand how this p;rocess works out.

  • Ivana Fulli MD


    Thanks for that post.

    I know that I am out of my depth on that post.

    I just want you to know that no comment does not make one not being very interested in that kind of post and comments.

    By the way, it might not be feasible but I wonder if you could not make it easier to find your other posts on a special field.I find your chronological archives not pleasant to work through(no offense intended).

    I was thinking about a little window on your blog where one could type a word and, by digital miracle, a list of NS former posts on related subjects will show.

    From a middle age clinician point of view it seeems simple enough.

    Unfortunatly younger and smarter people like my sons usually find my digital little wishes just ludicrous !

    (very clever both my sons are but they took from their father for their intelligence).

  • Ivana fulli MD


    Of course I was thinking of a little window in order to type more than a word since I can click on your blue list but it gives too many posts out of topic with one's search.(not too many in general but ).

    I was thinking then of a little window where one could write several words not one.

  • Ivana Fulli MD


    Anyone for testing old buddist monks after some time spent in getting used to meditate in the fRMI machine?

    PS: I was thinking yesterday of a little window where one could write two words -like autism and fRMI-instead of clicking on your blue list.

    Erratum then : I wrote word when I entended words.

    I wrote already that erratum yesterday since it makes and it appeared and then vanished.

  • Anonymous

    Errare Humanum Est

    Now, 20 years of science must be rewritten.

  • Anonymous

    ok so firstly…
    by running an analysis thousands of times, how likely is it by chance that you will get a significant result? likely
    hell, they even ran the analysis “under various different conditions” so what they were really doing according to you is seeing how long will it take and how many different analyses will it take to get a significant result? thousands apparently. When I have some time I think I'll read the paper and see what they actually did.
    it is interesting that you bring up the point that event related designs are “worse” than block related designs.
    See the problem investigating the brain in this manner is that you either get one or the other: spatial resolution=where are things in the brain, aka MRI and fMRI and temporal resolution: in what specific order to changes in the brain occur= ERP (event related potentials)/EEG (electroencephalography).
    the reason block designs are used in fMRI is because its strong point is NOT temporal resolution= cant tie things down to a specific time point, so a block design is needed to that subjects repeat the same exercise in order that the fMRI can capture an image of the overall changes. If a fMRI did not use a block design, as you suggest this was one of the main issues, they would be unable to collect any meaningful data at all.
    event related designs are not better, they are different.
    I absolutely agree with you.
    Its complicated, and interested individuals should probably ask someone who can either summarize or understand fMRI and ERP, adequately

  • Anonymous

    As they say, those who can, do, those who cannot, criticize.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    The only problem is when those who do, don't listen to those who criticize, and do it wrong.

  • Anonymous

    I suppose like in all those cases in which doing it “wrong” leads to new discoveries? Like with the origin of BOLD, for example?

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    But for every BOLD there are many GFAJ-1's.



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar