“Voodoo Correlations” in fMRI – Whose voodoo?

By Neuroskeptic | February 4, 2009 3:40 pm

It’s the paper that needs little introduction – Ed Vul et. al.’s Voodoo Correlations in Social Neuroscience”. If you haven’t already heard about it, read the Neurocritic’s summary here or the summary at BPS research digest here. Ed Vul’s personal page has some interesting further information here. (Probably the most extensive discussion so far, with a very comprehensive collection of links, is here.)

Few neuroscience papers have been discussed so widely, so quickly, as this one. (Nature, New Scientist, Newsweek, Scientific American have all covered it.) Sadly, both new and old media commentators seem to have been more willing to talk about the implications of the controversy than to explain exactly what is going on. This post is a modest attempt to, first and foremost, explain the issues, and then to evaluate some of the strengths and limitations of Vul et al’s paper.

[Full disclosure: I'm an academic neuroscientist who uses fMRI, but I've never performed any of the kind of correlational analyses discussed below. I have no association with Vul et al., nor - to my knowledge - with any of the authors of any of the papers in the firing line. ]

1. Vul et al.’s central argument. Note that this is not their only argument.

The essence of the main argument is quite simple: if you take a set of numbers, then pick out some of the highest ones, and then take the average of the numbers you picked, the average will tend to be high. This should be no surprise, because you specifically picked out the high numbers. However, if for some reason you forgot or overlooked the fact that you had picked out the high numbers, you might think that your high average was an interesting discovery. This would be an error. We can call it the “non-independence error”, as Vul et al. do.

Vul et al. argue that roughly half of the published scientific papers in a certain field of neuroscience include results which fall prey to this error. The papers in question are those which attempt to correlate activity in certain parts of the brain (measured using fMRI) against behavioural or self-report measures of “social” traits – essentially, personality. Vul et al. call this “social neuroscience”, but it’s important to note that it’s only a small part of that field.

Suppose, for example, that the magnitude of the neural activation in the amygdala caused by seeing a frightening picture was positively correlated with the personality trait of neuroticism – tending to be anxious and worried about things. The more of a worrier a person is, the bigger their amygdala response to the scary image. (I made this example up, but it’s plausible.)

The correlation coefficient, r, is a measure of how strong the relationship is. A coefficient of 1.0 indicates a perfect linear correlation. A coefficient of 0.4 would mean that the link was a lot weaker, although still fairly strong. A coefficient of 0 indicates no correlation at all. This image from Wikipedia shows what linear correlations of different strengths “look like”.

Vul’s argument is that many of the correlation coefficients appearing in social neuroscience papers are higher than they ought to be, because they fall prey to the non-independence error discussed above. Many reported correlations were in the range of r=0.7-0.9, which they describe as being implausibly high.

They say that the problem arises when researchers search across the whole brain for any parts where the correlation between activity and some personality measure is statistically significant – that is to say, where it is high – and then work out the average correlation coefficient in only those parts. The reported correlation coefficient will tend to be a high number, because they specifically picked out the high numbers (since only high numbers are likely to be statistically significantly different from zero.)

Suppose that you divided the amygdala into 100 small parts (voxels) and separately worked out the linear correlation between activity and neuroticism for each voxel. Suppose that you then selected those voxels in which the correlation was greater than (say) 0.8, and work out the average: (say) 0.86. This does not mean that activity across the amygdala as a whole is correlated with neuroticism with r=0.86. The “full” amygdala-neuroticism correlation must be less than this. (Clarification 5.2.09: Since there is random noise in any set of data, it is likely that some of those correlations which reached statistical significance were those which were very high by chance. This does not mean that there weren’t any genuinely correlated voxels. However, it means that the average of the correlated voxels is not a measure of the average of the genuinely correlated voxels. This is a case of regression to the mean.)

Vul et. al. say that out of 52 social neuroscience fMRI papers they considered, 28 (54%) fell prey to this problem. They determined this by writing to the authors of the papers and asking them to answer some multiple-choice questions about their statistical methodology.This chart shows the reported correlation coefficients in the papers which seemed to suffer from the problem (in red) vs. those which didn’t (in green); unsurprisingly, the ones which do tended to give higher coefficients. (Each square is one paper.)
That’s it. It’s quite simple. but….there is a very important question remaining. We’ve said that non-independent analysis leads to “inflated” or “too high” correlations, but too high compared to what? Well, the “inflated” correlation value reported by a non-independent analysis is entirely accurate - in that it’s not just made up – but it only refers to a small and probably unrepresentative collection of voxels. It only becomes wrong if you think that this correlation is representative of the whole amygdala (say).

So you might decide that the “true” correlation might be the mean correlation over all of the voxels in the amygdala. But that’s only one option. There are others. It would be equally valid to take the average correlation over the whole amygdalo-hippocampal complex (a larger region). Or the whole temporal cortex. That would be silly, but not an error – so long as you make it clear what your correlation refers to, any correlation figure is valid. If you say “The voxel in the amygdala with the greatest correlation with neuroticism in this data-set had an r=0.99″, that would be fine, because readers will realize that this r=0.99 figure was probably an outlier. However, if you say, or imply, that “The amygdala was correlated with neuroticism r=0.99″ based on the same data, you’re making an error.

My diagram (if you can call it that…) to the left illustrates this point. The ovals represent the brain. The colour of each point in the brain represents the degree of linear correlation between some particular fMRI signal in that spot, and some measure of personality.

Oval 1 represents a brain in which no area is really correlated with personality. So most of the brain is gray, meaning very low correlation. But a few spots are moderately correlated just by chance, so they show up as yellow.

Oval 2 represents a brain in which a large blob of the brain (the “amygdala” let’s call it) is really correlated quite well i.e. yellow. However, some points within this blob are, just by chance, even more correlated, shown in red.

Now, if you took the average correlation over the whole of the “amygdala”, it would be moderate (yellow) – i.e. picture 2a. However, suppose that instead, you picked out those parts of the brain where the correlation was so high that it could not have occurred by chance (statistically significant).

We’ve seen that yellow spots often occur by chance even without any real correlation, but red ones don’t – it’s just too unlikely. So you pick out the red spots. If you average those, the average is obviously going to be very high (red). i.e. picture 2b. But if you then noticed that all of the red spots were in the amygdala, and said that the correlation in the amygdala was extremely high, you’d be making (one form of) the non-independence error.

Some people have taken issue with Vul’s argument, saying that it’s perfectly valid to search for voxels significantly correlated with a behaviour, and then to report on the strength of that correlation. See for example this anonymous commentator:

many papers conducted a whole brain correlation of activation with some behavioral/personality measure. Then they simply reported the magnitude of the correlation or extracted the data for visualization in a scatterplot. That is clearly NOT a second inferential step, it is simply a descriptive step at that point to help visualize the correlation that was ALREADY determined to be significant.

The academic responses to Vul make the same point (but less snappily).

The truth is that while there is technically nothing wrong with doing this, it could easily be misleading in practice. Searching for voxels in the brain where activation is significantly correlated with something is perfectly valid, of course. But the magnitude of the correlation in these voxels will be high by definition. These voxels are not representative because they have been selected for high correlation. In particular, even if these voxels all happen to be located within, say, the amygdala, they are not representative of the average correlation in the amygdala.

A related question is whether this is a “one-step” or a “two-step” analysis. Some have objected t that Vul implies it is a two-step analysis in which the second step is “wrong”, whereas in fact it’s just a one-step analysis. That’s a purely semantic issue. There is only one statistical inference step (searching for significantly correlated voxels). But to then calculate and report the average correlation in those voxels is a second, descriptive step. The second step is not strictly wrong but it could be misleading, not because it introduces a new, flawed analysis, but because it would be a misinterpretation of the results of the first step.

2. Vul et al.’s secondary argument The argument set out above is not the only argument in the Vul et. al. paper. There’s an entirely separate one introduced on Page 18 (Section F.)

The central argument is limited in scope. If valid it means that some papers, those which used non-independent methods to compute correlations, reported inappropriately high correlation coefficients. But it does not even claim that the true correlation coefficients were zero, or that the correlated parts of the brain were in the wrong places. If one picks out those voxels in the brain which are significantly correlated with a certain measure, it may be wrong to then compute the average correlation, but the fact that the correlation is significantly greater than zero remains. Indeed, the whole argument rests upon the fact that they are!

but…this all assumes that the calculation of statistical significance was done correctly. Such calculations can get very complex when it comes to fMRI data. It can be difficult to correct for the multiple comparisons problem. Vul et al. point out that some of the papers in question (they only cite one, but say that the same also applies to an unspecified number of others), the calculation of significance seems to have been done wrong. They trace the mistake to a table printed in a paper published in 1995. They accuse some people of having misunderstood this table, leading to completely wrong significance calculations.

The per-voxel false detection probabilities described by E. et al (and others) seem to come from Forman et al.’s Table 2C. Values in Forman et al’s table report the probability of false alarms that cluster within a single 2D slice (a single 128×128 voxel slice, smoothed with a FWHM of 0.6*voxel size). However, the statistics of clusters in 2D (a slice) are very different from those of a 3D volume: there are many more opportunity for spatially clustering false alarm voxels in the 3D case, as compared to the 2D case. Moreover, the smoothing parameter used in the papers in question was much larger than 0.6*voxel size assumed by Forman in Table 2C (in E. et al., this was >2*voxel size). The smoothing, too, increases the chances of false alarms appearing in larger spatial clusters.

If this is true, then it’s a knock-down point. Any results based upon such a flawed significance calculation would be junk, plain and simple. You’d need to read the papers concerned in detail to judge whether it was, in fact, accurate. But this is a completely separate point to Vul et al.’s primary non-independence argument. The primary argument concerns a statistical phenomenon; this secondary argument accuses some people of simply failing to read a paper. The primary argument suggests that some reported correlation coefficients are too high, but only this second argument suggests that some correlation coefficients may in fact be zero. And Vul et al. do not say how many papers they think suffer from this serious flaw.

These two arguments seem to have gotten mixed up in the minds of many people. Responses to the Vul et al. paper have seized upon the secondary accusation that some correlations are completely spurious. The word “voodoo” in the title can’t have helped. But this misses the point of Vul et al.’s central argument, which is entirely separate, and seems almost indisputable so far as it goes.

3. Some Points to Note

  • Just to reiterate, there are two arguments about brain-behaviour correlations in Vul et al. The main one – the one everyone’s excited about – purports to show that 54% of the reported correlations in social neuroscience are weaker than they have been claimed, but cannot be taken to mean that they are zero. The second one claims that some correlations are entirely spurious because they were based on a very serious error stemming from misreading a paper. But at present only one paper has been named as a victim of this error.
  • The non-independence error argument is easy to understand and isn’t really about statistics at all. If you’ve read this far, you should understand it as well as I do. There are no “intricacies”. (The secondary argument, about multiple-comparison testing in fMRI, is a lot trickier however.)
  • How much the non-independence error inflates correlation sizes is difficult to determine, and it will vary in every different case. Amongst many other things the degree of inflation will depend upon two factors: the strictness of the statistical threshold used to pick the voxels (a stricter threshold = higher correlations picked); and the number of voxels picked (if you pick 99% of the voxels in the amygdala, then that’s nearly as good as averaging over the whole thing; if you pick the one best voxel, then you could inflate the correlation enormously.) Note, however, that many of the papers that avoided the error still reported pretty strong correlations.
  • It’s easy to work out brain activity-behaviour correlations while avoiding the non-independence problem. Half of the papers Vul et al. considered in fact did this (the “green” papers). One simply needs to select the voxels in which to calculate the average correlation based on some criteria other than the correlation itself. One could, for example, use an anatomy textbook to select those voxels making up the amygdala. Or, one could select those voxels which are strongly activated by seeing a scary picture. Many of the “green” papers which did this still reported strong correlations (r=0.6 or above).
  • Vul et al.’s criticisms apply only to reports of linear correlations between regional fMRI activity and some behavioural or personality measure. Most fMRI studies do not try to do this. In fact, many do not include any behavioural or personality measures at all. At the moment, fMRI researchers are generally seeking to find areas of the brain which are activated during experience of a certain emotion, performance of a cognitive process, etc. Such papers escape entirely unscathed.
  • Conversely, although Vul et al. looked at papers from social neuroscience, any paper reporting on brain activity-behaviour linear correlations could suffer from the non-independence problem. The fact that the authors happened to have chosen to focus on social neuroscience is irrelevant.
  • Indeed, Vul & Kerwisher have also recently written an excellent book chapter discussing the non-independence problem in a more general sense. Read it and you’ll understand the “voodoo” better.
  • Therefore, “social neuroscience” is not under attack (in this paper.) To anyone who’s read & understood the paper, this will be quite obvious.

4. Remarks: On the Art of Voodoo Criticism Vul et al. is a sound warning about a technical problem that can arise with a certain class of fMRI analyzes. The central point, although simple, is not obvious – no-one has noticed it before, after all – and we should be very grateful to have it pointed out. I can see no sound defense against the central argument: the correlations reported on the “red list” papers are probably misleadingly high, although we do not know by how much. (The only valid defense would be to say that your paper did not, in fact, use a non-independent analysis.)

Some have criticized Vul et. al. for their combative or sensationalist tone. It’s true that they could have written the paper very differently. They could have used a conservative academic style and called it “Activity-behaviour correlations in functional neuroimaging: a methodological note”. But no-one would have read it. Calling their paper “Voodoo correlations” was a very smart move – although there is no real justification for this, it brilliantly served to attract attention. And attention is what papers like this deserve.

But this paper is not an attack on fMRI as a whole, or social neuroscience as a whole, or even the calculation of brain-behaviour correlations as a whole. Those who treat it as such are the real voodoo practitioners in the old-fashioned sense: they see Vul sticking pins into a small part of neuroscience, and believe that this will do harm to the whole of it. This means you, Sharon Begley of Newsweek : The upcoming paper, which rips apart an entire field: the use of brain imaging in social neuroscience…”. This means you, anyone who read about this paper and thought “I knew it”. No, you didn’t, you may have thought that there was something wrong with all of these social neuroscience fMRI papers, but unless you are Ed Vul, you didn’t know what it was.

There’s certainly much wrong with contemporary cognitive neuroscience and fMRI. Conceptual, mathematical, and technical problems plague the field, just a few of which have been covered previously on Neuroskeptic and on other blogs as well as in a few papers (although surprisingly few). In all honesty, a few inflated correlations ranks low on the list of the problems with the field. Vul’s is a fine paper. But its scope is limited. As always, be skeptical of the skeptics.

ResearchBlogging.orgEdward Vul, Christine Harris, Piotr Winkielman, Harold Pashler (2008). Voodoo Correlations in Social Neuroscience Perspectives on Psychological Science

CATEGORIZED UNDER: bad neuroscience, fMRI, papers, voodoo
  • Elliot B
  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Thanks. It’s a robust response – everyone should read it.

  • Anonymous

    I’m a bit late to the party, but… it’s not correct to say that Vul’s central point is not obvious and that no one has noticed it before, because it is widely known in general and among imaging scientists in particular. It applies to all analyses where multiple comparisons are made – not just correlations and not just brain imaging studies. Many or most of the papers Vul condemned did not actually perform a “non-independent analysis” but simply performed a correct inference followed by a correct post-hoc report of the correlation values. It is incumbent on the author to state this clearly, but also on the reader to know how to interpret such an analysis. Obviously you can’t interpret it the same way you would a single correlation in the behavioral literature, for the reason Vul stated.Nor is it correct to say that Vul was performing a surgically precise “voodoo” operation on a sick part of neuroscience, since he accused careful scientists of unscrupulousness and called for the retraction of perfectly valid scientific reports. Members of the neuroimaging field have already published a number of much more productive and informative reports on the “sickness” of incomplete reporting and careless interpretation. Vul’s own “meta-analysis” was haphazard and did not meet his own stated standards of reproducibility and freedom from selection bias.The Vul paper reflects a certain ignorance regarding how neuroimaging studies are performed and interpreted, and a certain careless with accusations of malfeasance. It is frustrating that so many have jumped on the Ed Vul “voodoo” bandwagon without really understanding the field or checking in with the accused. Recommended reading would be the rebuttal already listed here, plus the Nichols/Poline rebuttal which has a statistical focus, plus most of the “accused” papers listed in the Vul article.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Better late than never, Anonymous! And this is certainly something which deserves to be discussed.I share your frustration with the Voodoo Bandwagon and with the way in which certain people with a grudge against cognitive neuroscience have embraced Vul's paper.However, I think – as often happens – that the dust from the bandwagon obscures the perfectly good points that Vul made.I do think that his central argument is non-obvious. You might disagree. Certainly in my experience with cognitive neuroscientists at a major British research university, it was “news to us” – not in the sense that the statisrtical argument was novel in itself (because as you say it's an old one). Rather that it hadn't “clicked” that it was a problem with this kind of fMRI analysis. If that's just because me & my colleagues are rather dimwitted then so be it, but I suspect we're not alone.You say that Many or most of the papers Vul condemned did not actually perform a “non-independent analysis” but simply performed a correct inference followed by a correct post-hoc report of the correlation values. That’s true, but it begs the question of why they reported the post-hoc correlation values given that these are simply meaningless (they are very high but they must have been to pass significance testing). I can’t see any good reason to do it – it either reflects misunderstanding on the part of the author or it will mislead the reader (or both).

  • Anonymous

    I really appreciate this blog. Please keep up the good work, as well as the informative and respectful debate!!

  • Anonymous

    “called for the retraction of perfectly valid scientific reports.”

    How are they “perfectly valid reports” given the observations by many thoughtful scholars that there’s something fishy going on?

    I think the rhetoric and force of the rebuttals tells us a lot about the veracity of the Vul et al. claims – that’s they’re onto something. And that something is embarrassing to the fMRI folks who’ve enjoyed a lot of time in the limelight recently.

  • Anonymous

    The “rhetoric and force of rebuttals ” has no “correlation” to the veracity of the claims Vul is making. This is a sensationalistic paper written in a manner to gain attention and start a new “trend” in the way fMRI data is analyzed (and subsequently put himself at the forefront of this trend…), and in doing so is throwing the baby out with the bathwater by totally negating the previously accepted ways of analyzing fMRI data.

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Vul et al are certainly “on to something”, but as I’ve argued in this post and others, it is not a radical critique of current fMRI practice. Just a stern warning that if you’re going to do fMRI analysis in this way, you need to do it right. Remember that even according to Vul, half the papers examined were perfectly OK.

  • Anonymous

    What if I tell you that I can guess what most of the people reading this comment are thinking right now?! My guess: You all are thinking that I am out of my mind!

    Okay let’s analyze my success. About 20% of the readership probably thought I was crazy. Now, I will make the oh-so-self-serving inference that the 20% I did get right represents the entire readership.

    How is that for an explanation?

  • http://www.blogger.com/profile/17331334516303148904 Matthew Lieberman

    For anyone interested, there was a public debate on Voodoo Correlations last fall at the Society of Experimental Social Psychologists between Piotr Winkielman (one of the authors on the Voodoo paper) and myself (Matt Lieberman). The debate has been posted online.

    http://www.scn.ucla.edu/Voodoo&TypeII.html

  • http://www.blogger.com/profile/06647064768789308157 Neuroskeptic

    Thanks, I hadn't come across that, will take a look!

  • http://philosophy.wisc.edu/sober/ Elliott Sober

    I like Neuroskeptic's separation of the two problems that Vu et al raise. N's analysis of the first makes sense to me. In connection with the second, and its relationship to the problem of multiple comparisons, it is worth noting that Bayesians and frequentists disagree about how multiple comparisons should be analyzed (and even if they pose a “problem” at all). I wonder if Neuroskeptic would like to comment on this larger issue.

  • Anonymous

    Dr. Matthew Lieberman and Dr. Naomi Eisenberger are guilty of horrible unethical research misconduct! They conduct environmentally manipulated unethical research experiments on unconsenting and unknowing individuals (in real time) while these individuals are living out their daily lives!

    They have people , that will go out ,and deliberately manipulate a person’s (real-time) social environment in an extremely negative, unpleasant and intolerable way that causes that individual to be faced with problems that will cause them to suffer from long term social and emotional distress!

    They purposely manipulate an individuals daily social environment in a negative way, so that they will begin to suffer extreme amounts of emotional and visceral pain on a daily basis …all, due to the problems that dr. Matthew Lieberman and dr. Naomi Eisenberger’s group has deliberately caused for them!

    Their victims begins to suffer long episodes of rejection, isolation, ostracism, loss, abandonment and cannot find any social support. They are not told why or who has done this to them! Their lives are deliberately destroyed !They begin to suffer extreme amounts of pain all …so , Dr. Matthew Lieberman and his wife, UCLA’s Dr. Naomi Eisenberger can get a more original view of individuals suffering from social distress! The focus of their research experiments!

    Dr. Matthew Lieberman and Dr. Naomi Eisenberger ‘ s research projects need to be investigated ,
    shut down and held accountable for the lives and health of the individuals that they have tortured and destroyed all for their own financial greed, job security and to manipulate a way to get their research published in journals!

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT
Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »