Neuroscience Fails Stats 101?

By Neuroskeptic | September 11, 2011 5:36 pm

According to a new paper, a full half of neuroscience papers that try to do a (very simple) statistical comparison are getting it wrong: Erroneous analyses of interactions in neuroscience: a problem of significance.

Here’s the problem. Suppose you want to know whether a certain ‘treatment’ has an affect on a certain variable. The treatment could be a drug, an environmental change, a genetic variant, whatever. The target population could be animals, humans, brain cells, or anything else.

So you give the treatment to some targets and give a control treatment to others. You measure the outcome variable. You use a t-test of significance to see whether the effect is large enough that it wouldn’t have happened by chance. You find that it was significant.

That’s fine. Then you try a different treatment, and it doesn’t cause a significant effect against the control. Does that mean the first treatment was more powerful than the second?

No. It just doesn’t. The only way to find that out would be to compare the two treatments directly – and that would be very easy to do, because you have all the data to hand. If you just compare the two treatments to control you might end up with this scenario:

Both treatments are very similar but one (B) is slightly better so it’s significantly different from control, while A isn’t. But they’re basically the same. It’s probably just fluke that B did slightly better than A. If you compared A and B directly you’d find they were not significantly different.

An analogy: Passing a significance test is like winning a prize. You can only do it if you’re much better than the average. But that doesn’t mean you’re much better than everyone who didn’t win the prize, because some of them will have almost been good enough.

Usain Bolt is the fastest man in the world (when he’s not false-starting himself out of races). Much faster than me. But he’s not much faster than the second fastest man in the world.

ResearchBlogging.orgNieuwenhuis S, Forstmann BU, & Wagenmakers EJ (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nature neuroscience, 14 (9), 1105-7 PMID: 21878926

  • http://www.blogger.com/profile/11092479050280203131 Strange Loop #641

    This also applies to comparisons between the control and just one treatment in a pre and post test design. Basically the difference between within treatment differences and between treatment differences.

  • Jake

    Damn, this is a little disturbing. I have occasionally seen this error in non-neuroscience papers, but I've never gotten the impression that it was a particularly pervasive problem (try saying that last bit five times fast). I admittedly don't read many neuroscience papers, but I find it surprising that this error would be so much more common in those literatures. Why might this be the case?

  • http://jayuhdinger.com jay uhdinger

    Wow that's surprising. Didn't think it would be so wide spread!

  • http://www.blogger.com/profile/15225859145004971487 Jon Brock

    This comment has been removed by the author.

  • http://www.blogger.com/profile/15225859145004971487 Jon Brock

    Your analogy implies that you are non-significantly slower than the second fastest man in the world. Can you confirm?

  • Anonymous

    From: http://www.foxnews.com/health/2011/09/09/study-clouds-picture-on-omega-3s-and-heart-health/

    “Vedtofte said that men who ate more omega-3 fatty acid-rich foods also seemed to gain protection from heart disease, but that the statistical differences were small so the effect could be due to chance.”

    They mean effect size, right?

  • Pseudonymoniae

    I've only really just gotten into reading neuroscience papers consistently after entering grad school, but this doesn't surprise me in the slightest. I regularly comment to others in my lab about all the bad stats I run into, and not just in low quality journals. Nature journals are probably the worst. Often the authors just report p-values, without naming the test, other times they report “t-test” without mentioning ever having conducted an ANOVA or using any sort of correction for multiple comparisons. And for some bizarre reason, there doesn't appear to be any to be any standard for reporting stats at the end of papers. One paper will report ANOVAs and post hoc tests for all but the most basic comparisons, while some others literally don't report any stats at all. There was a Nature paper from a couple weeks ago that I read which did exactly this. I'll see if I can find it.

    Btw NS, that graph doesn't illustrate an interaction, because it only has one independent variable. An interaction occurs when there is a significant difference across groups on one independent variable which also differ across groups within another independent variable.

    So for example, we could look at an interaction in the following study: divide our animals into those who carry a hemoglobin mutation and those who do not (independent variable #1) and also into those which are fed an iron supplement and those which are fed no supplement (second independent variable) and then compare them on a dependent measure (oxygenation of tissue x during exercise). When we run our 2×2 ANOVA, a significant interaction could indicate something like “only those animals which carried the mutation and were not given an iron supplement had insufficient oxygenation of tissue x”. (Well, you would need post hoc tests to identify which group had the low level of oxygenation.)

  • Pseudonymoniae

    …forgot to say: that graph can only have a main effect, e.g. requires a one-way ANOVA.

  • http://www.blogger.com/profile/05660407099521700995 petrossa

    Statistics is a bit like quantum states. Every number exists in any relation it's just luck of the draw which calculation produces meaningful results.

    Since a comparison to another calculation isn't menaingful you are stuck with that statistic that either means something, a lot or nothing depending on the aspect of it you perceive.

    It's more a virtual result then a real result.

    fMRI being a great example how you can make it do whatever you want.

  • mathii

    The title of this article makes the point quite succinctly I think: “The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant”

    http://www.stat.columbia.edu/~gelman/research/published/signif4.pdf

  • http://www.blogger.com/profile/16938658278079810327 jamzo
  • Anonymous

    Statistics is Photoshop for data.

    DC

  • http://www.blogger.com/profile/08144814049048407886 roxy

    Hi great article I really enjoy reading this blog,thanks, good topic I am pleased to say it is interesting that this blog has a great variety of viewpoints to better understand the situation and that is what most caught my attention and has a great variety of comments thanks
    http://www.klonopinx.com

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT
Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »