Is It Time To “Redefine Statistical Significance”?

By Neuroskeptic | October 3, 2017 8:33 am

fixing_science

A new paper in Nature Human Behaviour has generated lots of debate. In Redefine Statistical Significance, authors Daniel J. Benjamin and colleagues suggest changing the convention that p-values below 0.05 are called ‘significant’. Instead, they suggest, the cut-off should be set at 0.005 – a stricter criterion.

Over at The Brains Blog, John Schwenkler organized a discussion of the Benjamin et al. proposal, featuring commentary from several statisticians and researchers.

One of the commentaries is mine. In it, I don’t directly address the merits of p<0.005, but I point out that the p<0.05 rule is a holdover from a very different time. p<0.05 was introduced in 1925, back when statistical tests were carried out by hand.

Today, with the help of stats software, we can perform thousands of tests, producing thousands of p-values, in the time it used to take to calculate just one of them. Given a dataset we can carry out many analyses and see which gives the lowest p-values. This is the problem of p-hacking. At p<0.05, 1 in 20 p-values will be significant by chance alone.

I’m not sure that a p<0.005 threshold is the best or only solution to this problem. I have long advocated a different and more radical change to how science is carried out: study preregistration. Yet I would be happy to see p<0.005 (or even less) become the norm for non-preregistered studies. Preregistration, I think, would allow us to continue using p<0.05 in the spirit in which it was originally intended: to test significance in a carefully thought out, pre-planned analysis.

ADVERTISEMENT
  • CL

    Is there any evidence that studies reporting effects at p<0.005 are mor likely to be replicated than p<0.05 studies?

    • Jacob

      Only large-scale replication paper I know is here: http://science.sciencemag.org/content/349/6251/aac4716

      Eye-balling figure 2 it doesn’t look like p < 0.005 is any more likely to replicate than p < 0.05, but that's just an eyeball. I would expect a higher replication rate for a smaller p-value, but that's assuming the p-value is meaningful in the first place.

    • http://www.eiko-fried.com Eiko Fried

      CL: the original paper that asked for the redefinition of significance to 0.005 reports that p<0.005 studies from the reproducibility paper in science were more likely to replicate than p<0.05 studies … with a p value of 0.015. Which is n.s. according to their new standard ;).

      • CL

        hehe

  • http://www.eiko-fried.com Eiko Fried

    In the papers and comments on the papers I’ve read so far, I still haven’t seen a thorough discussion of type I errors that seem highly relevant in the context. I like the Lakens paper in the sense that it allows for some flexibility here, and does account for the fact that you (unlike Benjamin et al suggest) do not always want to run high risk of making type I errors irrespective of the research question.

  • TLongmire

    The studies are where A.I. should be preeminent.

  • OWilson

    It’s not necessary to redefine the language, or to apply restrictive parameters.

    “Statistically significant” is just a relative and subjective term if used out of context.

    A 1% increase in fuel consumption, may “statistically insignificant” but a 1% concentration of arsenic in a beverage would be fatal.

    Likewise a 0.30 degree rise in the Earth’s average temperature over the 38 year NOAA Satellite Record is considered “statistically significant”, while in a single typical day a temperature rise of 50 degrees is considered “statistically insignificant”.

    What’s in YOUR water? :)

  • Денис Бурчаков

    And the age of p-hackers will give way to the age of p-extortionists.

  • hudasx

    I wonder how different science would have been if statisticians agreed early on to report the actual p-value. The ease of looking up a table doomed us all.

  • Pingback: Lectuur op zaterdag: muziek en leren, programmeren leren? en wat als je De Mol niet herkende? | X, Y of Einstein?()

  • Pingback: Virtuality Bites – Enlaces interesantes de la semana – antroposcopio()

  • Pingback: Virtuality Bytes – Intersting links of the week 7 October, 2017 – QUEROLUS.ORG – A DIGITAL LIFE EXPOSED()

  • Pingback: Post Of The Week – Sunday 8th October 2017 | DHSB/DHSG Psychology Research Digest()

  • Andrew Old

    “At p<0.05, 1 in 20 p-values will be significant by chance alone."

    Surely that's not what it means? It's a 1 in 20 chance that the results will be significant given that the null hypothesis is true.

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar
+