A new paper in Nature Human Behaviour has generated lots of debate. In Redefine Statistical Significance, authors Daniel J. Benjamin and colleagues suggest changing the convention that p-values below 0.05 are called ‘significant’. Instead, they suggest, the cut-off should be set at 0.005 – a stricter criterion.
One of the commentaries is mine. In it, I don’t directly address the merits of p<0.005, but I point out that the p<0.05 rule is a holdover from a very different time. p<0.05 was introduced in 1925, back when statistical tests were carried out by hand.
Today, with the help of stats software, we can perform thousands of tests, producing thousands of p-values, in the time it used to take to calculate just one of them. Given a dataset we can carry out many analyses and see which gives the lowest p-values. This is the problem of p-hacking. At p<0.05, 1 in 20 p-values will be significant by chance alone.
I’m not sure that a p<0.005 threshold is the best or only solution to this problem. I have long advocated a different and more radical change to how science is carried out: study preregistration. Yet I would be happy to see p<0.005 (or even less) become the norm for non-preregistered studies. Preregistration, I think, would allow us to continue using p<0.05 in the spirit in which it was originally intended: to test significance in a carefully thought out, pre-planned analysis.