A paper in Psychological Science was taking a beating on Twitter last month.
Party like it is 2011 at Psychological Science. P-values are either in the Goldilocks range (0.03-0.05) or p<.001 with an effect size of d = 0.9. Sitting in a dim vs light room has a bigger effect on thinking you'll get the flu than that emotion manipulations have on mood. Right. pic.twitter.com/neNlZGp5Vl
— Daniël Lakens (@lakens) June 5, 2018
The paper reports on five studies which all address the same general question. Of these, Study #3 was preregistered and the authors write that it was performed after the other four had been completed. It was also larger than the others. The results of Study #3 closely matched the other studies’. So far, so good.
However, according to Daniël Lakens on Twitter (I’m not sure how he knows this), Study #3 was conducted on the instruction of the editors (during peer review):
This is the preregistered study, with P < .001, that they were asked to do because the editor asked for it. It worked out absolutely perfectly – exactly the same effect size, highly significant. Good for them, right?
— Daniël Lakens (@lakens) June 6, 2018
Now, this is where alarm bells started ringing for me. If Psychological Science asked the authors of this paper to carry out Study #3, the reason, presumably, is that they weren’t fully convinced by the other studies. The journal wanted more evidence for the hypothesis that ‘participants in a dimly lit room or wearing sunglasses tended to estimate a lower risk of catching contagious diseases.’ That’s understandable, but what would the editors have done if the results of Study #3 had come back negative?
This is a crucial question because if Psychological Science (or any journal) requires authors to provide more positive results in order to get published, they are putting those authors in a very difficult position. A paper in Psychological Science can make someone’s whole career, and to tie such a publication to the result of a particular study incentivizes (at best) questionable research practices.
To be fair, maybe maybe Psychological Science told the authors ‘you must do Study #3, but we commit to publish your paper whatever the results are.’ That wouldn’t be as bad, but the question would then arise: would this journal have considered the paper at all if the original four studies were not uniformly positive…?
The whole beauty of preregistration is that it would have allowed these authors to get their studies reviewed and accepted for publication before any of the results were in. There would then have been no pressure for Study #3 or any of the other results to be positive – no pressure from the journal, anyway. The hunger for positive results that underlies so much publication bias and p-hacking would never arise.
In other words, asking authors to conduct preregistered replication studies to confirm their non-preregistered findings might be harmful or it might be useful, but it certainly isn’t the best way to do preregistration.