A group of management researchers provide new evidence of a worrying bias in the scientific process – The Chrysalis Effect: How Ugly Initial Results Metamorphosize Into Beautiful Articles ( via Retraction Watch )
The issue they highlight – the ability of researchers to eventually squeeze support for a theory out of initially negative data – made me think, not of a chrysalis, but of the story of the ugly duckling who turned into a beautiful swan. Except in this case, the swan is the villain of the piece.
The authors, O’Boyle et al searched a database of dissertations and theses (ProQuest) for management-related graduate theses submitted between 2000-2010. They then used Google Scholar to try to track down published papers that described the same research covered in each thesis. Out of 2000 theses they investigated, they found 142 papers that they were sure were linked. (The average time lag was 3.4 years – an interesting fact in itself)
This is a clever method, and the results make for interesting reading. It turned out that a higher proportion of mentioned hypotheses were supported in papers (66%) compared to the theses (45%).
Partly, this was because unsupported hypotheses from the theses tended to just not get included in the papers. This is problematic, because it amounts to suppressing null results, which are meaningful and deserve to be published.
However, it gets worse. Quite often, a negative finding in the thesis became a positive finding by publication:
Among the dissertation hypotheses not supported with statistical significance, 56 of 272 (20.6%) turned into statistically significant journal hypotheses as compared to 17 of 373 (4.6%) supported dissertation hypotheses becoming statistically nonsignificant journal hypotheses.
How? Sometimes, data points were added or subtracted (excluded) from the sample, but even more concerning were the cases where the sample size didn’t change:
Of 77 pairs where the sample size did not change, 25 (32.5%) showed changes in the means, standard deviations, or interrelations of the included variables… when published, 16 (34.0%) of the unsupported hypotheses became statistically significant and none (0.0%) of the 63 supported hypotheses became statistically nonsignificant.
This is consistent with data manipulation, actual fiddling of the results, which is outright fraud – although there are some more benign possibilities. Maybe extra data was collected and, coincidentally, the same number of outliers were removed. Or maybe a typo had been fixed.
When it comes to discussing ways to solve the problem, O’Boyle et al make some decent suggestions: journals should encourage replication studies, and data sharing, etc. Which are good ideas. But they don’t discuss the one idea that would really change things: preregisration of methods and hypotheses.
Well, they do mention it, but they devote to it just two words: ‘research registries’, part of a list of “seismic shifts that may come about eventually” which I think is a polite way of saying “pipe dreams”.
But it’s no dream. All major medical journals require preregistration for clinical trials. The neuroscience journal Cortex pioneered preregistered reports for non-clinical studies, and other journals are now following suit. Many neuroscientists and psychologists are publicly preregistering replications online and this is starting to happen for original work too.
There is no reason management researchers can’t join in. And it’s ironic that O’Boyle et al think so little of preregistration, given that their study, in effect, exploited dissertations as a kind of preregistration database for published articles.
Still, at least they suggested some solutions. And they say, correctly, that
Attributing these behaviors to a few bad apples both understates the problem and misattributes the cause to the individual, when it is most likely systemic.
But O’Boyle et al then tried to fit the systemic problem into a sociological ‘model’, which I don’t think is necessary or helpful here:
We believe the sociological perspective of general strain theory is an appropriate framework for explaining how and why the Chrysalis Effect occurs… General strain theory views undesirable behavior from the perspective of negative social relationships that can be defined as “any relationship in which others are not treating the individual as he or she would like to be treated”…
A profound insight no doubt, but not a new one. This theory adds nothing to what we already know. Such monolithic systematizing actually leaves us understanding less than, before because it overlooks the fact that while questionable research practices (QRPs) are a sociological phenomenon, they are also a psychological one, that can operate on individuals without any social input.
Most QRPs are extensions of ordinary human cognitive biases: especially our tendency to wishful thinking, and our amazing capacity for post-hoc rationalization. A researcher ‘alone on a desert island’ would not be immune to QRPs. Although she would be under no social pressure to use them, she might end up fooling herself anyway.
O’Boyle, E., Banks, G., & Gonzalez-Mule, E. (2014). The Chrysalis Effect: How Ugly Initial Results Metamorphosize Into Beautiful Articles Journal of Management DOI: 10.1177/0149206314527133