The Ugly Ducklings of Science

By Neuroskeptic | March 26, 2014 5:45 pm

A group of management researchers provide new evidence of a worrying bias in the scientific process – The Chrysalis Effect: How Ugly Initial Results Metamorphosize Into Beautiful Articles ( via Retraction Watch )

The issue they highlight – the ability of researchers to eventually squeeze support for a theory out of initially negative data – made me think, not of a chrysalis, but of the story of the ugly duckling who turned into a beautiful swan. Except in this case, the swan is the villain of the piece.


The authors, O’Boyle et al searched a database of dissertations and theses (ProQuest) for management-related graduate theses submitted between 2000-2010. They then used Google Scholar to try to track down published papers that described the same research covered in each thesis. Out of 2000 theses they investigated, they found 142 papers that they were sure were linked. (The average time lag was 3.4 years – an interesting fact in itself)

This is a clever method, and the results make for interesting reading. It turned out that a higher proportion of mentioned hypotheses were supported in papers (66%) compared to the theses (45%).

Partly, this was because unsupported hypotheses from the theses tended to just not get included in the papers. This is problematic, because it amounts to suppressing null results, which are meaningful and deserve to be published.

However, it gets worse. Quite often, a negative finding in the thesis became a positive finding by publication:

Among the dissertation hypotheses not supported with statistical significance, 56 of 272  (20.6%) turned into statistically significant journal hypotheses as compared to 17 of 373  (4.6%) supported dissertation hypotheses becoming statistically  nonsignificant journal hypotheses.

How? Sometimes, data points were added or subtracted (excluded) from the sample, but even more concerning were the cases where the sample size didn’t change:

Of 77 pairs where the sample size did not change, 25 (32.5%) showed changes in the means, standard deviations, or interrelations of the included variables… when published, 16 (34.0%) of the unsupported hypotheses became statistically significant and none (0.0%) of the 63 supported hypotheses became statistically nonsignificant.

This is consistent with data manipulation, actual fiddling of the results, which is outright fraud – although there are some more benign possibilities. Maybe extra data was collected and, coincidentally, the same number of outliers were removed. Or maybe a typo had been fixed.

When it comes to discussing ways to solve the problem, O’Boyle et al make some decent suggestions: journals should encourage replication studies, and data sharing, etc. Which are good ideas. But they don’t discuss the one idea that would really change things: preregisration of methods and hypotheses.

Well, they do mention it, but they devote to it just two words: ‘research registries’, part of a list of “seismic shifts that may come about eventually” which I think is a polite way of saying “pipe dreams”.

But it’s no dream. All major medical journals require preregistration for clinical trials. The neuroscience journal Cortex pioneered preregistered reports for non-clinical studies, and other journals are now following suit. Many neuroscientists and psychologists are publicly preregistering replications online and this is starting to happen for original work too.

There is no reason management researchers can’t join in. And it’s ironic that O’Boyle et al think so little of preregistration, given that their study, in effect, exploited dissertations as a kind of preregistration database for published articles.

Still, at least they suggested some solutions. And they say, correctly, that

Attributing these behaviors to a few bad apples both understates the problem and misattributes the cause to the individual, when it is most likely systemic.

But O’Boyle et al then tried to fit the systemic problem into a sociological ‘model’, which I don’t think is necessary or helpful here:

We believe the sociological perspective of general strain theory is an appropriate framework for explaining how and why the Chrysalis Effect occurs… General strain theory views undesirable behavior from the perspective of negative social relationships that can be defined as “any relationship in which others are not treating the individual as he or she would like to be treated”…

A profound insight no doubt, but not a new one. This theory adds nothing to what we already know. Such monolithic systematizing actually leaves us understanding less than, before because it overlooks the fact that while questionable research practices (QRPs) are a sociological phenomenon, they are also a psychological one, that can operate on individuals without any social input.

Most QRPs are extensions of ordinary human cognitive biases: especially our tendency to wishful thinking, and our amazing capacity for post-hoc rationalization. A researcher ‘alone on a desert island’ would not be immune to QRPs. Although she would be under no social pressure to use them, she might end up fooling herself anyway.

ResearchBlogging.orgO’Boyle, E., Banks, G., & Gonzalez-Mule, E. (2014). The Chrysalis Effect: How Ugly Initial Results Metamorphosize Into Beautiful Articles Journal of Management DOI: 10.1177/0149206314527133

  • Chris Chambers

    I always find it bemusing when authors talk of study pre-registration as though it is some kind of unreachable pot of gold. Why do we so timidly accept the status quo as though it is beyond our control?

    What makes this so puzzling is that it really isn’t very difficult to plant the seeds of change in “the system” – you just have to have a bit of resolve and get on with it. As scientists we embrace skepticism because it helps us do good research, which is great, but it is possible to be too skeptical, to be too shrinking and self-defeating about changing the things that only we have the power to change.

    So every time I see another paper lamenting the status quo, I’m not interested in the sociological perspective or any other form of intellectualisation or hand waving. I want to know what the authors are doing – right now – to fix the problem. That’s the interesting part.

    • Neuroskeptic

      Exactly. Lamentations, without solutions, can easily turn into vindications (“It’s bad, but woe, it was ever thus, so let’s get used to it…”)

      • Bill C

        Is it a case of being too skeptical about change, or is it more the case that most individuals correctly recognize that pushing for the change would work against their career interest?

  • Jan Moren

    I must be missing something here; perhaps theses are different in this particular subfield or country than what I’m used to.

    For us, your thesis _is_ your published papers. Usually literally so – the papers are put together one after the other, with a lengthy (50+ pages) introduction that summarize and describe how these form a coherent body of research. The papers often even keep the exact same formatting as when they were published. People usually just paste together the final proof PDFs from the publisher, scaled down as needed to fit the print format.

    Some people like myself do write a coherent book*, but even then we spell out completely clearly in the introduction exactly what papers it’s based on, and what chapters are based on what papers. There just is no “tracking down” the original papers, or changing the data around.

    * “I’ll just edit the papers and move stuff around to make it read nicely. How hard can that be?” Famous last words.

    • Neuroskeptic

      Yes, but I think it depends on the discipline and on the individual student. Many theses are based on already-published papers but not in all cases. And often, even if a thesis includes some papers, it will also include additional material that hasn’t yet been written up as a paper – in some cases this is published later (an average of 3.5 years in the case of O’Boye’s sample).

  • Wouter

    Preregistration deserves a chance. I think (neuro)science will benefit from it. That being said, I’d like to address a pragmatic issue, closely related to this post, that’s been bugging me.
    And the issue is this: it’s going to put a lot of pressure on (under)graduate students. A PhD-student only has a limited amount of time to get certain studies published. Around 2-4 years for a thesis containing 4/5 articles is a common standard. If students depend on anonimous reviewers and the reviewing process before they can actually start anything, they’ll be under more pressure, as it is right now. This in itself might lead to QRPs. Optionally, we might lower standards for finishing one’s thesis (fewer articles), but that basically says you’re shifting a student’s focus from research to administrative tasks. In the end, you want (under)graduate students to learn (doing science), and preregistration might interfere with that goal. Any thoughts on this?

    • Neuroskeptic

      This is an interesting point and one I haven’t seen raised before.

      However – at least in the UK – PhD students in psychology and human neuroscience generally spend at least the first few months (and sometimes more) doing planning and paperwork, because they need to get ethics committee permission for their work.

      These ethics applications typically include all of the information that would constitute a preregistration. But, they are not publicly accessible.

      So one simple solution would be to make these applications publicly searchable (from the moment of submission or, perhaps, from the moment of approval.)

      This would constitute preregistration.

      Then there would be an additional step if you wanted to get your protocol pre-peer reviewed and accepted at a journal. However, this would be optional: you could choose to do the work and submit it for publication when you’re done (as in the current model), citing your evidence of preregistration.

      It might be that PhD students (and beginning researchers in general) would favor that option, and then would move towards pre-peer review once their research ‘pipeline’ was flowing.

  • Lisa Alonzo

    Of course I am fooling myself. Do you want me to go insane and keep it all bottled up, then take it out on my family like so many parents did in 2008.
    Remember this though. FOOL F=6,O=1+5, 0=1+5, L=12 which is 6+6.
    All together 6,6,6,6,6….in my vocabulary. This the number of the Beast and it’s true meaning is: fix, Fix,FiX,FIx, FIX! -LisAMAlonzo

  • Pingback: Links 4/2/14 | Mike the Mad Biologist()

  • poniesinjudah

    Splendid and useful. First a quibble: QRPs? Who the frig made that up? Science is turning into A Clockwork Orange with all its made up slang. And this bit is really cal.
    I think we need to ask if these highly fraudable statistical techniques are giving us enough. To keep using them. I know that’s like denying the Trinity but I don’t care. Statistics in biology is new, 20th Century. And they drive research choices in ways that are bad. Things that can’t be studied with, and especially produce statistics, are not even considered. Statistical techniques are also a counter of accomplishment for young scientists. Instead of erudition in their field. Banting years later on isolating insulin asked about statistical methods said they didn’t use any. If ALL the dogs the treated with their isolate hadn’t gotten normal blood sugar levels they would have known they hadn’t got it right.

    • Neuroskeptic

      Re: QRPs, you’re right that the term is bit clumsy, but I think it’s useful nonetheless.

      QRPs are the grey area between best practice and malpractice. They are not fraud, but they are nonetheless not ideal science.

      We do need a term for that.

      And “QRPs” (I pronounce it “quirps”, rhymes with twerps) seems as good as any.

  • Timothy O’Leary

    Interesting article. One concern I have is that there is a slightly naive position on the importance of statistics and the fact that science is often exploratory and statistical ‘hypotheses’ are being elevated to a status the shouldn’t really have. For example,

    “Partly, this was because unsupported hypotheses from the theses tended to just not get included in the papers. This is problematic, because it amounts to suppressing null results, which are meaningful and deserve to be published.”

    Many, many ‘unsupported hypotheses’ are simply uninteresting, irrelevant results that reflect the exploratory nature of research. This is especially true for a doctoral or masters thesis. Journal papers would be unreadable of every failed Friday afternoon HAD to be included. And, as for registering all such experiments, that sounds like an idea cooked up by someone who has relatively narrow and limited experience actually doing research.

    Finally, what really matter are Scientific hypotheses, not statistical ones.

  • Pingback: Cap and Trade Scientific False Positives? - Neuroskeptic |



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar