Do We Need An Adoption Service for Orphan Data?

By Neuroskeptic | November 1, 2017 6:05 pm

Having recently left an academic post, I’ve been thinking about what will happen to the data that I collected during my previous role that remains unpublished. Will it, like so much data, end up stuck in the limbo of the proverbial ‘file drawer’?

The ‘file drawer problem’ is generally understood to mean “the bias introduced into the scientific literature by selective publication – chiefly by a tendency to publish positive results but not to publish negative or nonconfirmatory results.”

However, while selective publication based on results is a big problem, even positive data can end up unpublished. If no-one gets round to analyzing some data at all, no-one will know whether it’s positive or negative, and it will vanish into obscurity, not because of the ‘file drawer problem’ but just because researchers only have so much time.

I would estimate that over the course of my research career, at least 25% of the data I’ve collected was never published, and in most cases it was never seriously analyzed. What happened? Well, life happened. For instance, I conducted a study as part of my PhD work that I’ll call the Orphan project.

oliver_twist_orphan

I designed and carried out the Orphan project during my PhD. It was intended to form the final part of my thesis, but in the end, my other studies provided sufficient results and Orphan was left out. So analyzing the Orphan data was never a priority during my PhD: I had to focus on the thesis studies.

I’d only managed to make a start on Orphan’s analysis by the time I’d got my PhD and moved away to start out as a postdoc in a different lab. Now that I had a new job to worry about, I had no time to finish Orphan, although I made a couple of abortive attempts. My old lab had their own data to deal with. So the data remained on my portable hard-drive, literally gathering dust, and now I can’t even remember where that hard-drive is.

This is a sad story. The volunteers who gave their time to participate did so in vain, and all my work was in vain too. Orphan wasn’t a great study, but it wasn’t terrible. It was no less deserving of publication than most of my published work.

I know that my lost project is far from unique. It’s hard to know just how much scientific data gets lost in limbo – by definition, it leaves no public traces – but it’s fair to say it’s not uncommon. Even if a study is published, it’s common that some parts of the work never make it into the papers. Researchers love adding additional measures to studies but don’t always have time to analyze all the extra data this generates.

So what can we do about this? Well, if the problem is that studies are being orphaned, why not create an adoption service?

There are lots of researchers who would love to get their hands on a particular kind of data, but who lack the resources to collect it. What if we could connect the people with too much data, and the people who need more?

I’m picturing a site where researchers could (perhaps anonymously at this stage) post a brief description of datasets that they have no time to analyze. Interested researchers could then contact those with data, and hopefully a collaboration would result, in which the data was shared, analyzed and published, saving it from oblivion.

Now, it could be said that we don’t need such a system, we just need open data. Why don’t researchers with surplus data just post it online so that anyone can access it and work with it? I agree that this is a great solution. But some researchers, for whatever reason, don’t want to make their data fully public. I think many of these would be open to handing their data to a named researcher in the framework of an agreed collaboration.

I’m not aware of any existing service that provides this kind of ‘adoption service for data’ (or ‘Tinder for data’ if you prefer). Perhaps the closest thing I know of is PsychFileDrawer, but it is mainly focussed on letting people share unpublished replication datasets.

CATEGORIZED UNDER: FixingScience, science, select, Top Posts
ADVERTISEMENT
  • https://www.youtube.com/playlist?list=UUwbGJwCdp96FKSLuWpMybxQ Lee Rudolph

    Having recently left an academic post

    Does this mean that you’re now a minion of Big Brain?

  • Neurocritic

    I agree with this idea. I’ve often considered posting a list of my unpublished datasets on my blog with the subject heading of, ‘FIRE SALE! EVERYTHING MUST GO!’

    This would be much more feasible if I weren’t anonymous, or if I set up yet another blog in my own name. Given the time and money invested in collecting this sort of data, I would still like to be an author. I would consider the adopter to be a collaborator who could either use my existing analyses or conduct analyses of their own, AND be first author on the paper.

    As far as Open Science goes, because of institutional rules I wouldn’t be able to post completely unpublished datasets to allow a free-for-all where the data collection lab was not involved at all. Many Open Science proponents don’t seem to understand the privacy and security rules that can limit unfettered data sharing.

  • Franck Ramus

    I suppose every productive researcher has this sort of problems.
    It’s worth noting that this is in part due to the incentives to always get new grants and rush headlong into new projects, without ever taking the time to finish the work properly.

    Over the years this has got me very frustrated so I decided to do something about it (for data that was sufficiently valuable to be worth it): I applied for grants to analyse previously acquired data. And it worked!
    In one case I got a grant that was mostly to run a new project (call it B), but that also included a postdoc to analyse data from the previous project A (on the same topic). A side-benefit was that, although project B had generated 0 publication by the end of the grant (only data collection was finished), the analysis of project A data had generated quite a few, so the reporting looked much better.
    In the second case I applied for a grant (C) exclusively dedicated to analysing the data from project B. Mostly salary to hire people to do the work, since I didn’t have enough time. I argued carefully that the data was already acquired and high quality so that the project was low risk-high gain. Apparently they were convinced!
    So this can be done. It does take some mental effort to pull the brakes and say: “no, this time, I will not acquire new data”…

  • Pingback: Weekend reads: Researcher sues over criticism; how to fire a professor; science by sexual harassers - Retraction Watch at Retraction Watch()

  • Heidi Seibold

    R packages can have an orphan status. Maybe this could be similar for data if they were published in a repository. This is from the CRAN website (https://cran.r-project.org/):

    Orphaned packages have no active maintainer: they have
    Maintainer: ORPHANED
    in their DESCRIPTON file.

    Orphaned packages remain in the main CRAN packages section as long as
    they pass “R CMD check” for the current release version of R.

    Everybody is more than welcome to take over as maintainer of an orphaned
    package. Simply download the package sources, make changes if necessary
    (respecting original author and license!) and resubmit the package to
    CRAN with your name as maintainer in the DESCRIPTION file of the
    package.

    Possible reasons for orphanizing a package:

    1) The current maintainer actively wants to orphanize the package,
    e.g., due to no longer having time or interest to act as package
    maintainer.

    2) Emails sent to the current maintainer by the CRAN admins bounced, or
    were not answered for longer periods of time.

    The current orphanizing process:

    1) File PACKAGES.in in CRAN’s src/contrib directory (the repository
    package metadata file) adds a Maintainer: ORPHANED override and an
    X-CRAN-Comment entry providing information about the original
    maintainer, and the date and reason for orphanizing.

    2) Package sources and binaries are updated on CRAN without increasing
    the version number.

    3) If the package cleanly passes R CMD check for the current release
    version of R, it remains in CRAN’s src/contrib directory. Otherwise,
    it is moved to the Archive.

  • My Own Life

    I think this is a promising activity for those who have stopped trading their time for money (retired).
    For one example, after Mickey Nardo retired, he found the time and impetus to focus on the distortion of clinical trial reports found in psychiatric drug promotion. His blog at http://1boringoldman.com/ produced beneficial results in that area.

  • research matters

    https://uploads.disquscdn.com/images/d4b9992b542998575180db633d19ec33b86ff2db0bb14752a0426c0103cb60b9.png Have you heard of ScienceMatters? They do exactly this – they publish single observations including orphan data.
    http://www.sciencematters.io

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar
+