The Perils of Sharing Brain Scans

By Neuroskeptic | November 22, 2012 8:09 pm

A fascinating paper by neuroscientists Van Horn and Gazzaniga chronicles their pioneering, but not entirely successful, attempt to get researchers sharing their brain scans: Why share data? Lessons learned from the fMRIDC.

It all started in 1999 when, along with some colleagues, they decided that the time was right for data sharing in neuroimaging. They got some public funding, and tried to get various major neuroscience journals to require that anyone publishing an fMRI study should make their data available to the fMRI Data Consortium (fMRIDC).

By making it mandatory, they’d ensure that there was no selection bias. Requirements to post raw data were already common in other fields of science like genetics and crystallography. So, they thought, why can’t it happen here?

However, it didn’t go down very well:

Upon becoming aware of our efforts and goals, fMRI researchers angered by journal requirements to provide copies of the fMRI data from their published articles began a letter writing campaign seeking to muster opposition  an effort which was featured in the news and editorial sections of several influential journals.

Editorials and commentaries over fMRI data sharing were aired in the pages of Science, Nature Neuroscience etc. expressing concern over the data sharing requirement, over what possession of the data implied, human subject concerns, and, if databasing was to be conducted at all, how it should be conducted “properly”.

This was all before my time, sadly. It sounds like a grand old academic street-fight. No doubt those on the other side remember it differently from how it’s presented here, though.

The reactions of our colleagues caught us somewhat off guard. We were honestly surprised by the  negative and hostile response when we had believed that creation of a data archive would be of benefit to the neuroimaging community. Perhaps, they had a point.

Maybe the field wasn’t ready for fMRI data sharing? Perhaps, it was too early to start such a project? We struggled with how best to move forward or whether to move forward at all.

Anyway, they decided they would continue, on a more modest scale. fMRIDC ended up with data from about 100 fMRI studies by the time the funders pulled the plug in 2006, only a fraction (and perhaps an unrepresentative one) of the papers published, but still, it’s something.

As I said, I missed out on this debate, but if I had been in the field 10 years ago, I suspect I’d have been on the extreme wing of the pro-sharing faction, the Montagnard to the founders’ Girondism. My view is that no researcher owns their data. The only person who owns the results of a brain scan is the person whose brain it is.

If I could wave a magic wand, I’d put a chip in every MRI scanner that automatically uploaded all scans to a public database as soon as they appeared (with the subject’s consent, and/or with personal information about the subject stripped out). I’d fix all the software such that every time someone ran an analysis, it was publicly logged. Total transparency is best for science, I believe, and it would also make scientists lives  easier, once they got over the initial shock.

Sadly that’s not possible… yet… but data sharing is a noble cause and, as Van Horn and Gazzaniga point out, even if fMRIDC is dead, the idea lives on with many new initiatives emerging, hydra-like, in its place.

ResearchBlogging.orgVan Horn JD, and Gazzaniga MS (2012). Why share data? Lessons learned from the fMRIDC. NeuroImage PMID: 23160115

CATEGORIZED UNDER: FixingScience, fMRI, methods, papers
  • The Neurocritic

    An interesting bit of history from Van Horn and Gazzaniga. Those authors (and current readers) might be curious to find that the original signed letter against the fMRIDC effort, and subsequent correspondence, is still available online.

    People were threatened by the fact that others might use their data/dollars to (1) gain publications of their own, which was bad for staying ahead in the competition for funding (and fame); and (2) prove them wrong, perhaps.

    On the other side, a letter from the Editor of the special issue of JOCN (2000) noted that 13 out of 15 labs gave an “enthusiastic response” when asked and agreed to participate. Several other labs wanted to take part but the page limits had been already reached.

  • conge

    Thank you for the link to fcon1000 and INDI.

  • SpongeBob63

    As someone who was initially positive about this scheme but who became rather negative I feel qualified to comment. At the time there were two interrelated practical problems which helped sink it:

    (1) A large part of the MRI data that could be submitted was in the form of fMRI data. The content of this data depends very much on the particular experiment presented and the details of the stimuli used. Without this information the fMRI data is largely uninterpretable and not capable of being re-analysed by other researchers. But there is no standardised way of describing in sufficient detail the experiment being run. A loose verbal description, as given in most papers, won't do. Precise timings, all stimuli, ideally a copy of the software used to present the experiment, are all needed. In other words everything about the experiment needed to run it again. By asking for the output data (fMRI data) there was an implicit need to surrender all the input to the experiment and all the intellectual work that went into this. Researchers baulked at doing this and the data aggregators only made token efforts to capture this info.

    (2) Instead, initially at least, the data aggregators went to great lengths to capture in considerable detail the surface characteristics of the MRI data being given (protocol descriptions of each sequence). In practice, this made it a pain to upload data without providing a lot of information that most experimenters (not being MR physicists) did not have ready access to. It was not made easy to submit 'raw' data, even without worrying about the experiment defining it. Given the choice between submitting to a journal that required all of this extra effort and one that did not, many researchers voted with their time and took the easy option.

    The rest they say is history.

  • Nitpicker

    “If I could wave a magic wand, I'd put a chip in every MRI scanner that automatically uploaded all scans to a public database as soon as they appeared (with the subject's consent, and/or with personal information about the subject stripped out). “

    For the T1 image, it's a picture of someone's head in 3D. You can render the image and recognise the person's face. Kinda tough to strip that info out…

  • Nitpicker

    “I'd fix all the software such that every time someone ran an analysis, it was publicly logged. Total transparency is best for science, I believe, and it would also make scientists lives easier, once they got over the initial shock.”

    Really? I assume that is a troll. Otherwise it seems pretty dumb and silly to me. Most neuroimaging analysis software is run pretty much like people run SPSS. Should we log all usage of SPSS publicly too? Surely once we get over our “initial shock” we will all be happy with that?

    Logging software usage will flag up those researchers who analyze their data many times. What does that really tell us? They could be inexperienced, make errors and fix them. Or they could be paranoid and constantly tweak their analyzes with insignificant changes (not to fish for results but in a search for the “perfect” analysis). Or the experiment could be fiendishly complicated and genuinely require multiple analyzes to get to grips with it. I've seen all the above. I am sure there are also people doing data dredging and fishing for positive results too. But my point is that mere logging of software usage won't flag that up. You will need an intelligent stats robot to interpret the usage.

    Oh wait, if I had that I could use it to analyze the data in the first place…

  • Juliano Assanjo

    Nitpicker: Strictly science, nothing is personal.

    We believe that all fMRI data should be shared, and funding agencies should enforce scientists to share their data as part of the grant.

    Please multiply all what you've said by -1, and look into the mirror.
    1- ” fishing for positive results”. The opposite is true, sharing will prevent fishing for positive results, Y? Cause many ppl out their will investigate it over and over. (That's contrary to what your opinion)
    2- “Troll” ! who is trolling when you say “Most neuroimaging analysis software is run pretty much like people run SPSS”… How about the many parameters that one might think of? (Her's something to refresh your memory: Carp, J. (2012). On the Plurality of (Methodological) Worlds: Estimating the Analytic Flexibility of fMRI Experiments, Front. Neuroscience, 6, DOI: 10.3389/fnins.2012.00149). So, what makes the set of parameters that you chose better than X?

    3- “They could be inexperienced, make errors and fix them.”… If so, it is not your problem, their results will be cross-checked by other researcher, wo what is the problem? But, honestly, I did not get the meaning of “make errors and fix them”. They will play with the data and get something, same as any other researcher. “Make errors and fix them”? Isn't' that how we learn, and how we do research?

    4- “For the T1 image, it's a picture of someone's head in 3D. You can render the image and recognise the person's face. Kinda tough to strip that info out..”

    If the volunteer accepts that, so what the heck? Plus, there are several algorithms that can be used to scramble the person's face, haven't you heard about them, they are very easy to run, plug and play.

    5- “Logging software usage will flag up those researchers who analyze their data many times”
    So what's the harm of analyzing many times, to double check the result by another researcher, or new findings?
    Here's something to refresh your memory:
    J. D. Van Horn and A. Ishai, “Mapping the human brain: New insights from fMRI data sharing,” Neuroinformatics, vol. 5, pp. 146-153, 2007. (DOI: 10.1007/s12021-007-0011-6)

    We are working on fMRI-leaks and all fMRI data will be publicly available, soon.
    One final thought, are you theee …Isabel Gauthier? ND, Why were you sooo afraid of sharing data? Now that's trolling.

    I really hate seeing a long comment here, but, it had to be done this time.


  • Neuroskeptic

    I'm going to leave the comment by Juliano Assanjo up as it contains useful information, but there's no need for that tone. Any more comments like that will be binned.

  • Neuroskeptic

    Nitpicker: For sharing scans, you could just extract the brain and upload that (e.g. FSL's BET tool).

    Some skull would be useful for some purposes though e.g. normalization so you could strip out the anterior parts and leave the rest.

    Regarding the software – actually I do think SPSS should be shared in the same way.

    You're right that most repeat analysis is entirely bona fide. That would become obvious, if it were all public, from the fact that everyone did it.

    But it would let you see whether someone had done 100 analyses and then published the only good one. If they'd done 100 and the others were 99% as good then no-one would care.

    As well as keeping people honest it would also be a huge timesaver. People must run duplicate analyses all the time. Imagine if you had, say, a web version of SPSS where if you tried to run the same analysis as someone else (on identical or similar data) it gave you a link to everything they subsequently did?

  • Neuroskeptic

    P.S. The magic wand comment is about 90% my real views and 10% trolling to get useful comments, which it seems to have done :)

  • Mark_Oxford

    I wonder if part of the problem here, and the failure of the process 10 years ago, was the failure of the data managers to recognise and understand the significant cultural shift that was being asked for within clinical and neuroscience research groups.

    The old “private” model assumes integrity on the part of the researchers. They do their best, process data as they see fit and publish results as best they can. Mistakes get made and even published but they are “honest” mistakes, corrected by replication elsewhere. It is not efficient but people are protected behind the privacy.

    What was being asked for in the “open” model, with complete disclosure of data and methods, is culturally radically different. Effectively researchers would no longer be trusted to work in private. Sharing of data would enable others (potentially with an antagonistic agenda) to highlight alleged analysis sins. Inevitably this could be used to allege either incompetence or overt data manipulation. Suddenly researchers private mistakes become public or their actions investigated.

    I am not saying the latter model of working is wrong. But it was hardly surprising that there would be kickback against it. Most people are not going to become nudists overnight and let it all hang out.

  • Neuroskeptic

    I'm sure that's true. That's why I said it would take a magic wand 😉 But I think with time, people will be persuaded.

  • Anonymous

    I do think the privacy issue is a lot more important and poses a bigger problem than acknowledged. Skull stripping will not guarantee that a ppt can't be identified. For one thing it will allow anyone having access to a labeled scan to identify past public scans of that subject (using a simple search & match algorithm sifting through a database). This is something possible *today* – and who knows what on earth people will be able to do with brains posted on the web in 10 years?

    I personally think it would even be unethical to ask for permission for essentially posting people's brains on facebook. But one might take the stance it would be ok as long as you have informed consent for doing exactly that (which most people don't have, nor blessings from ethics committees or data protection officers).

    Informed consent would entail telling people we want to post their brains on the web, don't know who on earth will access it or what they will do with it. Further, we cannot be 100% sure they won't be able to identify their identity as well as pathologies, risk factors, sexual preferences (think functional data) etc..

    I'd be interested to see how many 'true owners' of the data would consent. Data sharing is an important value in science, but not the only one nor the most important (by a considerable margin).

  • Neuroskeptic

    Anonymous: That's a fair point but would it really be possible to do that?

    I can see how, if you had 100 brain, you could match a known brain to one of them.

    But what if it were 10,000 brains in the database? Could you find your brain out of all those similar ones? Is brain structure that unique?

    Maybe it is, but has anyone checked?

  • Juliano Assanjo

    Anonymos: Nice idea, indeed.
    Neuroskeptic: I also thought, after reading Anoymos's comment, that it is a fair point. Sorry if the tone of my previous comment was a bit high, and thx for keeping it up, as I am a strong believer in data/tools/science sharing. The good thing is that I do think that “Automated publicly logging” is a smart troll ! ?

    Nonetheless, I will give it a shot to rebuttal Anonymos’s comment, although, it is not an easy one. I will assume that it is possible to retrieve some query (in possession) brain from a (public) dataset that contains 10,000 brains (or even more), and that brain structure is unique? I will further assume that this is possible with or without skull stripping, thus, I will not talk about skull stripping, or facial features scrambling (in case we wanted to keep the original brain structure attached to the original skull). This leaves us with an important issue, which is the privacy, and what policy to use to maintain it.

    Anonymous is assuming that there are few dishonest scientists/researchers/users who will do brain (structural/functional) retrieval, then, they might uncover the identity of the retrieved brain, and further, they will have private info related to pathologies, risk factors, sexual preferences (think functional data) etc. Thus, it is unethical to share brain scans. It is unethical just because someone out there can use these scans in an unjustified manner, and mostly for evil purposes, right?

    Now, the issue here is that, following the same logic, it is possible that scientists who belong to the group that collected the brain scans (or currently hosting these scans) to do that evil as will. And this implies that (any) brain scans collection is unethical, since there is a probability that these brain scans will be used in some evil way (by the owners/collectors). The most serious issue here is that those possessing the brain scans do not have to bother searching a database of brain scans to identify anything, they can just do it and they already have all the identities/information, and that is even more critical with respect to privacy. Dishonesty is dishonesty (probable) even if we write in the consent letter/contract that brain scans will not be used outside the designated study. But if this is acceptable and adequate to address the privacy, which I don’t think does, maybe we can ask those who want to use the shared scans to sign a letter/contract admitting that they will use the scans for an honest purpose.

  • The Original Nitpicker

    I am sure it matters little in the big scheme of things, but the Nitpicker who posts in this thread is not me – unless I'm suffering from some severe case of amnesia. I am the nitpicker who has been posting comments on this blog over the course of the past year or so.

    This is obviously one of the dangers of posting anonymously without an associated account. This other poster has as much a right to post under this name as I do. I just wanted to clear up any confusion this may cause.

  • The Original Nitpicker

    Now with the disclaimer about Nitpicker's multiple identities out of the way let me say something on topic:

    While I generally agree with the notions of transparency and that sharing data amongst researchers would theoretically be a good thing, I think several of the comments here are highly misguided. It's just the same as calling for Stasi-like surveillance of the population, fitting every citizen with a tracer chip, recording every book or article you read etc. Most sensible people here would fiercely object to such ideas but somehow because it is related to our favorite topic of scientific research this approach is suddenly desirable? Obviously, I know that the comment was tongue-in-cheek because this is not even remotely realistic – but that's not the point. The point is that this attitude is extremely dangerous and should be opposed vigorously so that it can't take root. It is also the equivalent of burning down the house to deal with your vermin problems.

    It is true that there are wide-spread problems with mishandling of data and unaccounted analyses. The current climate in science publishing encourages bad practices. There are even high-profile cases of outright fraud.

    None of that is good but the way to change that isn't this Big Brother vision of total transparency. What we as a community must do is change the culture of how science is done, what practices are respected and valued by the community. We must change the way publishing works and start to move the monopoly from private publishers getting rich on our backs to focus on what is conducive to doing good science.

    There is no magical solution, and there is not even a simple solution. This will take time, there will be a lot of opposition from numerous sides, but in time things can be changed. And the fact of the matter is that things are already changing. Open access is wide-spread now. The first journals are playing with the pre-registration of scientific protocols that NS is frequently advocating here. And many journals and even more scientists are starting to value good science, rather than good results. Instead of constantly asking for a revolution that is both unrealistic and misguided, we must encourage this evolution that is already in full swing to progress further.

    I'd have some things to say about why I feel sharing raw MRI data is also misguided but this comment has already gotten too long so I will stop here. Let me just summarize this by saying that individual's have a right to their own brain data. Unless they explicitly consent to it being shared with the wider public this must not happen. Skull-stripping isn't an answer. You don't even know what sort of information you can extract from these data that we don't even know about yet. It's highly unethical and irresponsible to expect that this should simply be public knowledge. For one thing, data protection laws in many jurisdictions in fact require that personal data should be destroyed after a certain period. And more generally speaking, I bet few of the advocates of data sharing would be okay with genetic samples being made available to everyone. This is not really much different.

  • Neuroskeptic

    But everyday people don't need to be tracked because their lives are private. By the same token I don't think all clinical scans need to be public.

    But research is about finding out truths about the world, which is the same world for all of us i.e. it is a public activity.

    For the same reason legal & political hearings are held in public because they concern the public good & are accountable to the public's opinions. A secret court is inherently dangerous.

  • The Original Nitpicker

    I knew you'd say that 😉

    I see your point but I still think it's misguided. It's the attitude that 'You don't need to be a afraid of surveillance if you have nothing to hide.' It's simply not true.

    Most researchers are adults who should be given some slither of responsibility to act ethically. We must encourage this more but we don't have to turn science into this Orwellian nightmare scenario. And it isn't like there already isn't any oversight locally.

    Another point is of course that such a degree of data sharing would quickly become largely unmanageable. It may be technically feasible to store all MRI data in public repositories but who is going to look into that? This is in fact the practical downside of surveillance societies: you would have seen that in East Germany under the Stasi and you have similar issues with the wide-spread CCTV coverage in the UK today. There is simply too much data to be mined. The problem might be somewhat alleviated by crowd-sourcing (which the other examples don't/didn't have) but that is still not sufficient.

    I still think the theoretical dangers of surveillance are greater, but the practical problems are clearly what makes these notions totally unrealistic. Instead of wasting serious effort contemplating such ideas I think it would be time better spend thinking about ways to realistically encourage better practices and a more transparent climate in research.

  • Anonymous

    I second the Original Nitpicker on this crucial point. The idea to keep individuals from wrongdoing by total surveillance reflects two things: A deep mistrust towards the individual and a high level of trust towards the all-powerful, well-meaning observer. I don't think the latter is deserved. History has taught us a few lessons on this.

    @ Neurosceptic: I think it's highly probable brain recognition would be possible. Why shouldn't it. Brains don't seem any less different to me than faces. And we have pretty good face recognition algorithms even by today (working on a *lot* more than just a couple of hundred of faces). And even if we don't know whether it works (or will work), that's still an ethical concern, right?

    @Juliano: If I ask a ppt to consent to take part in a study, she will have to trust me to stick to what the consent form says I will do with the data – the contract between the two of us. You're absolutely right in that regard.

    Still in my opinion it's a truly different story If I ask her to trust that an anonymous, ill-defined pool of people that will do ill-defined things with the data, will restrict themselves to those ill-defined things that are kinda ok.

    And there is another problem: legal regulations regarding data protection are very different in different countries. Post in on the web, let everyone have access and the consequences of using data in a way prohibited will be very different depending on where the person obtaining the data will live. It is not the same story either way.

    All this is a bit abstract. An example not associated with data sharing, but demonstrating the possible harm associated with brains going viral is this one:

    You might remember the fun we had with it (and the rest of the web). Yes, it's anonymous (kind of, we know his age, gender, job type, likely nationality, IQ, medical history, number of children and hospital…). But at least *he* will know these are his pictures. I doubt the online fuzz was much fun for him…

  • Nitpicker

    I agree with everything the Original Nitpicker said.

    And sorry to have randomly picked the same handle… great minds think alike?

  • The Original Nitpicker

    No offense taken. I should probably register a name! Perhaps Neuropicker? 😛

    In any case I am glad to see that I'm not alone in my opinion. It is easy to call for idealistic improvements that are really terrible ideas in disguise.

    “The road to hell is paved with good intentions”

  • Anonymous

    Quick thought expt for those advocating sharing all brain data. One of the undergrads you scan today could be running for president in 25 years time. If that persons brain scan in online, someone will analyse it & say “his/her amygdala is smaller than average, he/she can't possibly be president” or similar. There are big privacy implications to too much data sharing.

    And I agree with the comment that open everything is too much like the Stasi. Yes, make the final published data open but not every analysis along the way.

  • bsci

    I want to second parts of SpongeBob63's comment. The original opposition to the fmridc might have been from data ownership concerns, but it didn't survive because it was just too hard to use. In the article they blame the researchers for not having good data curation, but, even with perfect data curation at the time, it took days of work to meet their data submission specifications. If I knew others just sent them hard drives & said figure it out yourself (as they mention in the article), the group I was in could have saved some time.

    This is not to blame the fMRIDC effort because this is a hard problem that still hasn't been solved. The current great examples of fMRI data sharing are mostly resting data or task data or task data from a study designed to be shared publicly. There still isn't a great way to share fMRI task data.

  • Johan Carlin

    I think the debate is getting confused because we are simultaneously dealing with the (rather silly) idea that every time you run a GLM on fMRI data it should be posted on the internet and the (rather less silly) idea that raw fMRI data and the information needed to analyse it should be made available to other researchers on publication.

    As for the ethics issues of the latter, I guess I don't see them. At my institute volunteers consent that their (anonymised) data can be shared with researchers elsewhere. Maybe some sort of registration procedure would be necessary to stop the data from being accessible by non-researchers.

    There is a long tradition in medical research of anonymising data to a reasonable extent and then sharing it – see e.g. countless case studies in medical journals. So concerns about 'brain identification' seem far fetched to me given that brains alone arguably contain less identifying information than your average case study. If brain extraction or Freesurfer's skull anonymisation was added to prevent running face recognition algorithms on the T1 I really don't see an issue.

    Remember that if all data is anonymous, the mapping from name to brain remains obscure even if I can consistently ID the same brain across different sessions.

  • OctoSpider

    @Neuroskeptic 24 November 2012 08:11

    “But what if it were 10,000 brains in the database? Could you find your brain out of all those similar ones? Is brain structure that unique?”

    The answer will depend on the type of MR images (and resolution) used for the comparison, the homogeneity of the 10,000 sample, how atypical your brain is, and the degree of confidence required for the final identification. Hard to give a simple answer for this.

    Best case with sets of high resolution T1 and T2 images, from the same scanner, from a diverse (general public) population, then yes to a high degree of certainty.

  • BioEthicist

    @Johan Carlin. “Remember that if all data is anonymous, the mapping from name to brain remains obscure even if I can consistently ID the same brain across different sessions.”

    The anonymity and ethics issues are potentially significant.

    This is not about recognizing the face from the T1 image, the point is that any MRI image is a unique 3D biometric identifier of the person, much as a fingerprint is. The difference is that, like DNA, it also has potential predictive and diagnostic power.

    Conceptually I do not think this is too different from the idea of keeping all our medical records online *in the open* for all to see provided we use an ID number to identify the record, not an actual name. The benefits for medical researchers in such a system are clear. So are the problems.

  • Juliano Assanjo

    May I have your attention please….Will the real NitPicker please stand up, I repeat…. We're gonna have a problem here….

    @Anonymous…I hope you are still following this?
    “Still in my opinion it's a truly different story If I ask her to trust that an anonymous, ill-defined pool of people that will do ill-defined things with the data, will restrict themselves to those ill-defined things that are kinda ok.”

    Well, as a volunteer, I would trust more the anonymous ill-defined pool of people that will/might do ill-defined things, why? Because I am anonymous to those anonymous people. But, I am identified to the group whom I volunteered to. So, if we do simple Bayes risk analysis, we'll end up that the original group who collected my brain data is more dangerous than the Anonymous people/researchers out there.

    Now, if we will be able in the future to decode lots of information from sMRI and fMRI (although the noise and the temporal resolution is a drawback of current fMRIs), I would suggest that someone (especially government agencies) adopts fMRI bank (now we have revolutionary memory storage devices) so that in the future humans can data-mine the brains of (at least the famous people) those who passed away, and see what these brains were made of. Why Einstein's brain is in Musem? Imagine having Einstein's brain responses, or Mozart's, how does it feel to have such genius brains, and dig for information about them. (The biggest question is; what stimuli to use? Visual, verbal, memory tracing, resting state? etc). So, keep these data until that day comes.

  • The Original Nitpicker

    @BioEthicist: That's precisely my point. Skull-stripping is irrelevant as it can only deal with what we currently know is possible with structural MRI scans. But what about the potentially problematic information in these scans? What if someone were to discover a reliable biomarker for criminal behavior, the risk of serious illness, or predictors of political tendencies etc? Admittedly, some of these potential risks may be far-fetched but in my mind the focus should be on protecting the individual's personal data rather than on ensuring that every single data set ever collected is open access. The vast majority of these data will never even be looked at by anyone. Plus as methodology is constantly advancing some ideas may not be as far-fetched as it may seem.

    The idea that the greater danger stems from the original researchers who can identify the scans is also a flawed argument. That is why we have ethics committees. While there may be a more immediate danger from the original researchers, they must seek approval to use your brain data, they can only use it in the specified way, and usually they are required to destroy the information after a certain number of years. None of these provisions would apply to publicly available data.

    Naturally, nobody stops you obtaining informed consent from your subjects that their data will be shared and may be reused. There are databases of face images out there – and obviously those contain more identifying informations than brains. But the fact is that such repositories require ethical approval and the subjects' consent. Even if in future we require all MRI researchers to upload all their data, this can not retroactively apply to most data now.

  • bsci

    @O Nitpicker, I highly doubt there will be single-person diagnostic-level accuracy with the MRI and fMRI data currently being openly posted. That said, the real issue is the epidemiological or disease studies that might post genetic information, diagnoses, and family histories that are directly paired with MRI images. The MRI data is the most likely point for deanonymizing data.
    I don't think this has been demonstrated, but, if a volunteer is given some images of her/his brain and posts them in a public setting, like facebook, it is well within the realm of possibility to match that image to a brain in public databases. Such a match gives access to all other data that is linked to the brain scans.

  • The Original Nitpicker

    @bsci: I agree it's not very probable with most current fMRI data. More likely with structural data. But that's not the point. You can't just safely assume that it's anonymous and that the probability that someone might find something in those data that might raise ethical concerns is close to zero. Data transparency is a virtuous goal but ethics must come first. Moreover, the data we are currently collecting are constantly improving, e.g. you can clearly make out anatomical detail in high-res EPI scans.

    In my mind this is a real ethical concern, not a small technicality. It's like publishing someone's fingerprint against their explicit consent. Whenever I present actual brains I either ensure that they are processed in such a way that they are standardized or I take scans from subjects from whom I had explicit approval that their brains could be used in this capacity (e.g. my own scans). Anything else is irresponsible in my book.

    Here's what it all boils down to in my book:

    1. Brain images are unique, personal data containing potentially diagnostic data.

    2. We do not typically have approval from subjects to share their data as widely as these proposals would expect us to. Subjects certainly do not typically give informed consent that such data will be made openly available. So before any such repositories take off such broader ethical approvals will be necessarily need to be enforced on a large scale.

    3. There is nothing wrong with sharing processed data which has truly been stripped of personal info. Anything that says subject no. X showed response Y in region Z (plus basic demographic detail like age, gender, handedness) is presumably fine to be publicly available as long as nobody outside the researchers has access to the original data.

    4. Processed data are also what most people would actually want to have access to. There may be some who will want to preprocess the data with different parameters; however, there are options to do that now even without open data access: you could ask the authors for collaboration. It's also more ethically justified to share standard data with explicit researchers rather than making it all public. But importantly, if you don't obtain access to these data, you should perhaps bite the sour apple and replicate the original experiment.

    5. The previous point leads to a pragmatic issue: the vast majority of these data sets will not be looked at by anyone. I'd be interested in really seeing how this system would fare in action but my guess is only a small proportion of raw fMRI data will be reused in any way. Putting all the ethical issues aside this is just a major waste of resources that could be applied much better elsewhere.

    6. I don't see the point. What good will it do if you upload all those data? Is it to stop outright fraud and researcher degrees of freedom? These are noble goals but will that really prevent these problems? Neuroskeptic's Orwellian vision of MRI scanners automatically uploading every scan to a public repository notwithstanding, there is absolutely nothing from stopping any particular researcher from cherry picking their data before making them public. This merely shifts the problem, it doesn't fix it. There may be techniques to spot such untoward practices, but I wouldn't count on them working.
    If on the other hand the idea is simply to make large data sets available for further analyses, that is a different story. Such data sets are already available now and they can be set up with exactly that goal in mind and the appropriate data protection and ethics procedures.

    Sorry, this comment became very long. I hope to have clarified my position sufficiently now and so will refrain from commenting in this thread further.

  • Juliano Assanjo

    @The Original Nitpicker …
    If brain images are unique, personal data containing potentially diagnostic data, then, maybe people should stop volunteering for any experiment containing structural and/or functional brain scans.

    Now, here's something unethical about the private consent (i.e. given to the data collector). The is (as the one above) also a hypothetical scenario; that someday a volunteer might go to court accused for some crime, and the judge will/might ask for his brain scans (if somehow he knew it was scanned before, say a divorce case with his wife knowing about the brain scans) to search for some evidence via the assumed (above) diagnostic.

    Will volunteers accept this as part of their consent? Did they?

    The best option, in this case and the case that “The Original Nitpicker” is raising would be to look for volunteers who will accept their scans go public, or, being exposed in the court of justice, probably someday, in which case the private owner himself delivers the scans.

    Would it be feasible to questionnaire people and ask them about these two hypothetical ethical issues?

  • bsci

    @O Nitpicker,
    The raw data is already proving very valuable. Particularly for resting data, there is a boom of very large sample size publications using these data sets. In addition, different groups have been able to test out multiple data processing methods on the same large data sets, which means we can better understand the subtle effects of processing. For all the talk of “big science” making the raw data openly available lets small groups of mathematicians, statisticians, and engineers access and examine data that they don't have the expertise or resources to collect well.

    The issues regarding consent & anonymization are also being considered. Here's a quote from another article that, I suspect will be paired with the original one discussed in this blog post:
    Mennes, M., Biswal, B., Castellanos, F.X., Milham, M.P., 2012. Making data sharing work: The FCP/INDI experience. NeuroImage.
    Realizing the need to minimize the potential for breach of privacy as the sine qua non of open access data sharing, the FCP steering committee agreed to full anonymization of all datasets in accordance with the U. S. Health Insurance Portability and Accountability Act (HIPAA).
    Specifically, the 18 types of protected health information (PHI) identified by HIPAA (Gunn et al., 2004) are removed from all datasets prior to upload to the FCP site for distribution. The general consensus is that once fully de-identified in compliance with HIPAA, a dataset is no longer considered to be subject to the same rules governing human research (Freymann et al., 2012). With that said, a few local ethics boards have required investigators to re-consent participants to obtain explicit agreement that their data may be released in any form, even if the data are fully de-identified and anonymized (the coding algorithm is destroyed so no links can be traced between the released data and personal identifiers). This inconsistency reflects a need for more explicit guidance by oversight and funding agencies including the US National Institutes of Health (NIH), the National Science Foundation (NSF), and their international counterparts. Additionally, it highlights the need for researchers around the world to adjust their consent process immediately to inform participants that their brain imaging and phenotypic data may be shared, whether in the short run, or one or more years after study completion and publication of initial findings.

  • The Origipicker

    @Juliano: Hypothetic scenarios aren't really the point for me. It's simple really: if you want to make all brain data public, you must first obtain informed consent from the subject that they are okay with their data being used in that manner. Informed consent in this context does mean to clarify the potential risks there may be. It need not be some highly hypothetical example like the ones you mention. Anyone with a scan of your brain at hand would be able to match that in the data base. They may not even need a scan as things like age, race or gender can probably be determined with fairly high confidence. Finally, demographic info included in a typical “anonymized” data set can help you identify a subject by process of elimination. There are countless reasons why you might not want to have your data exposed to the whole world outside of what can be determined from your brain scan. Say you participated in an experiment on psychopathic tendencies or sexual preferences or various measures predicting health conditions. Surely that sort of information shouldn't be marked with ID tags for anyone to use, regardless of whether they are governments, insurance companies, or just individuals with nefarious motives.

    My main point is really this: transparency and scrutiny in science are a good thing but they are far outweighed by need to protect the individual's data. We can overcome the issues with science without that, especially because it is complete overkill as few data will actually in truth be looked at by anyone.
    And this is really the last word I will have in this thread ;->

  • Robert P. O'Shea

    What if it became possible to identify from a scan whether someone holds unpopular political opinions? It is then possible that the researchers could be forced by some oppressive regime to divulge the identity of the person to the secret police.

  • Neuroskeptic

    I wouldn't call my proposals Orwellian. I prefer to think of them as a Panopticon.

  • The Origipicker

    @NS: Also a nice metaphor. Turn science into a prison. Would you really want to do research under those conditions?

  • Neuroskeptic

    Yes, I'd love to, so long as everyone else was, so it wasn't a disadvantage.

  • Juliano Assanjo

    It is obvious that NS's proposal is (the) non-Orwellian, but the contrary is (the) Orwellian. Hmm…Panopticon does not imply prison, unless the earth is our big prison.

  • Origamipicker

    @Juliano, the Panopticon idea was mainly focused on how to design a prison although admittedly (as the wikipedia article NS linked states) it could apply to any institutions, hospitals, asylums, etc. All wonderful images for how science should work?

    In truth, this is not a great metaphor after all. NS doesn't propose a panopticon where one person watches all others but one where everyone watches everyone else. That's slightly better and more consistent with the way our modern world of social networks and smart phones is heading.

    All the same this doesn't strike me as a desirable future for research. On the one hand, everybody watching each others steps is not a pleasant thought, and on the other this idea will generate such an overload of data. Some (say, high impact papers on social cognition) will be scrutinized over and over by obsessive know-it-alls and the whole field will stagnate. But the majority of data in those archives will just be ignored and collect digital dust. Instead it would be far more sensible to just independently attempt replications of work you think should be tested.

  • Neuroskeptic

    Why is everyone watching everyone an unpleasant thought? It's already the norm in, say, experimental particle physics where the sheer scale of experiments means everyone knows about them long before them come on. Did the guys working on the Large Hadron Collider think, “goddamn it, wish we could do this in private?”

  • Juliano Assanjo

    1- “On the one hand, everybody watching each others steps is not a pleasant thought”
    JA: Science (in general) will be more challenging, more productive, and much honest this way.

    2- “and on the other this idea will generate such an overload of data.”
    JA: Memory and data management is very cheep (and small) nowadays.

    3- “Some (say, high impact papers on social cognition) will be scrutinized over and over by obsessive know-it-alls and the whole field will stagnate.”
    JA: It doesn't matter as long as we need to get to the truth, this will help us understand better every effect, and develop faster, isn't that our aim? It will not stagnate because there will always be fresh blood. We are still appreciating Copernicus for publishing his heliocentric (sun-centered) model in 1543, while it is not.

    4- “But the majority of data in those archives will just be ignored and collect digital dust.”
    JA: I can't see any problem with this one, although I believe this “digital dust” would be very, very important in the future as scientists will have more computational-diagnostic power and they will make use of these (probably) to get conclusions way beyond our right now, then, they will be able to make marvelous findings on several issues, I can't name any, but, that I can trust, and do wish for. Who would have thought that humans (pioneered by C. Darwin) would make use of fossils to investigate evolution, or scientific research on Egyptian mummies (I guess someone at that time said that; we are crazy to Mummify these damned Corpses).

    5- “Instead it would be far more sensible to just independently attempt replications of work you think should be tested.”
    JA: First, it is a waste of (tax) money, second, I will (probably) always rebuttal your (different) results by saying that you couldn't correctly replicate my experiment, I have many (rational) excuses; different scanner, scanner calibration is not correct, the sample is different, the subjects were not trained adequately, or, over-trained, etc. Plus, by sharing data we will all move forward faster.

  • Origamipicker


    1 – “Science (in general) will be more challenging, more productive, and much honest this way.”

    I see that argument. As I've said earlier, it is also the same argument as “You have nothing to fear, if you have nothing to hide”. I don't subscribe to this notion. Making your data available to everyone, including your competitors and your worst enemies, will set you up for an onslaught of petty vendettas and countless other politically motivated analyses of your data. In the data sharing argument is always this implicit assumption that allowing others to look at your data will make it more honest. Well, the ones looking at your data are people, too, and they can also have dishonest motives. And my hunch is that those are the types of people who will want to look at other people's data the most.

    2 – “Memory and data management is very cheep (and small) nowadays.”

    That's a flawed argument. Memory and data management supplies are growing with the demand. fMRI data sets of today are already substantially larger than those of only a few years ago. With higher resolution and additional methods becoming available now, the requirements will multiply even more in the next decade.

    Furthermore, I wasn't talking about the overload on technology. I don't doubt that it's technically feasible to archive all MRI data securely. But the management of such an archive to be inspected readily by people all over the world is a different story. I'm repeating myself but there will be far too much data for anyone to really look at. And 90% of the data won't be looked at by anyone because nobody will care!

    3- “It doesn't matter as long as we need to get to the truth, this will help us understand better every effect, and develop faster, isn't that our aim?”

    This should be our aim but my point is that this wouldn't be facilitated by having full data access. For every paper of impact you would have an onslaught of people trying to reanalyse the data and squeezing all sorts of conclusions out of it. Many of those will be complete red herrings or outright erroneous analyses. The original authors and the whole community at large would get bogged down in lengthy reanalyses instead of doing what they should be doing to drive science further: validating results through replication and follow-up experiments and the framing of new testable hypotheses.

    4-“I can't see any problem with this one,…”

    It's not a problem in and of itself other than that it is a massive waste of resources. fMRI time series from some arbitrary experiments aren't ancient Egyptian mummies. The most interesting information you can retrieve from that in a 1000 years will be “Look, brains looked pretty much the same then as they are do now. But damn their brain imaging equipment was bad!”
    The kind of data that is worth preserving is our experimental designs and theories. They can be validated through replication.

    5-“First, it is a waste of (tax) money, second, I will (probably) always rebuttal your (different) results by saying that you couldn't correctly replicate my experiment”

    No, this is the essence of scientific validation. One replication isn't enough, there need to be many. Data sharing isn't really helping here at all because at the end of the day you will still only use the same data set the original authors uploaded. If the effect they reported is specious, but robust to all the data massaging you can apply to it, that still doesn't make it any truer.
    As for wasting money, I never said people should waste their days just replicating other people's work. Any good scientific design incorporates replication as the backbone on which to make new discoveries. What must happen is that unsuccessful replications are published. These are already done every day.

  • Neuroskeptic

    “Making your data available to everyone, including your competitors and your worst enemies, will set you up for an onslaught of petty vendettas and countless other politically motivated analyses of your data.”

    It would also protect you from exactly such nonsense which currently occurs in e.g. peer review.

    And I really think you have a very negative view of the scientific community & process… I admit that sounds weird coming from me of all people but I believe that it's current organizational systems that cause most of the problems.

    Once scientists are liberated from these restrictions, they will be free to focus on data which is what we all want to do & most people will do harmoniously.

  • Origamipicker

    @NS: “And I really think you have a very negative view of the scientific community & process…”

    I wouldn't call it negative but realistic. On the large scale science is self-correcting and the pursuit of truth. But individual researchers on their own are flawed human beings like everyone else. I know that this is part of what your proposal is meant to safeguard against. My point is that greater transparency and giving more power to greater masses is also associated with dangers. Idealistic proponents of the notion that everything should be open and free to everyone tend to usually neglect that.

  • Juliano Assanjo

    @ Origamipicker

    1- “For every paper of impact you would have an onslaught of people trying to reanalyse the data and squeezing all sorts of conclusions out of it.”
    JA: That'll be great, I am with it all the way, the same thing will happen to the theories we propose. It happened before to Einstein, Darwin, Fourier, and less famous scientists.

    2- “Memory and data management is very cheep (and small) nowadays…. That's a flawed argument…the requirements will multiply even more in the next decade.”

    JA: [Only if you mean are very cheep]. Well, let me give you an example, memory prices in 2012 are 500 times less than in 1999 (the time fMRIDC started). Although brain imaging resolution will multiply, memory prices will continue to decrease much, much faster cause they are widely/commercially used allover the globe. Same thing for data management, the competition is fierce, and tools and servers are becoming way, way cheaper. We are living the information boom era, cloud computing, optical communications/storage, quantum computers, etc, so, why not making use of it.

    3- “It's not a problem in and of itself other than that it is a massive waste of resources.”
    JA: The price of 2Gbyte (which can be used to store fMRI data) in 2012 is less than a US Dollar. I think people can tell which one is a massive waste-of-resources; putting one US Dollar (or slightly more) to store the data , or throwing that data that may worth around 100,000 (One hundred thou) US dollars to the garbage?

    4- “Making your data available to everyone, including your competitors and your worst enemies, will set you up for an onslaught of petty vendettas and countless other politically motivated analyses of your data. “

    That's what it's really all about. This statements rebuttals itself by itself, I hope more people are watching this.

    I can continue to rebuttal the other points, but, I have chosen only the important ones.

  • Dung

    I think this stuff must remain private.



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar