The Ethics of Research on Leaked Data: Ashley Madison

By Neuroskeptic | July 14, 2018 9:07 am

A paper just published reports that Republicans are more likely to have used the adultery website Ashley Madison than Democrats, while Libertarians were even more likely to do so.

That’s a claim that could ruffle some feathers, but the way in which the researchers conducted this study might be even more controversial. That’s because this paper is based on the 2015 Ashley Madison data leak, which exposed the personal data, including names and credit-card details, of millions of registered users.

For this study, the authors, Kodi B. Arfer and Jason J. Jones, took the leaked data and matched it up against voter registration records for five U.S. states. They considered a voter to be an active Ashley Madison user if they had ever paid money to the website. About 1 in 500 voters met these criteria.

Those voters registered as Libertarians were most likely to be active users, even controlling for age, gender and state. Registered Republicans came next and Democrats were least likely.

arfer_jones

Arfer and Jones conclude, provocatively, that:

Our results are perhaps the strongest evidence yet that people with more sexually conservative values, although they claim to act accordingly, are more sexually deviant in practice than their more sexually liberal peers.

Personally I wouldn’t put too much stock in these results, partly because there are innumerable confounding variables that weren’t considered in these models, but mainly because, as the authors themselves point out, Ashley Madison usage is rare but adultery is far more common (e.g. 21% of husbands in the General Social Survey, by self report). It’s not clear that Ashley Madison use is a good proxy for actual adultery, in other words.

But what certainly is interesting about this study is the ethics. In the paper, Arfer and Jones do not mention that any ethics committee/IRB approved their study [edit: but see the comments below: “We did get IRB approval, from a UCLA IRB. They certified the study as exempt from review (because it didn’t involve new data collection or interacting with subjects), although my understanding is that there was actually a full committee review”] I wouldn’t think they needed approval though, because they didn’t collect any new data from human participants.

The authors do discuss the ethics of their work, briefly. Acknowledging that the original data leak was an unethical and illegal breach of privacy, Arfer and Jones go on to say that this doesn’t make using the data wrong:

We believe that using data that were originally collected unethically is itself ethically permissible. To forbid such use would be closing the stable door after the horse has bolted. In the case of [Ashley Madison] in particular, not only have the data already been publicly available since 2015, it has been widely discussed in the news (e.g., Biddle, 2015; Lamont, 2016; Victor, 2015), with some reports even describing how to obtain and use the data (e.g., Paton, 2015; Prince, 2015).

We cannot undo the past, but we can make the most of the present by getting what social and scientific value we can out of undesirable events, whether those events are natural disasters, disease epidemics, or human wrongdoing.

Hmm. It’s a tricky ethical question, but I’m not sure I’m happy with this ‘horse has bolted’ answer. Publishing this study of the data might be seen as rubbing salt into the wound of those who were exposed in 2016, for instance.

CATEGORIZED UNDER: ethics, papers, science, select, Top Posts
ADVERTISEMENT
  • http://www.mazepath.com/uncleal/EquivPrinFail.pdf Uncle Al

    more sexually deviant” The only remaining sexual deviance is Mormon heterosexual plural marriage. More’s the pity.

  • Kodiologist

    Hi, I’m the first author. To clarify, yes, we did get IRB approval, from a UCLA IRB. They certified the study as exempt from review (because it didn’t involve new data collection or interacting with subjects), although my understanding is that there was actually a full committee review (and a lot of back-and-forth with a lawyer) due to the sensitive nature of the data. Let me know if you have any questions.

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Thanks! I will update the post.

    • RexDev

      “Our results are perhaps the strongest evidence yet that people with more sexually conservative values, although they claim to act accordingly, are more sexually deviant in practice..”

      Wow. If this is the strongest evidence, it’s pretty weak.

      From your “analysis” pointing towards less than 1% of the populations, you point to a conclusion that is frankly ridiculous.

      You are not a scientist; you don’t even resemble one. Get another job.

      You are making the world a worse place — some ingoramus will quote your “evidence” as fact.

    • DaveW

      If I understand the above correctly, you added to the leaked data by determining voter registration. That seems a further invasion of privacy to me and new data collection. Also, it is quite the leap of faith to go from party registration to degree of conservative sexual values. None of the Libertarians that I have met or read are at all conservative in their sexuality: most seem rather dissolute. Republicans and Democrats seem to overlap rather broadly, although Democrats do seem to be becoming very blue-nosed. What kind of error estimate is there for your assessments, how were the five states picked, and why wasn’t the number of times the site was used a factor in the analysis (surely that would be a stronger indicator)?

      • Kodiologist

        > If I understand the above correctly, you added to the leaked data by determining voter registration. That seems a further invasion of privacy to me and new data collection.

        It isn’t “data collection” in the sense that IRBs use to define human-subjects research because it doesn’t involve interacting with or measuring people; rather, it’s the use of existing measurements.

        > Also, it is quite the leap of faith to go from party registration to degree of conservative sexual values.

        You might have gotten the idea that we have more faith in the interpretation of the data for Libertarians than we actually do. The interpretation is clearest for Republicans and Democrats, for whom there is preexisting research on sexual attitudes. For Libertarians and Greens, it’s basically a guess, as acknowledged in the paper.

        > What kind of error estimate is there for your assessments

        See Table 2.

        > how were the five states picked

        We aimed to get a variety of states (in terms of political leanings, size, presence of major cities, etc.) and exploit the voting data my coauthor happened to already have access to.

        > why wasn’t the number of times the site was used a factor in the analysis (surely that would be a stronger indicator)?

        The short answer is that a binary approach allows for a much more straightforward method and interpretation. One certainly could try to make a richer measure of usage, and such an analysis would probably be worth doing, but there are a lot of choices to make, and issues arise such as: if you measured money spent as your measure of usage, how much would differences in expenditure reflect price differences rather than usage differences? And does greater site usage really have much to do with greater (intention to) cheat? It’s definitely not obvious to me that any given amount-of-usage indicator would be stronger in some sense than the binary approach.

    • DaveW

      Hi Kodiologist – thanks for replying, although your reply has only shown up on email and I can’t find it here. In response, in your paper:

      “American political conservatism emphasizes maintenance of the social order, individual interests (as opposed to collective interests), limited government, aggressive foreign policy, and traditional or religious values.” (1st sentence of introduction)

      Possibly in theory, but in practice every President since FDR (maybe Carter and Eisenhower excepted) has had a foreign aggressive enough to kill a lot of foreigners. Democratic controlled congresses have gone along with these wars at least to start – they were for the wars before they were against them – and elements of the Republican Party have always been against foreign adventures – they were against the wars before they were for them. Also, since FDR the Federal Government has been growing by leaps and bounds no matter who was president. Most politicians of both Parties, until very recently, claimed to hold strong Christian religious values (Trump would be an exception).

      Party affiliation per se is not a good predictor of the values in your definition for D&Rs and the house of cards logic that follows your 1st sentence did not convince me that it is.

      If you could measure actual conservative values in your AM sample, then maybe you could find some convincing relationship. Or gun registration – that might be interesting.

      “Libertarians are harder to rate in terms of overall conservatism, but on average are more conservative than Republicans” (1st paragraph of Results)

      This would be true only for limited government/individual rights in your definition. Libertarians are not for aggressive foreign policy, excessive reliance on religion, or the current social order, and apparently, they don’t live in Oklahoma (you really should has tossed OK – it is also an outlier compared to your other states).

      Table 2 seems to be a comparison of the results cranked through 6 models and doesn’t tell me if, for example, the difference between Republicans and Democrats has a high probability of being true outside the model universe.

      Anyway, thanks for the excuse to procrastinate (I have a pile of papers to edit and I really don’t want to do them) and for the courtesy of replying.

      • http://arfer.net Kodiologist

        Happy to oblige! My previous comment was spamfiltered. You should be able to see it on the site now.

        As a liberal, I’d be the first to agree that the Democratic Party isn’t as liberal as it’s sometimes accused of being or hoped to be. And as a pacifist, I’m particularly displeased at how hawkish the party is. My impressions about the relative sexual attitudes of Republicans and Democrats seem to be upheld by the previous research I cited. To be clear, I’d rather have had self-report sexual-attitude data of all the voters rather than their political parties. That would’ve helped more with the goal of the study while also making it less controversial. But then we wouldn’t have had a sample size of 50 million, etc.

        > you really should have tossed OK – it is also an outlier compared to your other states

        I don’t feel safe assuming that the variation is all of a scientifically uninformative sort, which is what one does by discarding an outlier. What’s more, the model with interaction terms can account for how the party effects differ between Oklahoma and other states.

        > Table 2 seems to be a comparison of the results cranked through 6 models and doesn’t tell me if, for example, the difference between Republicans and Democrats has a high probability of being true outside the model universe.

        Answering that question per se requires a full Bayesian analysis, which in turn requires an explicit statistical model. The current analysis is actually *less* dependent on the models being true. One of the virtues of predictive analysis is that if a model is too wrong to be accurate, its observed accuracy will penalized accordingly. I have a bunch of previous papers on this topic if you’re interested.

  • http://russwilson.coffeecup.com/ RustyRiley

    you can’t have it both ways — questioning the ethics of the study after saying “I don’t think they needed approval though …” so what would approval be necessary for? They didn’t disclose, I presume, the individual’s use, or non-use, of Ashley Madison data/voter details. They knew these details themselves but these are now public records; so “rubbing salt” — discomfort, would I be comfortable with it, no, but I think I’d have to wear it — a whole lot less uncomfortable than a lot of what passes for “journalism, in the public interest”, if you’ve ever been a victim of that. The interpretation though …

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Institutional Review Boards have a specific remit which, as far as I understand it, is to protect participants in research studies and experiments, by reviewing the studies before the data is collected.

      If the data is already collected, the IRB does not get to review re-use of the data. You need an IRB approval to collect data, not to analyze data.

      So I’m not trying to have it both ways.

      Now in fact it emerges that (although this is not mentioned in the paper) the IRB did discuss the study, and certified that the study was indeed exempt because it didn’t collect new data.

      I don’t think this precludes ongoing ethical discussion.

      • http://russwilson.coffeecup.com/ RustyRiley

        But you didn’t give grounds for ethical discussion — “rubbing salt” isn’t an ethical concern imho

        • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

          Well, look at it this way.

          Suppose I post revenge porn of you all over Facebook where our mutual friends can see it.

          Clearly, I am the one to blame for this outrage.

          However, isn’t there also a sense that our friends should not look at the images? That, as soon as they realize what has happened, they should avert their eyes and close the window, because it would be shameful to look at someone’s privacy exposed in this way, even though someone else exposed it?

          In a similar way, one could argue that we should not download or use leaked data because to do so would be taking advantage of someone’s breached privacy

          • http://russwilson.coffeecup.com/ RustyRiley

            I don’t accept this – for many reasons, too many to go into here. Instead, i ask you to consider why an experienced, professional IRB DID accept the study’s being done — a correct decision imho.

          • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

            So, as I understand Kodiologist’s comment, the IRB did not approve the study per se. They, after a full review, certified that the study didn’t need their approval because it is not collecting data.

          • http://russwilson.coffeecup.com/ RustyRiley

            re-read his FULL comment, not stopping after the first 2 1/2 sentences; especially the last sentence.

          • Jason Max

            This is not a good analogy. Wouldn’t you see a difference between friends pruriently viewing your pictures (and further violating your privacy, because now you have to see your friends and know they have seen your pics…) and researchers using your pictures? Say skin cancer researchers were using the images to determine the frequency of moles in the population, would that really seem as violating to your privacy as your friend seeing you naked?

      • DaveW

        Please correct me if I am wrong, but my understanding is that the political party registration was newly collected data. Presumably there now exists a database with party affiliation attached to the other information. How is this not a further invasion of privacy?

        • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

          The political party data came from voter registration records which are collected by the government. The researchers here only accessed those records, which isn’t considered “collection”.

          However I agree with you that there is now a database which is even more of a breach of privacy than the original AM leak database because it now contains voter registration data as well.

          The ethics of creating such a database – for scientific purposes – is what I’m pondering.

  • OWilson

    I have no sympathy for people who share their most personal data with online commercial interests.

    I agree with the author of this article that the scant controls in place for the study are way too loose!

    We coud also draw the conclusion that Republicans are (a) more pathetically naive and trusting, when it comes to online security. (b) a more libidinous lot, or, (c) have more desposable income to patronise these high end matchmakers!

    • Eric Johnson

      Alternatively, it may show Republicans are more likely to use such a service. Maybe democrats have more ready access to willing adultery partners?

      • OWilson

        Maybe! :)

  • Bobaloo

    And I’m sure the authors support the use of the German data on medical research during WW2, since after all no NEW people were killed to obtain it.

    Nothing like bad science being used to support personal beliefs.

    • Kodiologist

      Yes, we specifically mention the Nazi hypothermia experiments in the paragraph that Neuroskeptic quotes (although he didn’t quote that part). There’s no question in my mind that those experiments were monstrous, but refraining from using the data doesn’t unkill any of the victims.

    • 7eggert

      I think by throwing away the data gained you make killing those people even worse.

  • Facebook User

    How does voting republican equal “sexually conservative”? There have been plenty of studies that have shown that conservative voters are more sexually active and even more adventurous in their sexual practices. By the way, what use is labeling groups by statistics. Some races commit murder at 12 times the rate of other races. Should we judge the high murder race? I don’t think so.

    • Kodiologist

      In surveys of attitudes, Republicans, compared to Democrats, endorse more conservative opinions about sex (see the introduction of the paper). But what attitudes people endorse is a different matter from what they actually do, and this study is just one example in a long history of research in social psychology examining how well attitudes agree with behavior.

      The use of this study is not just to continue this sort of attitude-vs-behavior research, but to show how issues with self-report in sex research are serious but can be circumvented with alternative measures.

  • Scott Wilson

    So they cherry picked five states to get the results they wanted.

    • Kodiologist

      We report on every state we analyzed. We also made sure to include both left- and right-leaning states, and it turns out that the main points of the results are consistent across all the states we analyzed.

  • Darby42164

    “We believe that using data that were originally collected unethically is itself ethically permissible. To forbid such use would be closing the stable door after the horse has bolted.”

    Yes but contributing to the spread of unethically collected data does not strike me as ethical. Granted the people who use this site are not sympathetic victims, but this study draws renewed interest to the data set than if this study was not done. Hypothetically this additional attention brought to the data may make more people examine this data, potentially ending more marriages, and possibly damaging the victims mental health in the process. Here we are at the Discover Magazine site talking about a data set that honestly I had forgotten about.

    In medicine there is “Primum non nocere” (first, to do no harm). From Wikipedia:
    “Non-maleficence, which is derived from the maxim, is one of the principal precepts of bioethics that all medical students are taught in school and is a fundamental principle throughout the world. Another way to state it is that, “given an existing problem, it may be better not to do something, or even to do nothing, than to risk causing more harm than good.”

    Can you say that the renewed attention you bring to this is doing no harm? If several more marriages were broken up, adversely affecting the mental health of the cheater, the one cheated on, and any children they might have? Is that ethical? Doesn’t sound like it.

    • http://arfer.net Kodiologist

      It’s a difficult question. I see your point.

      A lame rejoinder is that we hit on the idea for this study while the leak was still news, in the latter half of 2015. The chief reason it took so long to publish wasn’t the study itself, which we could conduct quickly since we already had all the data, but the IRB process, which was especially slow.

      My more serious reply is that ethical responsibility isn’t the same thing as mere causation, and a good thing too, or everybody who so much as acknowledges the existence of this study would be committing an evil act by increasing awareness of the leak and thus perhaps leading to another marriage self-destructing. (In general, my opinion is that treating everybody as ethically responsible for *all* the consequences of their actions is a philosophy that sounds good on paper but is impossible to fulfill in practice. It is consequentialsm taken to an absurd extreme. But that goes deeper into meta-ethics than is probably appropriate here.)

      Besides, it’s not clear to me that marriages breaking up from one of the spouses learning that the other cheated is always a bad thing. I’m sure the cheater would like his or her cheating to be private, but the spouse might legitimately want to know.

  • Michael V.

    Kodiologist, Is it possible to get a full copy of your study? I would like to present it to a class of mine at a university.

    • http://arfer.net Kodiologist

      Sure thing: http://arfer.net/projects/cheat/paper . If you want the official PDF, I don’t believe Springer will let me post it on the Internet yet, but I could email it to you.

  • Jonathan O’Donnell

    Troy Hunt is a security specialist that has thought about these issues, since he runs as site called “Have I been pwned”. The site allows you to check if your personal data has been compromised in data breaches.

    Generally, you can type in your email address and the site will confirm whether your personal data is included in the breach.

    He didn’t do that for the Ashley Madison breach, as this system allows you to determine if anyone’s data is in the breach. For Ashley Madison (and similar breaches, such as Adult Friend Finder), he invoked the concept of ‘sensitive’ data. Privacy legislation in many jurisdictions recognises that some data is more sensitive than than other data, and so the law requires you to be more careful with that data. In Australia (where I am from), the legal definition of sensitive data includes membership of a political association and sexual preferences or practices. YMMV.

    Troy talks about how he handled the Ashley Madison breach. You can see him refining the process as he goes.
    Here’s how I’m going to handle the Ashley Madison data, by Troy Hunt, on Troy Hunt’s blog, 29 July 2015
    https://www.troyhunt.com/heres-how-im-going-to-handle-ashley/

    He has also provided an insight into how this breach affected the people in it, through some of the questions and comments that people had sent him. In it, you can see that people are scared, and that they are feaful of the ongoing ramifications that this will have on their lives.
    Here’s what Ashley Madison members have told me, by Tory Hunt, on Troy Hunt’s blog, 24 August 2015
    https://www.troyhunt.com/heres-what-ashley-madison-members-have/

    I understand that his service is different from the way that you conduct your research. However, I think that it is useful to compare the two, as they help to tease out some of the issues.

    His environment is different as well – he doesn’t work at a university. You can see this when he says:
    “I’ve been asked a few times now what the process for flagging a breach as sensitive is and the answer is simply this: I make a personal judgement call.”
    The Ethics of Running a Data Breach Search Service, by Troy Hunt, on Troy Hunt’s blog, 25 September 2015
    https://www.troyhunt.com/the-ethics-of-running-a-data-breach-search-service/

    He doesn’t need to go through an ethics committee or institutional review board. We do. We work within a structure that provides rules and guidelines. You followed those guidelines, and your research was approved.

    So I think that this gives us two different issues to talk about
    1. How do we run the system: What is the role of ethics committees and institutional review boards in an age of breached data? That is, do we need to tighten up the framework?
    2. How do we police ourselves: As a researcher, has your attitude changed based on the discussion that has taken place since this publication?

    Here are a couple of questions that might help to shape your thinking around question two:
    + Would you do a followup study (eg with the Adult Friend Finder data)?
    + Would you pay for access to breach data for research purposes?
    + Would you use breach data for research if it was provided to you anonymously (ie not yet public)?
    + Would you use breach data for research if it was provided to you by the hackers (ie not yet public, and from an illegal source)?
    + Would you use breach data for research if it was provided to you by the organisation that was breached (ie not yet public, and from the data owner)?
    + Would you use breach data for research if it was provided to you by a trusted third party (ie not yet public, and from another researcher, for example)?
    + Would you collect the data yourself, for research purposes, if you found that it was ‘leaking’ (eg publicly accessible, but only in a manner that the site owner did not anticipate, such as URI-manipulation).
    + Would you only use breach data that had already been released to the public by someone else (ie anonymous or known source, no payment involved).
    + Would you link multiple sets of breached data for research purposes?

    The reason I ask is that there is a growing swag of breached data out there.
    Pwned websites, by Troy Hunt, on Have I Been Pwned, undated – accessed 6 August 2018
    https://haveibeenpwned.com/PwnedWebsites

    All the data in this list has been verified by Troy. He doesn’t pay for data, but people (both trusted and untrusted) do send it to him. He often works with the organisations involved, but not always.

    If you want to, you could build a research niche in this area. I wouldn’t, but YMMV.

    Jonathan O’Donnell
    Research Whisperer

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar
+