Predicting Suicide: A Statistical Scandal

By Neuroskeptic | May 2, 2014 4:57 pm

A shocking piece of statistics has been uncovered in a paper published in a respectable psychiatry journal.

The offending article, Electrodermal hyporeactivity as a trait marker for suicidal propensity in uni- and bipolar depression, appeared in 2013 in the Journal of Psychiatric Research. It examined whether an ‘electrodermal hyporeactivity’ test – based on measuring the electrical conductivity of the skin – could predict suicide attempts in depressed people.

According to the authors, Lars Thorell and colleagues of Sweden, the test worked well. Their abstract said:

RESULTS: The high sensitivity and raw specificity of electrodermal hyporeactivity for suicide were confirmed… The findings support the hypothesis that electrodermal hyporeactivity is a trait marker for suicidal propensity in depression.

Sensitivity and specificity are two key yardsticks by which any diagnostic or predictive test can be judged. Broadly speaking they refer, respectively, to the test’s ability to avoid false negatives, and false positives. A high sensitivity and a high specificity mean that a test is an accurate one. Which is exactly what Thorell et al. found… right?

Er… no.

They reported sensitivity, but not specificity. Instead they reported something they call ‘raw specificity’. What is this? Well… it doesn’t exist. Thorell et al. just made it up. The term is unknown in statistics: it does not appear on Google Scholar in any other paper (there are a few ‘hits’ but upon closer inspection they are all referring to the old-fashioned specificity of some ‘raw’ variable.)


It turns out that by ‘raw specificity’, Thorell et al. were referring to the metric known to everyone else in the world as negative predictive value (NPV). NPV is an important metric in its own right, but it’s in no way a substitute for specificity. It makes no sense to evaluate a test by looking at sensitivity and NPV. A first-year undergraduate would get a failing grade if they did that in an exam.

I’m stunned that Thorell et al passed peer review but as so often, it fell to post-publication peer review to save the day. The Journal of Psychiatry Research has just published two letters (1, 2) from outraged readers, pointing out that ‘raw specificity’ is a nonsensical concept. One of the letters is by a student who’s currently enrolled in an Honors Program and is due to graduate in 2016. I wasn’t kidding when I said that this is the kind of error that would shame an undergraduate.

So did the test work? Well, the actual specificity (maybe Thorell et al. call this the ‘cooked’ specificity?) of the electrodermal test was 33% over all patients. The sensitivity was 74%. The sum of sensitivity and specificity was 107%. To put this in context, an entirely random ‘test’ will get you a sum of sensitivity and specificity equal to 100%, while a perfectly accurate test would get a sum of 200%. So the electrodermal test’s true performance is just 7% better than flipping a coin.

In a rebuttal letter, Thorell et al. don’t dispute any of the facts above, but rather they argue that various special considerations inherent in testing for suicide mean that specificity is a poor metric, and ‘raw specificity’ is a better one. Their arguments sound vaguely plausible but however you try to rationalize it, the fact is that even a purely random test could have an extremely high sensitivity + ‘raw specificity’.

I will now proceed to design a suicide prediction technique that outperforms Thorell et al.’s electrodermal test. Watch in amazement! My proposed test is simple: the patient picks a card at random from a standard deck. If it is any card except the Ace of Spades, I declare them a suicide risk. If they pick the Ace of Spades then I say they’re not. In other words, I randomly assign a suicide risk to 51/52 or about 98% of people.

In Thorell et al. there were 783 patients, of whom 120 turned out to be suicidal, while 663 were not. In this sample, my Ace of Spades test has a sensitivity for detecting suicide of 98%, and a ‘raw specificity’ of 85%, total 183%! My pack of cards are much better, in other words, than Thorell et al.’s test, which had a sensitivity of 74% and a ‘raw specificity’ of 88%, totalling a mere 164%.

It’s clear that there is no substitute for the old-fashioned sensitivity and specificity, which Thorell et al. should have used in the first place.

Hat Tip: Bernard Carroll.

ResearchBlogging.orgCulver, A. (2014). Letter to the Editor: Specificity of electrodermal reactivity testing for suicidal propensity in Thorell et al. Journal of Psychiatric Research DOI: 10.1016/j.jpsychires.2014.03.013

Mushquash, C., Weaver, B., & Mazmanian, D. (2014). Reporting sensitivity and specificity for suicide risk instruments: A comment on Journal of Psychiatric Research DOI: 10.1016/j.jpsychires.2014.03.014

  • calling all toasters

    You are, of course, absolutely right about the bungling by the researchers. Or it’s possibly motivated cognition, as Cutler points out. The use of specificity and sensitivity is weird anyway. Shouldn’t they only be used when you have the population parameters? Otherwise you can raise one and lower the other by changing the n in one of the groups, assuming the proportions of positives stays reasonably constant. Maybe their data is a good estimator of the proportion of people with mood disorders that are hyporeactive, but I wouldn’t bet on it.

    The data may have some value despite it all, if not as a clinical predictor of suicide, at least in finding a real effect. The OR for completed suicide was 2.34, and the OR for attempts was 1.82.

    Oh, except for one little thing… the test-retest reliability for reactivity is shockingly bad. I calculate it as r = 0.011. So who knows what we’re talking about?

  • Uncle Al

    The simple solution is to mandatory teach suicide in the primary grades.. Student Inspiration and curiosity would be killed. Anybody attempting suicide thereafter would be non-lethally unskilled.

    Suicide is a self-cleaning toilet. It should be encouraged. What part of the Beltway would not be improved if 10% of its work farce pulled its own plugs over a weekend? Declare 04 May to be National Suicide Day. “May the fourth be with you.” Subsidize workshops. Award priority to diversity applicants to be fair..

  • Pingback: ScienceSeeker Editor’s | ScienceSeeker Blog()

  • Gerald Te Rito

    Im know intellect but have done research on electromagnetic hypersensitivity that’s been documented at the UN. Most cases involves young children maturing on to adulthood and vital organs are sensitive to electromagnetic fields. In adulthood anybody is sensitive to deliberated harassment by other users by technology in electromagnetic harassment technology over a period of time will lead to suicide. 98 percent of the population is probably true. ..

  • Scott S.

    Data mining at its finest.

  • Pingback: Weekend reads: Shocking suicide statistics, scientists say they’re over-regulated, the real @FakeElsevier | Retraction Watch()

  • Pingback: Jak badać i jak nie badać chorób psychiatrycznych | nic prostszego()

  • Pingback: Know Thy Model: Specificity and the Importance of Using Fake Data | Zachary David()

  • Retired Nurse

    If you want to see some utterly crap statistics, take a peek at those being fed into palliative care journals about the liverpool care pathway, and the ‘dementia timebomb’ – all used to support G8 cost savings….

  • littlegreyrabbit

    I would just say that the tone of this post is a bit over the top. While I would accept the authors may not have used the most transparent and readily understandable statistical measure, a transparent and understandable measure such as odds ratio suggests this is quite a powerful factor. It is to me surprising to see evidence (or rather confirmation of a previous finding) that a single gross anatomical/physiological parameter could have such a predictive effect on suicide at all. The idea that you would see a stronger effect that 107% for combined sensitivity/specificity would be extraordinary. That might confirm your contention that it was a poor metric to use, but the casual reader might have come aware with the impression that the authors had been trying some manipulative statistical trick or their finding was not significant – when neither is true.
    I put a similar comment on Retractionwatch where this post was singled out for mention, but in the increasingly capricious moderation of Oransky and Marcus it didn’t make it through. From a blog which served a moderately useful function of collating retractions in a single place it is beginning (beginning?) to suffer delusions of its own importance.

    • Neuroskeptic

      I’m not saying anything about the odds ratio, that is not the point of the post. If the odds ratio was so impressive they should have used that as their headline claim e.g. in the Abstract.

      Instead they did this: they invented a novel name for a metric (NPV) that already has a well-established name. Their new name is misleading because it leads the casual reader or skim reader to assume that NPV is related to ‘specificity’, which it isn’t. Whether this is ‘manipulative’ I will leave for you to judge.

  • Pingback: I’ve Got Your Missing Links Right Here (17 May 2014) | Gaia Gazette()

  • Pingback: Monday Miscellany: Masterpost, Metaphors, MOVING » Gruntled & Hinged()

  • Pingback: Predicting Suicide: Return of a Scandal (Part 1) - Neuroskeptic()



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar