Using Science to Sniff out Science That’s Too Good to Be True

By Neuroskeptic | September 19, 2012 9:15 am

Neuroskeptic is a neuroscientist who takes a skeptical look at his own field and beyond at the Neuroskeptic blog

Fraud is one of the most serious concerns in science today. Every case of fraud undermines confidence amongst researchers and the public, threatens the careers of collaborators and students of the fraudster (who are usually entirely innocent), and can represent millions of dollars in wasted funds. And although it remains rare, there is concern that the problem may be getting worse.

But now some scientists are fighting back against fraud—using the methods of science itself. The basic idea is very simple. Real data collected by scientists in experiments and observations is noisy; there’s always random variation and measurement error, whether what’s being measured is the response of a cell to a particular gene, or the death rate in cancer patients on a new drug.

When fraudsters decide to make up data, or to modify real data in a fraudulent way, they often create data which is just “too good”—with less variation than would be seen in reality. Using statistical methods, a number of researchers have successfully caught data fabrication by detecting data which is less random than real results.

Most recently, Uri Simonsohn applied this approach to his own field, social psychology. He has two “hits” to his name, and more may be on the way.

Simonsohn used a number of statistical methods but in essence they were all based on spotting too-good-to-be-true data. In the case of the Belgian marketing psychologist Dirk Smeesters, Simonsohn noticed that the results of one experiment conducted by Smeesters were suspiciously “good”: They matched with his predictions almost perfectly.

Using a technique called Monte Carlo simulation—widely used in economics, neuroscience and many other fields—he showed that the chance of this really happening was extremely low. Even if Smeesters’ theory were correct, the data should have contained some noise, meaning that the results would only be approximately, not exactly, as predicted. What’s more, Smeester’s data were much neater than similar results published by other researchers using the same methods.

After Simonsohn confronted him with this evidence, Smeesters at first claimed that he’d made an honest mistake, but he was eventually found guilty of fraud by a university committee, and several of his papers have since been retracted—stricken from the scientific record. Using similar methods, Simonsohn uncovered evidence of fraud in a second researcher, Lawrence  Sanna, who worked in the same field of social psychology, but whose fraud was entirely unrelated to that of Smeesters. The full details of Simonsohn’s investigations are available here.

These cases have attracted a great deal of attention. They made Simonsohn famous as the “data detective” or the “scientific sleuth,” and they also created the perception of a crisis in social psychology. Researchers are now debating just how big the problem of fraud is and how best to fight it.

Simonsohn wasn’t the first to use statistics to spot too-good-to-be-true data. Several months previously, in the field of anaesthesiology, the massive fraud of Dr Yoshitaka Fujii of Toho University, Tokyo, in Japan, was uncovered by similar methods. Over the course of his career, Fujii had published hundreds of papers, many of them about the drug granisetron, used to prevent nausea in patients after surgery.

British researcher John Carlisle took 168 of Fujii’s clinical trials and observed that several key variables, such as numbers of side effects, were exactly the same in many of these trials. Assuming the data were real, you’d expect variation in these numbers, just by chance. Carlisle found extremely strong evidence that Fujii’s results were too consistent to be real.

And Carlisle’s was not the first statistical report alleging that Fujii’s data were too good to be true. Amazingly, exactly the same charges had been made twelve years previously, back in 2000, when a group of researchers wrote in a public Letter to the Editor of a scientific journal that the data from 47 of Fujii’s granisetron trials were “incredibly nice.” These researchers noted that the number of patients reporting the side-effect of headache was exactly the same in over a dozen of his papers. They calculated the chances of this happening, assuming the data were real, as less than 1 in 100 million. We now know that, indeed, they weren’t real: Fujii had made them up. But for various reasons, the 2000 allegations didn’t lead to any serious investigation into Fujii’s work, leaving him free to fake dozens more trials.

As useful as these methods are, it’s important to remember that they can only suggest fraud, not prove it. For example, too-good-to-be-true data might be the result of an honest error, rather than intentional manipulation.

In the cases described here, further inquiries revealed other hard evidence of foul play and, eventually, admissions in most cases. However, if the too-good-to-be-true approach becomes used more widely, it’s likely that eventually, someone will deny the allegations, and stand by their results. What will happen then remains to be seen, and it might lead to an ugly controversy.

On the other hand, though, the case of Fujii shows that these methods have a vital role to play in keeping science clean. If the first warnings about Fujii’s “nice” data had been taken more seriously back in 2000, huge amounts of time, money and effort would not have been wasted.

Fraud can happen in any field of science, in any institution, in any country. We need all the tools we can get in fighting back against it.


  • Brian Too

    A useful tool, not the answer.

    Of course the more organized, crafty and devious fraudsters will eventually use Monte Carlo simulations to generate fake data. Assuming they have not done so already!

  • Marty

    Something that I saw repeatedly in my career was poor quality data being presented as hard numbers. It isn’t quite fraud but it is close.

    In one case I remember data on hundreds of explosives being timed by hand with a stop watch and then being presented to an accuracy of a hundreth of a second. Now I suspect that you might be able to time the start of hundreds of explosions outdoors by hand to an accuracy of a hundreth of a second if you had come to earth as a child in a rocket from the planet Krypton. But I doubt a mere mortal could do it repeatedly. Bad quality data. But they got away with it because almost no one questioned how the data was timed and because they argued that the data might have problems but gee whiz they had so many data points.

    I suspect the land based temperature measurements being used by our good friends in the well respected field of climatogy is mostly garbage data. I’m not an expert in the area. But if you look at the pictures of some of the temperature measurement sites the sites don’t look too Kosher. And the secrecy surrounding the computer codes that manipulate the data is a red flag.

    I suspect we only seeing a tiny bit of the fraus that happens in science.

  • Marty

    The last sentence of my comment has two careless typos. It should read: I suspect we are only seeing a tiny bit of the fraud that happens in science.

    One of these days I’ll learn to type.

  • blindboy

    Does anyone know if it is true that Mendel’s results fail this kind of analysis?

  • Jay

    The statistical careful conclusion is: If this data was generated/collected again many times, using the exact same methods, then we expect p% of the results to have this or a more extreme result. IF and only IF the underlying assumptions of ‘randomness’ that are carried in the Monte Carlo model are true in the actual data. When the consequences of a conclusion involve ending careers & public dismissal of a researcher (plus marginally related people), the statistic should not be used as the _only_ criterion. Hence, I fully approve of the way it was used in these detailed cases. What we need to watch out for, is the folks who do the ‘easier’ work of running the numbers, then pillory the researcher.

  • Micki, MS

    Back when I was doing research, an MD who I will call “Smith” came to me and was just sick that he had discovered his partner who I will call “Jones” was faking data. Jones had 10-20 rats in the animal facility and was supposed to come in every day to record body weights and other data. The data sheets were kept right on the cages where anyone could see them. Saturday there was no data for Saturday, Sunday there was no data for either Saturday or Sunday, but Monday data miraculously appeared. Next weekend, same thing. Also the animals’ recorded weights were all obviously higher than the observed weights. I had been doing rat research for about 10 years and could look at a rat and judge weights within 20 grams. Smith and I weighed a few to confirm our suspicions. Then Smith went to the office he shared with Jones and looked on Jones’s desk. One data sheet was filled out with data for dates about 3 days into the future. There was no rat with that ID number in the animal facility, so we suspected it had died. We told the two heads of departments overseeing the research.

    I suspect Jones was quietly asked to leave. MD heads of departments don’t discuss that sort of thing with mere MS employees. Within months he was working at another prestigious research hospital, but the rumor mill (which I had nothing to do with) was passing the fraud information around.

    I don’t know if his fraud would have showed up in a single statistical analysis, if he was consistently adding, say 40 grams, to each animal’s weight. The data distribution would still be the same. One would need to compare his results to others in the field to see that he was getting much better results. Now the faked longer survival rates might raise a red flag. But the data could just mean he was a much better surgeon than the other researchers. Had Dr. Smith not by chance gotten curious and looked at his partner’s rats, a paper might have been published that would have had Dr. Jones looking like a genius.

    One observation directly about the blog essay: They are using methods other than just testing randomness of data. Confronting the researcher before you make the information public is an ethical requirement. As another reader pointed out, on rare occasions, the data will show little variation and still be legitimate data.

    Another problem in research is that MDs and PhDs sometimes think that they can just go to a computer and do a statistical test. If it comes out with a good p value, they think they have a clinically significant finding and publish it. You at least need to plot individual data points to see the pattern and consult a statistician to evaluate your data. One PhD I worked with thought a t test was all he needed and got a grant to continue his research. The people giving grants need to check raw data, not just p values, before giving out money. Upon in-depth analysis of his data, the main contribution to the statistical difference was a small number of outliers, with a very tiny trend for the two groups to differ. The paper was never published, but could have been published as a “no the theory didn’t pan out” paper. Researchers don’t like to publish papers that show a failed hypothesis and many journals don’t either.

    In my 15 years of research, those are the only two cases I knew about at my institutions. One was fraud and the other probably ignorance maybe combined with ego. I say “ego” as some researchers may think they knew enough that they didn’t need a statistician.

    There is a lot of pressure to publish or perish. Sometimes the bad data are fraudulent and sometimes the researcher just doesn’t know statistics well enough. So when you read a paper and no statistician is listed as author, or thanked in the acknowledgements, have a healthy skepticism about it. If you are on a committee awarding grants, insist on seeing the report of a PhD statistician.

    Presenting preliminary data at meetings may not always help reveal mistakes, although an MD I once worked for did point out a major flaw in someone’s research at a meeting and the researcher admitted he didn’t know that his method was seriously flawed. My boss said he could hear groans all over the room, so the researcher was easily convinced.

    I saw a graph by a well-respected scientist at an international meeting. Since part of my job was to consult with our institution’s PhD statistician about our department’s data, I could see that there was one data point that pushed this researcher’s data to statistical significance. He had 5 data points to calculate an R value. Four formed an almost perfect square and the fifth was far out, but in line with two points, so that the R looked great. To those of you who don’t have a background in correlations and R values, data that forms a square means you have no correlation at all. One grad student had the courage to politely point that out. He was shot down. This was a prominent researcher presenting his pet theory and no one else in a very large auditorium had the courage to support the grad student. No one at the meeting knew me, so I figured if I spoke, it would be a waste of time. Even respected scientists can want so much for their theory to be true that they won’t see the data doesn’t support it. He was doing research way out of his field, but it was in my field. To be fair, I expected that if he got enough data he would eventually support his theory, as similar research had already done. I think he was on the right track, but he claimed he had made it all the way to Central Station, while he was actually still in the boondocks.

    So there are many ways to get bad data, but in my opinion, the biggest “sin” is to be human and want so badly to have good data, that you don’t evaluate your own work sufficiently.

  • Pingback: The State of Junk Science « Kynosarges Weblog()

  • Neuroskeptic

    Micki, MS: Thanks for the great comment. You’re right of course that a statistical approach won’t be able to catch all fraud and sometimes it takes an observant person finding ‘the smoking gun’ to do it.

    However I think we need all the tools we can get, to fight fraud. Stats, whistle-blowers, data sleuths and just plain nosy people. All have a role to play.

    Then there’s the issue of misleading science that’s not fraud. There is a lot of that and I think it’s more harmful than fraud in the aggregate. But it’s a separate issue.

  • Frances deTriusce

    As the prop’ behind, I’d like to highlight the site and encourage anyone who is doubtful about the endemic nature of fraud in the life sciences to take a look. Shameless plug, yes, but the site is 100% ad free, paid for out of my own pocket as a service to the scientific community.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

Collapse bottom bar