In This Is Spinal Tap, British heavy metal god Nigel Tufnel says, in reference to one of his band’s less succesful creations:
It’s such a fine line between stupid and…uh, clever.
This is all too true when it comes to science. You can design a breathtakingly clever experiment, using state-of-the-art methods to address a really interesting and important question. And then at the end you can realize that you forgot to type one word when writing the 1,000 lines of code that runs this whole thing, and as a result, the whole thing’s a bust.
It happens all too often. It has happened to me three times in my scientific career to date and I know of several colleagues who had similar problems. Right now I’m currently struggling to deal with the consequences of someone else’s little mistake.
Here’s one cautionary tale. I once ran an experiment involving giving people a drug or a placebo. When I crunched the numbers I found, or thought I’d found, a really interesting effect which was consistent with a lot of previous work giving this drug to animals. How cool is that?
So I set about writing it up and told my supervisor and all my colleagues. Awesome.
About two or three months later, I decided for some reason to reopen the original data file, which was in Microsoft Excel. I happened to notice something rather odd – one of the experimental subjects, who I remembered by name, was listed with a date-of-birth which seemed wrong: they weren’t nearly that old.
Slightly confused – but not worried yet – I looked at all the other names and dates of birth and, oh dear, they were all wrong. But why?
Then it dawned on me and now I was worried: the dates were all correct, but they were lined up with the wrong names. In an instant I saw the horrible possibility: mixed up names would be harmless in themselves, but what if the group assignments (1 = drug, 0 = placebo) were wrongly paired with the results? That would render the whole analysis invalid… and oh dear. They were.
As the temperature of my blood plummeted I got up and lurched over to my filing cabinet where the raw data was stored on paper. It was deceptively easy to correct the mix-up and put the data back together. I re-ran the analysis.
No drug effect.
I checked it over and over. Everything was completely watertight – now. I went home. I didn’t eat and I didn’t sleep much. The next morning I broke the news to my supervisor. Writing that e-mail was one of the hardest things I’ve ever done.
What had happened? As mentioned, I had been doing all the analysis in Excel. Excel is not a bad stats package and it’s very easy to use, but the problem is that it’s too easy: it just does whatever you tell it to do, even if this is stupid.
In my data, as in most people’s, each row was one sample (i.e. a participant) and each column was a variable. What had happened was that at some point I’d tried to take all the data, which was in no particular order, and reorder (sort) the rows alphabetically by subject name to make it easier to read.
How could I screw that up? Well, by trying to select “all the data” but actually only selecting some of the columns. I must have reordered them, but not the others, so all the rows became mixed up. And the crucial column, drug=1 placebo=0, was one of the ones I reordered.
The immediate lesson I learned from this was: don’t use Excel, use SPSS, which does not allow you to reorder only some columns. Actually, I still use Excel for making some figures but every time I use it, I think back to that terrible day.
The broader lesson though is that if you’re doing something which involves 100 steps, it only takes 1 mistake to render the other 99 irrelevant. This is true in all fields, but I think it’s especially bad in science, because mistakes can so easily go unnoticed due to the complexity of the data, and the consequences are severe because of the long time-scale of scientific projects.
Here’s what I’ve learned: Look at your data, every step of the way, and look at your methods, every time you use them. If you’re doing a neuroimaging study, the first thing you do after you collect the brain scans is to open them up and just look at them. Do they look sensible?
Analyze your data as you go along. Every time some new results come in, put it into your data table and just inspect it. Make a graph which just shows absolutely every number all on one massive, meaningless line from Age to Cigarette’s Smoked Per Week to EEG Alpha Frequency At Time 58. For every subject. Get to know the data. That way if something weird happens to it, you’ll know. Don’t wait to the end of the study to do the analysis. And don’t rely on just your own judgement – show your data to other experts.
Check and recheck your methods as you go along. If you’re running, say, a psychological experiment involving showing people pictures and getting them to push buttons, put yourself in the hot seat and try it on yourself. Not just once, but over and over. Some of the most insidious problems with these kinds of studies will go unnoticed if you only look at the task once – such as the old “randomized”-stimuli-that-aren’t-random issue – which has also happened to me, although it wasn’t my fault in that instance.
Trust no-one. This sounds bad, but it’s not. Don’t rely on their work, in experimental design or data analysis, until you’ve checked it yourself. This doesn’t mean you’re assuming they’re stupid, because everyone makes these mistakes. It just means you’re assuming they’re human like you.
Finally, if the worst happens and you discover a stupid mistake in your own work: admit it. It feels like the end of the world when this happens, but it’s not. However, if you don’t admit it, or even worse, start fiddling other results to cover it up – that’s misconduct, and if you get caught doing that, then it is the end of the world, or of your career, at any rate.