In a previous post, I said that I’d write about how to improve the quality of scientific research by ending the scrabbling for “positive results” at the cost of accuracy. So here we go. This is a long post, so if you’d prefer the short version, the answer is that we ought to get scientists in many fields to pre-register their research – to go on record and declare what they are looking for before they start looking for anything.
This is not my idea. Clinical trial registration is finally becoming a reality. Several organizations now offer registration services – such as Current Controlled Trials. Their site is well worth a click, if only to see the future of medical science unfolding before your eyes in the form of a list of recently registered protocols. Each of these protocols, remember, will eventually become a published scientific paper. If it doesn’t, everyone will know that either the trial was never finished, or worse, it was finished and the results were never published. Without registration, a trial could be run and never published without anyone knowing what had happened – making it very easy for “inconvenient” data to never see the light of day. This is publication bias. We know it happens. Trial registration makes it all but impossible. It’s important.
In fact, if someone were designing the system of clinical trials from scratch, they would, almost certainly, make registration an integral step right from the start. Unfortunately, no-one intelligently designed clinical trials. They evolved, and they’re still evolving. We’re not there yet. Trial registration is still a “good idea” rather than a routine part of clinical research, and while many first-class medical journals now require pre-registration and refuse to publish unregistered trials, plenty of other respectable publications have yet to catch up.
What I want to point out is that it’s not just clinical trials which would benefit from registration. Registration is a way to defeat publication bias, wherever it occurs, and any field in which there are “negative results” is vulnerable to the risk that they won’t be reported. In some parts of science there are no negative results – in much of physics, chemistry, and molecular biology, you either get a result, or you’ve failed. If you try to work out the structure of a protein, say, then you’ll either come up with a structure, or give up. Of course, you might come out with the wrong structure if you mess up, but you could never “find nothing”. All proteins have a structure, so there must be one to find.
But in many other areas of research there is often genuinely nothing to find. A gene might not be linked to any diseases. A treatment might have no effect. A pollutant might not cause any harm. Basically, if you’re looking for a correlation between two things, or an effect of one thing upon another, you might get a negative result. Just off the top of my head, this covers almost all genetic association and linkage studies, almost all neuroimaging, most experimental psychology, much of climate science, epidemiology, sociology, criminology, and probably others I don’t know about. Oh, and clinical trials, but we already knew that. People don’t tend to publish negative results, for various reasons. Wherever this is a problem, trial registration would be useful.
Publication bias is known to be a problem in behavioural genetics (finding genes associated with psychological traits). For example Munafo et. al. (2007) found pretty strong evidence of publication bias in research on whether a certain allele (DRD2 Taq1A) predisposes to alcoholism. They concluded by saying that
Publication of nonsignificant results in the psychiatric genetics literature is important to protect against the existence of a biased corpus of data in the public domain.
Which is true, but saying it won’t change anything, because everyone already knew this. No-one likes publication bias, but it happens anyway – so we need a system to prevent it. Curiously however, registration is rarely mentioned as an option. Salanti et. al. (2005) wrote at length about the pitfalls of genetic association studies, but did not. Colhoun et. al. (2003) , in a widely cited paper in the Lancet, explained how publication bias was a major problem but then flat-out dismissed registration, saying that
an effective mechanism for establishment of prospective registers of proposed analyses is not feasible.
They didn’t say why, and if it works for clinical trials, I can see very little reason why it shouldn’t work for other research. Indeed another similar paper in the same journal raised the idea of “prestudy registration of intent”. Clearly it deserves serious thought.
Registration would also help combat “outcome reporting bias“, or as it’s known in the trade, data dredging. Any set of results can be looked at in a number of ways, and some of these ways will lead to different conclusions to others. Let’s say that you want to find out whether a certain gene is associated with obesity. You might start by taking a thousand men and seeing whether the gene correlates with body weight. Let’s say it doesn’t, which is really annoying, because you were hoping that you could spend the next five years getting paid to find out more about this gene. Well, you still could! You could check whether the gene is associated with Body Mass Index (weight in proportion to height.) If that doesn’t work, try percentage of body fat. Still nothing? Try eating habits. Eureka! Just by chance, you’ve found a correlation. Now you report that, and don’t mention all the other things you tried first. You get a paper, “Gene XYZ123 influences eating behaviour in males”, and a new grant to follow up on it. Sorted. Lynn McTaggart would be proud.
This kind of thing happens all the time, although that’s an extreme example. The motives are not always selfish – most scientists genuinely want to find positive results about their “pet” genes, or drugs, or whatever. It is all too easy to dredge data without being aware of it. Registration would put an end to most of this nonsense, because when you register your research – before the results are in – you would have to publically outline what statistical tests you are planning to do. Essentially, you would need to write the Methods section of your paper before you collected any results.
If you were feeling particularly puritan, you could make people register the Introduction in advance too. Nominally, this is a statement of why you did the research, how it fits into the existing literature, what hypothesis you were testing and what you expected to find. In fact, it’s generally a retrospective justification for getting the results you did, along with a confident “prediction” that you were going to find … exactly what you found. This is not a serious problem, as publication bias is, because everyone knows that it happens and so no-one (except undergraduates) takes Introductions seriously. But writing Introductions that no-one can read with a straight face (“Oh sure, they really predicted that ahead of time” “Ha, sure they didn’t just decide to do that post-hoc and then PubMed a reference to justify it”) is silly. Registration would be a way of getting everyone to put their toys away and get serious.