Are Underpowered Studies Ever Justified?

By Neuroskeptic | July 29, 2017 2:03 pm

Is a small scientific study better than none at all? A provocative piece in Frontiers in Psychology raises the question of whether we should ever do under-powered studies. The authors are Dutch researchers Rik Crutzen and Gjalt-Jorn Y. Peters.

fixing_science

Crutzen and Peters begin by questioning the idea that even a little evidence is always valuable. Taking the example of a study that only manages to recruit a handful of patients because it’s studying a rare disease, the authors say that:

Underpowered studies are often unable to contribute to in fact answer research questions… sometimes, the more virtuous decision is to decide that current means do not allow studying the research question at hand.

What about preliminary or pilot studies, which are often small? Crutzen and Peters say that small pilot studies are useful for checking that a research method works and is feasible, but that we shouldn’t rely on the data they produce, not even as a guide to how many participants we need in a future follow-up study:

An early-phase trial is not appropriate to get an accurate estimate of the effect size. This lack of accuracy affects future sample size calculations. For example, if researchers find an effect size of (Cohen’s) d = 0.50 in an early-phase trial with N = 100, then the 95% confidence interval ranges from 0.12 to 0.91.

So what are researchers to do, then? Should we only ever consider data from well-powered studies? This seems to be what Crutzen and Peters are implying. They argue that it is rarely impossible to conduct a well-powered study, although it might take more resources:

when researchers claim to study a rare population, they actually mean that the resources that they have at their disposal at that moment only allows collection of a limited sample (within a certain time frame or region). More resources often allow, for example, international coordination to collect data or collecting data over a longer time period.

They conclude that

In almost all conceivable scenarios, and certainly those where researchers aim to answer a research question, sufficient power is required (or, more accurately, sufficient data points).

Crutzen and Peters go on to say that psychology textbooks and courses promote underpowered research, thus creating a culture in which small studies are accepted. The same could be said of neuroscience. But in this post I’m going to focus on the idea that research should be ‘never underpowered.’

First off, I don’t think Crutzen and Peters really answer the question of whether a small study is better than nothing at all. They say that small studies may be unable to answer research questions, but should we expect them to? Surely, a small amount of data is still data, and might provide a partial answer. Problems certainly arise if we overinterpret small amounts of data, but we can’t blame the data for this. Rather, we should temper our interpretations, although this may be easier said than done when it comes to issues of public interest.

It’s true that in most cases (though not all), it would be possible to make studies bigger. However (unless we increased total science funding), this would mean doing fewer studies. All else being equal, we’d do (say) one study with n=1000 instead of ten different studies with n=100.

From a statistical perspective, one n=1000 study might indeed be better, but I worry about the implications for scientific innovation and progress. How would we know what to study, in our n=1000 project, if we didn’t have smaller studies to guide us?

Larger studies would also lead to the centralization of power in science. Instead of most researchers having their own grants and designing their own experiments, funding and decision-making would be concentrated in a smaller number of hands. To some extent, this is already happening e.g. with the rise of ‘mega-projects’ in neuroscience. Many people are unhappy with this, although part of the problem may be the traditional top-down structure of leadership in science. If it were possible to organize large scale projects more ‘democratically’, they might be more palatable, especially to junior researchers.

ADVERTISEMENT
  • Rik Crutzen

    Thank you for discussing our piece. A question related to the implications: How do the smaller studies guide ‘what’ to study?

    • OWilson

      Despite attempts to quantify the targeting of research, “what” to study will always be a function of societal, political and cultural priorities.

      A society under threat may choose weapons research, a complacent society may choose, the environment.

      The tension between the relative “merit” of these choices persists throughout society.

      While we all agree that a democratic approach, bottom up, is more productive than a top down policy, the reality is that all available resources are often directed to the latest politically perceived threat, be it HIV, nuclear war, or “climate change”.

      • Marcel van Assen

        I strongly recommend research with sufficiently high statistical power (say > .8). And, in this era of publication practices, I also prefer no research above research with weak power (say =< .5).

        However, in an ideal world, where all research gets published and there is no p-hacking (e.g. when research is pre-registered and checked at a detailed level), underpowered research is perfectly fine. Others may then combine these data with their own data, for meta-analysis on that topic.

        Currently, underpowered studies may only be useful to check if the experiment works out wrt understanding instructions, do the programs run, etc, but not as much for testing and estimating effects.

        • OWilson

          In the broader picture, which is my area of interest, “studies” are a notoriously nebulous hybrid animal lying somewhere between science, surveys, and traffic counts.

          “Studies”, being ill defined, are often confused, and intentionally misused by the Mainstream Press, and ever opportunistic politicians as actual “experiments”, or even, “Settled Science”. :)

  • https://suboptimum.wordpress.com/ rxs

    This is very interesting. But I don’t quite understand:

    “From a statistical perspective, one n=1000 study might indeed be better”

    Wouldn’t 10 independent N=100 studies plus a meta-analysis be actually the best? (independence is key, to be more robust wrt systematic compared to a single study)

    Should we in fact run exclusively underpowered studies?

    (likely I just don’t understand something… :))

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Thanks for the comment. You’re right that 10 n=100 studies might be better than one n=1000 study if they could be meta-analysed, because they would provide independent sources of data and hence more generalizability.

      However we can’t assume that those 10 studies would all examine the same question. If they were 10 different studies looking at different things, we couldn’t meta-analyse them.

      If we have 10 small studies looking at 10 questions, vs. one large study looking at one of those questions, it is possible that the large study is better.

      • https://www.youtube.com/playlist?list=UUwbGJwCdp96FKSLuWpMybxQ Lee Rudolph

        However we can’t assume that those 10 studies would all examine the same question.

        Nor can we assume that all 10 of those studies will be published (or even just that their data will be made public). Hard to do a meta-analysis of a body of work half or more of which is buried in file drawers (adjust metaphor for modern times).

        A (very) partially analogous situation arises in mathematics: there is no formal mechanism for alerting the wider mathematical community to an unsuccessful attempt to prove something. Depending on personalities and various other kinds of extra-mathematical externalities, news of some failures will percolate informally into (variously small or large) sub-communities, whose members are then (if they choose) able to adjust their own research to take into account the details of the failure of the particular attempt. (Of course merely knowing that an attempt has failed, without knowing how, shouldn’t keep one from trying to succeed; but also of course, even then one may be discouraged from keeping on trying, if one’s assessment of the person who failed is sufficiently favorable … .)

  • Pingback: 1 – Are Underpowered Studies Ever Justified?()

  • Martin Hebart

    I’m not sure if limited research funding is an argument against larger n. I see the point that with larger n fewer studies might be carried out. But we should bear in mind that we are in many cases (not always) spending only a small fraction of our time acquiring data, but much more time designing experiments, analyzing data, writing up results, presenting results, reviewing papers, going to meetings, etc. Funding a PhD student or postdoc is in many cases much more expensive than running experiments. This means that only slight adaptations to research grants could leverage much larger n. Of course not a tenfold increase, but maybe instead of 16-18 participants around 30?

  • Shane O’Mara

    Where would that leave patient HM? Good neuropsychology has been built on single case designs. Interestingly most of applied behaviour analysis has been too.

    • Gjalt-Jorn Peters

      Well, HM wasn’t quantitative research. This was qualitative research. In qualitative research, ‘power’ is not a concept – you don’t have quantification, and therefore, cannot use inferential statistics (i.e. you can’t estimate sampling and measurement error), and therefore, ‘power’ is a nonsensical concept. You should make sure you gather sufficient data to draw conclusions, e.g. until you achieve saturation. From this perspective, given the extremely rich data gathered from HM, combined with the fact that ‘his’ dataset was considered largely ‘anecdotal’ and generated a lot of hypotheses which were subsequently tested in quantitative research, I think HM would do quite well :-)

      The problem occurs when people do small quantitative studies, but draw conclusions nonetheless, simply adding a disclaimer to the discussion (which they don’t put in the abstract, or the press release) . . . I don’t think the university where HM was originally studied published a press release, so even if it *would* have been an underpowered study, I think it would be much less damaging than many underpowered studies conducted today :-)

  • http://www.mazepath.com/uncleal/qz4.htm Uncle Al

    Psychology is a number-decorated scam . US bursting prisons – $80 billion/yr. Ineffectually monstrous US schools – $68 billion/year for the Department of Education alone.. Humanity embraces SSRIs – $17 billion/year for pills. Looser standards, more papers! There’s a pony in there somewhere, for the world endlessly mucks psychology’s fuming stables.

    After Donald Trump, how can political science be anything at all? After oil-bursting Venezuela aflame with poverty, what is macroeconomics? Let’s psychologize North Korea. More studies are needed.

  • C’est la même

    Science cannot progress without pilot studies.

    The key is in the interpretation – we should never consider underpowered studies to be conclusive evidence.

    • Gjalt-Jorn Peters

      Well, perhaps we have different definitions of pilot studies :-)

      Many people currently use what they call ‘pilot studies’ (studies with e.g. 50 participants) to inform their power analyses. Such pilot studies are very bad practice.

      If with a pilot study, you mean a study to generate hypotheses to study in subsequent adequately powered research, then yes, I think those are necessary – but I think they can often be qualitative, rather than quantitative. An extreme example would be e.g. simply sitting on a public square and observing people. We don’t need quantification in all stages of studying something – which you need depends on your goals, I think.

      The problem with underpowered studies is that all estimates can vary erratically between samples. Combined with the desire of many researchers (and universities’ press offices) to find sensational patterns, this means that evidence from underpowered studies is ‘asymmetrically’ likely to be considered more conclusive. As in, something that seems really cool will probably be considered more conclusive than something that’s disappointing . . . Highly powered studies don’t afford this flexibility.

      • C’est la même

        Thanks for your reply. It is obvious that you’re speaking based on your experience in the psychological sciences, something I have little direct experience with. Whereas I was speaking more broadly, though particularly focusing on medical science where the outcomes can be more objective (or controlled in double blinded trials).

        A pilot study does provide some indicators towards potential effect sizes, though you are quite right that there is inherent statistical problems with this, along with increased risk of bias due to other factors. But how common is it that power analyses are determined solely on the basis of a single pilot study, rather than considering outcomes of similar classes of studies? Has anyone reviewed this systematically?

        It’d be great if everyone could have enough funding to do highly powered studies – without compromising on breadth and depth of scientific investigation, job security of scientists etc. Politicians keep demanding ‘economic efficiency’. In science this often means low powered studies.

        • C’est la même

          In terms of decision making from pilot studies, the following thought experiment:

          The true effect size is small, an underpowered pilot study fails to detect an effect, so no one ever follows up on it.

          The true effect size is small, but an underpowered pilot study reports a substantial effect size. The follow-up study is subsequently underpowered and thus fails to find an effect, so no one ever follows up on it.

          The true effect size is small, an underpowered pilot study reports a large effect size, and everyone thinks the effect size is large until several groups try to reproduce the effect with correctly powered studies and discover that the true effect is small.

          It seems to me that the primary problem is not making assumptions about effect size from pilot studies, but lack of replication and follow ups. The greatest risk of underpowered studies is still type 2 errors. But the question is, do we really care about small effect sizes that need very large samples to replicate? It depends on the effect on hand.

          Some studies just aren’t going to replicate, I’m content with that fact. Instead of aiming for some fantasy 100% success rate, we need to realise that it is sufficient replication that is important before we start to trust an observed effect.

          • Gjalt-Jorn Peters

            Hehe – space for replies gets more and more narrow :-)

            Yes, you’re right – my experiences are largely constrained to psychology, and so my extrapolations beyond psychology become increasingly shaky (i.e. larger SE’s :-)).

            The problem, I think, is publication bias (not as much as a phenomenon in the extant literature, but as a manifestation of current journal policy). because of this, your last thought experiment example won’t happen – the meta-analysis will conclude that there’s a medium or even large effect size, because studies finding negative or no effects won’t get published.

            Without publication bias, the problem of underpowered studies would be less severe. But one problem remains, and that touches again upon the point we make in our paper: we currently teach our students (implicitly) that you can learn know how humans work (again psy, sorry!) by doing an experiment with 2×20 people.

            There’s no explanation that you need (e.g.) 8 such studies to actually have some credibility. The ‘classic studies’ (e.g. bystander effect) are just presented as evidence.

            So we don’t teach our students that they should refrain from conclusions on the basis of underpowered research. We teach them that one (underpowered) study does the trick . . .

            (note that our paper was about this – about our curricula, not about the value of underpowered research, that was just the introduction :-))

          • C’est la même

            You make an excellent point.

            Why does teaching students “how to conduct research” currently seem to exclude teaching about the methodological failings such as publication bias and lack of statistical power, issues of blinding, randomisation etc?
            (rhetorical question)

          • Gjalt-Jorn Peters

            Excellent rhetorical question. A rhetorical answer may be that, at least in psychology, due diligence re: methodology and statistics would eliminate most of what we learned as basics of psychology :-)

  • Bernard Carroll

    This discussion is a sign of our times. It reflects the fact that so many contemporary studies deal with surrogate variables that aim to identify a minor percentage of the variance in multi-determined outcomes. Compare these with classical physiology. When Nobelist Otto Loewi in 1921 recognized the action of Vagusstoff in slowing the heart rate after vagus nerve stimulation, he didn’t need statistics or power analyses. Likewise, when Eccles, Fatt and Koketsu identified the action of ACh on Renshaw cells in 1954, they didn’t need statistics. There is no P value in the entire paper. See http://tinyurl.com/y8njmqr2. They simply described the phenomena, and this paper clinched the Nobel prize for Eccles for confirming the chemical theory of neurotransmission. The take-home message is that a great deal of contemporary research is indirect and derivative, rather than close to the essential action. Nowhere is this more evident than in clinical trials.

  • Gjalt-Jorn Peters

    Dear Neuroskeptic,

    Indeed, as Rik indicated, thank you for discussing this opinion piece! I share Rik’s curiosity at to how small studies may inform research agenda’s. It is my understanding that for those small studies, all sample estimates are drawn from very wide sampling distributions, and therefore, can have wildly different values in the next small sample. This means that basing decisions on small studies seems unwise – after all, your decision is mostly based on measurement/sampling error, and barely on patterns in the population. Unless, of course, you meta-analyse many such small studies, in which case you’re usually better off with one large study.

    In addition, I’m really interested to know your point of view regarding the main point of the opinion piece: that it would be good to stop instilling the norm that psychology is done with studies with a few dozen datapoints each.

    In any case, I vehemently agree that science suffers from a number of dysfunctional infrastructures – e.g. the top-down organisation found in many places, the largely dysfunctional rgant systems that waste unjustifiably high amounts of public funds, and of course the publication system. But all over the world, people seem to be starting to realise this, so like winter, change seems to be coming :-)

  • polistra24

    I’d turn the requirements around.

    If a hypothesis needs large numbers of data points to firm up the statistics, it’s not much of a hypothesis. It’s always going to be a dubious and approximate answer that doesn’t solve anything except the researcher’s need for a large budget.

    When a problem can be solved, a series of PURPOSEFUL experiments or trials will solve it. Push in several directions, see which direction leads toward solution. Action, not stats.

    • john

      Are you going to do that in clinical studies of drugs intended to help the sufferers of rare medical conditions? Some diseases being looked at by drug companies can have as few as a hundred or so cases alive at any one time.

  • john

    Most of the issues with SSS studies can be alleviated when enough of them have been completed to allow for meta-studies to amalgamate the data to provide a much higher confidence interval. However there is a high probability that small differences in the studies will mean that multiple independent variables will be introduced.

    The problems with psychological research are deep and broad. For example almost every psychological research effort conducted in the US draws most of its participants from the population of undergraduate psychology students.

    Statistics offer a variety of tools such as linear regression to make up for some of the faults in SSS studies.

  • reasonsformoving

    “How would we know what to study, in our n=1000 project, if we didn’t have smaller studies to guide us?”

    What am I missing here? If a smaller study is underpowered then what confidence can you have that it reveals what to study in the first place?

  • ttaerum

    As fortune would have it, science is much more than a power test. One obvious example is Einstein’s thought experiments (and this article is also a thought experiment).
    The notion that we might do experiments having low power because we are dealing with patients with a very rare disease confounds itself by a question of whether there is sufficient power to determine that this miniscule group of patients actually has that rare disease. If you can show conclusively at the molecular level that all your patients have this rare disease and you can show at the molecular level your CRISPR insert worked for 80% of the total world wide population of 5 patients with this disease, then power is, in every respect, mote. Fortunately, this is only a thought experiment and we all wonder where the funding came from.

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar
+