Flexible Measures Are A Problem For Science

By Neuroskeptic | April 5, 2016 7:31 am

A fascinating new site called FlexibleMeasures.com reveals the enormous variety of different ways which psychologists have devised to analyse the data from the same experimental task.


The competitive reaction time task (CRTT) is widely used as a research tool to probe aggression. Participants in the task are given the chance to lash out at ‘opponents’ by subjecting them to annoying blasts of loud noise.

Using the noise is interpreted as aggressive behaviour – but how exactly should this be quantified? German psychologist Malte Elson, of Ruhr University Bochum, created FlexibleMeasures.com to explore the many ways this question has been answered.

Some researchers define aggression as the average volume of the noise inflicted – louder noise is more aggressive. Others look at the duration of the noise, and still others consider the volume multiplied by the duration. And there are many more specific methodological choices on top of these.

All told, FlexibleMeasures.com lists no less than 147 published strategies for analyzing CRTT data. This is a lot, especially bearing in mind that there are only 120 published papers on the CRTT in the Flexible Measures database! There are more approaches than papers.

For instance, one strategy is called ‘Volume x Duration, multiplied averages of all trials (25)’. This approach has only been used in a single paper. However, although another paper by the same authors used an approach called ‘Volume + Duration (sum), average of all trials (25), standardized’.

On FlexibleMeasures.com, all these papers, strategies and authors are visualizable, giving rise to graphics like this one that shows the diversity of measures in papers coming from one particular research group.


Why is this diversity a problem? Because it creates the scope for p-hacking, for trying different techniques on the same data until the results come out the way the researchers want. The sheer number of approaches raises the possibility that different researchers – or indeed the same researchers at different times – have resorted to creating new analytic approaches because they didn’t like the results the existing ones gave.

This problem is not limited to this task. While the CRTT is currently the only task on FlexibleMeasures.com, the site and its database are set up to collate data on additional paradigms too. Malte writes that “hopefully the database will grow – collaborations are welcome!” So if you know of another worryingly flexible paradigm, get in touch with him.

I asked Malte what inspired him to create the site:

I did my PhD on methodological inadequacies in research on effects of violent media on aggression, an area where the CRTT is particularly popular.

Thus, I was already quite familiar with the flexible practices associated with this particular test in the aggression literature, and I thought it a simple visualization of this flexibility might be helpful to aggression researchers, and also to reviewers and authors in related domains.

The site shows that flexibility appears to be the norm, and not the exception, in laboratory research on aggression that relies on this test. Aggression researchers need to change their ways if they want to provide credible answers to societal questions of great relevance…

My hope is that the example of the CRTT inspires other researchers to reach out to me and use FlexibleMeasures.com’s infrastructure to systemize issues related to methodological flexibility, flexibility in measurement, and diverse methods of computation in their respective area.

  • RogerSweeny

    Another fine and necessary post. If in ten years, science is better than it is now, some credit will belong to this site.

  • http://www.mazepath.com/uncleal/qz4.htm Uncle Al

    If you are not looking at anything quantifiable, you can honestly arrive at any conclusion your grant funder desires, hence the Federal Reserve.

  • Matthew Slyfield

    Step #1 in fixing the problem: Recognize that psychology, sociology and the like are not sciences.

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      I disagree. Whether we call them science or something else, those fields of study will need some methods. So long as they have methods there will be multiple methods and the problem of ‘flexible measures’ will arise…

      • Matthew Slyfield

        That was just step one. The main purpose of that step is to keep ‘flexible measures’ from infecting the real sciences.

  • kamiya

    flexible measures is not scientific

  • matus

    Ok, so what’s the best/correct way to analyse CRTT data?

    • Malte Elson

      Nobody knows because none of the quantification strategies have been validated as a predictor of aggression.

      • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

        Now that would be an important study to run. Do a validation study testing the CRTT against a real world measure of aggression and see which analysis method is best. Someone should fund that!

        • Malte Elson

          Certainly. But real-world measures of aggressive behavior are hard to come by for ethical/legal reasons. Some studies report correlations of self-reported trait aggression scores with CRTT quantifications, and with no surprise, the range is quite big (but that again might be due to the CRTT quantification problem)

