Statistics: When Confounding Variables Are Out of Control

By Neuroskeptic | April 2, 2016 10:38 am

Does ice cream cause drownings? Let’s think about this statistically. Consider that, in any given city, daily sales of ice cream are, most likely, positively correlated with daily rates of drownings.


Now, no matter how strong this correlation is, it doesn’t really mean that ice cream is dangerous. Rather, the association exists because of a ‘confound’ variable. In this case it’s temperature: on sunny days, people tend to eat more ice cream and they also tend to go swimming more often, thus risking drowning. The ice cream/drownings correlation would cease to exist once you take temperature into account. This means that ice cream has no ‘incremental validity’ over temperature: it doesn’t add anything to our ability to predict downing, above what we can predict from temperature.

Controlling for confounds is a widely-used technique in science. However, accordng to researchers Jake Westfall and Tal Yarkoni, there’s a major pitfall associated with the method. In a new paper, they warn that Statistically Controlling for Confounding Constructs Is Harder than You Think

Their argument is a simple one but the implications are rather serious: if you want to control for a certain variable, let’s call it C, your ability to succesfully correct for it is limited by the reliability of your measure of C. Let’s call your experimental measure (or construct) Cm. If you find that a certain correlation holds true even controling for Cm, this might be because C is not really a confound, but it might also be that Cm is a poor measure of C. “Controlling for Cm” is not the same as “Controlling for C”.

Staying with the ice cream/drowning example, for instance, suppose that we had a broken thermometer, meaning that our measure of temperature was noisy. Ice cream sales might well predict drownings even after controlling for our flawed ‘temperature’ variable. We might, then, conclude that ice cream and drownings have some deep connection beyond temperature.

Here’s Westfall and Yarkoni’s illustration of the problem. On the left we see the original icecream-drowning correlation, on the right the zero correlation after correcting for C, temperature. In the middle we see that the correlation remains (albeit smaller) with some hypothetical imperfect measure of temperature, Cm. journal.pone.0152719.g002

Based on various analyses of real and generated data, Westfall and Yarkoni conclude that many scientists have been controlling for Cm and wrongly interpreting this as ‘controlling for C’, thus wrongly concluding that incremental validity has been shown.

Literally hundreds of thousands of studies spanning numerous fields of science have historically relied on measurement-level incremental validity arguments to support strong conclusions about the relationships between theoretical constructs. The present findings inform and contribute to this literature – and to the general practice of “controlling for” potential confounds using multiple regression – in a number of ways.

First, we show that the traditional approach of using multiple regression to support incremental validity claims is associated with extremely high false positive rates under realistic parameter regimes. Researchers relying on such arguments will thus often conclude that one construct contributes incrementally to an outcome, or that two constructs are theoretically distinct, even when no such conclusion is warranted…

Taken as a whole, our results demonstrate that drawing construct-level inferences about incremental validity is considerably more difficult than most researchers recognize. We do not think it is alarmist to suggest that many, and perhaps most, incremental validity claims put forward in the social sciences to date have not been adequately supported by empirical evidence, and run a high risk of spuriousness.

The authors note that they were not the first to discuss the issue. In the past, this problem has been known as ‘residual confounding‘, amongst other names.

So what’s the solution? Westfall and Yarkoni say that the answer is structural equation modelling (SEM), ideally drawing on multiple different measures (or indicators) to estimate the confounding variable better. However, even if only one confound measure is available, “researchers can use an SEM approach to estimate what level of reliability must be assumed in order to support the validity of one’s inferences.”

The point is that when a scientific argument rests on the failure of controlling for a confound to affect the results, a straightforward correlation analysis is not enough.

ResearchBlogging.orgWestfall J, & Yarkoni T (2016). Statistically Controlling for Confounding Constructs Is Harder than You Think. PloS ONE, 11 (3) PMID: 27031707

  • daniele marinazzo

    It would be interesting to see if the scenario analyzed by Westfall and Yarkoni would apply to the case of suppressor variables as well, when the link between A and B is not disrupted, rather explained, by the presence of the suppressor C. I think it would, as is the case with dynamical models, where it’s well known that a pairwise, non conditioned approach leads to false positives (mediated influences), but also false negatives are possible (synergetic effects, i.e. suppressors).

    • Thom Baguley

      If I understand correctly – yes that should be possible. My understanding is that if the predictors in a regression model are measured with differing levels of unreliability then they will bias estimates and this bias can occur in either direction. You can both under control (if the covariate is more unreliable) or over control (if it more reliable). Suppressor are just regular covariates with a particular pattern of correlation with the other predictors and the response and thus their degree of suppression would also be impacted by residual confounding.

  • Pingback: Indistinguishable from Magic – Disruptive Paradigm()

  • Andrew Jebb

    Not only measurement error, but range restriction in the control variable, any non-linearity between the control/predictor and criterion, inadequate operationalizations, overly coarse scales, etc. These other things are NOT mitigated by a latent variable approach the authors recommend. And these issues extend to mediation inferences as well, not just incremental validity. The good news is that seems to be a non-issue in many controls we DO use that are measured pretty much without error (e.g., gender, age). The even better news is that people might be more cautious about causal inference from field data (skepticism, baby!).

  • Matthew Slyfield

    “Does ice cream cause drownings? Let’s think about this statistically.”

    Lets not. Correlation is not causation. Statistics is about correlation. Statistics can not say anything about causality.

    • OWilson

      When the divergence between model prediction and actual outcomes is observed, most scientists look for other opinions, even critics of their methods to correct their obviously false assumptions.

      In “climate science”, they just go back in the room with their peers to hammer it into shape, no matter how ungainly it starts to look! :)

      • Matthew Slyfield

        Real scientists look for non statistical evidence of causal mechanisms.

        Only after finding evidence to support such a mechanism do they sit down and build a model using the possible causal mechanism they found.

        If you start with statistics and correlation, you aren’t doing science.

        • Nathan Merrill

          You can be. The problem is that you need to not just show the correlation, but actually show causation. If you start with correlation drawing your attention to something, you can’t use that correlation to imply causation.

    • Alfred

      you didn’t read the article

    • Darren K

      Nice atheism+ meme there.

      • Neuroskeptic

        A dank memer has visited my blog.

  • biobender

    Hi, would something like the Deming regression solve/ameliorate the problem?

  • Pingback: From the Community: Statistics: When Confounding Variables Are Out of Control by Neuroskeptic | PLOS Neuroscience Community()

  • SimonG03

    The point that “controlling for imperfect measures” can lead to false conclusions is somewhat trivial because multiple regression was never meant to work in such cases. It is very well known—and part of every basic statistics course—that, in order to interpret the results of multiple regression, independent variables must be assumed to be measured without error.

    • Neuroskeptic

      Many points in statistics are trivial in principle but getting people to actually use them in practice is anything but trivial!

    • Libbero

      “independent variables must be assumed to be measured without error.”
      You mean with mean error = 0, right?

  • kyle

    There are way too many factors and problems with trying to find statistics of “Does ice cream cause drownings?” Statistics are based with correlation of some way. These are absurdly different and most likely impossible to study for an accurate result.

  • Pingback: A Confounding Problem - CURATIO Magazine()

  • Pingback: Links for April 2016 - foreXiv()

  • Pingback: Statistics: When Confounding Variables Are Out of Control | Data Not Shown (of the Dead)()

  • Pingback: Why Science Is Nonetheless Mystified About How People Work | GetUsaNews()

  • Pingback: Why Science Is Still Mystified About How Humans Work -



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar