(False?) Positive Psychology Meets Genomics

By Neuroskeptic | August 27, 2014 6:35 pm

Academic bunfight ahoy! A new paper from Nick Brown – famed debunker of the “Positivity Ratio” – and his colleagues, takes aim at another piece of research on feel-good emotions.

sadgraph

The target is a 2013 paper published in PNAS from positive psychology leader Barbara Fredrickson and colleagues: A functional genomic perspective on human well-being.

The disputed paper claimed to have found significant correlations between questionnaire measures of human happiness, and the expression of a set of 53 stress related genes in blood cells.

In their critical article, which is out now in PNAS, Brown et al. criticize many aspects of the Fredrickson paper, but their most serious charge is that the headline results are likely to be a false positive. The key statistical analysis, a method that they dub “RR53″, is flawed, say Brown et al.:

Even when fed entirely random psychometric data, the “RR53″ regression procedure generates large numbers of results that appear, according to these authors’ interpretation, to establish a statistically significant relationship between well-being and gene expression. We believe that this procedure is, simply put, totally lacking in validity.

Harsh words. Cole and Fredrickson are defiant in their PNAS response, and say that “Brown et al.’s reanalysis itself contains major statistical and factual errors”. The reply is only 500 words, but a more detailed rebuttal is posted here. So who’s right?

I should point out at this point that I am cited in the Acknowledgements of Brown et al. and I was involved in the paper in an advisory capacity.

In my view, there is no doubt that RR53, and hence the Fredrickson et al. paper, is flawed. The rebuttal focuses on deflecting one aspect of Brown et al.’s critique, a so-called ‘bitmapping’ procedure. But I believe that RR53 can be shown to give high false-positive rates without using ‘bitmapping’ at all – and I’ll now demonstrate this with the help of some simulations.

*

First, I ran 10,000 RR53 simulations using the Fredrickson genetics data and parameters, but using randomly generated predictors in place of the two happiness questionnaire scores. I found a false positive rate of 55% per predictor – far higher than it should have been. In theory, the false positive rate should be 5%.

Why so high? I ran more simulations, this time with 53 randomly generated outcome variables (‘genes’) instead of the actual gene expression data. This revealed that the false positive rate is correct (5%) so long as the outcome variables are uncorrelated with each other. But if they are inter-correlated – and they are in Fredrickson et al’s data – the procedure gives spurious positive ‘associations’.

Here’s a plot of the false positive rate as a function of outcome inter-correlation, with 53 outcome variables. There’s a clear relationship:

fredrickson_cole_1

Fredrickson et al.’s 53 genes have an inter-correlation “MCM” of 0.415 (see the end of the post for details on my “MCM” metric). The graph above shows that this corresponds to a false positive rate of approximately 55%. My conclusion here is that RR53 on Fredrickson et al.’s data produce false positives because the gene data is inter-correlated.

Even more simulations suggest that neither the degree of correlation between the predictors variables, nor the number of predictors, has any effect on the rate of false positives (per predictor). On the other hand, the false positive rate increases with the number of outcome variables (‘genes’):

fredrickson_cole_2

In summary, I believe that the RR53 procedure on which Fredrickson et al.’s PNAS paper is based is prone to false positives. I believe that with the dataset Fredrickson et al used, their chance of observing a false positive association between each happiness-score predictor and average gene expression, was 55%.

So what went wrong? I think the answer is deceptively simple. RR53 is based on a t-test and an assumption of the t-test is that all of the observations in the sample are independent. If the outcome variables are correlated, this assumption is violated. It is essentially the problem of auto-correlation. I may expand on this in a future post.

In my opinion, whatever else may be right or wrong with Fredrickson et al.’s paper, their central analysis was flawed and their headline results are probably false positives. For what it’s worth I think the flaw is an insidious one, one that’s not obvious at first glance, and I’m not saying that Fredrickson et al. are to be blamed for making this mistake. To err is human. But I believe that a mistake was made.

*

Gory details: in my simulation (Matlab code on request), I follow the Fredrickson et al procedure as explained by Brown et al. All random data are unit normally distributed. To generate two vectors of correlated random numbers, X and Y, I generate X, then generate a second random vector Z, and then set Y = wX + (1-w)Z, where w is the weight, from 0 to 1, that determines how correlated X and Y are.

To simulate “RR53″, I first generate two sets of random but correlated predictors (the correlation r=0.79 in the Fredrickson et al. questionnaire data) I then generate a set of (usually 53) random but variably correlated outcome variables (‘genes’). I then run Repeated Regressions i.e. for each outcome variable in turn, I regress both predictor variables against the outcome variable to obtain two regression coefficients.

I then use a one sample t-test for each of the sets of coefficients, with the null hypothesis that the mean is zero. Fredrickson et al. used a bootstrap to estimate the standard error of the mean; I use simple parametric t-tests after verifying that the differences are negligible (bootstrapping is slow.)

I quantified the inter-correlation of the outcome variables by first calculating the mean of all of the outcome variables and then calculating the mean of all of the correlation coefficients between each variable and the mean. I call this quick and dirty metric the mean-correlation-with-mean, “MCM”.

ResearchBlogging.orgBrown, N., MacDonald, D., Samanta, M., Friedman, H., & Coyne, J. (2014). A critical reanalysis of the relationship between genomics and well-being Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1407057111

  • Nick

    Thanks for this. I hope that some quantification of the “overexuberance” of the RR53 procedure will help convince people that our article is correct (although we still feel that Fredrickson et al.’s factor analysis is the most critical issue in demonstrating the lack of validity of their study).

    I just wanted to make something clear. Cole and Fredrickson spend quite a bit of their rebuttal letter and supporting document criticising our alleged “bitmapping” procedure, as if we made up some completely unvalidated statistical technique. We didn’t. In fact, the word “bitmapping” doesn’t appear anywhere in our article or SI. At one point we describe how the outer loop of our R program works to generate all of the mathematically possible combinations of the psychometric data into two factors, and this site mention that the algorithm for this involves using a counter variable (which runs from 1 to 8191, the number of unique combinations) which is interpreted as a binary number (aka “bitmap”), with the 0s and 1s in any given position (bits 0 through 13, with a little-endian structure assumed) resulting in the corresponding item from the MHC-SF scale being assigned to one factor or the other. There are lots of other ways we could have programmed this. I wrote this up mainly to help people read our code; it’s completely incidental to our argument. (I mention this because you included the term ” bitmapping” in your post, and I want it to be crystal clear that it is not some kind of resampling or other statistical procedure.)

    • Elizabeth_Reed

      Nick, you are my hero. Thank you!

  • PsychStudent

    Any chance you (or someone else) could provided a simplified description of Fredrickson et al’s argument? I don’t quite understand their detailed response, but it seems that they allege the way that the samples/partitions are done is incorrect. You say you didn’t use the “bitmapping” procedure- but was what Brown et al did actually invalid? Fredrickson et al seem to show that if one uses the “correct” sampling/partitioning procedure, the shockingly high significance rate disappears. It’s not clear that your simulations avoided the issue they bring up. What did you do that they didn’t or vice versa? I’m don’t understand everything they are saying so I can’t honestly judge one way or another, but it doesn’t seem like you addressed the heart of their complaint. Any chance you could clarify?

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Sure :)

      The well-being questionnaire that Fredrickson et al. used has 13 items. Fredrickson et al. chose to calculate two sub-scale scores, “Eudaimonic” and “Hedonic”. (These two scores are correlated r=0.79 in their dataset.) Fredrickson et al. went on to observe differential genomic associations with each scale, and their conclusions are largely based on interpreting these differences.

      “Bitmapping” refers to the way that Brown et al. exhaustively repartitioned the 13 items to calculate other pairs of subscales, from the same questionnaire data. Brown et al. show that most of these other, arbitrary pairs of pseudo-subscales, are also correlated with gene expression and actually give more “interesting” differential associations with the genes, suggesting that there’s nothing ‘special’ about Eudaimonic and Hedonic.

      For details on how this was implemented, see Nick’s comment.

      My simulations however approach the issue from a different angle. I don’t repartition the actual questionnaire data, but simply generate random ‘questionnaire data’ from scratch. I don’t even bother generating individual item scores, I just randomly generate two ‘subscale totals’. This is because my narrow objective in this post is not to criticize Fredrickson et al.’s use and interpretation of questionnaires, but to criticize their RR53 procedure.

      So ‘bitmapping’ and my simulations are two quite different things. Nothing in Fredrickson and Cole’s rebuttal challenges what I have said.

      For that matter, the rebuttal also doesn’t challenge much of what Brown et al said, e.g. I don’t see any attempt to deny the observation that “the RR53 procedure appears to be exquisitely sensitive to even the smallest variations in the data.” which Brown et al expand on in their SI.

      • Nick

        The questionnaire actually has 14 items. :) The reason that there are 8191 possible combinations (i.e., 2^13 – 1) and not (2^14 – 1), is that every pair appears twice at some point in the binary representation of the numbers 1 through 16383. For example, 1234 decimal (00010011010010 binary) and 15149 decimal (11101100101101 binary) both correspond to the same split (questions 2, 5, 7, 8, and 11 in one “factor”, and 1, 3, 4, 6, 9, 10, 12, 13, and 14 in the other). And the “-1″ is because the number zero (or 8191) gives you all zeroes, or all ones, and so one factor has all the questions and the other is empty!

        All of this, however, while fun and interesting for geeks to discuss, is almost peripheral to the main points in the article. The problems with Fredrickson et al.’s factor analysis (namely, that neither their data, nor theory, nor past published research support their Hed/Eud split), combined with the errors in their dataset, mean that their original results are totally unreliable. The demolition of the RR53 procedure is, however, useful for the analysis of other research, past or future, that might use or have used it. (I don’t know whether or not this has been done in the past, but I have seen slides presented by Fredrickson of some current work that seem to suggest that the same method is being used for that.)

        The fact that the procedure that generated Fredrickson et al.’s invalid results is, arguably, hardly surprising; a valid method would not have shown their apparently-significant results. For example, if instead of performing the 53 regressions of the individual genes on Hed/Eud and averaging the resulting coefficients (which are overwhelmingly non-significant), you could instead average the gene expression values (which are commensurate) and do a single regression on each of Hed and Eud. If you do that, you actually get the same effect size numbers as the original study, but there’s a big fat very non-significant p-value attached to each of them. This seems to be because taking the average of the individual correlation coefficients strips the “cloud of uncertainty” (i.e., the wide confidence interval, due to the high p-values) away and leaves you with what looks like a clean data point, as if it had just come out of the gene analysis microarray machine.

        • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

          Exactly, if you average the genes before running the regression, the false positive rate is 5% as expected.

          The coefficient of that regression is equal to the mean of the 53 coefficients from the RR53, but the confidence interval around it is much larger (as it should be).

          If you don’t want to take the mean of 53 genes, you could do a factor analysis of them and extract the primary factor. Or you could take the mean of the z-scores of the gene values.

        • Thom Baguley

          BTW you seem to be describing the ecological fallacy.

      • zbreeze

        You are really good at making the technical descriptions make sense! Thank you.

  • Pingback: 6 Secrets and techniques of Nice Resumes, Backed By Psychology | Posts

  • Pingback: Lastest Psychology Information | Ecircassia

  • Pingback: Lastest Psychology Information | Posts

  • Pingback: Prima o poi – Ocasapiens - Blog - Repubblica.it

  • Pingback: Anatomy of a statistical artifact: Eudaimonic well-being and genomics | DATAHOWLER

  • Pingback: Positive Psychology | Create Resumes | Find Jobs | FastJobz.Com

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT
Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »