Random gene sets can predict breast cancer survival better than supposedly cancer-related ones

By Ed Yong | February 3, 2012 10:00 am

I’ve written a few guest posts for the Faculty of 1000’s Naturally Selected blog, covering some interesting papers from last year that I missed here. There’s one about how eggs greet sperm, and another on how sleeping alone affects newborn babies. But the third post is one that I particularly want to draw attention to – it’s about a cancer paper that didn’t get much notice last year, but seems to deserve it. Here’s the first bit:

Tumours are bundles of cells that grow and divide uncontrollably, and their genes are deployed in unusual ways. By analysing the genes from different tumour samples, scientists have tried to pin down the chaotic events that lead to cancer. They seem to be making headway. Dozens of papers have reported “gene expression signatures” that predict the risk of dying or surviving from cancer, and new ones come out every month.

These signatures purportedly hint at how healthy cells transform into tumours in the first place. If, for example, the genes in question are involved in wound healing, this tells you that the healing process is somehow involved in a tumour’s progression. These collections of genes reveal deeper truths about the disease they’re associated with.

This idea sounds reasonable, but David Venet from the Université Libre de Bruxelles has thrown a big spanner into the works. He has shown that completely random sets of genes can predict the odds of surviving breast cancer better than published signatures.

Venet found three signatures that are completely unconnected to cancer. Instead, these collections of genes were associated with laughing at jokes after lunch, with the experience of social defeat in mice, and with the positioning of skin cells. All of them were associated with breast cancer outcomes.

Head over to Naturally Selected for the rest, including how long it took to get this study published.

Image by Hakan Dahlstrom

CATEGORIZED UNDER: Cancer, Genetics, Medicine & health

Comments (4)

  1. Heather

    Link for How sleeping alone affects newborn babies: http://blog.f1000.com/2012/01/23/never-let-me-go/

  2. Heather

    As to the cancer gene signature paper, xkcd has a comic for every situation… http://xkcd.com/882/

    And the point about dogmas of academic publishing highlights the importance of the open science movement. Looking forward to reading more of your posts on F1000!

  3. Andrew

    To a large extent this had been noted previously, and was attributed to specific properties of the NKI dataset. Thus it is not generally true, but of course is important to bear in mind.


  4. Lee

    Firstly, this paper has an overly sensational title for the paper – the biggest result is actually the authors creating a new signature called metaPCNA. Their main result isn’t even the association of random gene signatures to prognosis.

    Secondly, if you’re familiar with the field of GWA studies, you’ll realize their p-values are ridiculously and suspiciously low (~10^-6 – when in biology do you ever see that?). The reason for this is that GWA studies are built around predictability. A typical p-value for prediction, not training, lies in the 10^-3 to 0.05 range. Ignoring this fact, the authors of the study base their idea that random gene signatures are associated with prognosis on the fact that they can train data sets to cluster without testing them. Those familiar with machine learning know that you can train anything out of any data set you want – the real challenge is finding significance in a separate data set. The over-sensationalized nature of the paper is simply demonstrating this fact. They don’t use any sort of validation or testing sets. The p-values they present are training p-values, not testing p-values. All of their nice, pretty p-values will probably disappear if you applied their trained signatures on a second data set. This was really the main flaw of the paper.

    All in all, their main contribution is in the idea that one should use a more rigorous hypothesis testing on gene signatures in that one should show that one’s signature should outperform 95% of random signatures – but in separate testing sets, not in the training set.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Not Exactly Rocket Science

Dive into the awe-inspiring, beautiful and quirky world of science news with award-winning writer Ed Yong. No previous experience required.

See More

Collapse bottom bar