It’s been a busy few days in the world of personal genomics. By coincidence I have a coauthored comment in Genome Biology out, Rumors of the death of consumer genomics are greatly exaggerated (it was written and submitted a while back). If you haven’t, please read the FDA’s letter, and 23andMe’s response, as much as there is one right now. Since Slate ran my piece on Monday a lot of people have offered smart, and more well informed, takes. On the one hand you have someone like Alex Tabarrok, with “Our DNA, Our Selves”, which is close to a libertarian cri de coeur. Then you have cases like Christine Gorman, “FDA Was Right to Block 23andMe”. It will be no surprise that I am much closer to Tabarrok than I am to Gorman (she doesn’t even seem to be aware that 23andMe offers a genotyping, not sequencing, service, though fuzziness on the details doesn’t discourage strong opinions from her). An interesting aspect is that many who are not deeply in the technical weeds of the issue are exhibiting politicized responses. I’ve noticed this on Facebook, where some seem to think that 23andMe and the Tea Party have something to do with each other, and the Obama administration and the FDA are basically stand-ins. In other words, some liberals are seeing this dispute as one of those attempts to evade government regulation, something they support on prior grounds. Though Tabarrok is more well informed than the average person (his wife is a biologist), there are others from the right-wing who are taking 23andMe’s side on normative grounds as well. Ultimately I’m not interested in this this argument, because it’s not going to have any significant lasting power. No one will remember in 20 years. As I implied in my Slate piece 23andMe the company now is less interesting than personal genomics the industry sector in the future. Over the long term I’m optimistic that it will evolve into a field which impacts our lives broadly. Nothing the United States government can do will change that.
Yet tunneling down to the level of 23andMe’s specific issues with the regulatory process, there is the reality that it has to deal with the US government and the FDA, no matter what the details of its science are. It’s a profit-making firm. Matt Herper has a judicious take on this, 23andStupid: Is 23andMe Self-Destructing? I don’t have any “inside” information, so I’m not going to offer the hypothesis that this is part of some grand master plan by Anne Wojcicki. I hope it is, but that’s because I want 23andMe to continue to subsidize genotyping services (I’ve heard that though 23andMe owns the machines, the typing is done by LabCorp. And last I checked the $99 upfront cost is a major loss leader; they’re paying you to get typed). I’m afraid that they goofed here, and miscalculated. As I said above, it won’t make a major difference in the long run, but I have many friends who were waiting until this Christmas to purchase kits from 23andMe.
First, download your 23andMe raw results now if you have them. If you don’t know what’s going on, the FDA has finally started to move aggressively against the firm. Unfortunately this is not surprising, as this was foreshadowed years ago. And, 23andMe has been moving aggressively to emphasize its medical, as opposed to genealogical, services over the past year. But this isn’t the story of one firm. This is the story of government response to very important structural shifts occurring in the medical delivery system of the United States. The government could potentially bankrupt 23andMe, but taking a step back that would still be like the RIAA managing to take down Napster. The information is coming, and if there’s one thing that can overpower state planning it is consumer demand. Unless the US government wants to ban their citizens from receiving their own genetic data they’re just putting off the inevitable outsourcing of various interpretation services. Engagement would probably be the better long term bet, but I don’t see that happening.
The last week has seen a lot of chatter about the slapping down of the diagnostic patent by Sequenom, Judge Invalidates Patent for a Down Syndrome Test:
A federal judge has invalidated the central patent underlying a noninvasive method of detecting Down syndrome in fetuses without the risk of inducing a miscarriage.
The ruling is a blow to Sequenom, a California company that introduced the first such noninvasive test in 2011 and has been trying to lock out competitors in a fast-growing market by claiming they infringe on the patent.
Sequenom’s stock fell 23 percent on Thursday, to $1.92.
The judge, Susan Illston of the United States District Court in Northern California, issued a ruling on Wednesday that the patent was invalid because it covered a natural phenomenon — the presence of DNA from the fetus in the mother’s blood.
The existence of intellectual property is a utilitarian one. That is, these are institutions which are meant to further the cause of creativity and innovation. Is there going to be an abandonment in this domain of the push toward technological innovation? Coincidentally in the last week of October Sequenom put out a press release which heralded some advances in its panel:
Matter has a very long feature by my friend Virginia Hughes, Uprooted, on how personal genomics is changing, and sometimes disrupting, family relationships. I sat in on one session at the Consumer Genetics Conference last week, and an audience member expressed worry about how genetic results might cause family disruption. This individual was actually a faculty member who wanted to introduce personal genomics into the classroom as a way to educate, but was wary of these sorts of side effects. Even neglecting the reality that paternity uncertainty is likely far less pervasive among the sorts whose parents would be enrolling their offspring at universities in the Boston area, these worries always have to be predicated by the fact that even dodging this ethical gray zone in the specific case only delays the near-future inevitable. Unless medical authorities ubiquitously and invariably selectively shield this sort of information from the relevant parties the widespread adoption of genetic analysis as a consumer product will result in exposure of this sort of information. Though it may seem crazy preemptive testing of all offspring to ascertain biological relatedness of putative parents may simply be the best way to head off this issue, which will be like a ticking time bomb.
The website The Root often has a Q & A with various African Americans, famous and not so famous, about their genealogy in relation to personal genomics. In most cases these tests tell you what you already know, but for African Americans there is often actually value-add in terms of greater specificity and precision, which would otherwise be lacking for obvious historical reasons. Despite its objective scientific patina the processing and interpretation of the resultant information can be rather subject, and illuminating. Recently they sat down with actress, and the first black winner of Miss America, Vanessa L. Williams, to discuss her results. There were two passages which I think were particularly interesting, so I’ll quote them below:
This is a public service announcement. If you are a user of direct-to-consumer personal genomics services, please do not pay any attention to your mtDNA and Y chromosomal haplogroups. Why? Because they hardly tell you anything about your individual ancestry. What do I mean by this? Your mtDNA comes down from your mother’s-mother’s-mother’s-mother… and similarly for your Y chromosomal lineage if you are a male. These few individuals are not any more likely to contribute to your ancestry than all those multitudes and multitudes who do not contribute to your mtDNA or Y lineages; also known as almost all your ancestors! What you should pay attention to are your autosomal results. Inferences made from most of your genome. These results may be more difficult to parse, but difficulty is no sin, and elegant ease is no virtue, in this case. That’s because you are interested in your ancestry, not a convenient interpretable story.
Of course I am not saying that mtDNA and Y chromosomal haplogroups are useless. They are useful for population scale phylogeography. But please don’t make inferences about yourself from one data point. At least in most cases.
Update: Feature was always there. Just hard to find.
23andMe did a site redesign. Most of it is user interface clean up, but there one particular cool function: if you have an individual’s pedigree up to grandparents you can see which allele they inherited. Just select “Family Traits” under “Family & Friends.”
I have put 1 million markers (from a combination of Illumina SNP-chips) of mine online. I’m also going to put my sequence online when I get it done. Why? What do I gain from this? Hopefully I don’t gain anything from it. By this, I mean that the only major information that is actionable in a life altering sense is likely to be disease related. Though I’ve been contacted about possible loss of function mutations through imputation, so far my genotype has not illuminated any more risk susceptibilities. Rather, I am trying to make it clear by my openness that your genetic information has more power when pooled together with that of others, and small one step in creating that vast pool of information is to demystifying sharing it, and practicing what you (that is, me) preach. My soul is not in my genes, and certainly my genotype reflects me with far less obvious fidelity than a photograph would. By this, I mean that there are many traits that one could predict about me, but many one would be at a loss to predict.
Rebecca Skloot has an op-ed in The New York Times, The Immortal Life of Henrietta Lacks, the Sequel. I’ve read it a few times now and I’ll be honest and say I’m not totally clear on some of the points she’s trying to make, so I didn’t have a strong reaction to it. This is in contrast to Michael Eisen, who has a post up, The Immortal Consenting of Henrietta Lacks. He told me on Twitter that he had some exchanges with Skloot (on Twitter) which informed his response, so he probably has more context than I do. Eisen says:
I’ve gotten several emails about the Vice interview of Geoffrey Miller on BGI’s Cognitive Genetics Project. It’s a sexy piece, and no surprise given Miller’s fascination with the future of China and science (something I share to a moderate extent). But for the love of God please watch this Steve Hsu video first before reading that.
After my previous post my wife started doing research online. The autosomal dominant condition that I have is almost certainly localized to one particular chromosome (there is a large effect QTL there that is strongly associated with my condition). Additionally, I inherit this condition from my mother. My daughter has her whole pedigree genotyped, thanks to 23andMe. My wife went into the Family Inheritance feature, and compared the identity by descent blocks shared between my mother and my daughter. And, it turns out that on that chromosome the only segments inherited from me, her father, come from my father. Ergo, she can not have inherited the autosomal dominant condition from my mother, since she did not inherit those alleles from her!
We are very happy right now. This is one reason I don’t really care about what the F.D.A. thinks about direct-to-consumer personal genomics. We’re talking about commodity technology. And no one is going to stand between you and your health, if you are motivated.
Addendum: With hindsight I could have figured this out myself a year ago. It just hadn’t crossed my mind.
A few weeks ago I put up a new data set into my repository. As is my usual practice now the populations can be found in the .fam file. But I’ve added more into this. I have to rewrite my ADMIXTURE tutorial soon, so I thought I would bring up an important issue when interpreting these data sets using clustering methods: one has to understand that conclusions can not rest on one single result. Rather, one must attempt to ascertain the statistical robustness of the results. If you arrive at an expected result this is obviously not as important a consideration, but if you arrive at a novel and surprising result, then you have to make sure that it isn’t simply a fluke.
To do this I have been running my PHYLOCORE data set with cross-validation (regular 5-fold). In theory you should be able to see where the value is minimized, and that is your “best” K. But, my personal experience with running ADMIXTURE and STRUCTURE is that the inferred plausibility of a given K derived from the statistic can itself be quite volatile. In other words, it is best to run replicates of a data set when attempt to assess robustness. I’m going to run PHYLOCORE 50 times, but I already have 10 runs.
The results are plotted below
This is an example of the type of question I receive all the time:
Here is some genetic analysis of Somalis from yours truly. I don’t necessarily blame the public here, as the marketing of Y and mtDNA lineages has really gotten out of control recently. The problem is that the fine print that Y and mtDNA follow only one direct line of descent is usually there. But, it is accompanied by rich visual and narrative media that tells a story about that marker, and it is this that is salient for most. Not that the story being told is only a very small part of the overall epic cycle that is your genealogy.
I have very little with which I can disagree with in this Mark Thomas piece, To claim someone has ‘Viking ancestors’ is no better than astrology. His conclusion:
Exaggerated claims from the consumer ancestry industry can also undermine the results of serious research about human genetic history, which is cautiously and slowly building up a clearer picture of the human past for all of us.
Many of the commercial companies plant stories in the media that sound exciting and seem scientific. But very often they are trivial or wrong, are not published in peer-reviewed scientific journals, and just serve as disguised PR for the company.
The only caveat I would offer is that the sort of confusions and misrepresentations that occur with Y and mtDNA phylogeography are dampened when you are looking at a million markers throughout the whole genome. This does not mean there are still no confusions and misrepresentations (e.g., the reference populations matter a great deal when you present someone as a linear combination of X populations, and that summary is still not reality as such, but an informative model). One alarming aspect of the trade in Y and mtDNA is that I’ve met several people who somehow believe that only these lineages are ancestrally informative. That is probably a function of the ease with which you can say someone is “descended from Niall of the Nine Hostages.”
Addendum: I actually asked Jim Wilson on Twitter if I could get a look at the raw results (not even raw data) for the claims made. One major problem when scientists have a go-to-media-first strategy is that things get out of hand very quickly.
Last summer Neuroskeptic posted on The Coming Age of Fetal Genomics. It seems likely to me that this “age” won’t be ushered in with a bang, but we’ll be there before we know it. After all, most people aren’t thinking about having children at any given moment, and don’t track biomedical advances in genetic disease screening until they’re crossing that bridge. Over at Xconomy Luke Timmerman has a post up, Natera Joins Quest in Four-Way Battle for Prenatal Genetic Tests. Here are some important details:
Perhaps. The New York Times has a piece out reviewing the vogue for sequencing the genomes of children who have mysterious diseases. The numbers are what matters here I think:
A few years ago, this sort of test was so difficult and expensive that it was generally only available to participants in research projects like those sponsored by the National Institutes of Health. But the price has plunged in just a few years from tens of thousands of dollars to around $7,000 to $9,000 for a family. Baylor College of Medicine and a handful of companies are now offering it. Insurers usually pay.
Demand has soared — at Baylor, for example, scientists analyzed 5 to 10 DNA sequences a month when the program started in November 2011. Now they are doing more than 130 analyses a month. At the National Institutes of Health, which handles about 300 cases a year as part of its research program, demand is so great that the program is expected to ultimately take on 800 to 900 a year.
Experts caution that gene sequencing is no panacea. It finds a genetic aberration in only about 25 to 30 percent of cases. About 3 percent of patients end up with better management of their disorder. About 1 percent get a treatment and a major benefit.
It seems this is a floor in terms of the results outcome for these children, as some of them may receive better or more effective treatments in the future, because the specific nature of their disease is already known. Since most medical treatments today are marginal in effect these outcomes don’t surprise or depress me, and the price point is sure to come down. In the near future I imagine that everyone will have a whole genome sequence, and relevant information about your specific genetic profile in relation to the sea of biomedical literature constantly coming out may be sent to you in a drip, drip, fashion by a phone or web app.
Yesterday a friend of mine who happens to be of doughty German and Scandinavian upper Midwest stock messaged me on Facebook and explained that her father’s results for 23andMe had come in…and he was 43 percent Sub-Saharan African! Her mother’s results came in a few hours later, and she was 35 percent Sub-Saharan African. I went to my account, and my parents were also in the same range. Oh my, overnight I became an underrepresented minority! Obviously this was a bug. The key clause is obviously. There are people who receive results suggesting that they are 5 percent Sub-Saharan African and such. Or someone like Dan MacArthur, who has likely South Asian ancestry, but in the 1-2 percent range.
In the near future one of my projects is revising and expanding the “PHYLO” pedigree file which I put up a week ago. Basically I want there to be a public data set which has a modest number of SNPs useful for phylogenetic analysis (100-200,000) with a wide population coverage. Additionally, I am going to do a few things like rename the family ids to populations, and also release it with scripts to help in running Admixture (for example, shell scripts which will automate replication and later analysis of replicates). Finally, I’m planning on running ~50 replicates of K = 2 to K = 20 with 10-fold cross-validation (yes, this is will take a while) to get a good sense of the “best” K’s. The reality is that most people probably are only interested in the “most informative” K, +/- 1, so there’s no need for everyone to run K = 2 to K = 20. The time saved should be used on running replicates, and then CLUMPP to merge the results.