
Guess what, we’re related! Credit: Wapondaponda
This is a public service announcement. If you are a user of direct-to-consumer personal genomics services, please do not pay any attention to your mtDNA and Y chromosomal haplogroups. Why? Because they hardly tell you anything about your individual ancestry. What do I mean by this? Your mtDNA comes down from your mother’s-mother’s-mother’s-mother… and similarly for your Y chromosomal lineage if you are a male. These few individuals are not any more likely to contribute to your ancestry than all those multitudes and multitudes who do not contribute to your mtDNA or Y lineages; also known as almost all your ancestors! What you should pay attention to are your autosomal results. Inferences made from most of your genome. These results may be more difficult to parse, but difficulty is no sin, and elegant ease is no virtue, in this case. That’s because you are interested in your ancestry, not a convenient interpretable story.
Of course I am not saying that mtDNA and Y chromosomal haplogroups are useless. They are useful for population scale phylogeography. But please don’t make inferences about yourself from one data point. At least in most cases.
Update: Feature was always there. Just hard to find.
23andMe did a site redesign. Most of it is user interface clean up, but there one particular cool function: if you have an individual’s pedigree up to grandparents you can see which allele they inherited. Just select “Family Traits” under “Family & Friends.”
I have put 1 million markers (from a combination of Illumina SNP-chips) of mine online. I’m also going to put my sequence online when I get it done. Why? What do I gain from this? Hopefully I don’t gain anything from it. By this, I mean that the only major information that is actionable in a life altering sense is likely to be disease related. Though I’ve been contacted about possible loss of function mutations through imputation, so far my genotype has not illuminated any more risk susceptibilities. Rather, I am trying to make it clear by my openness that your genetic information has more power when pooled together with that of others, and small one step in creating that vast pool of information is to demystifying sharing it, and practicing what you (that is, me) preach. My soul is not in my genes, and certainly my genotype reflects me with far less obvious fidelity than a photograph would. By this, I mean that there are many traits that one could predict about me, but many one would be at a loss to predict.
Rebecca Skloot has an op-ed in The New York Times, The Immortal Life of Henrietta Lacks, the Sequel. I’ve read it a few times now and I’ll be honest and say I’m not totally clear on some of the points she’s trying to make, so I didn’t have a strong reaction to it. This is in contrast to Michael Eisen, who has a post up, The Immortal Consenting of Henrietta Lacks. He told me on Twitter that he had some exchanges with Skloot (on Twitter) which informed his response, so he probably has more context than I do. Eisen says:
I’ve gotten several emails about the Vice interview of Geoffrey Miller on BGI’s Cognitive Genetics Project. It’s a sexy piece, and no surprise given Miller’s fascination with the future of China and science (something I share to a moderate extent). But for the love of God please watch this Steve Hsu video first before reading that.
After my previous post my wife started doing research online. The autosomal dominant condition that I have is almost certainly localized to one particular chromosome (there is a large effect QTL there that is strongly associated with my condition). Additionally, I inherit this condition from my mother. My daughter has her whole pedigree genotyped, thanks to 23andMe. My wife went into the Family Inheritance feature, and compared the identity by descent blocks shared between my mother and my daughter. And, it turns out that on that chromosome the only segments inherited from me, her father, come from my father. Ergo, she can not have inherited the autosomal dominant condition from my mother, since she did not inherit those alleles from her!
We are very happy right now. This is one reason I don’t really care about what the F.D.A. thinks about direct-to-consumer personal genomics. We’re talking about commodity technology. And no one is going to stand between you and your health, if you are motivated.
Addendum: With hindsight I could have figured this out myself a year ago. It just hadn’t crossed my mind.
A few weeks ago I put up a new data set into my repository. As is my usual practice now the populations can be found in the .fam file. But I’ve added more into this. I have to rewrite my ADMIXTURE tutorial soon, so I thought I would bring up an important issue when interpreting these data sets using clustering methods: one has to understand that conclusions can not rest on one single result. Rather, one must attempt to ascertain the statistical robustness of the results. If you arrive at an expected result this is obviously not as important a consideration, but if you arrive at a novel and surprising result, then you have to make sure that it isn’t simply a fluke.
To do this I have been running my PHYLOCORE data set with cross-validation (regular 5-fold). In theory you should be able to see where the value is minimized, and that is your “best” K. But, my personal experience with running ADMIXTURE and STRUCTURE is that the inferred plausibility of a given K derived from the statistic can itself be quite volatile. In other words, it is best to run replicates of a data set when attempt to assess robustness. I’m going to run PHYLOCORE 50 times, but I already have 10 runs.
The results are plotted below
This is an example of the type of question I receive all the time:
Here is some genetic analysis of Somalis from yours truly. I don’t necessarily blame the public here, as the marketing of Y and mtDNA lineages has really gotten out of control recently. The problem is that the fine print that Y and mtDNA follow only one direct line of descent is usually there. But, it is accompanied by rich visual and narrative media that tells a story about that marker, and it is this that is salient for most. Not that the story being told is only a very small part of the overall epic cycle that is your genealogy.
(Also, in population genetics using the word “Caucasian” is really confusing. G2 can often be thought of as a Caucasian haplogroup, but I don’t think that that’s what my correspondent meant)
I have very little with which I can disagree with in this Mark Thomas piece, To claim someone has ‘Viking ancestors’ is no better than astrology. His conclusion:
Exaggerated claims from the consumer ancestry industry can also undermine the results of serious research about human genetic history, which is cautiously and slowly building up a clearer picture of the human past for all of us.
Many of the commercial companies plant stories in the media that sound exciting and seem scientific. But very often they are trivial or wrong, are not published in peer-reviewed scientific journals, and just serve as disguised PR for the company.
The only caveat I would offer is that the sort of confusions and misrepresentations that occur with Y and mtDNA phylogeography are dampened when you are looking at a million markers throughout the whole genome. This does not mean there are still no confusions and misrepresentations (e.g., the reference populations matter a great deal when you present someone as a linear combination of X populations, and that summary is still not reality as such, but an informative model). One alarming aspect of the trade in Y and mtDNA is that I’ve met several people who somehow believe that only these lineages are ancestrally informative. That is probably a function of the ease with which you can say someone is “descended from Niall of the Nine Hostages.”
Addendum: I actually asked Jim Wilson on Twitter if I could get a look at the raw results (not even raw data) for the claims made. One major problem when scientists have a go-to-media-first strategy is that things get out of hand very quickly.
Last summer Neuroskeptic posted on The Coming Age of Fetal Genomics. It seems likely to me that this “age” won’t be ushered in with a bang, but we’ll be there before we know it. After all, most people aren’t thinking about having children at any given moment, and don’t track biomedical advances in genetic disease screening until they’re crossing that bridge. Over at Xconomy Luke Timmerman has a post up, Natera Joins Quest in Four-Way Battle for Prenatal Genetic Tests. Here are some important details:
Perhaps. The New York Times has a piece out reviewing the vogue for sequencing the genomes of children who have mysterious diseases. The numbers are what matters here I think:
A few years ago, this sort of test was so difficult and expensive that it was generally only available to participants in research projects like those sponsored by the National Institutes of Health. But the price has plunged in just a few years from tens of thousands of dollars to around $7,000 to $9,000 for a family. Baylor College of Medicine and a handful of companies are now offering it. Insurers usually pay.
Demand has soared — at Baylor, for example, scientists analyzed 5 to 10 DNA sequences a month when the program started in November 2011. Now they are doing more than 130 analyses a month. At the National Institutes of Health, which handles about 300 cases a year as part of its research program, demand is so great that the program is expected to ultimately take on 800 to 900 a year.
…
Experts caution that gene sequencing is no panacea. It finds a genetic aberration in only about 25 to 30 percent of cases. About 3 percent of patients end up with better management of their disorder. About 1 percent get a treatment and a major benefit.
It seems this is a floor in terms of the results outcome for these children, as some of them may receive better or more effective treatments in the future, because the specific nature of their disease is already known. Since most medical treatments today are marginal in effect these outcomes don’t surprise or depress me, and the price point is sure to come down. In the near future I imagine that everyone will have a whole genome sequence, and relevant information about your specific genetic profile in relation to the sea of biomedical literature constantly coming out may be sent to you in a drip, drip, fashion by a phone or web app.
Yesterday a friend of mine who happens to be of doughty German and Scandinavian upper Midwest stock messaged me on Facebook and explained that her father’s results for 23andMe had come in…and he was 43 percent Sub-Saharan African! Her mother’s results came in a few hours later, and she was 35 percent Sub-Saharan African. I went to my account, and my parents were also in the same range. Oh my, overnight I became an underrepresented minority! Obviously this was a bug. The key clause is obviously. There are people who receive results suggesting that they are 5 percent Sub-Saharan African and such. Or someone like Dan MacArthur, who has likely South Asian ancestry, but in the 1-2 percent range.
In the near future one of my projects is revising and expanding the “PHYLO” pedigree file which I put up a week ago. Basically I want there to be a public data set which has a modest number of SNPs useful for phylogenetic analysis (100-200,000) with a wide population coverage. Additionally, I am going to do a few things like rename the family ids to populations, and also release it with scripts to help in running Admixture (for example, shell scripts which will automate replication and later analysis of replicates). Finally, I’m planning on running ~50 replicates of K = 2 to K = 20 with 10-fold cross-validation (yes, this is will take a while) to get a good sense of the “best” K’s. The reality is that most people probably are only interested in the “most informative” K, +/- 1, so there’s no need for everyone to run K = 2 to K = 20. The time saved should be used on running replicates, and then CLUMPP to merge the results.
Over at David Dobbs’ weblog Laura Hercher has a guest post up with the heading The Case for Selective Paternalism in Genetic Testing. Here are some relevant sections:
Which brings me back to this issue of paternalism. I agree that it makes no sense to put up obstacles for inquisitive and motivated individuals who wish to query their genome for information, however laced with uncertainty or peril. But forgive us if our first thoughts are often about how to help (yes, and to protect) the patients we see, in the medical setting. Science literacy is rare. The desire to use web-based tools to analyze their own DNA sequence is vanishingly rare. And a sentence like “Your risk of type II diabetes is decreased by the allele that you carry, in a gene that accounts for an estimated 1.5% of the heritability of the disease” is regularly interpreted as “You will not get type II diabetes.” So we worry about the effect that getting this information may have on the people who live where the sky is blue and the sun is yellow. Sue us.
…
So, yes – more information, not less, is the way of the future, for so many reasons. But I will throw in a plea for understanding that sometimes the opposition is not merely protecting an information fiefdom, but responding to their own previous experience. Sometimes, I get a little protective. I guess that’s paternalism. I plead guilty – guilty, with an explanation.
Over at Genomes Unzipped Vincent Plagnol has put up a post, Exaggerations and errors in the promotion of genetic ancestry testing, which to my mind is an understated and soft-touch old-fashioned “fisking” of the pronouncements of a spokesperson for an outfit termed Britain’s DNA. The whole post is worth reading, but this is a very grave aspect of the response of the company:
…The main reason is that listening to this radio interview prompted my UCL colleagues David Balding and Mark Thomas to ask questions to the Britain’s DNA scientific team; the questions have not been satisfactorily answered. Instead, a threat of legal action was issued by solicitors for Mr Moffat. Any type of legal threat is an ominous sign for an academic debate. This motivated me to point out some of the incorrect, or at the very least exaggerated, statements made in this interview. Importantly, while I received comments from several people for this post, the opinion presented here is entirely mine and does not involve any of my colleagues at Genomes Unzipped.
From what I can gather this firm is charging two to three times more than 23andMe for state-of-the-art scientific genealogy, circa 2002. So if you can’t be bothered to read the piece, it looks like Britain’s DNA is threatening litigation for researchers having the temerity to point out that the firm is providing substandard services at above-market costs. Plagnol’s critique lays out point-by-point refutation of assertions, but the interpretation services on offer seem to resemble nothing more than genetically rooted epic fantasy. A triumph of marketing over science.
Looks like 23andMe has a new $99 price point. If so, that’s 100 markers per cent! (here’s the press release)
1) Privacy: Yes, this a privacy risk. 23andMe is fundamentally an IT company, and IT companies mess up. But I am confident that within 10-15 years genetic information is going to be pretty easy to get anyhow. Your data will be in too many places for any expectation of privacy.
2) Cost/worth it: That is dependent on your income. If you are willing to spend $100 on a nice meal, I think $100 for 1 million markers is an excellent proposition. The markers never depreciate, though in the near future you will you get sequence data which will supersede them.
Court to Decide if Human Genes Can Be Patented. So it seems a group of middle aged to very aged lawyers will decide the decades long Myriad Genetics saga. My position on this issue is simple: if you are going to award patents, they must be awarded to acts of engineering, not discoveries of science. See Genomics Law Report for more well informed commentary.
Many months ago I told some of my friends that I’d run analyses of their 23andMe data, and report it back to them. A year ago I made the same promise to some of my readers. But life got in the way, and I’ve been very busy. I’m working on scripts to make the whole process efficient for me (if you want to know, I’m trying to get the output to be easy to merge many runs with CLUMPP and then produce DISTRUCT type outputs; I’ve done this with other Admixture outputs, but for various reasons the labeling gets messed up with my ‘personal’ project). But I’ve decided to at least start pushing some of the results live. I won’t be putting it in this space, probably razib.com. But I thought I would get your attention first. I know a lot of ID’s are missing, but I’ll add them later when I can find anything. And yes, I need to get back to African Ancestry too (that site was infested with a backdoor, so I had to yank it). This is all rather basic stuff, but I just don’t have the time to do things in a manual fashion, and the scripts I have for population sets don’t transfer over when I want to give individual friend results as well as population results.
The results in tabular format are here. And all individual results are here. In terms of the tech details, ~140,000 SNPs, ~3000 total individuals in the data set, at K = 11. I will probably be reporting K = 12 to K = 25 from now on (I’m just going to get 10> replicates and merge them).
A week ago I posted on a rather scary case of medical doctors withholding information from a family because they felt that it was in the best interests of the family. I objected mostly because I don’t have a good feeling about this sort of paternalism. Laura Hercher has a follow up. She’s not offering just her opinion, but she actually made some calls to people who were involved in the case. From what I can gather in her post the issue that triggered this outrage (in my opinion, it’s an outrage) is that for these particular tests informed consent was simply not mandatory. Since they didn’t have the consent a priori, the doctors had to go with their judgement.
Interesting story in The San Jose Mercury News, Open-source science helps San Carlos father’s genetic quest:
“We used materials that are public, freely available,” said Rienhoff, a physician and scientist, as Beatrice frolicked nearby. “And everything we’ve learned we’ve put back out there, in the public domain. It’s for the patient’s good, and the public good.”
Born with small, weak muscles, long feet and curled fingers, Beatrice confounded all the experts.
No one else in her family had such a syndrome. In fact, apparently no one else in the world did either.
Rienhoff — a biotech consultant trained in math, medicine and genetics at Harvard, Johns Hopkins and the Fred Hutchinson Cancer Research Center in Seattle — launched a search.
He combed the publicly available medical literature, researching diseases, while jotting down each new clue or theory. Because her ailment is so rare, he knew no big labs or advocacy groups would be interested.