Update: Feature was always there. Just hard to find.
23andMe did a site redesign. Most of it is user interface clean up, but there one particular cool function: if you have an individual’s pedigree up to grandparents you can see which allele they inherited. Just select “Family Traits” under “Family & Friends.”
I have put 1 million markers (from a combination of Illumina SNP-chips) of mine online. I’m also going to put my sequence online when I get it done. Why? What do I gain from this? Hopefully I don’t gain anything from it. By this, I mean that the only major information that is actionable in a life altering sense is likely to be disease related. Though I’ve been contacted about possible loss of function mutations through imputation, so far my genotype has not illuminated any more risk susceptibilities. Rather, I am trying to make it clear by my openness that your genetic information has more power when pooled together with that of others, and small one step in creating that vast pool of information is to demystifying sharing it, and practicing what you (that is, me) preach. My soul is not in my genes, and certainly my genotype reflects me with far less obvious fidelity than a photograph would. By this, I mean that there are many traits that one could predict about me, but many one would be at a loss to predict.
Larry Moran has a post up, Who Owns Your Genome?, where he mentions me apropos of the HeLa genome disclosure:
In my opinion, there is no excuse for publishing this genome sequence without consent.
Razib Khan disagrees. He thinks that he can publish his genome sequence without obtaining consent from anyone else and I assume he feels the same way about the sequence of the HeLa genome [Henrietta Lacks’ genome, and familial consent].
In response to Larry, I don’t have a definitive opinion about the HeLa genome disclosure in terms of whether it was ethical to release it or not. “Both sides” have positions which I see the validity of. I think ultimately the root issues really date to the 1950s, not today, and they don’t have to do with personal genomics as such. Also, I’d recommend Joe Pickrell’s post, Henrietta Lacks’s genome sequence has been publicly available for years.
Larry also has a question in the comments:
Rebecca Skloot has an op-ed in The New York Times, The Immortal Life of Henrietta Lacks, the Sequel. I’ve read it a few times now and I’ll be honest and say I’m not totally clear on some of the points she’s trying to make, so I didn’t have a strong reaction to it. This is in contrast to Michael Eisen, who has a post up, The Immortal Consenting of Henrietta Lacks. He told me on Twitter that he had some exchanges with Skloot (on Twitter) which informed his response, so he probably has more context than I do. Eisen says:
I’ve gotten several emails about the Vice interview of Geoffrey Miller on BGI’s Cognitive Genetics Project. It’s a sexy piece, and no surprise given Miller’s fascination with the future of China and science (something I share to a moderate extent). But for the love of God please watch this Steve Hsu video first before reading that.
After my previous post my wife started doing research online. The autosomal dominant condition that I have is almost certainly localized to one particular chromosome (there is a large effect QTL there that is strongly associated with my condition). Additionally, I inherit this condition from my mother. My daughter has her whole pedigree genotyped, thanks to 23andMe. My wife went into the Family Inheritance feature, and compared the identity by descent blocks shared between my mother and my daughter. And, it turns out that on that chromosome the only segments inherited from me, her father, come from my father. Ergo, she can not have inherited the autosomal dominant condition from my mother, since she did not inherit those alleles from her!
We are very happy right now. This is one reason I don’t really care about what the F.D.A. thinks about direct-to-consumer personal genomics. We’re talking about commodity technology. And no one is going to stand between you and your health, if you are motivated.
Addendum: With hindsight I could have figured this out myself a year ago. It just hadn’t crossed my mind.
A few weeks ago I put up a new data set into my repository. As is my usual practice now the populations can be found in the .fam file. But I’ve added more into this. I have to rewrite my ADMIXTURE tutorial soon, so I thought I would bring up an important issue when interpreting these data sets using clustering methods: one has to understand that conclusions can not rest on one single result. Rather, one must attempt to ascertain the statistical robustness of the results. If you arrive at an expected result this is obviously not as important a consideration, but if you arrive at a novel and surprising result, then you have to make sure that it isn’t simply a fluke.
To do this I have been running my PHYLOCORE data set with cross-validation (regular 5-fold). In theory you should be able to see where the value is minimized, and that is your “best” K. But, my personal experience with running ADMIXTURE and STRUCTURE is that the inferred plausibility of a given K derived from the statistic can itself be quite volatile. In other words, it is best to run replicates of a data set when attempt to assess robustness. I’m going to run PHYLOCORE 50 times, but I already have 10 runs.
The results are plotted below
This is an example of the type of question I receive all the time:
Here is some genetic analysis of Somalis from yours truly. I don’t necessarily blame the public here, as the marketing of Y and mtDNA lineages has really gotten out of control recently. The problem is that the fine print that Y and mtDNA follow only one direct line of descent is usually there. But, it is accompanied by rich visual and narrative media that tells a story about that marker, and it is this that is salient for most. Not that the story being told is only a very small part of the overall epic cycle that is your genealogy.
(Also, in population genetics using the word “Caucasian” is really confusing. G2 can often be thought of as a Caucasian haplogroup, but I don’t think that that’s what my correspondent meant)
I have very little with which I can disagree with in this Mark Thomas piece, To claim someone has ‘Viking ancestors’ is no better than astrology. His conclusion:
Exaggerated claims from the consumer ancestry industry can also undermine the results of serious research about human genetic history, which is cautiously and slowly building up a clearer picture of the human past for all of us.
Many of the commercial companies plant stories in the media that sound exciting and seem scientific. But very often they are trivial or wrong, are not published in peer-reviewed scientific journals, and just serve as disguised PR for the company.
The only caveat I would offer is that the sort of confusions and misrepresentations that occur with Y and mtDNA phylogeography are dampened when you are looking at a million markers throughout the whole genome. This does not mean there are still no confusions and misrepresentations (e.g., the reference populations matter a great deal when you present someone as a linear combination of X populations, and that summary is still not reality as such, but an informative model). One alarming aspect of the trade in Y and mtDNA is that I’ve met several people who somehow believe that only these lineages are ancestrally informative. That is probably a function of the ease with which you can say someone is “descended from Niall of the Nine Hostages.”
Addendum: I actually asked Jim Wilson on Twitter if I could get a look at the raw results (not even raw data) for the claims made. One major problem when scientists have a go-to-media-first strategy is that things get out of hand very quickly.
Last summer Neuroskeptic posted on The Coming Age of Fetal Genomics. It seems likely to me that this “age” won’t be ushered in with a bang, but we’ll be there before we know it. After all, most people aren’t thinking about having children at any given moment, and don’t track biomedical advances in genetic disease screening until they’re crossing that bridge. Over at Xconomy Luke Timmerman has a post up, Natera Joins Quest in Four-Way Battle for Prenatal Genetic Tests. Here are some important details:
Perhaps. The New York Times has a piece out reviewing the vogue for sequencing the genomes of children who have mysterious diseases. The numbers are what matters here I think:
A few years ago, this sort of test was so difficult and expensive that it was generally only available to participants in research projects like those sponsored by the National Institutes of Health. But the price has plunged in just a few years from tens of thousands of dollars to around $7,000 to $9,000 for a family. Baylor College of Medicine and a handful of companies are now offering it. Insurers usually pay.
Demand has soared — at Baylor, for example, scientists analyzed 5 to 10 DNA sequences a month when the program started in November 2011. Now they are doing more than 130 analyses a month. At the National Institutes of Health, which handles about 300 cases a year as part of its research program, demand is so great that the program is expected to ultimately take on 800 to 900 a year.
…
Experts caution that gene sequencing is no panacea. It finds a genetic aberration in only about 25 to 30 percent of cases. About 3 percent of patients end up with better management of their disorder. About 1 percent get a treatment and a major benefit.
It seems this is a floor in terms of the results outcome for these children, as some of them may receive better or more effective treatments in the future, because the specific nature of their disease is already known. Since most medical treatments today are marginal in effect these outcomes don’t surprise or depress me, and the price point is sure to come down. In the near future I imagine that everyone will have a whole genome sequence, and relevant information about your specific genetic profile in relation to the sea of biomedical literature constantly coming out may be sent to you in a drip, drip, fashion by a phone or web app.
Yesterday a friend of mine who happens to be of doughty German and Scandinavian upper Midwest stock messaged me on Facebook and explained that her father’s results for 23andMe had come in…and he was 43 percent Sub-Saharan African! Her mother’s results came in a few hours later, and she was 35 percent Sub-Saharan African. I went to my account, and my parents were also in the same range. Oh my, overnight I became an underrepresented minority! Obviously this was a bug. The key clause is obviously. There are people who receive results suggesting that they are 5 percent Sub-Saharan African and such. Or someone like Dan MacArthur, who has likely South Asian ancestry, but in the 1-2 percent range.
In the near future one of my projects is revising and expanding the “PHYLO” pedigree file which I put up a week ago. Basically I want there to be a public data set which has a modest number of SNPs useful for phylogenetic analysis (100-200,000) with a wide population coverage. Additionally, I am going to do a few things like rename the family ids to populations, and also release it with scripts to help in running Admixture (for example, shell scripts which will automate replication and later analysis of replicates). Finally, I’m planning on running ~50 replicates of K = 2 to K = 20 with 10-fold cross-validation (yes, this is will take a while) to get a good sense of the “best” K’s. The reality is that most people probably are only interested in the “most informative” K, +/- 1, so there’s no need for everyone to run K = 2 to K = 20. The time saved should be used on running replicates, and then CLUMPP to merge the results.
Over at David Dobbs’ weblog Laura Hercher has a guest post up with the heading The Case for Selective Paternalism in Genetic Testing. Here are some relevant sections:
Which brings me back to this issue of paternalism. I agree that it makes no sense to put up obstacles for inquisitive and motivated individuals who wish to query their genome for information, however laced with uncertainty or peril. But forgive us if our first thoughts are often about how to help (yes, and to protect) the patients we see, in the medical setting. Science literacy is rare. The desire to use web-based tools to analyze their own DNA sequence is vanishingly rare. And a sentence like “Your risk of type II diabetes is decreased by the allele that you carry, in a gene that accounts for an estimated 1.5% of the heritability of the disease” is regularly interpreted as “You will not get type II diabetes.” So we worry about the effect that getting this information may have on the people who live where the sky is blue and the sun is yellow. Sue us.
…
So, yes – more information, not less, is the way of the future, for so many reasons. But I will throw in a plea for understanding that sometimes the opposition is not merely protecting an information fiefdom, but responding to their own previous experience. Sometimes, I get a little protective. I guess that’s paternalism. I plead guilty – guilty, with an explanation.
The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).
Over at Genomes Unzipped Vincent Plagnol has put up a post, Exaggerations and errors in the promotion of genetic ancestry testing, which to my mind is an understated and soft-touch old-fashioned “fisking” of the pronouncements of a spokesperson for an outfit termed Britain’s DNA. The whole post is worth reading, but this is a very grave aspect of the response of the company:
…The main reason is that listening to this radio interview prompted my UCL colleagues David Balding and Mark Thomas to ask questions to the Britain’s DNA scientific team; the questions have not been satisfactorily answered. Instead, a threat of legal action was issued by solicitors for Mr Moffat. Any type of legal threat is an ominous sign for an academic debate. This motivated me to point out some of the incorrect, or at the very least exaggerated, statements made in this interview. Importantly, while I received comments from several people for this post, the opinion presented here is entirely mine and does not involve any of my colleagues at Genomes Unzipped.
From what I can gather this firm is charging two to three times more than 23andMe for state-of-the-art scientific genealogy, circa 2002. So if you can’t be bothered to read the piece, it looks like Britain’s DNA is threatening litigation for researchers having the temerity to point out that the firm is providing substandard services at above-market costs. Plagnol’s critique lays out point-by-point refutation of assertions, but the interpretation services on offer seem to resemble nothing more than genetically rooted epic fantasy. A triumph of marketing over science.
My initial inclination in this post was to discuss a recent ordering snafu which resulted in many of my friends being quite peeved at 23andMe. But browsing through their new ‘ancestry composition’ feature I thought I had to discuss it first, because of some nerd-level intrigue. Though I agree with many of Dienekes concerns about this new feature, I have to admit that at least this method doesn’t give out positively misleading results. For example, I had complained earlier that ‘ancestry painting’ gave literally crazy results when they weren’t trivial. It said I was ~60 percent European, which makes some coherent sense in their non-optimal reference population set, but then stated that my daughter was >90 percent European. Since 23andMe did confirm she was 50% identical by descent with me these results didn’t make sense; some readers suggested that there was a strong bias in their algorithms to assign ambiguous genomic segments to ‘European’ heritage (this was a problem for East Africans too).
Here’s my daughter’s new chromosome painting:
One aspect of 23andMe’s new ancestry composition feature is that it is very Eurocentric. But, most of the customers are white, and presumably the reference populations they used (which are from customers) are also white. Though there are plenty of public domain non-white data sets they could have used, I assume they’d prefer to eat their own data dog-food in this case. But that’s really a minor gripe in the grand scheme of things. This is a huge upgrade from what came before. Now, it’s not telling me, as a South Asian, very much. But, it’s not telling me ludicrous things anymore either!
But in regards to omission I am curious to know why this new feature rates my family as only ~3% East Asian, when other analyses put us in the 10-15% range. The problem with very high values is that South Asians often have some residual ‘eastern’ signal, which I suspect is not real admixture, but is an artifact. Nevertheless, northeast Indians, including Bengalis, often have genuine East Asia admixture. On PCA plots my family is shifted considerably toward East Asians. The signal they are picking up probably isn’t noise. Almost every apportionment of East Asian ancestry I’ve seen for my family yields a greater value for my mother, and that holds here. It’s just that the values are implausibly low.
In any case, that’s not the strangest thing I saw. I was clicking around people who I had “shared” genomes with, and I stumbled upon this:
As you can guess from the screenshot this is Daniel MacArthur’s profile. And according to this ~25% of chromosome 10 is South Asian! On first blush this seemed totally nonsensical to me, so I clicked around other profiles of people of similar Northern European background…and I didn’t see anything equivalent.
What to do? It’s going to take more evidence than this to shake my prior assumptions, so I downloaded Dr. MacArthur’s genotype. Then I merged it with three HapMap populations, the Utah whites (CEU), the Gujaratis (GIH), and the Chinese from Denver (CHD). The last was basically a control. I pulled out chromosome 10. I also added Dan’s wife Ilana to the data set, since I believe she got typed with the same Illumina chip, and is of similar ethnic background (i.e., very white). It is important to note that only 28,000 SNPs remained in the data set. But usually 10,000 is more than sufficient on SNP data for model-based clustering with inter-continental scale variation.
I did two things:
1) I ran ADMIXTURE at K = 3, unsupervised
2) I ran an MDS, which visualized the genetic variation in multiple dimensions
Before I go on, I will state what I found: these methods supported the inference from 23andMe, on chromosome 10 Dr. MacArthur seems to have an affinity with South Asians (i.e., this is his ‘curry chromosome’). Here are the average (median) values in tabular format, with MacArthur and his wife presented for comparison.
| ADMIXTURE results for chromosome 10 | |||
| K 1 | K 2 | K 3 | |
| CEU | 0.04 | 0.02 | 0.93 |
| GIH | 0.87 | 0.05 | 0.08 |
| CHD | 0.01 | 0.97 | 0.01 |
| Daniel MacArthur | 0.29 | 0.07 | 0.64 |
| Ilana Fisher | 0.01 | 0.06 | 0.94 |
You probably want a distribution. Out of the non-founder CEU sample none went above 20% South Asian. Though it did surprise me that a few were that high, making it more plausible to me that MacArthur’s results on chromosome 10 were a fluke:
And here’s the MDS with the two largest dimensions:
Again, it’s evident that this chromosome 10 is shifted toward South Asians. If I had more time right now what I’d do is probably get that specific chromosomal segment, phase it, and then compare it to various South Asian populations. But I don’t have time now, so I went and checked out the results from the Interpretome. I cranked up the settings to reduce the noise, and so that it would only spit out the most robust and significant results. As you can see, again chromosome 10 comes up as the one which isn’t quite like the others.
Is there is a plausible explanation for this? Perhaps Dr. MacArthur can call up a helpful relative? From what recall his parents are immigrants from the United Kingdom, and it isn’t unheard of that white Britons do have South Asian ancestry which dates back to the 19th century. Though to be totally honest I’m rather agnostic about all this right now. This genotype has been “out” for years now, so how is it that no one has noticed this peculiarity??? Perhaps the issue is that everyone was looking at the genome wide average, and it just doesn’t rise to the level of notice? What I really want to do is look at the distribution of all chromosomes and see how Daniel MacArthur’s chromosome 10 then stacks up. It might be a random act of nature yet.
Also, I guess I should add that at ~1.5% South Asian that would be consistent with one of MacArthur’s great-great-great-great grandparents being Indian. Assuming 25 year generation times that puts them in the mid-19th century. Of course, at such a low proportion the variance is going to be high, so it is quite possible that you need to push the real date of admixture one generation back, or one generation forward.
Court to Decide if Human Genes Can Be Patented. So it seems a group of middle aged to very aged lawyers will decide the decades long Myriad Genetics saga. My position on this issue is simple: if you are going to award patents, they must be awarded to acts of engineering, not discoveries of science. See Genomics Law Report for more well informed commentary.
Many months ago I told some of my friends that I’d run analyses of their 23andMe data, and report it back to them. A year ago I made the same promise to some of my readers. But life got in the way, and I’ve been very busy. I’m working on scripts to make the whole process efficient for me (if you want to know, I’m trying to get the output to be easy to merge many runs with CLUMPP and then produce DISTRUCT type outputs; I’ve done this with other Admixture outputs, but for various reasons the labeling gets messed up with my ‘personal’ project). But I’ve decided to at least start pushing some of the results live. I won’t be putting it in this space, probably razib.com. But I thought I would get your attention first. I know a lot of ID’s are missing, but I’ll add them later when I can find anything. And yes, I need to get back to African Ancestry too (that site was infested with a backdoor, so I had to yank it). This is all rather basic stuff, but I just don’t have the time to do things in a manual fashion, and the scripts I have for population sets don’t transfer over when I want to give individual friend results as well as population results.
The results in tabular format are here. And all individual results are here. In terms of the tech details, ~140,000 SNPs, ~3000 total individuals in the data set, at K = 11. I will probably be reporting K = 12 to K = 25 from now on (I’m just going to get 10> replicates and merge them).
A week ago I posted on a rather scary case of medical doctors withholding information from a family because they felt that it was in the best interests of the family. I objected mostly because I don’t have a good feeling about this sort of paternalism. Laura Hercher has a follow up. She’s not offering just her opinion, but she actually made some calls to people who were involved in the case. From what I can gather in her post the issue that triggered this outrage (in my opinion, it’s an outrage) is that for these particular tests informed consent was simply not mandatory. Since they didn’t have the consent a priori, the doctors had to go with their judgement.