The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).
Over at Genomes Unzipped Vincent Plagnol has put up a post, Exaggerations and errors in the promotion of genetic ancestry testing, which to my mind is an understated and soft-touch old-fashioned “fisking” of the pronouncements of a spokesperson for an outfit termed Britain’s DNA. The whole post is worth reading, but this is a very grave aspect of the response of the company:
…The main reason is that listening to this radio interview prompted my UCL colleagues David Balding and Mark Thomas to ask questions to the Britain’s DNA scientific team; the questions have not been satisfactorily answered. Instead, a threat of legal action was issued by solicitors for Mr Moffat. Any type of legal threat is an ominous sign for an academic debate. This motivated me to point out some of the incorrect, or at the very least exaggerated, statements made in this interview. Importantly, while I received comments from several people for this post, the opinion presented here is entirely mine and does not involve any of my colleagues at Genomes Unzipped.
From what I can gather this firm is charging two to three times more than 23andMe for state-of-the-art scientific genealogy, circa 2002. So if you can’t be bothered to read the piece, it looks like Britain’s DNA is threatening litigation for researchers having the temerity to point out that the firm is providing substandard services at above-market costs. Plagnol’s critique lays out point-by-point refutation of assertions, but the interpretation services on offer seem to resemble nothing more than genetically rooted epic fantasy. A triumph of marketing over science.
My initial inclination in this post was to discuss a recent ordering snafu which resulted in many of my friends being quite peeved at 23andMe. But browsing through their new ‘ancestry composition’ feature I thought I had to discuss it first, because of some nerd-level intrigue. Though I agree with many of Dienekes concerns about this new feature, I have to admit that at least this method doesn’t give out positively misleading results. For example, I had complained earlier that ‘ancestry painting’ gave literally crazy results when they weren’t trivial. It said I was ~60 percent European, which makes some coherent sense in their non-optimal reference population set, but then stated that my daughter was >90 percent European. Since 23andMe did confirm she was 50% identical by descent with me these results didn’t make sense; some readers suggested that there was a strong bias in their algorithms to assign ambiguous genomic segments to ‘European’ heritage (this was a problem for East Africans too).
Here’s my daughter’s new chromosome painting:
One aspect of 23andMe’s new ancestry composition feature is that it is very Eurocentric. But, most of the customers are white, and presumably the reference populations they used (which are from customers) are also white. Though there are plenty of public domain non-white data sets they could have used, I assume they’d prefer to eat their own data dog-food in this case. But that’s really a minor gripe in the grand scheme of things. This is a huge upgrade from what came before. Now, it’s not telling me, as a South Asian, very much. But, it’s not telling me ludicrous things anymore either!
But in regards to omission I am curious to know why this new feature rates my family as only ~3% East Asian, when other analyses put us in the 10-15% range. The problem with very high values is that South Asians often have some residual ‘eastern’ signal, which I suspect is not real admixture, but is an artifact. Nevertheless, northeast Indians, including Bengalis, often have genuine East Asia admixture. On PCA plots my family is shifted considerably toward East Asians. The signal they are picking up probably isn’t noise. Almost every apportionment of East Asian ancestry I’ve seen for my family yields a greater value for my mother, and that holds here. It’s just that the values are implausibly low.
In any case, that’s not the strangest thing I saw. I was clicking around people who I had “shared” genomes with, and I stumbled upon this:
As you can guess from the screenshot this is Daniel MacArthur’s profile. And according to this ~25% of chromosome 10 is South Asian! On first blush this seemed totally nonsensical to me, so I clicked around other profiles of people of similar Northern European background…and I didn’t see anything equivalent.
What to do? It’s going to take more evidence than this to shake my prior assumptions, so I downloaded Dr. MacArthur’s genotype. Then I merged it with three HapMap populations, the Utah whites (CEU), the Gujaratis (GIH), and the Chinese from Denver (CHD). The last was basically a control. I pulled out chromosome 10. I also added Dan’s wife Ilana to the data set, since I believe she got typed with the same Illumina chip, and is of similar ethnic background (i.e., very white). It is important to note that only 28,000 SNPs remained in the data set. But usually 10,000 is more than sufficient on SNP data for model-based clustering with inter-continental scale variation.
I did two things:
1) I ran ADMIXTURE at K = 3, unsupervised
2) I ran an MDS, which visualized the genetic variation in multiple dimensions
Before I go on, I will state what I found: these methods supported the inference from 23andMe, on chromosome 10 Dr. MacArthur seems to have an affinity with South Asians (i.e., this is his ‘curry chromosome’). Here are the average (median) values in tabular format, with MacArthur and his wife presented for comparison.
|ADMIXTURE results for chromosome 10|
|K 1||K 2||K 3|
You probably want a distribution. Out of the non-founder CEU sample none went above 20% South Asian. Though it did surprise me that a few were that high, making it more plausible to me that MacArthur’s results on chromosome 10 were a fluke:
And here’s the MDS with the two largest dimensions:
Again, it’s evident that this chromosome 10 is shifted toward South Asians. If I had more time right now what I’d do is probably get that specific chromosomal segment, phase it, and then compare it to various South Asian populations. But I don’t have time now, so I went and checked out the results from the Interpretome. I cranked up the settings to reduce the noise, and so that it would only spit out the most robust and significant results. As you can see, again chromosome 10 comes up as the one which isn’t quite like the others.
Is there is a plausible explanation for this? Perhaps Dr. MacArthur can call up a helpful relative? From what recall his parents are immigrants from the United Kingdom, and it isn’t unheard of that white Britons do have South Asian ancestry which dates back to the 19th century. Though to be totally honest I’m rather agnostic about all this right now. This genotype has been “out” for years now, so how is it that no one has noticed this peculiarity??? Perhaps the issue is that everyone was looking at the genome wide average, and it just doesn’t rise to the level of notice? What I really want to do is look at the distribution of all chromosomes and see how Daniel MacArthur’s chromosome 10 then stacks up. It might be a random act of nature yet.
Also, I guess I should add that at ~1.5% South Asian that would be consistent with one of MacArthur’s great-great-great-great grandparents being Indian. Assuming 25 year generation times that puts them in the mid-19th century. Of course, at such a low proportion the variance is going to be high, so it is quite possible that you need to push the real date of admixture one generation back, or one generation forward.
Court to Decide if Human Genes Can Be Patented. So it seems a group of middle aged to very aged lawyers will decide the decades long Myriad Genetics saga. My position on this issue is simple: if you are going to award patents, they must be awarded to acts of engineering, not discoveries of science. See Genomics Law Report for more well informed commentary.
Many months ago I told some of my friends that I’d run analyses of their 23andMe data, and report it back to them. A year ago I made the same promise to some of my readers. But life got in the way, and I’ve been very busy. I’m working on scripts to make the whole process efficient for me (if you want to know, I’m trying to get the output to be easy to merge many runs with CLUMPP and then produce DISTRUCT type outputs; I’ve done this with other Admixture outputs, but for various reasons the labeling gets messed up with my ‘personal’ project). But I’ve decided to at least start pushing some of the results live. I won’t be putting it in this space, probably razib.com. But I thought I would get your attention first. I know a lot of ID’s are missing, but I’ll add them later when I can find anything. And yes, I need to get back to African Ancestry too (that site was infested with a backdoor, so I had to yank it). This is all rather basic stuff, but I just don’t have the time to do things in a manual fashion, and the scripts I have for population sets don’t transfer over when I want to give individual friend results as well as population results.
The results in tabular format are here. And all individual results are here. In terms of the tech details, ~140,000 SNPs, ~3000 total individuals in the data set, at K = 11. I will probably be reporting K = 12 to K = 25 from now on (I’m just going to get 10> replicates and merge them).
A week ago I posted on a rather scary case of medical doctors withholding information from a family because they felt that it was in the best interests of the family. I objected mostly because I don’t have a good feeling about this sort of paternalism. Laura Hercher has a follow up. She’s not offering just her opinion, but she actually made some calls to people who were involved in the case. From what I can gather in her post the issue that triggered this outrage (in my opinion, it’s an outrage) is that for these particular tests informed consent was simply not mandatory. Since they didn’t have the consent a priori, the doctors had to go with their judgement.
Image credit: Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings
I really love the paper Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings. I first read it about six years ago. The result is rather straightforward, but the problem is empirically a moderately deep one. Modern analytic genetics as the fusion between Mendelism and biometrics began with R. A. Fisher’s The Correlation between Relatives on the Supposition of Mendelian Inheritance in 1918. But note, that paper assumed particular relatedness between relatives. As highlighted in the above paper the expected values for most categories of relatedness always had a variance component which was unaccounted for, and so reduced the power of the methodology to ascertain the extent of heritability. The relatedness you can expect between any two siblings is ~0.50, and that is also the average across all siblings. But the reality is that in most cases two given siblings will not share half their genes identical by descent.
Interesting story in The San Jose Mercury News, Open-source science helps San Carlos father’s genetic quest:
“We used materials that are public, freely available,” said Rienhoff, a physician and scientist, as Beatrice frolicked nearby. “And everything we’ve learned we’ve put back out there, in the public domain. It’s for the patient’s good, and the public good.”
Born with small, weak muscles, long feet and curled fingers, Beatrice confounded all the experts.
No one else in her family had such a syndrome. In fact, apparently no one else in the world did either.
Rienhoff — a biotech consultant trained in math, medicine and genetics at Harvard, Johns Hopkins and the Fred Hutchinson Cancer Research Center in Seattle — launched a search.
He combed the publicly available medical literature, researching diseases, while jotting down each new clue or theory. Because her ailment is so rare, he knew no big labs or advocacy groups would be interested.
I noticed today that GEDmatch is trying to raise funds to cover the cost of their web services. What are those services? Basically if you get raw data back from direct-to-consumer genotyping firms GEDmatch allows you to run further analytics. You can do some of these yourself…but most people aren’t going to know how to convert their files into pedigree format and use PLINK. As genotype data becomes more and more common there’ll be more need for analytic services like GEDmatch, whether for profit or for fun.
One can imagine a near future where much of the work can be offloaded to desktop applications (e.g., Promethease does this for traits and diseases). But the problem is that there is greater returns to the analysis when you aggregate the source data into huge agglomerations, and if people are doing their own analyses on their own systems that’s not going to happen (the GWAS information in Promethease uses aggregated information implicitly with the studies they rely on). This is why GEDmatch and openSNP are important.
In any case, if you have used GEDmatch and wish to give, this would be a good time.
AncestryDNA believes that our customers have the right to their own genetic data. It is your DNA, after all. So we’re working to provide access to your raw DNA data in early 2013, which includes related security enhancements to ensure its safety during every step of the process. Moving forward, we plan to add even more tools and improvements for our customers, and any new features will be available to all AncestryDNA members.
If the rights of the customers to own their own data were so important to them they should have front-loaded this feature. As it is, they didn’t, and as many bloggers noted the firm had stated they didn’t have plans to unroll this feature in the near future. What changed? I don’t know the details, but I suspect they realized that many of us who complained in the past were going to continue to complain constantly. Combined with the contrast with its competitors, like 23andMe, and I assume they realized this just wasn’t going to solve itself if they ignored it. The key here is follow up. I’ll assume “early 2013″ is no later than March 31st (the first 1/4th of the year). If AncestryDNA doesn’t have the feature out by then I’ll assume they’re not serious, and will begin trying to make sure that their deficits come up high on Google searches again.
Blogs and word of mouth matter a lot in this domain. I convinced James Miller, author of Singularity Surviving, to get his parents genotyped this weekend. Also, after more than two years of harassment a friend who works at Google finally got typed, and will be sending me his data.
John Hawks points me to a critique of NPR coverage of personal genomics. In defense of NPR they seem like Physical Review Letters in comparison to other media, such as the BBC. But I do wonder what the causality here is. Does the media lead us to the proposition that “genetics is scary”? Or is it the public which demands these stories?
Meanwhile, as some are expressing worry, technology keeps pushing forward:
A faster DNA sequencing machine and streamlined analysis of the results can diagnose genetic disorders in days rather than weeks, as reported today in Science Translational Medicine.
Up to a third of the babies admitted to neonatal intensive care units have a genetic disease. Although symptoms may be severe, the genetic cause can be hard to pin down. Thousands of genetic diseases have been described, but relatively few tests are available, and even these may detect only the most common mutations.
I got a notification today from Ian Logan that he set up a page on my genotype using a method which detects rare homozygous SNPs in the ~1 million markers I put up from my 23andMe results. My raw data is online, so anyone can analyze it. Here is the summary of my results:
The program finds about 50 ‘rare/uncommon’ SNPs from the 900,000+ tested by 23andMe.
The are no ‘homozygous-recessive’ results (surprisingly, as 1-2 might be expected).
There are a list other individuals, and sure enough most of them do have a rare recessive homozygous locus or two. I assume that ascertainment bias (the technology finding variation in Europeans better than non-Europeans in most cases) wouldn’t result in my case, because I should have less variation, not more (less variation would presumably result in more homozygous recessives). So I am thinking it may simply be that because I’m from a population with greater genetic variation (South Asians) I am less likely to yield a homozygous recessive.
I re-emphasized to John the importance to the genetic genealogy community that AncestryDNA release our genetic data to us. I mentioned that my colleagues and I were happy to discover that Ken Chahine’s statements to the Presidential Commission for the Study of Bioethical Issues in Washington D.C. on August 1st were in line with our belief that our genetic data belongs to us (video and transcript). During the second session, Dr. Chahine stated that “the customer retains ownership of their DNA and data”. However, we feel that AncestryDNA’s policies do not currently reflect this. John reiterated what I have been told before, which is that they are genuinely considering the best way to deliver this data to us. In response to my persistence, John told me that they are aware that this is important to me, but that they have to take into consideration everyone’s feedback, not just mine. As a result, giving us access to our genetic data is not at the top of their list of priorities. He explained that they read lots of feedback and do a significant number of surveys and focus groups in order to determine what is most important to their customers and, by that process, their priorities are dictated….
Slate reposts a piece from New Scientist, Do You Really Want To Know Your Baby’s Genetics? It is arranged as a series of questions which might arise from the new information. For me my frustration with this sort of discussion is rooted in reviewing old articles about “test-tube babies” in major newspapers from the 1970s and early 1980s. Today in vitro fertilization is banal and commonplace, but many of the same concerns were voiced back then which you see cropping up now in regards to personal genomics. My issue is not concern as such, but its inchoate character. It is not uncommon for me to encounter people pursuing postgraduate work in science who express the opinion that “it’s scary,” the “it” being genetic information. When further queried the fear is generally layers upon layers of formless disquiet, some confusion about the specific details, as well as a default stance toward the “precautionary principle.”
Interesting story in The New York Times, Genes Now Tell Doctors Secrets They Can’t Utter:
One of the first cases came a decade ago, just as the new age of genetics was beginning. A young woman with a strong family history of breast and ovarian cancer enrolled in a study trying to find cancer genes that, when mutated, greatly increase the risk of breast cancer. But the woman, terrified by her family history, also intended to have her breasts removed prophylactically.
Her consent form said she would not be contacted by the researchers. Consent forms are typically written this way because the purpose of such studies is not to provide medical care but to gain new insights. The researchers are not the patients’ doctors.
But in this case, the researchers happened to know about the woman’s plan, and they also knew that their study indicated that she did not have her family’s breast cancer gene. They were horrified.
“We couldn’t sit back and let this woman have her healthy breasts cut off,” said Barbara B. Biesecker, the director of the genetic counseling program at the National Human Genome Research Institute, part of the National Institutes of Health. After consulting the university’s lawyer and ethics committee, the researchers decided they had to breach the consent stipulations and offer the results to the young woman and anyone else in her family who wanted to know if they were likely to have the gene mutation discovered in the study. The entire family — about a dozen people — wanted to know. One by one, they went into a room to be told their result.
“It was a heavy and intense experience,” Dr. Biesecker recalled.
Around the same time, Dr. Gail Jarvik, now a professor of medicine and genome science at the University of Washington, had a similar experience. But her story had a very different ending.
She was an investigator in a study of genes unrelated to breast cancer when the study researchers noticed that members of one family had a breast cancer gene. But because the consent form, which was not from the University of Washington, said no results would be returned, the investigators never told them, arguing that their hands were tied. The researchers said an ethics board — not they — made the rules.
Dr. Jarvik argued that they should have tried to persuade the ethics board. But, she said, “I did not hold sway.”
By now you have probably read in The New York Times, or on the blogs, about the new paper in Nature which reports on the empirical trend toward the children of older fathers carrying more de novo mutations. Really all you need is this figure:
It’s easy to see genomic data regulation in romantic narrative terms — The plucky little guys who want to be free! The big, bad institutions who want to control them! — and it’s also a trap. Interpreting genomic information in a medically useful way is very, very complicated. It’s easy to do badly — and people may make life-altering decisions based on bad information.
Gene-testing companies already have a track record of offering tests unsupported by unsupported by clinical evidence, such as CYP450 testing to determine antidepressant dosage. A let-the-market-regulate-itself, buyer-beware approach isn’t any more desirable than it would be for new drugs.
We’re discussed this before. The shorter perspective from me is that on principle I don’t object to regulation, but when viewed across the constellation of things which our government regulates, I don’t see the case for direct-to-consumer genomic services being monitored closely. A result from 23andMe will not kill you, though it may lead to a sequence of actions which may kill you. But this is unfortunately a problem with the whole diet industry, which is often based on unsupported fads and fashions, and has a much larger social impact. Nutrition is very complicated with incredible real life consequences, and yet regulating it would frankly be a fool’s errand. You may destroy the American diet publishing industry, but you can’t prevent internet message boards. Similarly, the SNP-chip results themselves are commodities, and with client and server analytic software proliferating in the next few years the reality is that the market will regulate itself! And unfortunately, the impact on peoples’ lives will be the same, for good or bad, as the diet industry.
Analysis of cell-free fetal DNA in maternal plasma holds promise for the development of noninvasive prenatal genetic diagnostics. Previous studies have been restricted to detection of fetal trisomies, to specific paternally inherited mutations, or to genotyping common polymorphisms using material obtained invasively, for example, through chorionic villus sampling. Here, we combine genome sequencing of two parents, genome-wide maternal haplotyping, and deep sequencing of maternal plasma DNA to noninvasively determine the genome sequence of a human fetus at 18.5 weeks of gestation. Inheritance was predicted at 2.8 × 106 parental heterozygous sites with 98.1% accuracy. Furthermore, 39 of 44 de novo point mutations in the fetal genome were detected, albeit with limited specificity. Subsampling these data and analyzing a second family trio by the same approach indicate that parental haplotype blocks of ~300 kilo–base pairs combined with shallow sequencing of maternal plasma DNA is sufficient to substantially determine the inherited complement of a fetal genome. However, ultradeep sequencing of maternal plasma DNA is necessary for the practical detection of fetal de novo mutations genome-wide. Although technical and analytical challenges remain, we anticipate that noninvasive analysis of inherited variation and de novo mutations in fetal genomes will facilitate prenatal diagnosis of both recessive and dominant Mendelian disorders.
Here’s the last paragraph:
As a follow up to my post below on the thick coverage of European information in genealogical and genomic databases, here are the “Ancestry Finder” matches from 23andMe for my daughter using the default settings:
If I increase sensitivity India does come up, at 0.1%, second to last in a very long list of European nations. I’m pointing this peculiarity out because my daughter is 50 percent South Asian, but this element of her ancestry doesn’t find many matches because there aren’t many people out there in the database to match. In contrast, because she is 1/8th Norwegian (her great-great grandparents were immigrants from the Olso area; thanks Ancestry.com!) this “block” jumps out, and aligns up with many people in their database.
This isn’t just an exceptional case. Here’s the result for a friend who is 50 percent East Asian (Chinese) and 50 percent American white:
The old warning rears its ugly head: the tool is just a tool, and must be used with and understanding of what it can and can’t do. If you decrease sensitivity many South Asians actually match people from European nations before they do people from India. Why? Part of it is probably that many South Asian groups are highly endogamous, which dampens intra-South Asian segment sharing. And the other part is that the sample size of Europeans is so large that random matches with this population are just as, or more, likely than genuine matches with the smaller number of South Asians.
I follow CeCe Moore’s blog posts on scientific genealogy pretty closely. But it’s more because of my interest in personal genomics broadly, rather than scientific genealogy as such. My own knowledge of my family’s past beyond the level of grandparents is very sketchy. This despite the fact that I know I have two very well documented lines of ancestry which I could follow up on, my paternal lineage, and the paternal lineage of my mother’s maternal grandfather. I don’t have a great interest in this beyond the barest generalities, and my parents tend to have a rather disinterested stance as well. Why? I can’t help but wonder if part of the issue is that unlike many South Asians my family has a relatively diverse background, so it isn’t as if we are sustained by a coherent self-identity as members of a sub-ethnicity (Bengalis are not tribal, so lineage groups are more ad hoc and informal). Additionally, there is probably some self-selection in the type of personalities who would transplant themselves across continents and are willing to spend the majority of their lives in a nation not of their birth.