Image credit: Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings
I really love the paper Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings. I first read it about six years ago. The result is rather straightforward, but the problem is empirically a moderately deep one. Modern analytic genetics as the fusion between Mendelism and biometrics began with R. A. Fisher’s The Correlation between Relatives on the Supposition of Mendelian Inheritance in 1918. But note, that paper assumed particular relatedness between relatives. As highlighted in the above paper the expected values for most categories of relatedness always had a variance component which was unaccounted for, and so reduced the power of the methodology to ascertain the extent of heritability. The relatedness you can expect between any two siblings is ~0.50, and that is also the average across all siblings. But the reality is that in most cases two given siblings will not share half their genes identical by descent.
Interesting story in The San Jose Mercury News, Open-source science helps San Carlos father’s genetic quest:
“We used materials that are public, freely available,” said Rienhoff, a physician and scientist, as Beatrice frolicked nearby. “And everything we’ve learned we’ve put back out there, in the public domain. It’s for the patient’s good, and the public good.”
Born with small, weak muscles, long feet and curled fingers, Beatrice confounded all the experts.
No one else in her family had such a syndrome. In fact, apparently no one else in the world did either.
Rienhoff — a biotech consultant trained in math, medicine and genetics at Harvard, Johns Hopkins and the Fred Hutchinson Cancer Research Center in Seattle — launched a search.
He combed the publicly available medical literature, researching diseases, while jotting down each new clue or theory. Because her ailment is so rare, he knew no big labs or advocacy groups would be interested.
I noticed today that GEDmatch is trying to raise funds to cover the cost of their web services. What are those services? Basically if you get raw data back from direct-to-consumer genotyping firms GEDmatch allows you to run further analytics. You can do some of these yourself…but most people aren’t going to know how to convert their files into pedigree format and use PLINK. As genotype data becomes more and more common there’ll be more need for analytic services like GEDmatch, whether for profit or for fun.
One can imagine a near future where much of the work can be offloaded to desktop applications (e.g., Promethease does this for traits and diseases). But the problem is that there is greater returns to the analysis when you aggregate the source data into huge agglomerations, and if people are doing their own analyses on their own systems that’s not going to happen (the GWAS information in Promethease uses aggregated information implicitly with the studies they rely on). This is why GEDmatch and openSNP are important.
In any case, if you have used GEDmatch and wish to give, this would be a good time.
AncestryDNA believes that our customers have the right to their own genetic data. It is your DNA, after all. So we’re working to provide access to your raw DNA data in early 2013, which includes related security enhancements to ensure its safety during every step of the process. Moving forward, we plan to add even more tools and improvements for our customers, and any new features will be available to all AncestryDNA members.
If the rights of the customers to own their own data were so important to them they should have front-loaded this feature. As it is, they didn’t, and as many bloggers noted the firm had stated they didn’t have plans to unroll this feature in the near future. What changed? I don’t know the details, but I suspect they realized that many of us who complained in the past were going to continue to complain constantly. Combined with the contrast with its competitors, like 23andMe, and I assume they realized this just wasn’t going to solve itself if they ignored it. The key here is follow up. I’ll assume “early 2013″ is no later than March 31st (the first 1/4th of the year). If AncestryDNA doesn’t have the feature out by then I’ll assume they’re not serious, and will begin trying to make sure that their deficits come up high on Google searches again.
Blogs and word of mouth matter a lot in this domain. I convinced James Miller, author of Singularity Surviving, to get his parents genotyped this weekend. Also, after more than two years of harassment a friend who works at Google finally got typed, and will be sending me his data.
John Hawks points me to a critique of NPR coverage of personal genomics. In defense of NPR they seem like Physical Review Letters in comparison to other media, such as the BBC. But I do wonder what the causality here is. Does the media lead us to the proposition that “genetics is scary”? Or is it the public which demands these stories?
Meanwhile, as some are expressing worry, technology keeps pushing forward:
A faster DNA sequencing machine and streamlined analysis of the results can diagnose genetic disorders in days rather than weeks, as reported today in Science Translational Medicine.
Up to a third of the babies admitted to neonatal intensive care units have a genetic disease. Although symptoms may be severe, the genetic cause can be hard to pin down. Thousands of genetic diseases have been described, but relatively few tests are available, and even these may detect only the most common mutations.
I got a notification today from Ian Logan that he set up a page on my genotype using a method which detects rare homozygous SNPs in the ~1 million markers I put up from my 23andMe results. My raw data is online, so anyone can analyze it. Here is the summary of my results:
The program finds about 50 ‘rare/uncommon’ SNPs from the 900,000+ tested by 23andMe.
The are no ‘homozygous-recessive’ results (surprisingly, as 1-2 might be expected).
There are a list other individuals, and sure enough most of them do have a rare recessive homozygous locus or two. I assume that ascertainment bias (the technology finding variation in Europeans better than non-Europeans in most cases) wouldn’t result in my case, because I should have less variation, not more (less variation would presumably result in more homozygous recessives). So I am thinking it may simply be that because I’m from a population with greater genetic variation (South Asians) I am less likely to yield a homozygous recessive.
I re-emphasized to John the importance to the genetic genealogy community that AncestryDNA release our genetic data to us. I mentioned that my colleagues and I were happy to discover that Ken Chahine’s statements to the Presidential Commission for the Study of Bioethical Issues in Washington D.C. on August 1st were in line with our belief that our genetic data belongs to us (video and transcript). During the second session, Dr. Chahine stated that “the customer retains ownership of their DNA and data”. However, we feel that AncestryDNA’s policies do not currently reflect this. John reiterated what I have been told before, which is that they are genuinely considering the best way to deliver this data to us. In response to my persistence, John told me that they are aware that this is important to me, but that they have to take into consideration everyone’s feedback, not just mine. As a result, giving us access to our genetic data is not at the top of their list of priorities. He explained that they read lots of feedback and do a significant number of surveys and focus groups in order to determine what is most important to their customers and, by that process, their priorities are dictated….
Slate reposts a piece from New Scientist, Do You Really Want To Know Your Baby’s Genetics? It is arranged as a series of questions which might arise from the new information. For me my frustration with this sort of discussion is rooted in reviewing old articles about “test-tube babies” in major newspapers from the 1970s and early 1980s. Today in vitro fertilization is banal and commonplace, but many of the same concerns were voiced back then which you see cropping up now in regards to personal genomics. My issue is not concern as such, but its inchoate character. It is not uncommon for me to encounter people pursuing postgraduate work in science who express the opinion that “it’s scary,” the “it” being genetic information. When further queried the fear is generally layers upon layers of formless disquiet, some confusion about the specific details, as well as a default stance toward the “precautionary principle.”
Interesting story in The New York Times, Genes Now Tell Doctors Secrets They Can’t Utter:
One of the first cases came a decade ago, just as the new age of genetics was beginning. A young woman with a strong family history of breast and ovarian cancer enrolled in a study trying to find cancer genes that, when mutated, greatly increase the risk of breast cancer. But the woman, terrified by her family history, also intended to have her breasts removed prophylactically.
Her consent form said she would not be contacted by the researchers. Consent forms are typically written this way because the purpose of such studies is not to provide medical care but to gain new insights. The researchers are not the patients’ doctors.
But in this case, the researchers happened to know about the woman’s plan, and they also knew that their study indicated that she did not have her family’s breast cancer gene. They were horrified.
“We couldn’t sit back and let this woman have her healthy breasts cut off,” said Barbara B. Biesecker, the director of the genetic counseling program at the National Human Genome Research Institute, part of the National Institutes of Health. After consulting the university’s lawyer and ethics committee, the researchers decided they had to breach the consent stipulations and offer the results to the young woman and anyone else in her family who wanted to know if they were likely to have the gene mutation discovered in the study. The entire family — about a dozen people — wanted to know. One by one, they went into a room to be told their result.
“It was a heavy and intense experience,” Dr. Biesecker recalled.
Around the same time, Dr. Gail Jarvik, now a professor of medicine and genome science at the University of Washington, had a similar experience. But her story had a very different ending.
She was an investigator in a study of genes unrelated to breast cancer when the study researchers noticed that members of one family had a breast cancer gene. But because the consent form, which was not from the University of Washington, said no results would be returned, the investigators never told them, arguing that their hands were tied. The researchers said an ethics board — not they — made the rules.
Dr. Jarvik argued that they should have tried to persuade the ethics board. But, she said, “I did not hold sway.”
By now you have probably read in The New York Times, or on the blogs, about the new paper in Nature which reports on the empirical trend toward the children of older fathers carrying more de novo mutations. Really all you need is this figure:
It’s easy to see genomic data regulation in romantic narrative terms — The plucky little guys who want to be free! The big, bad institutions who want to control them! — and it’s also a trap. Interpreting genomic information in a medically useful way is very, very complicated. It’s easy to do badly — and people may make life-altering decisions based on bad information.
Gene-testing companies already have a track record of offering tests unsupported by unsupported by clinical evidence, such as CYP450 testing to determine antidepressant dosage. A let-the-market-regulate-itself, buyer-beware approach isn’t any more desirable than it would be for new drugs.
We’re discussed this before. The shorter perspective from me is that on principle I don’t object to regulation, but when viewed across the constellation of things which our government regulates, I don’t see the case for direct-to-consumer genomic services being monitored closely. A result from 23andMe will not kill you, though it may lead to a sequence of actions which may kill you. But this is unfortunately a problem with the whole diet industry, which is often based on unsupported fads and fashions, and has a much larger social impact. Nutrition is very complicated with incredible real life consequences, and yet regulating it would frankly be a fool’s errand. You may destroy the American diet publishing industry, but you can’t prevent internet message boards. Similarly, the SNP-chip results themselves are commodities, and with client and server analytic software proliferating in the next few years the reality is that the market will regulate itself! And unfortunately, the impact on peoples’ lives will be the same, for good or bad, as the diet industry.
Analysis of cell-free fetal DNA in maternal plasma holds promise for the development of noninvasive prenatal genetic diagnostics. Previous studies have been restricted to detection of fetal trisomies, to specific paternally inherited mutations, or to genotyping common polymorphisms using material obtained invasively, for example, through chorionic villus sampling. Here, we combine genome sequencing of two parents, genome-wide maternal haplotyping, and deep sequencing of maternal plasma DNA to noninvasively determine the genome sequence of a human fetus at 18.5 weeks of gestation. Inheritance was predicted at 2.8 × 106 parental heterozygous sites with 98.1% accuracy. Furthermore, 39 of 44 de novo point mutations in the fetal genome were detected, albeit with limited specificity. Subsampling these data and analyzing a second family trio by the same approach indicate that parental haplotype blocks of ~300 kilo–base pairs combined with shallow sequencing of maternal plasma DNA is sufficient to substantially determine the inherited complement of a fetal genome. However, ultradeep sequencing of maternal plasma DNA is necessary for the practical detection of fetal de novo mutations genome-wide. Although technical and analytical challenges remain, we anticipate that noninvasive analysis of inherited variation and de novo mutations in fetal genomes will facilitate prenatal diagnosis of both recessive and dominant Mendelian disorders.
Here’s the last paragraph:
As a follow up to my post below on the thick coverage of European information in genealogical and genomic databases, here are the “Ancestry Finder” matches from 23andMe for my daughter using the default settings:
If I increase sensitivity India does come up, at 0.1%, second to last in a very long list of European nations. I’m pointing this peculiarity out because my daughter is 50 percent South Asian, but this element of her ancestry doesn’t find many matches because there aren’t many people out there in the database to match. In contrast, because she is 1/8th Norwegian (her great-great grandparents were immigrants from the Olso area; thanks Ancestry.com!) this “block” jumps out, and aligns up with many people in their database.
This isn’t just an exceptional case. Here’s the result for a friend who is 50 percent East Asian (Chinese) and 50 percent American white:
The old warning rears its ugly head: the tool is just a tool, and must be used with and understanding of what it can and can’t do. If you decrease sensitivity many South Asians actually match people from European nations before they do people from India. Why? Part of it is probably that many South Asian groups are highly endogamous, which dampens intra-South Asian segment sharing. And the other part is that the sample size of Europeans is so large that random matches with this population are just as, or more, likely than genuine matches with the smaller number of South Asians.
I follow CeCe Moore’s blog posts on scientific genealogy pretty closely. But it’s more because of my interest in personal genomics broadly, rather than scientific genealogy as such. My own knowledge of my family’s past beyond the level of grandparents is very sketchy. This despite the fact that I know I have two very well documented lines of ancestry which I could follow up on, my paternal lineage, and the paternal lineage of my mother’s maternal grandfather. I don’t have a great interest in this beyond the barest generalities, and my parents tend to have a rather disinterested stance as well. Why? I can’t help but wonder if part of the issue is that unlike many South Asians my family has a relatively diverse background, so it isn’t as if we are sustained by a coherent self-identity as members of a sub-ethnicity (Bengalis are not tribal, so lineage groups are more ad hoc and informal). Additionally, there is probably some self-selection in the type of personalities who would transplant themselves across continents and are willing to spend the majority of their lives in a nation not of their birth.
To test this, I’ll track how genes affect attitudes during the 2012 US Presidential election by running several surveys of twins. Why twins? Well, there are two kinds of twins: identical twins (called monozygotic, or MZ) and fraternal twins (called dizygotic, or DZ). MZ twins share 100% of their DNA, but DZ twins share only about 50% of their DNA just like normal siblings. Every twin is born around the same time as his or her co-twin, so each pair of twins shares a common upbringing. If politics is mostly about upbringing (as traditional theories would have us believe), then fraternal (DZ) twins should be just as similar on average as identical (MZ) twins. But if genes do play a role in political attitudes alongside upbringing, then DZ twins should be less similar to each other than MZ twins, since MZ twins share more of their genes. So by tracking attitude changes during the election, if the attitudes of identical twins change together more than the attitudes of fraternal twins, this would suggest that genes play a role in political attitude change.
Secod, Genomes Unzipped put up a very complimentary review of openSNP. I just went in and added a bunch of phenotypes for me. I’d say openSNP is one of those attempts to bridge the space between the type of people who find 23andMe a bit overwhelming, and those who are comfortable using plink and phasing their genotypes.
The Awl had a rather unoriginal piece up recently, Everything I Didn’t Learn From Taking A Personal Genome Test (this is part of a genre which will probably crest in the next few years, before widespread genotyping becomes common, demystifying the whole enterprise). Misha Angrist has a pretty levelheaded response. There are two things I would like to emphasize:
1) A non-trivial minority of people do receive actionable information from personal genomic results. By and large I am skeptical of individual risk prediction, and I communicate that skepticism to friends. But in one case a friend ended up with a large effect macular degeneration mutation. Before he had signed up for testing I told him to sleep through the risk prediction part. I don’t do that now. Chances are there won’t be any surprises. But some serious information will be received by 1 in 10 to 1 in 100.
2) The “recreational” part having to do with stuff like ancestry inference is actually pretty robust. You could, for example, market an analytic and visualization which shows how closely related you are to near relatives. This isn’t going to be earth-shattering, but I do think that there’s a lot more fun angles out there that are there for the taking. A more professional version of GEDmatch.
When Lo licensed his technology to Sequenom, he stipulated that it could not be used for sex selection. Rabinowitz says Natera won’t test for sex at this point, either. But how long such provisions will hold is unclear. Meanwhile, NIPD’s reach is expanding as the technology used to analyze cffDNA improves. In December 2010, Lo published a paper in Science Translational Medicine showing that in principle, at least, scientists can piece together the entire fetal genome from cffDNA. Lo says that exceeded even his own expectations: “If you asked me prior to 2008, I would have probably said that was science fiction.”
At the time his paper was published, the process cost $200,000. Now, with the cost of DNA sequencing dropping faster than that of computing power, he estimates the bill may come to one-tenth of that—still expensive, but no doubt tempting for some parents. Lo wagers complete fetal genome testing might be widely available in a clinical setting within a decade. What fetal genes might one day suggest about a baby’s eye color, appearance, and intellectual ability will be useful to parents, not insurers. But with costs coming down and insurers interested in other aspects of the fetal genome, a Gattaca-like two-tiered society, in which parents with good access to health care produce flawless, carefully selected offspring and the rest of us spawn naturals, seems increasingly plausible.
First, it’s rather crazy that as we live and breathe it is on the order of $20,000 to get a genome of your unborn children! I say on the order because no one knows, and I assume that they’re being optimistic here for media consumption. We plan to get screening for karyotype scale issues for our next child, so I keep track of this area with some interest.
All that being said, without pre-implantation genetic diagnosis it’s going to be very unlikely that you will get the “perfect child,” barring gene therapy. I may be unimaginative, but I can’t see the actionable use of a relatively dense genotype, let alone a full genome, at this stage once you eliminate the risks of very problematic diseases. I suppose at this point I can divulge that I tried to get my daughter’s genetic material from a c.v.s., so she could get typed while she was in utero, but that was mostly for the “wow!” factor (for what it’s worth, it’s really hard to get genetic material back from large biomedical firms).
Finally, I don’t find the beating-around-the-bush about “trick ethical questions” that is par for the course of these sorts of pieces useful. The reality is that most of the public finds this aspect of personal genomics “scary.” You don’t need to genuflect to it, just accept it as a given. Rather, lay out the issues in explicit detail, and let the people make their own judgement.
As I have indicated before, my daughter has a family tree where everyone out to 0.25 coefficient of relatedness has been genotyped by 23andMe. This is convenient in many ways. Before, relatedness was a theory. Now relatedness can be ascertained on the genomic level. Sometimes this can lead to peculiar consequences. “On paper” my daughter is 1/8 Scandinavian. Or 12.5%. But truly the expected value is 13.5%! (weighting by contributions from each maternal grandparent). Still, this remains an expected value. I would need a large sample of Scandinavians from that locale to make a truly precise guess as to the genetic contribution. Similarly, though I come in at about ~15 percent East Asian, my daughter looks to be a bit more East Asian than you’d expect based on that value (i.e., closer to 8-8.5 percent; I run her genotype more than a dozen times now). This may be a bias in the methodology, or, more likely it is simply the sampling error from my genome (I contributed more East Asian segments in the chromosomes passed down).
In any case, 23andMe has a “family inheritance” feature which is very convenient. It illustrates visually chromosome by chromosome the extent to which two individuals match genomic segments. Presumably this is useful for those who are distant cousins, who may match on a segment here and there. Instead of just focusing on one base pair, A/C/G/T, the method looks at the correlations of bases across a sequence of the chromosome. Below are the visualizations for matches of each individual with my daughter, in sequence: father, mother, paternal grandfather, paternal grandmother, maternal grandfather, maternal grandmother, paternal uncle, paternal uncle, paternal aunt, and maternal uncle. And no, I don’t know why it has an XY in the plots. For those of you without a biological background I hope that this can help in getting across how Mendelism manifests in a concrete manner. And if you do have a biological background, you can infer from these matches other interesting information about the meiotic process.
A week ago I reported that according to 23andMe I’m 40% Asian, and she is 8% Asian (in the future if I say “she” without explanation, you know of whom I speak). Obviously something is off here. The situation resolved itself when I tuned my parameters and increased my sampled populations in Interpretome. By now I’ve already done the estimates of recombination on the chromosomes which came together to produce her, and the realized value of 8 percent instead of 20 percent “Asian” simply can not be due to a particular set of unlikely crossing over events. From what I can gather it seems like ancestry painting should be viewed as a qualitative rather than a quantitative assessment. This sounds really strange when you are given percentages, but the results are strange, and obviously wrong too often in terms of the specific values.
Here’s an admixture plot which shows more realistically informative values:
Michelle tipped me off to 23andMe’s new initiative to get Parkison’s disease sufferers genotyped. Basically, if you are a sufferer, you get the service for free. The goal presumably to increase the sample size so as to pick up new possible associations. But a question: can you think of a downside for Parkinson’s disease sufferers? A lot of people have genetic privacy concerns, but if you manifest a disease like Parkinson’s I suspect that’s the least of your worries.