Note: please read the the earlier post on this topic if you haven’t.
The above image is from 23andMe. It’s from a feature which seems to have been marginalized a bit with their ancestry composition. Basically it is projecting 23andMe customers on a visualization of genetic variation from the HGDP data set. This is actually a rather informative sort of representation of variation. But there has always been an issue with the 23andMe representation: you are projected onto their invariant data set. In other words, you can’t mix & match the populations so as to explore different relationships. The nature of the algorithm and representation produces strange results, so varying the population sets is often useful in smoking out the true shape of things.
With the MDS feature I wrote about yesterday you can now compute positions with different weights of populations and mixes. This post will focus on how to manipulate the overall data set. You should have PHYLO from the the earlier post. Open up the .fam file. It should look like this:
1) Privacy: Yes, this a privacy risk. 23andMe is fundamentally an IT company, and IT companies mess up. But I am confident that within 10-15 years genetic information is going to be pretty easy to get anyhow. Your data will be in too many places for any expectation of privacy.
2) Cost/worth it: That is dependent on your income. If you are willing to spend $100 on a nice meal, I think $100 for 1 million markers is an excellent proposition. The markers never depreciate, though in the near future you will you get sequence data which will supersede them.
At this point if you have spare cash why not shell out $300 for a raw copy of your genotype? (yes, I know 23andMe provides other services) I’m sure many readers spend $100 on nice meals now and then. That’s one day. Your genotype won’t ‘depreciate’ in a literal sense, and more practically until whole-genome sequencing gets affordable within the next decade (i.e., < 10 years) 1 million SNPs is a pretty good deal. And not to be morbid, but it is probably best to get older family members typed now (though if they have had hospital stays you can probably later retrieve genetic material, it will be a bureaucratic pain).
The reason I’m posting this now though is that I received a notification about a $50 discount code from 23andMe. Here it is: YHPRD7. It’s valid for the next few days. $50 isn’t trivial for most people, so perhaps it will prompt a few here to go and purchase.
23andMe has done some great things, and I highly recommend its service to friends. But I’m really glad that CeCe Moore is being consulted by them in regards to improving their ancestry feature set. Below are the “ancestry paintings” for myself & my daughter.
According to 23andMe I’m 40% Asian, and she is 8% Asian. Obviously something is off here. The situation easily resolved itself when I tuned my parameters and increased my sampled populations in Interpretome. But it just goes to show you the limits of this sort of thing without fine-grained control of the details of the analysis.
From 23andMe: “To show our appreciation and to encourage others to join in this research revolution we are giving you a $50 coupon that you can share with as many people as you like. This coupon expires in 7 days (August 9, 2011) so make sure you get the word out fast.” At current prices that works to 24% off for the yearly price ($9/month X 12 months + $99).
(this is for “new customers only”)
Second, Heather Frawley. I’ve uploaded her text file as well as pedigree format at RapidShare as a zip file. Click “Free Download” at the bottom right of the page. It’ll take about ~5 minutes to pull down the 10 MB file.
Remember, if you want to have your public genotype posting publicized or want me to upload and format it, email me at contactgnxp -at- gmail -dot- com.
According to Your Genetic Genealogist, it is:
1000 African American
5500 East Asian
3400 South Asian
4900 Southern European
6200 Ashkenazi Jewish
56,000 Northern European
1,000 First generation from two continents
I’m kind of surprised that there are so few African Americans, since the marginal return on ancestry matching technologies for the black American community is going to be higher than for other groups. If these numbers are true then I have on the order of ~10% of the 23andMe genotypes for black Americans in the African Ancestry Project. Zack Ajmal referring to the over 3,000 South Asians quips: “Now if 10-20% of them would participate in Harappa Ancestry Project!” My main concern is that if HAP gets more well known Zack will have hundreds of Tamil Brahmins sending him pretty much duplicate genotypes.
Do you share your information with others? How has your personal genetic information influenced your lifestyle and the way you approach your health and medical decisions? Can genetic information create new communities and connections?
The Social Networking and Personal Genomics Study at the Center for Biomedical Ethics invites participants between the ages of 18 and 75 to spend approximately 2 hours with us in a focus group setting. Participants must have purchased direct-to-consumer personal genetic information from 23andMe, Inc., shared their information with others, and be willing to discuss their perspectives and experiences. Focus group members will receive a $50 gift card for their participation and childcare will be available on an as-needed basis at no cost. For additional information or to enroll, please contact Simone Vernez, Project Manager, by email at email@example.com or by telephone at (650) 723- 9364. For more information on the study itself, including specific research aims and funding please visit http://bioethics.stanford.edu/research/SocialNetworkingandPersonalGenomics.html. For general information about participant rights, contact 1-866-680-2906.
Dr. Daniel MacArthur at Genomes Unzipped:
23andMe announced yesterday that it will now be releasing information on Alzheimer’s disease risk markers in the APOE gene to customers who purchased their recently upgraded v3 test. The APOE markers are famously associated with a major increase in risk for late-onset Alzheimer’s, with individuals carrying two copies of the ε4 version of the gene being around 15 times more likely than average to develop the disease. Customers who have been tested on the v3 platform will be able to able to access their APOE status after “unlocking” it; customers on earlier versions of the test will need to upgrade to get access. You can see screenshots of the unlocking and results pages here.
For a limited time, you can order a 23andMe kit for $0 up front, plus a 12-month commitment to our Personal Genome Service® at $9/month. This is down from the regular price of $199 plus $9/month.
This promotional price will be available from 12:00AM PST until 11:59PM PST on Monday 4/11/11, or while supplies last!
Update: Sale is a go right now. 5 kits per person.
Dan MacArthur points me to this nice post over at Daily Kos, Our Genome Decoded: How Companies Like 23andMe Are Advancing the Field of Personal Genomics:
…However, in the past few years several private biotech companies have started offering a “personal genome service” that involves sequencing the most variable portions of our DNA. The goals are straightforward – to give individuals information about their ancestry and inherited traits. While there are definite limitations – both technically and bioethically – to the amount and type of information that can be obtained from personal genome sequencing, in my case the service answered a lingering question about something important to me, and thus was well worth it.
In this article, I’m going to tell the story about why I chose to purchase a personal genome service, briefly explain how it works, show my interesting results, and finally, provide some commentary on how these services will impact the fields of genomics and medicine.
One step at a time. I also appreciate that Michelle keeps posting on her ADMIXTURE results.
In the very near future you may be forced to go through a “professional” to get access to your genetic information. Professionals who will be well paid to “interpret” a complex morass of statistical data which they barely comprehend. Let’s be real here: someone who regularly reads this blog (or Dr. Daniel MacArthur or Misha’s blog) knows much more about genomics than 99% of medical doctors. And yet someone reading this blog does not have the guild certification in the eyes of the government to “appropriately” understand their own genetic information. Someone reading this blog will have to pay, either out of pocket, or through insurance, someone else for access to their own information. Let me repeat: the government and professional guilds which exist to defend the financial interests of their members are proposing that they arbitrate what you can know about your genome. A friend with a background in genomics emailed me today: “If they succeed in ramming this through, then you will not be able to access your own damn genome without a doctor standing over your shoulder.” That is my fear. Is it your fear? Do you care?
In the medium term this is all irrelevant. Sequencing will be so cheap that it will be impossible for the government and well-connected self-interested parties to prevent you from gaining access to your own genetic information. Until then, they will slow progress and the potential utility of this business. Additionally, this sector will flee the United States and go offshore, where regulatory regimes are not so strict. BGI should give glowing letters of thanks to Jeffrey Shuren and the A.M.A.! This is a power play where big organizations, the government, corporations, and professional guilds, are attempting to squelch the freedom of the consumer to further their own interests, and also strangle a nascent economic sector of start-ups as a side effect.
You are so much more than your genes. So much more than that 3 billion base pairs. But they are a start, a beginning, and how dare the government question your right to know the basic genetic building blocks of who you are. This is the same government which attempted to construct a database of genetic information on foreign leaders. We know very well then who they think should have access to this data. The Very Serious People with a great deal of Power. People with “clearance,” and “expertise,” have a right to know more about your own DNA sequence than you do.
What can you do? What can we do? Can we affect change? I don’t know, I can’t predict the future. But this is what I’m going to do.
Since I know plenty of friends are getting, or just got, their V3 results, I thought I’d pass this on, Open-ended submission opportunity for 23andMe data (#2):
Who is eligible
Everyone who is of European, Asian, or North African ancestry and all four of his/her grandparents are from the same European, Asian, or North African ethnic group or the same European, Asian, or North African country.
Also, Zack has more than 30 individuals in HAP. The “cow belt” is still way underrepresented. The only Bengalis in the data set are my parents.
The Pith: In this post I examine how looking at genomic data can clarify exactly how closely related siblings really are, instead of just assuming that they’re about 50% similar. I contrast this randomness among siblings to the hard & fast deterministic nature of of parent-child inheritance. Additionally, I detail how the idealized spare concepts of genetics from 100 years ago are modified by what we now know about how genes are physically organized, and, reorganized. Finally, I explain how this clarification allows us to potentially understand with greater precision the nature of inheritance of complex traits which vary within families, and across the whole population.
Humans are diploid organisms. We have two copies of each gene, inherited from each parent (the exception here is for males, who have only one X chromosome inherited from the mother, and lack many compensatory genes on the Y chromosome inherited from the father). Our own parents have two copies of each gene, one inherited from each of their parents. Therefore, one can model a grandchild from two pairs of grandparents as a mosaic of the genes of the four ancestral grandparents. But, the relationship between grandparent and grandchild is not deterministic at any given locus. Rather, it is defined by a probability. To give a concrete example, consider an individual who has four grandparents, three of whom are Chinese, one of whom is Swedish. Imagine that the Swedish individual has blue eyes. One can assume reasonably then on the locus which controls blue vs. non-blue eye color difference one of the grandparents is homozygous for the “blue eye” allele, while the other grandparents are homozygous for the “brown eye” alleles. What is the probability that any given grandchild will carry a “blue eye” allele, and so be a heterozygote? Each individual has two “slots” at a given locus. We know that on one of those slots the individual has only the possibility of having a brown eye allele. Their probability of variation then is operative only on the other slot, inherited from the parent whom we know is a heterozygote. That parent in their turn may contribute to their offspring a blue eye allele, or a brown eye allele. So there is a 50% probability that any given grandchild will be a heterozygote, and a 50% probability that they will be a homozygote.
The above “toy” example on one locus is to illustrate that the variation that one sees among individuals is in part due to the fact that we are not a “blend” of our ancestors, but a combination of various discrete genetic elements which are recombined and synthesized from generation to generation. Each sibling then can be conceptualized as a different “experiment” or “trial,” and their differences are a function of the fact that they are distinctive and unique combinations of their ancestors’ genetic variants. That is the most general theory, without any direct reference to proximate biophysical details of inheritance. Pure Mendelian abstraction as a formal model tells us that reproductive events are discrete sampling processes. But we live in the genomic age, and as you can see above we can measure the variation in genetic relationships among siblings today in an empirical sense. The expectation, as we would expect, is 0.50, but there is variance around that expectation. It is not likely that all of your siblings are “created equal” in reference to their coefficient of genetic relationship to you.
Last week I reported that it turns out that one of my siblings carry a possible Neandertal haplotype on the dystrophin gene. To review, it seems likely that ~3% of the average non-African’s genome is derived from Neandertal populations. But by and large this ancestral quantum seems broadly dispersed through the genome of individuals, so that there isn’t a particular set of loci which are Neandertal, as such. As an analogy, about ~20-25% of the genome of an average black American is derived from Europe because of white American ancestry. But you can’t usually predict from that on which locus the “white” alleles will be found. The main exception to this will be loci where you might suspect selection will be operative, such as those implicated in malaria defense (some of them have negative consequences).
The dystrophin haplotype though has higher frequencies in some populations than expectation. ~9% in non-Africans as a whole, and higher in some groups. So there was a reasonable expectation that people might find that they carried it snooping through their genomes. Now that my parents (RF and RM) have come through, as well as sibling #2 (RS2), I can show you this:
The pith: In this post I examine the most recent results from 23andMe for my family in the context of familial and regional (Bengal) history. I also use these results to offer up a framework for the ethnognesis of the eastern Bengali people within the last 1,000 years, and their relationship to other South Asian and Southeast Asian populations.
Since I received my 23andMe results last May I’ve been blogging about it a fair amount. In a recent post I inferred that perhaps I had a recent ancestor who was an ethnic Burman or some related group. My reasoning was that this explained a pattern of elevated matches on chromosomal segments with populations from southwest China in the HGDP data set. But now we have more than my genome to go on. This week I got the first V3 chip results from a sibling. And finally, yesterday the results from my parents came in. One thing that I immediately found interesting was my father’s mtDNA haplogroup assignment, G1a2. This came from his maternal grandmother, and as you can see it has a distribution which is mostly outside of South Asia. In case you care, I asked my father her background, and like my patrilineage she was a “Khan,” though an unrelated one (“Khan” is just an honorific). I received these results before the total genome assessment, and so initially assumed this confirmed my hunch that my father had some unknown recent ancestry of “eastern” provenance. But it turns out my hunch is probably wrong. In fact, my parents have about the same “eastern” proportion, with my mother slightly more! My expectation was that perhaps my mother would be around 25-30% “Asian,” and my father above 50%. The reality turns out that my father is 38%, and my mother 40%.
Image credit: f_mafra
Below are the “Ancestry Paintings” generated by 23andMe for my family (so far). What you see are the 22 non-sex chromosomes, which have two copies each, and assignments to “Asian,” “European,” and “African,” ancestry groups. The reference populations to generate these assignments come from the HapMap, the northern European sample of white Americans from Utah, Chinese from Beijing, Japanese from Tokyo, and ethnic Yoruba from Nigeria. What the assignment to one of these classes denotes is that that region of the genome is closest to that category in identity. It does not imply that your recent ancestry is European or Asian (African is probably a different matter, but there are many complaints about the results for African Americans and East Africans in the 23andMe forums). This caveat is especially important for South Asians, because we generally find that we’re ~75% European and ~25% Asian. All that means is that though most of our genetic affinity is with Europeans, a smaller fraction seems to resemble Asians more. Via “gene sharing” on 23andMe I can see that the Asian fraction varies from ~35% in South India and Sri Lanka, to ~10% in Pakistan and Punjab. This is not because South Indians have more East Asian ancestry than Punjabis. Rather, to a great extent the South Asian genome can be decomposed into two ancestral elements, one with a distant, but closer, affinity to populations of eastern Eurasia, and one with a close affinity to populations of western Eurasia. What some have termed “Ancient South Indians” (ASI) and “Ancient North Indians” (ANI). ASI ancestry, which is probably just a touch under 50% in South Asians overall, seems to shake out then as somewhat more Asian than European.* The fraction of ASI increases as one moves south and east in South Asia (and as one moves down the caste status ladder).
There is pretty much a 100% probability that I carry Neandertal origin genes, since I’m Eurasian. That being said, I hadn’t looked too closely into the matter in regards to my own genome, because the whole “which SNPs are Neandertal” issue has been pretty dicey. But after the “Neandertal dystrophin” paper sniffing for whether you carry a specific Neandertal haplotype got a whole lot easier. The authors provided the markers and their associated haplotypes within the paper. So if the B006 haplotye is Neandertal, by looking at your markers in 23andMe through the browse raw data feature you can figure out what your lineage is, and see if you are indeed “Neandertal” on that locus. Since it’s on the X chromosome, males will carry only one copy of the gene. On the other hand, if you’re a woman you’ll have two copies, so ascertaining what specific combination of markers you have spanning a particular genomic segment can be more difficult (the results are not “phased,” so you don’t know if the allele is from the mother or father on any given genotype). But inferring the sequence of markers on a strand of DNA is much easier if you have relatives to compare with.
As you know the results for my first sibling came back earlier this week. I decided to look at which haplotypes we carried. Below the fold are the SNPs (the links will take you to 23andMe, so if you are logged into your account it will take you to where you need to go):
I have noted a few times that one thing you have to be careful about in two dimensional plots which show genetic variance is that the dimensions in which the data are projected upon are often generated from the data itself. So adding more data can change the spatial relationships of previous data points. Additionally, in 23andMe’s global similarity advanced plot you are projected onto the dimensions generated from the HGDP data set. There are some practical reasons for this. First, it’s computationally intensive to recalculate components of variance every time someone is added to the data set. Second, it isn’t as if the ethnic identity of any given individual is validated. What would you do if an alien sent in a kit and spuriously put “French” as their ancestry?
So, in reply to this comment: “Let me rephrase: is there any difference when you switch to the world-wide plot? I imagine not, or you would’ve mentioned it.” Actually, there is a slight difference. Below on the right you have a “world view,” with my position being marked with green, and on the left a “zoom in” for Central/South Asia in the HGDP data set.