The post yesterday about the deletion which results in heart disease later in life had some interesting ancestry related material. This makes sense, the genetic maps which I post on now and then ultimately have a medical rationale behind them; eliminate population structure so that you don’t have spurious correlations confusing you when you try and get a fix on the genetic underpinnings of a disease. By example, consider a study with cases & controls, and individuals with the trait or disease have five times the likelihood of carrying a particular allele at a particular gene. But you look more closely, and you see that if you control for race this doesn’t hold, in fact you are just picking up the fact that one population has a greater propensity for the trait or disease as well as the reality that populations differ in allele frequencies on many genes. The old chestnut about correlation not equaling causation applies here. But, causation can ascertained using correlation as a precondition. Eliminating cryptic population substructure with ancestrally informative markers (AIMs) is the way you would do this, so that there’s nothing in the genetic background confounding the associations you pick up.
In any case, the supplementary material has some graphs that I thought would be of interest.
As you can see, the North Indians, Central Indians and South Indians overlap, though there are broad differences. Those differences, not surprisingly, follow geography. The genetic data here also aligns with our intuition derived from visual cues; Punjabis and Bengalis, for example, differ in appearance, but not so much that a minority of Punjabis might not be confused for a minority of Bengalis and vice versa. In contrast, a set of ethnic Swedes and south Sudanese will look so different that one can assume perfect sorting just on visual cues alone. While Swedes and Sudanese are disjoint on a host of skin color genes (e.g., SLC24A5) just as they are in complexion, the different South Asian groups overlap. Bengalis are far darker skinned on average than Punjabis, but the two groups overlap physically and genetically on a trait like skin color.
But here’s the important figure:
CEU represents the northern European sample used tin the HapMap. These are Mormons from Utah, disproportionately British Isles, Scandinavian and German. Unlike the previous maps I’ve posted which depended on hundreds of thousands of genes to construct a plot of genetic variation, this uses only 50 markers. As you can see 50 genes are all you need to perfectly separate northern Europeans from South Asians. If you had one gene, like SLC24A5, you would have a harder time. Europeans are fixed for the derived variant that results in light skin, but among South Asian populations the frequencies vary from around 50% (in the south and east) to 85% (in the north and west) for the derived variant. In other words, looking at SLC24A5 alone you could only identify a minority of South Asians in a mixed population of South Asians and Europeans (on the order of 1/4). The rest of the population would be a mix of Europeans and South Asians. But, add dozens of more genes which vary between the two populations and they separate out nice & cleanly.
The rank order of genetic affinity matches our intuition, South Asians from the north are closer to Europeans than South Asians from the south. Unfortunately for the egos of many North Indians they are closer to South Indians than they are to white people. It is sometimes difficult to explain to Punjabis and Kashmiris who pride themselves on the European appearance (especially Kashmiris, who admittedly can “pass” for the superior race quite often!) which is more considered more attractive by South Asians that though they may sometimes look more European than the typical South Asian, that’s just due to the variance on the small subset of genes which construct physical appearance. On the totality of their genes South Asians without recent European ancestry are more likely to be like other South Asians than Europeans.
Note: The supplementary information has the region and in some cases caste backgrounds for the samples. So of the Kashmiris, all are Pandits. Also, I do see that the South Asians with the lowest values on the horizontal dimension are closer to the Europeans with the highest values than they are to the South Asians with the lowest values.
Update: Some are asking about how South Asians relate to Europeans in the context of East Asians. This is not a trivial question because the slight majority of South Asian mtDNA lineages cluster with those of East as opposed to West Eurasia. But in regards to total genome content I think Li et. al. are close to the mark when they looked at the HGDP populations. The only really representative South Asian population are Sindhis, the other groups are relatively marginal or atypical. But, Sindhis are probably genetically interchangeable with the Punjabi sample which was included above, so I think I can fairly assume that the other populations wouldn’t be much further than where I approximately placed them.
Also, just a note, the outliers for Central and South Asians are the Hazaras and Makranis. These are two groups with known East Asian and African ancestry respectively.