Strange genetic variation in South Asia

By Razib Khan | August 6, 2010 1:11 am

Dienekes has a post up where he highlights the fact that the recent paper on South Asian metabolic diseases has a figure which elucidates population structure within the region. Accounting for structure is important for genome-wide associations since you might get a spurious correlations if trait value/disease frequency is simply tracking cryptic population variation. Dienekes says:

The existence of two clusters is kind of obvious, while their interpretation is not as dots of the same color appear in both clusters: a placement of these individuals in a global context might have been useful here. Things are clearer at the top cluster which shows a clear gradient anchored by Punjabi Sikh and Hindu Tamils on either end.

Also of interest is the group of isolated Muslim/Christian individuals on the left which deviate strongly from the mainstream; these probably represent exogenous elements that don’t resembe the bulk of the Indian population.

The second issue is easily addressed. The Christian outliers are both give English as their native language. That suggests to me that they’re Anglo-Indian, a community of mixed South Asian and European origin. South Asian Muslims are overwhelmingly of indigenous origin. But, a minority of the Muslim elite are West Asian, or have substantial West Asian ancestry, as is evident by the fact that they look white. Benazir Bhutto’s mother was of Kurdish and Persian ethnic background (her family was from Esfahan in Iran). I’ve reedited the religious & linguistic PC plots to fit onto the screen.


So what’s going on with the cluster which extends along the second principal component? The first component is probably just a European/West Asian-South Asian axis of variation. But I don’t understand where the variation for the second is coming from. Observe that the one South Indian group, Tamil speakers, are not represented in the secondary cluster. The plot reminded me of something I saw last fall.

Below is figure S4 is from the supplements of Reconstructing Indian population history. I added some labels. The Indian cluster is tight when the genetic variation includes non-Indian groups. But, when you constrain the variation to Europeans and South Asians only, something strange happens:

The Gujarati sample is from Houston, and is from HapMap Phase 3. I have a suspicion that the secondary cluster among the Gujaratis here is of the same class of phenomenon as the secondary cluster in the first plot. The Anglo-Indians and West Asian Muslims serve as rough proxies for Europeans, and you have an expected European-South Asian axis. But you also have this strange orthogonal component. I had assumed that the plot from the Reich et al. paper was an anomaly, but I’m not so sure seeing the second paper.

CATEGORIZED UNDER: Genetics, Genomics
  • Pingback: Tweets that mention Strange genetic variation in South Asia | Gene Expression | Discover Magazine --

  • Thorfinn

    What about a Saka/Scythian origin? That would explain why they are concentrated in North/Western groups, and lie orthogonal to the main axis. Reich’s speculation–that they are Huns–also seems reasonable.

  • Razib Khan

    scythians were iranian speaking. they should be alone the main axis, not orthogonal. the first figure has muslims and i think a parsi or two who are probably good substitutes for iranians. so perhaps huns with mongolian ancestry? but that doesn’t show up on structure which infer ancestry components (in fact, northeast indians have that).

    two thoughts

    1) this only shows up when you constrain the variation which you’re using to extract dimensionalities. so it’s something weird specific to south asia i think, and interpreting the PCA in a global context may mislead

    2) the kalash are really strange in their positioning in south asia because they’ve very inbred/genetically isolated. it could be the admixture of some group like this in the north/west of india, who arrived with the sakas, etc.

  • pconroy

    What if the mysterious group are admixed with Malays? I’ve always wondered where the settlers that arrived in Madagascar stopped off on the way.

    Another possibility is Siddi admixture – after all Gujarat state is the main population center??

  • Razib Khan

    Another possibility is Siddi admixture – after all Gujarat state is the main population center??

    no. the makranis have african admixture, and it is obvious in them (they’re more toward african than any other south asians). this doesn’t show up on the plots with african populations. in fact, it doesn’t show up with east asians either. i think it’s something local to south asia, perhaps due to extreme inbreeding.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar