Resolutions in the Indian genetic layer cake

By Razib Khan | April 23, 2011 8:54 pm

Two years ago Reconstructing Indian Genetic History reframed how we should view South Asian historical genomics. In short, Indians can be viewed as a hybrid between a West Eurasian group, “Ancestral North Indians” (ANI) and a very different group, “Ancestral South Indians” (ASI), which had distant connections to West and East Eurasians. At least to a first approximation. Last fall I posted on a new paper which surveyed the Austro-Asiatic speaking peoples of India, and concluded that they were exogenous to the subcontinent. This is an interesting point. Prehistoric treatments of South Asia often use linguistic terms to denote putative ancient populations. One model is that first it was the Munda, the most ancient Austro-Asiatics. Then the Dravidians. And finally the Indo-Aryans. These genetic data imply that the Munda arrived after the initial ANI-ASI synthesis. The Munda people of India can be thought of as ANI-ASI, with an overlay of East Eurasian ancestry.

Zack Ajmal’s K = 11 ADMIXTURE run has highlighted some further issues. He has a set of Austro-Asiatic samples, as well as a host of Indo-Aryan and Dravidian speaking populations. I now believe we can now further clarify and refine our model of the peopling of India. Here it is:

1) ASI, circa ~10,000 years BP

2) ANI enters the subcontinent from the northwest, synthesis with ASI

3) The ancestors of the Munda enter from the northeast, synthesis with ANI + ASI in their region

4) A subsequent group of West Eurasians, related to the ANI, so I will term them ANI2, enters from the northwest and overlays the ANI + ASI synthesis. In the northeast quadrant of the subcontinent this group marginalizes the Munda people, who are either assimilated or escape to more remote locations. I believe that ANI2 is likely the Indo-Europeans, but it may be Dravidians as well

5) A second group of Austro-Asiatic peoples enters from the northeast, and synthesizes with the AN2 + ANI + ASI. In some regions they are absorbed (Assam), but in other regions they are culturally dominant (Meghalaya)

Below are two plots which illustrate where I’m coming from. The “S Asian” component from K = 11 above seems to overlap, but is not identical to, ANI. The “Onge” component plays a similar role with ASI. The “SW Asian” and “European” elements are pretty straightforward. They’re very closely related to the “S Asian” one, but they do separate from it. Their relationship to distant non-Indian groups as well as a gradient toward the northwest suggests to me a more recent arrival of this element.

Two patterns. For the Indo-European and Dravidian South Asian groups you see a vertical distribution which corresponds to populations which are a combination of ANI/ASI. But notice the perpendicular distribution of the Austro-Asiatic groups. The East Eurasian element to their ancestry means that they are not fully modeled by the two-way admixture. I believe that the the “Onge” fraction, which tracks ASI, is overestimating ASI in the Austro-Asiatic because the this proportion just seems way too high in many Southeast Asian and Dai groups to be plausible to me as a prefect proxy for ASI in them. But in any case, note that the Austro-Asiatic groups seem to be mostly a mix of ANI/ASI like other South Asians. There is clearly one outlier population. I’ll get to them.

Below is a plot which shows the ratio of the sum of AN2 over the stabilized hybrid proportion.

We know from Reconstructing Indian Genetic History that South Indian tribals and Dalits have a fair amount of West Eurasian ANI. But, from the genome bloggers, and especially Zack’s further analyses, we can see that there is a further component of West Eurasian ancestry which is probably not ANI, but post-dates it. These components have affinities to Southwest Asia or Central Eurasia. They’re labeled “SW Asian” and “European” in Zack’s K = 11. Here’s the big thing you notice: this element increases southeast-northwest, and low caste to high caste. It’s almost absent among many Dravidian populations. It is very common in the northwest of the subcontinent.

Again, except for that one outlier, the Austro-Asiatic groups almost totally lack AN2, just like some Dravidian tribals. On the other hand, even the most AN2 groups in South Asia clearly have some ASI and ANI. But having ASI and ANI does not guarantee AN2. The East Eurasian component found in the Austro-Asiatics seems constrained to the northeast of the subcontinent by and large. Finally, we have the outlier Austro-Asiatic group.

These are the Khasi. They are are not Munda, and seem to have closer relationships to other East Eurasian populations. They also have a small, but noticeable AN2 component. What’s going on? I believe that the Khasi arrived in northeast India after those who brought AN2 had already marginalized the Munda. Some of the Khasi were probably assimilated into the post-Munda (Indo-European or Dravidian speaking) peasantry. But some of the Khasi maintained their identity in the highlands, where they also intermarried with the post-Munda population, which had AN2. In contrast the Munda who retained their cultural identity had withdrawn and disengaged.

Here’s a table for you perusal (remember that ASI is inferred):

GroupLanguageStatusS AsianOngeE AsianSW AsianEuroSiberianASI
North KannadiDravidian
SatnamiIndo-EuropeanL Caste49%36%8%1%3%0%56%
KamsaliDravidianL Caste59%35%1%2%0%0%54%
VysyaDravidianMid Caste62%34%0%2%0%0%53%
NaiduDravidianU Caste59%32%0%4%2%1%50%
LodiIndo-EuropeanL Caste58%32%1%2%6%0%50%
VelamaDravidianU Caste60%29%0%7%2%0%46%
SrivastavaIndo-EuropeanU Caste56%28%0%4%10%0%44%
Gujaratis aIndo-European
Cochin jewsDravidian
VaishIndo-EuropeanU Caste52%24%0%6%15%0%39%
Gujaratis bIndo-European
Bene Israel JewsIndo-European
Kashmiri panditIndo-EuropeanU Caste51%18%0%12%15%2%31%

Singapore malay


CATEGORIZED UNDER: Genetics, Genomics

Comments (16)

  1. Ayesha

    Could you please upload the geological distributions of different groups on map?

  2. Ian

    What’s so cool here is that this is actual, blogger-driven research.

  3. #1, i thought of it. couldn’t find a map with all of them on it. also, the spelling of these groups is non-standard.

  4. “I believe that the the “Onge” fraction, which tracks ASI, is overestimating ASI in the Austro-Asiatic because the this proportion just seems way too high in many Southeast Asian and Dai groups to be plausible to me as a prefect proxy for ASI in them. ”

    Razib, if I understand it correctly, the data could be read in the opposite way: The high concentration of the Onge component in Munda attests to the antiquity of Munda in India, possibly predating Dravidians and Indo-Europeans. Linguistically, Ongan-Jarawan has recently been shown to be related to Austronesian, which in turn, with Austroasiatic, is part of Narrow Austric. Narrow Austric is a decent superphylum with some solid cognates behind it. So, the high concentration of the Onge component in Munda simply means that both Munda and Onge are relics of an ancient, ultimately, East Asian population that expanded massively in SEAsia and Oceania but remained largely conserved in the Indian refugium. Munda then domesticated rice independently of the South China source.

  5. #4, i don’t know anything about linguistics. but why do the the dravidian tribes lack the east asian component? if the munda are the originals, their own profile should be find in others.

  6. “why do the the dravidian tribes lack the east asian component? if the munda are the originals, their own profile should be find in others.”

    I like your reasoning but consider this. If the East Asian component was the earliest one, then it may have been associated with a different set of demographic parameters than the later, presumably Dravidian, component. The smaller effective population size of the East Asian component, which is more appropriate for Pleistocene foraging population structure, would make any gene flow from smaller and more sparsely distributed Munda to larger, more expansive Dravidians negligible and perishable. Or, alternatively, the original East Asian component in Dravidians simply had enough time to drift out. Even in the Munda the frequencies of the EDAR gene are low, which suggests that the West Asian non-EDAR component is close to edging out the East Asian one from the Munda. Notably, the Harappa dental sample – assuming that Harrapans were Dravidian-speakers – has moderate to elevated frequencies of shovel-shaped incisors (58%), which is one of the phenotypical features controlled by the EDAR gene common in East Asians. In West Eurasians, by contrast, shovel-shaped incisors get progressively rare being replaced by chiseled incisors instead. So, we may find evidence for the early East Asian component in Dravidians in some lucky places.

  7. The smaller effective population size of the East Asian component, which is more appropriate for Pleistocene foraging population structure, would make any gene flow from smaller and more sparsely distributed Munda to larger, more expansive Dravidians negligible and perishable

    this is the model i think would be most plausible of the munda represent the oldest stratum.

  8. Ayesha , i do have most almost all of those groups depicted on my maps at my site.

  9. Balaji

    The E. Asian fraction in many non-Austro-Asiatic groups appears to be significant. Here are some. Chenchu 10%, Madiga, 6%, Mala 7%, Malayan 13%, North Kannadi 7%, Paniya 14%, Satnami 13%. These are from Zack’s Reference 3 for K=5.

  10. #9, they don’t have EA edar. that’s probably false positive in terms of measuring real recent admixture. also, don’t rely on one K run.

  11. Balaji

    Yes, like the Austro-Asiatics, they probably don’t have EA EDAR and therefore the E. Asian ancestry is not too recent. The Satnami have 13% E. Asian at K=5, 14% at K=6, 15% at K=8 and 9% S.E. Asian, 2% Siberian and 2% Papuan at K=14.

  12. the austro-asiatcs do have the EA edar. other tribes do not. please check the links to my posts. and look, you need to look at the full range of K’s. you can weird results at any given K. you jumped from 6 to 14 because the east asian element disappears in the middle, right? but above 12 or so 14 the cross-validation error starts to get worse. zack should just do a supervised run to resolve it. but i believe that the austro-asiatic paper is open access now, you should check it out. if not, i can send it….

  13. Whatever the cause of the difference, simply identifying the Khasi as likely being an outlier from all of the other Austro-Asiatic language tribes is an accomplishment that has merit.

    The Khasi could quite plausible be an tribe that is the result of the fusion of roughly equal numbers of people from a Munda language speaking tribe and a Tibeto-Burmese language speaking tribe. Tibeto-Burmese language populations generally appear to be least admixed of any of South Asia’s linguistic groups and also to be the most recent arrival to the mix. And, given the general tendency of Austro-Asiatic tribes to be in the general vicinity where we would expect a Tibeto-Burmese population migrating into South Asia to land, the geography wouldn’t make this kind of fusion particularly unlikely.

    One would still have to account for the small Siberian and European components, but there are a variety of ways that these components could have introgressed into a pretty small founder population and then taken on an abnormally large percentage, assimilation of a literal handful of defecting Indo-Europeans (pre-admixture with South Asians) in a scouting or missionary group, into a couple of reproductively successful families of either of the pre-fusion populations, for example, could do the trick.

  14. skeptic

    The genetic data is indeed fascinating. I’m not up to date with all the datareasoning here, but can you clarify a few points?

    1. Where do the dates of the ANI/ASI come from? Are they just hypothetical or is there some independent reason to assign the dates?
    2. ASI is plausibly 10k BC because of the proximity to Andamanese tribes, right?
    3. ANI is presumably farmer genes, but is this a guess or some independent evidence for this?
    4. What is the evidence for ANI2? Why is this considered an overlay separate from ANI? Can this be correlated with Indo-Europeans (eg. show that it is not part of European farmer DNA, but a later migration in to Europe).

    This is just for my own education. Thanks!

  15. Balaji

    EDAR prevalence amonng the Austro-Asiatics of the mainland (Munda) is only 5%. Here are the E. Asian percentanges for the Satnami: K=5: 13%, K=6: 14%, K=7: 14%, K=8: 15%, K=9: 15%, K=10: 16%, K=11: 8%, K=12: 8%, K=13: 9%, K=14: 9%. The sudden drop that starts at K=11 is clearly related to the appearance of the Onge element!

  16. The sudden drop that starts at K=11 is clearly related to the appearance of the Onge element!

    yes, ASI automatically results in an “east asian” signal. the key is to find a K where that signal disappears and the ASI is cleanly separated from other east eurasian groups. and “only 5%” is a lot because 1) they aren’t that east asian, 2) EDAR variant in east asia is not fixed everywhere.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar