By Razib Khan | January 29, 2013

Most people in South Asia speak one of two varieties of language, Indo-Aryan and Dravidian. These two are not particularly closely related. Indo-Aryan is an Indo-European language, as is evident in the plethora of obvious cognates with other Indo-European dialects. I have a minimal fluency in Bengali, the easternmost of the Indo-European languages, and quite a bit more fluency with English, one of the most westernmost, and it was evident to me rather early on (e.g., grass vs. gash, man vs. manush, nose vs. nak). In contrast to me Dravidian languages are peculiar because the accent and cadence are clearly South Asian, but they are utterly impenetrable (though there are many loan words into Indo-Aryan from Dravidian).

But in this post I’m going to explore the genetic relationships of the people who speak a subgroup of Austro-Asiatic languages indigenous to India, that of the Munda. The traditional question has always been whether the Austro-Asiatic languages are from India, or, whether they are from Southeast Asia. More precisely, did the Munda culture come to India, or is the Munda culture a relic of the original Austro-Asiatic domain in eastern India?

As background I believe it is important that readers understand that the territory between Vietnam and that of the Munda was likely dominated by Austro-Asiatic dialects ~2,000 years ago. Both the Burmese and Thai arrived in the historic period from southern China, and overthrew Mon or Khmer cultures which flourished in lowland Southeast Asia. In the case of both the Burmese and Thai it was a situation where the newcomers imposed their language upon the indigenous population, but by and large adopted most elements of high culture from the natives (e.g., Theravada Buddhism). The monarchies of Thailand and Burma drew directly from the Indic-inflected polities of the Khmer and Mon.

The recent extensive distribution and variety of Austro-Asiatic languages in Southeast Asia is suggestive of the likelihood that they derive from this area, but it is not a definitive point in that model’s favor. But there are now other genetic lines of inquiry. A few years ago a paper came out which reported that the Y chromosomal lineages of the Munda people which connect them to the Southeast Asia are much more diverse in Southeast Asia. This matters because population expansions and migrations tend to homogenize lineages through greater genetic drift, with the “source” population more likely to maintain diversity. Additionally, there was also evidence of a genetic variant in EDAR which has the hallmark of recent increase in frequency across eastern Asia. This seems to peg the Munda arrival to the Holocene, not the Pleistocene. Finally, there is the pattern of male lineages exhibiting some concordance with Southeast Asia, but female lineages being entirely indigenous. This is a classic expectation from a model of migration where there was a strong bias toward males because of the mobility of these groups, which lacked women and children.

I decided to further explore the question using the Estonian Biocenter data sets, as well as the HGDP and HapMap. For those of you who are curious about the technical details, I LD pruned the Estonian Biocenter marker set from ~600,000 down to ~130,000. I also put the samples through –geno 0.01 and –mind 0.80 on Plink to get high quality individuals and good coverage on markers. To be explicitly clear, I renamed and combined some of the populations in the original data set (e.g., Chamars = UP_Dalits). I ran a preliminary MDS to make sure that the data wasn’t strange, and it checked out.

So to do the analysis I ran TreeMix. I used Chinese Americans as the root outgroup population, and  wanted 5 migrations, and also tried to correct for any remaining LD by looking across a window of 1,000 SNPs. You can view my first plot below.

The primary thing I would focus on is the gene flow from Cambodians to Munda. This is exactly what one might expect if the Munda were intrusive to South Asia. More interestingly, observe that there is no gene flow into Burmese from the South Asian groups, even though they are much closer proximity to South Asia! This is probably picking up something deep in history then. The fact that the Munda diverge early from other South Asian groups is also in keeping with Admixture or Structure bar plot results: the South Asian ancestry of the Munda is relatively unadmixed.

Next I wanted to focus more on the eastern population flows. So I removed a lot of the western groups which overwhelmed my gene flow edges.

In this scenario again there is a gene flow parameter from the rough region of the Cambodian node. Perhaps more curious now there is a powerful gene flow parameter into the Burmese from the same locus.Totally intelligible in light of the fact that the modern Burmese are genetically a hybrid population between Tibeto-Burman and Mon (Austro-Asiatic).

I’m certainly not ready to assert that the “case is closed.” But it seems that we need to shift our probabilities again toward the intrusive hypothesis.

    You probably already saw this, just in case, there was a Center for Human Genetic Research 2013 retreat that David Reich was a speaker at, and a participant blogged his topics, pasting topic 2 here, he reiterates about the ANI-ASI admixture event, one thing I found interesting was that some disease traits such as cardiac risk are linked to ASI component. :
    Topic 2: admixture in Indian history. [Reich 2009, and a forthcoming paper by Priya Moorjani]. Indians are less studied in genomics because the Indian government makes it very hard to get genetic material out of the country. Only now are South Asians from Pakistan, Bangladesh and Indians in the U.S., U.K., etc finally making it into 1000 Genomes, etc.
    When you MDS plot Indians, you find “the Indian Cline” – Indians are on a spectrum between peoples west (Persians, Europeans) and east (Andamanese, East & Southeast Asian). Or if you MDS plot Europeans vs. Chinese you find PC1 separates them; if you then add Indians, they are spread along PC1. Whereas if you plot Indians vs. Chinese, then add Europeans, you don’t see any cline.
    Explanation consistent with history is that Indians are admixed from ancestral north Indians (‘ANI’) and ancestral south Indians (‘ASI’). Different groups in India vary from 20% ‘northern’ and 80% ‘southern’ to 70%/30%. Dravidian speakers and lower castes are associated with more ASI ancestry. Some disease traits – esp. cardiac risk – associated with Indians come from the ASI ancestry. Linkage disequilibrium size suggests that the ANI-ASI admixture occurred less than 3500 years ago with the rise of Vedic religion (Hinduism). This represents a major demographic transformation long after the advent of agriculture and is different from more recent Indian history where endogamy (marrying people of same region, caste etc) has been expected.
    Different Indian groups have different levels of endogamy which can be detected by linkage disequilibrium. In Vysya, huge LD size suggests a recent (~2000 years) founder event and less than 1 in 100 marriages to outsiders per generation. Medical implication: the close genetic relationships within whole regions / caste groups are what is responsible for most recessive genetic diseases in Indians, not due to recent consanguinity (marriage between second cousins, etc.)
    An ancient sample from Luxembourg is ‘more European than Europeans’, suggesting that Europeans are also an admixture of ancient Europeans with Middle Easterners.
    Q&A: Hybrid incompatibility issues that lead to speciation tend to arise first on the X chromosome. We do see plenty of neanderthal SNPs on modern human X chromosomes [perhaps referring to Yotova 2011??]
    Q&A: The ANI admixture into India cannot be entirely, or maybe even at all, explained by Indo-European speakers’ migration into India as told by the Veda. There are at least two different admixture events among upper caste North Indians and we don’t know if either corresponds to the Indo-European migration.

    This is unsurprising but can you really trust TreeMix? It repeatedly throws up weird results whilst failing to detect historically demonstrated admixture.


