The Pith:In India 5,000 years ago there were the hunter-gathers. Then came the Dravidian farmers. Finally came the Indo-Aryan cattle herders.
There is a new paper out of the Reich lab, Genetic Evidence for Recent Population Mixture in India, which follows up on their seminal 2009 work, Reconstructing Indian Population History. I don’t have time right now to do justice to it, but as noted this morning in the press, it is “carefully and cautiously crafted.” Since I am not associated with the study, I do not have to be cautious and careful, so I will be frank in terms of what I think these results imply (note that confidence on many assertions below are modest). Though less crazy in a bald-faced sense than another recent result which came out of the Reich lab, this paper is arguably more explosive because of its historical and social valence in the Indian subcontinent. There has been a trend over the past few years of scholars in the humanities engaging in deconstruction and intellectual archaeology which overturns old historical orthodoxies, understandings, and leaves the historiography of a particular topic of study in a chaotic mess. From where I stand the Reich lab and its confederates are doing the same, but instead of attacking the past with cunning verbal sophistry (I’m looking at you postcolonial“theorists”), they are taking a sledge-hammer of statistical genetics and ripping apart paradigms woven together by innumerable threads. I am not sure that they even understand the depths of the havoc they’re going to unleash, but all the argumentation in the world will not stand up to science in the end, we know that.
Since the paper is not open access, let me give you the abstract first:
Over at Econlog Bryan Caplan bets that India’s fertility will be sup-replacement within 20 years. My first inclination was to think that this was a totally easy call for Caplan to make. After all, much of southern India, and the northwest, is already sup-replacement. And then I realized that heterogeneity is a major issue. This is a big problem I see with political and social analysis. Large nations are social aggregations that are not always comparable to smaller nations (e.g., “Sweden has such incredible social metrics compared to the United States”; the appropriate analogy is the European Union as a whole).
Update: Please do not take the labels below (e.g., “Baloch”) as literal ancestral elements. The most informative way to read them is that they indicate populations where this element is common, and, the relationship of proportions can tell us something. The literal proportion does not usually tell us much.
I was browsing the Harappa results, and two new things jumped out at me. Zack now has enough St. Thomas Christian samples from Kerala that I think we need to accept as the likely model that this community does not derive from the Brahmins of Kerala, as some of them claim. Their genetic profile is rather like many non-Brahmin South Indians, except the Nair, who have a peculiar attested history with the Brahmins of their region.
But that’s not the really interesting finding. Below is a table I constructed from Zack’s data.
After posting on Basque mtDNA I wanted to make something more explicit that I alluded to below, that uniparental lineages are highly informative, but they may not be representative of total genome content. This is plainly true in the case of mestizos from Latin America, but we don’t need genetics to point us in the right direction on this score, we have plenty of textual evidence for asymmetry in sexes when it came to admixture events in the post-Columbian era. Rather, I want to note again the issue of South Asia. When it comes to mtDNA the good majority of South Asian lineages are closer to those of East Asia than Western Eurasia. By this, I do not mean to say that that they’re particular close to East Asian lineages, only that if you go back in the phylogeny the South Asian lineages (I’m thinking here of haplogroup M) they tend to coalesce first with East Asian lineages before they do so with West Eurasian lineages.
Here is a quote from one of the definitive papers on this topic:
With the current economic malaise in the developed economies and the rise of the “B.R.I.C.s” you hear a lot about “China” and “India.” There is often a tacit acknowledge that China and India are large diverse nations, but nevertheless in a few paragraphs they often get reduced to some very coarse generalizations. What’s worse is when you compare China and India to nations which simply aren’t on their scale. For example, over at Brown Pundits there is sometimes talk about India vs. Bangaldesh/Pakistan/Nepal/Sri Lanka. The problem is that the appropriate comparison are specific Indian states, not the whole nation. Uttar Pradesh, the largest Indian state in population, is actually in the same range as Bangladesh and Pakistan. Similarly, when comparing social metrics in Bangaldesh vs. India, one should focus on culturally similar regions, such as the state of West Bengal, not the sum average of India as a nation.
Similarly, we look at frenetic Chinese growth and worry about how they are “leaving us behind” (from an American perspective). But do take a step back to wonder how much the Chinese are leaving the Chinese behind?
Below are two charts which show the yawning chasm within these mega-nations on the scale of states (at a finer grain the variation is even greater). First a rank order of Chinese provinces by GDP PPP, with comparable nations interspersed within. PPP values shouldn’t be taken too literally, and the Chinese data seem to overestimate the values on a province level basis by 10-15%. But you get the general picture.
Unlike in some Asian societies dairy products are relatively well known in South Asia. Apparently at some point my paternal grandmother’s family operated a milk production business. This is notable because Bengal is not quite the land of pastoralists. In much of North India milk and milk-products loom larger, in particular ghee. People don’t tend to consume what makes them ill, and even accounting for some processing in the form of butter, most researchers have assumed a substantial number of South Asians must be lactase persistent. That is, they can extract nutritive value out of the lactose sugar present in milk (in addition to fat and protein). Additionally, many South Asians have the well known -13910 C>T common in Western Eurasia. How do I know this? Because I share my genetic information with lots of South Asians, and some of them, especially Punjabis, come up as “lactose tolerant” on that allele.
A new paper in Molecular Biology and Evolution confirms this with a larger data set, over 2000 samples from South Asia. The geographical pattern is exactly what you’d expect:
One of the things that happens if you read ethnographically thick books like Nicholas Dirks’ Castes of Mind: Colonialism and the Making of Modern India is that you start to wonder if most castes were simply created by the British and for the British. Granted, even Dirks would not deny the existence of Brahmins prior to the British period, but those who work within his general paradigm might argue that a group like Kayasthas were the product of very recent developments (e.g., the uplift of a non-Brahmin literate group willing to serve Muslim and British rulers). The emergence of genomics complicates this sort narrative, because you can examine relationships and see how plausible they would be given a particular social model.
Zack Ajmal is now at 90 participants in the Harappa Ancestry Project. He’s still undersampling people from the Indo-Gangetic plain between Punjab and Bengal, but that’s not his fault. Hopefully that will change. He posted K = 4 recently for the last 10 participants, but I notice K = 12 in his spreadsheets. So this is what I did:
1) I aligned the ethnic identification information with the K = 12 results.
2) I removed relatives and those who were not 100% South Asian.
3) I added some reference populations in. These are all upper case below. All other rows are individuals (HRP numbers provided).
4) I removed five ancestral groups. The three Africans, Papuans, and Siberians.
Then I arranged the rows alphabetically by ethnic identification. Helpfully many people provided their caste information as well. I’ve uploaded a csv with the information. But skim the plots & table below. Those of you who are brown can probably make more sense of them than I can. But I think some of the patterns are pretty interesting already. For me the big thing that jumps out is how uniform some of these caste groups are. Remember that HRP22 and HRP23 are my parents. If the British made these groups up, they were very punctilious about their ancestral make up in constituting them!
School girls in Hunza, Pakistan
A few days ago I observed that pseudonymous blogger Dienekes Pontikos seemed intent on throwing as much data and interpretation into the public domain via his Dodecad Ancestry Project as possible. What are the long term implications of this? I know that Dienekes has been cited in the academic literature, but it seems more plausible that this sort of project will simply distort the nature of academic investigation. Distort has negative connotations, but it need not be deleterious at all. Academic institutions have legal constraints on what data they can use and how they can use it (see why Genomes Unzipped started). Not so with Dienekes’ project. He began soliciting for data ~2 months ago, and Dodecad has already yielded a rich set of results (granted, it would not be possible without academically funded public domain software, such as ADMIXTURE). Even if researchers don’t cite his results (and no doubt some will), he’s reshaping the broader framework. In other words, he’s implicitly updating everyone’s priors. Sometimes it isn’t even a matter of new information, as much as putting a spotlight on information which was already there. Below is a slice of a bar plot from Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. It uses STRUCTURE with K = 7. To the right of the STRUCTURE slice are two plots of individual data on French and French Basque from the same HGDP data set using ADMIXTURE at K = 10 from Dodecad.
I mentioned a few days ago that a friend was trying to get together some data to analyze the genetic variation of South Asians. By a strange coincidence Dienekes just published a more detailed analysis of South Asians…and uncovered something very interesting, though not that surprising. Some technical preliminaries:
A note of caution: The reduced marker set (~30k) means that a lot of noise is added in the admixture estimates. In particular, many individuals are likely to get low-level admixture from population sources that can be attributed to noise. But, as we will see, the small marker set does not really affect either the power of the GALORE approach, or of ADMIXTURE to infer meaningful clusters.
In addition to the various online sources of public data Dienekes got about a dozen South Asians. I was one of those South Asians, DOD075. In many ways I’m a rather standard issue South Asian, similar to Gujaratis, except that I have a substantial ‘East Asian’ component. More concretely, between 1/6 and 1/7 of my ancestry seems to be of eastern origin, far higher than the norm among South Asians. The rest of my ancestry was mostly South Asian specific, with a minor, but significant ‘West Asian’ component common across northern India.
Rerunning with more data with different samples Dienekes came out with a different set of ancestral components. Of particular interest to me he broke down the East Asian between East Asian proper and Southeast Asian. Below are a selection of populations with ancestral components + me. I’ve also renamed a few components. North Kannadi = Dravidian and Irula = Indian tribal. Indian = Generic Indian. Looking at the Fst it seems that Indian endogamy and population bottlenecks has had an effect…look at the North Kannadi distance from everyone else.
Quick review. In the 19th century once the idea that humans were derived from non-human ancestral species was injected into the bloodstream of the intellectual classes there was an immediate debate as to the location of the proto-human homeland; the Urheimat of us all. Charles Darwin favored Africa, but in many ways this ran against the cultural grain. The theory of evolution was birthed before the highest tide of the age of white supremacy and European hegemony, and Darwin’s model had to swim against the conviction that Africans were the most primitive of the colored races. After the waning of the ideological edifice of white supremacy, and the shock it received during and after World War II, the debates as to the origin of humanity still remained contentious and followed the same outlines (though without the charged normative inferences). But as the decades wore on many more researchers began to believe that Darwin was correct, and that the origin of humanity lay in the African continent. First, the deep origin of the human lineage in Africa was accepted, but eventually a more recent expansion out of Africa was argued for by one school. The turning point in these academic disputes was the popularization of the “mitochondrial Eve” theory of the 1980s.
What some paleontologists had long argued, that anatomically modern humans have their locus of origin in Africa, was supported now by research from genetics which indicated that Africans were the most basal clade of humans on a continental scale, so that non-Africans could be conceived of as a subset of Africans. From this originates the chestnut of wisdom that Africans have more genetic diversity than all other human populations combined. By the year 2000 one could say that the “Out of Africa” triumphalism had proceeded to the point where an almost exterminationist model had taken hold when it came to the relationships of anatomically modern H. sapiens, and other groups which had evolved outside of Africa over the past million or so years, such as the Neandertals.
But the theoretical dichotomies were too coarse and absolute as it turns out. A division between multiregionalist phyletic gradualism, where H. sapiens evolved out of its hominin ancestors concurrently on a world wide scale, and a model of rapid expansion of one tribe in Africa to replace all others in totality, may have been warranted in the age of classical genetics and a morphometric analysis, but now we can look at the raw genomic material in a more fine-grained fashion. In fact, we can now look at the genomic patterns of variation among extinct hominins! Though there have long been hints that the expansion-and-replacement paradigm was too extreme from the genetic and morphological data, with the publication last spring in Science of a paper which made the claim for admixture between Neandertals and non-Africans in the range of 1-4% in all non-African groups based on a comparison of Neandertal and modern human genetic variation, one can dismiss absolutist expansion-and-replacement as self-evidently true orthodoxy. But one orthodoxy has no given way to another, and the shock to the old models presented by the data has not resulted in the coalescence of new robust paradigms. We live in a time of scientific troubles, so to speak.
In the comments below a strange conversation grew out of the politicized nature of Pakistani identity, and its relationship to India the nation-state, and India the civilization. I assume that a typical reader, or more accurately commenter, on this weblog would be sanguine if they found out they were 10% chimpanzee. After all, it’s what’s between your ears that really matters, not who your ancestors were. I do understand that some readers have strong genealogical-nationalist interests in human population genetics, and that’s fine so long as you don’t presume that the rest of us share such priorities (this is a problem for some commenters, so please be aware that I get annoyed when you project this way, though it’s obviously not a banning offense).
But readers who come via search engines are a different case, and that’s why I’ve started to get worried about over-reading of PCA and such. Nevertheless, I do think PCA can answer the question of whether there is any real genetic discontinuity between Pakistanis and Indians. The answer is no. Page 19 of Reich et al. supplement 1 includes in the HGDP Pakistani populations in their plot of genetic variation of Indian groups. I’ve added some labels, but the top-line is rather clear. AP = Andhara Pradesh, UP = Uttar Pradesh, GUJ = Gujarat and RAJ = Rajasthan. I assume Ind. and Pak. abbreviations are self-evident.
The New York Times has a piece up, Defusing India’s Population Time Bomb, which reiterates what I was trying to get at yesterday, India’s demographic problems are localized to particular regions, not the nation as a whole. First, let’s review the world’s population growth & fertility rates:
Now let’s focus on a few nations:
In my post Pakistan ~10 years on I alluded to the fact that despite India’s robust economic growth of the past ~15 years or so in the aggregate there is a wide range of state-by-state variation. It is conventional in the media to point out the massive caste/class divisions in India, but because of the lack of familiarity with the geography of that nation there’s less reference to the regional gulfs. But if you look at the state-level data they’re rather large. The total fertility in the northern gangetic states of Uttar Pradesh and Bihar is ~4, while that in the southern states of Kerala and Tamil Nadu is ~2. Uttar Pradesh and Bihar are not trivial states, rather, they’re the two most populous! Additionally, they’re a meaty portion of the South Asian “Cow Belt”, the cultural heart of the subcontinent. The first great historic polities of South Asia, that of the Maurya and Gupta, had their focus in what is today Bihar, while later on the Muslim dynasts famously operated from a base around the region of Delhi in Uttar Pradesh. In the Indian cultural geography these states are the heart of Āryāvarta.
Wikipedia has a set of pages which rank the states of India by various metrics. The tables themselves are illuminating, but for non-Indian readers I thought a series of thematic maps would be better. Additionally, I added one scatterplot.