The Pith: Afro-Indians are mostly African, with a substantial Indian minority ancestry. The latter is disproportionately female mediated. It also seems that that ancestry is more northwest Indian, and that natural selection has been operating upon them outside of the African environment.
Along the western coast of South Asia, from Makran in southwest Pakistan, down to the Konkan coast of southwest Iindia, there are isolated communities of Afro-Indians. They are called Siddis or Habshi. Their African origin is clear in their physical appearance, as well as aspects of their folk customs which tie them back to Sub-Saharan African. Nevertheless, they have assimilated to many Indian cultural traits. They generally speak the local language, and practice Islam, Hinduism, or Roman Catholic Christianity (in that order in proportion).
How and why did the Siddis arrive in India? The earliest date for their arrival almost certainly must be bounded by the period when Indo-Islamic polities rose to prominence in the early second millennium. The cosmopolitan melange of the armies of the Muslim warlords included diverse groups of Africans, some of whom took power, and established their own self-conscious Afro-Indian dynasties, set apart from the Turkish, Afghan, Persian, and Arab inflected statelets. Were these the sources of the modern Siddi communities? The oral history of the Siddi of the western coast of South Asia suggests not. In fact the geographical concentration of these Afro-Indian tribes along the Arabian sea fringe is indicative of different historical actors: the Portuguese. In much of Asia, out to China, the role of Africans was very different from that in the New World. They were objects purchased as for elite consumption, not production. They served at court, guarded the harem, etc. Lowland Asia had no need for imported labor, as there was human stock aplenty. Whereas in much of the New World black African slaves were critical cogs in the capitalist system of production, in Asia, as in the Arab world outside of a few areas such as southern Iraq, they were signals of luxurious consumption by the high and mighty (this was in vogue at European courts for a period as well).
Two new papers published yesterday in the American Journal of Human Genetics examine the genetics of the Siddi of India with an eye toward elucidating the details of their historical ethnogenesis. Though the papers overlap to a great extent, there are subtle differences which result in complementation. Shah et al. uses a far thicker set of markers, while Narang et al. look at many more populations, but due to removing SNPs which don’t span their populations the marker set is much thinner. Let’s review the papers in turn.
As I’ve been harping on and on for the past few years that the patterns of contemporary genetic variation are probably only weakly tied to past patterns of genetic variation (though Henry Harpending warned me about this as far back as 2004). A major reason that scholars operated under this presupposition is the axiom that most of the variation we see around us crystallized during the Last Glacial Maximum (~20 thousand years before the present).
This may be true in some cases, but I doubt it is true in most cases. I was pointed to a classic case of this problem just today. A reader alerted me to a short paper from this spring which attempts to ascertain the point of origin of the dominant mtDNA haplogroup among the Onge tribe of the Andaman Islanders, M31a1. This is an interesting issue because some researchers proposed, plausibly in the past, that these indigenous people in the Andaman Islands represent the descendants of the first wave “Out of Africa,” who took the rapid “beachcomber” path. Understanding the key to their genetics may then unlock the key to the “Out of Africa” event. Or so we thought. It looks like the human evolutionary past was a lot more complicated than we’d presumed.
The paper is in the Journal of Genetics and Genomics. Mitochondrial DNA evidence supports northeast Indian origin of the aboriginal Andamanese in the Late Paleolithic:
In view of the geographically closest location to Andaman archipelago, Myanmar was suggested to be the origin place of aboriginal Andamanese. However, for lacking any genetic information from this region, which has prevented to resolve the dispute on whether the aboriginal Andamanese were originated from mainland India or Myanmar. To solve this question and better understand the origin of the aboriginal Andamanese, we screened for haplogroups M31 (from which Andaman-specific lineage M31a1 branched off) and M32 among 846 mitochondrial DNAs (mtDNAs) sampled across Myanmar. As a result, two Myanmar individuals belonging to haplogroup M31 were identified, and completely sequencing the entire mtDNA genomes of both samples testified that the two M31 individuals observed in Myanmar were probably attributed to the recent gene flow from northeast India populations. Since no root lineages of haplogroup M31 or M32 were observed in Myanmar, it is unlikely that Myanmar may serve as the source place of the aboriginal Andamanese. To get further insight into the origin of this unique population, the detailed phylogenetic and phylogeographic analyses were performed by including additional 7 new entire mtDNA genomes and 113 M31 mtDNAs pinpointed from South Asian populations, and the results suggested that Andaman-specific M31a1 could in fact trace its origin to northeast India. Time estimation results further indicated that the Andaman archipelago was likely settled by modern humans from northeast India via the land-bridge which connected the Andaman archipelago and Myanmar around the Last Glacial Maximum (LGM), a scenario in well agreement with the evidence from linguistic and palaeoclimate studies.
Most of the time I point to and review papers on this weblog which excite me. But in the interests of “balance” and dampening the bias toward material I find interesting and salient I thought it would be interesting to look at a paper which I thought wasn’t too interesting. It’s in the Journal of Human Genetics, part of the Nature Publishing Group empire. Also, it is open access, so you can read it yourself and make your own individual judgments.
India’s role in the dispersal of modern humans can be explored by investigating its oldest inhabitants: the tribal people. The Soliga people of the Biligiri Rangana Hills, a tribal community in Southern India, could be among the country’s first settlers. This forest-bound, Dravidian speaking group, lives isolated, practicing subsistence-level agriculture under primitive conditions. The aim of this study is to examine the phylogenetic relationships of the Soligas in relation to 29 worldwide, geographically targeted, reference populations. For this purpose, we employed a battery of 15 hypervariable autosomal short tandem repeat loci as markers. The Soliga tribe was found to be remarkably different from other Indian populations including other southern Dravidian-speaking tribes. In contrast, the Soliga people exhibited genetic affinity to two Australian aboriginal populations. This genetic similarity could be attributed to the ‘Out of Africa’ migratory wave(s) along the southern coast of India that eventually reached Australia. Alternatively, the observed genetic affinity may be explained by more recent migrations from the Indian subcontinent into Australia.
To be blunt about it I think the researchers here just randomly stumbled onto a weird result which happened to align with some plausible preconceptions. This happens all the time, and is responsible for the unfortunate confirmation bias which plagues science. Researchers know very well what the expected results are, and may unconsciously or consciously sift through their data for a set of facts which align well with their theoretical preconceptions. In this case it isn’t quite so bald, as there are no orthodoxies, but a set of alternative hypotheses which go back a century or so.
Zack Ajmal has been taking his Reference 3 data set for a stroll over at the Harappa Ancestry Project. Or, more accurately, he’s been driving his computer to crunch up ADMIXTURE results ascending up a later of K’s. Because it is the Harappa Ancestry Project Zack’s populations are overloaded a touch on South Asians. He managed to get a hold of the data set from Reconstructing Indian History. If you will recall this paper showed that the South Asian component which falls out of ancestry structure inference algorithms may actually be a stabilized hybrid of two ancient populations, “Ancestral North Indian” (ANI) and “Ancestral South Indian” (ASI). ANI are a population which can be compared pretty easily to other West Eurasians. There are no “pure” groups of ASI, but the indigenous peoples of the Andaman Islands are the closest, having diverged from the mainland ASI populations tens of thousands of years ago.
At K = 11, that is, 11 inferred ancestral populations, Zack seems to have now stumbled onto the patterns which one would expect from this hybrid model of South Asians. Let me quote him:
Now let’s take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.
The r-squared between % ASI and % Onge, an Andaman group, is 0.994. That means 99.4% of the variation in the former can be explained by variation of the latter. The % ASI is consistently higher than Onge. Why? The last common ancestors of Andaman Islanders and the ASI diverged on the order of tens of thousands of years ago. Dienekes observed ADMIXTURE needs good reference populations, and the Onge have been so long diverged from the last common ancestor with the mainland ASI populations that it’s not a perfect proxy for this ancient group. But it seems that the underestimate is systematically biased in the same direction, so that explains the good fit between the two trends.
Zack naturally generated a pairwise matrix of Fsts between these inferred ancestral populations. Remember, the value within Fst shows the proportion of the genetic variance in the two populations which can be partitioned across them, but not within them. So it’s a rough measure of genetic distance.
Here’s the matrix. I’ve renamed some populations:
Whenever Zack Ajmal posts a new update to the Harappa Ancestry Project he appends some data to his ethnic database. This sends me to Wikipedia, because how many people are supposed to know what a “Muslim Rawther” means? Well, if you are a Muslim Rawther, and perhaps from Southern India, you would. But South Asian ethno-linguistic categories and hierarchies are notoriously Byzantine, and I have difficulty making sense of them. This isn’t too surprising in my case, as my family’s background is relatively mixed in the very recent past (e.g., Hindus and Muslims, and people of various caste backgrounds), so we’re not the sort who can go at length about our pure ancestry and all that stuff. Unfortunately, Wikipedia isn’t always useful, because the people editing the entries on particular South Asian ethnic groups are often people from those ethnic groups, so you get a lot of extraneous information, and a particular slant on how awesome and high achieving the group (also, sometimes there’s funny stuff about how notoriously good looking that particular caste!). On occasion there are other sources which are informative. For example, Zack has several individuals from the Tamil Nadar caste. I know a little about this group because 1) I have a friend whose family is Nadar (he’s American, so saying he’s an American Nadar is pretty worthless), 2) The New York Times profiled the group last fall.
When Zack noted that a group termed Tamil Vishwakarma had submitted entries, I went to Wikipedia. That was the first time I’d heard of the group. This is what I found:
School girls in Hunza, Pakistan
A few days ago I observed that pseudonymous blogger Dienekes Pontikos seemed intent on throwing as much data and interpretation into the public domain via his Dodecad Ancestry Project as possible. What are the long term implications of this? I know that Dienekes has been cited in the academic literature, but it seems more plausible that this sort of project will simply distort the nature of academic investigation. Distort has negative connotations, but it need not be deleterious at all. Academic institutions have legal constraints on what data they can use and how they can use it (see why Genomes Unzipped started). Not so with Dienekes’ project. He began soliciting for data ~2 months ago, and Dodecad has already yielded a rich set of results (granted, it would not be possible without academically funded public domain software, such as ADMIXTURE). Even if researchers don’t cite his results (and no doubt some will), he’s reshaping the broader framework. In other words, he’s implicitly updating everyone’s priors. Sometimes it isn’t even a matter of new information, as much as putting a spotlight on information which was already there. Below is a slice of a bar plot from Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. It uses STRUCTURE with K = 7. To the right of the STRUCTURE slice are two plots of individual data on French and French Basque from the same HGDP data set using ADMIXTURE at K = 10 from Dodecad.
I mentioned a few days ago that a friend was trying to get together some data to analyze the genetic variation of South Asians. By a strange coincidence Dienekes just published a more detailed analysis of South Asians…and uncovered something very interesting, though not that surprising. Some technical preliminaries:
A note of caution: The reduced marker set (~30k) means that a lot of noise is added in the admixture estimates. In particular, many individuals are likely to get low-level admixture from population sources that can be attributed to noise. But, as we will see, the small marker set does not really affect either the power of the GALORE approach, or of ADMIXTURE to infer meaningful clusters.
In addition to the various online sources of public data Dienekes got about a dozen South Asians. I was one of those South Asians, DOD075. In many ways I’m a rather standard issue South Asian, similar to Gujaratis, except that I have a substantial ‘East Asian’ component. More concretely, between 1/6 and 1/7 of my ancestry seems to be of eastern origin, far higher than the norm among South Asians. The rest of my ancestry was mostly South Asian specific, with a minor, but significant ‘West Asian’ component common across northern India.
Rerunning with more data with different samples Dienekes came out with a different set of ancestral components. Of particular interest to me he broke down the East Asian between East Asian proper and Southeast Asian. Below are a selection of populations with ancestral components + me. I’ve also renamed a few components. North Kannadi = Dravidian and Irula = Indian tribal. Indian = Generic Indian. Looking at the Fst it seems that Indian endogamy and population bottlenecks has had an effect…look at the North Kannadi distance from everyone else.
The past ten years has obviously been very active in the area of human genomics, but in the domain of South Asian genetic relationships in a world wide context it has seen veritable revolutions and counter-revolutions. The final outlines are still to be determined. In the mid-1990s the conventional wisdom was that South Asians were a branch of a broader West Eurasian cluster of peoples, albeit more distant from the core Middle Eastern-North-African-European-Caucasian clade. The older physical anthropological literature would have asserted that South Asians were predominantly Caucasoid, but with a Australoid element admixed in at varying proportions as a function of geography and caste. To put it more concretely, and I think accurately, a large degree of South Asian physical variety can be defined along the spectrum between A. R. Rahman and Nawaz Sharif. The regional and caste truisms are only correlations. Subrahmanyan Chandrasekhar was a Tamil Brahmin, but experienced anti-black racism in the United States. I think that is reasonable in light of his appearance.
This rough & ready mainstream understanding, supporting by classical genetic markers, was overturned in the early years of the 21st century. One line of thought argued that South Asians were much more distinctive from the broader Western Eurasian cluster of peoples. Representative of this body of work is a paper like The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. These researchers tended to start with the female lineages, mtDNA, and then supplement that with Y lineages, the paternal descent. A separate line of evidence, generally drawn from Y chromosomal results, indicated that there were deep connections between the people of India and those of Central Eurasia, in particular via the R1a haplogroup. Additionally, one aspect of the first set of results which was very surprising was that it actually placed South Asians closer to East, not West, Eurasians. But by the end of the aughts the uniparental studies had been supplemented by a range of results produced from SNP-chips, which looked at hundreds of thousands of genetic variants. These studies seemed to support the older view of South Asians being closer to West Eurasians than East Eurasians. Finally last year a paper came out which posited that almost all South Asian populations were actually an ancient stabilized hybrid between two groups, a European-like population, “Ancient North Indians” (ANI), and another group which is no longer present in unadmixed form, “Ancient South Indians” (ASI), of whom the Andaman Islanders are distant relatives. Though there was a slight bias toward ANI as a whole, the fraction of ASI increased as one went southeast, and down the caste ladder. The distinctive “South Asian” ancestral group in other words then may actually be conceived of as a compound of these two elements; an admixture of the native substrate against a European-like genetic background.