The pith: In this post I examine the most recent results from 23andMe for my family in the context of familial and regional (Bengal) history. I also use these results to offer up a framework for the ethnognesis of the eastern Bengali people within the last 1,000 years, and their relationship to other South Asian and Southeast Asian populations.
Since I received my 23andMe results last May I’ve been blogging about it a fair amount. In a recent post I inferred that perhaps I had a recent ancestor who was an ethnic Burman or some related group. My reasoning was that this explained a pattern of elevated matches on chromosomal segments with populations from southwest China in the HGDP data set. But now we have more than my genome to go on. This week I got the first V3 chip results from a sibling. And finally, yesterday the results from my parents came in. One thing that I immediately found interesting was my father’s mtDNA haplogroup assignment, G1a2. This came from his maternal grandmother, and as you can see it has a distribution which is mostly outside of South Asia. In case you care, I asked my father her background, and like my patrilineage she was a “Khan,” though an unrelated one (“Khan” is just an honorific). I received these results before the total genome assessment, and so initially assumed this confirmed my hunch that my father had some unknown recent ancestry of “eastern” provenance. But it turns out my hunch is probably wrong. In fact, my parents have about the same “eastern” proportion, with my mother slightly more! My expectation was that perhaps my mother would be around 25-30% “Asian,” and my father above 50%. The reality turns out that my father is 38%, and my mother 40%.
Image credit: f_mafra
Below are the “Ancestry Paintings” generated by 23andMe for my family (so far). What you see are the 22 non-sex chromosomes, which have two copies each, and assignments to “Asian,” “European,” and “African,” ancestry groups. The reference populations to generate these assignments come from the HapMap, the northern European sample of white Americans from Utah, Chinese from Beijing, Japanese from Tokyo, and ethnic Yoruba from Nigeria. What the assignment to one of these classes denotes is that that region of the genome is closest to that category in identity. It does not imply that your recent ancestry is European or Asian (African is probably a different matter, but there are many complaints about the results for African Americans and East Africans in the 23andMe forums). This caveat is especially important for South Asians, because we generally find that we’re ~75% European and ~25% Asian. All that means is that though most of our genetic affinity is with Europeans, a smaller fraction seems to resemble Asians more. Via “gene sharing” on 23andMe I can see that the Asian fraction varies from ~35% in South India and Sri Lanka, to ~10% in Pakistan and Punjab. This is not because South Indians have more East Asian ancestry than Punjabis. Rather, to a great extent the South Asian genome can be decomposed into two ancestral elements, one with a distant, but closer, affinity to populations of eastern Eurasia, and one with a close affinity to populations of western Eurasia. What some have termed “Ancient South Indians” (ASI) and “Ancient North Indians” (ANI). ASI ancestry, which is probably just a touch under 50% in South Asians overall, seems to shake out then as somewhat more Asian than European.* The fraction of ASI increases as one moves south and east in South Asia (and as one moves down the caste status ladder).
First, I want to note that I’ll be using abbreviations for my family members now and then (this applies to future posts). My father will be RF, my mother will be RM, and my siblings will be RS, with a number to denote which sibling. So currently we have RS1. As you can see in a gestalt sense we resemble each other a great deal as a family. We’re about 40% Asian, and 60% European. The extent of fragmentation indicates that we’re not that recent of an admixture; otherwise, the Asian and European fragments would cluster on one strand or the other. Some have suggested that my mother does exhibit less fragmentation. A hypothesis for why this may be is that her maternal grandfather was reputedly from a family of Middle Eastern origin who had resettled in South Asia, first in Delhi, and later in southeast Bengal (specifically, the district of Noakhali). Since he presumably would hardly have had any Asian ancestry according to 23andMe’s algorithm the homologs inherited from him would be overwhelmingly European, with only one generation of recombination intervening.
To assess probabilities of the plausibility of various hypotheses to explain the pattern of the results you need all the non-genomic information. Above is a map of British India. I’ve pointed to the region of Bengal from which my family comes. Of my great-grandparents 7 out of 8 were born in Comilla (which is actually a greater expanse to the southeast of Dhaka than the current Bangladeshi administrative division). 1 grandparent was born in Noakhali, which is just to the southeast of Comilla. 4 out of 8 great-grandparents were born within 5 miles of the town of Chandpur (RF’s grandparents). 3 out of 4 great-grandparents were born within 5 miles of the village of Homna (RM’s grandparents). These two locations are about 30 miles from each other as the crow flies, though transport between them would have been by water in an earlier era (Homna is on the Meghna river, which is actually a more substantial body of water than the Ganges by the time the latter reaches Bangaldesh). This region is bounded on the west by the Padma river, which narrows at Chandpur to about 2 miles in width (average depth ~1,000 feet). To the east is the Indian state of Tripura. This is a relatively porous border, defined on the map, not imposed by geography. You can see that in some regions the Bangaldesh-India border here in the east actually bisects rice paddies.
Today Tripura state is majority ethnic Bengali due to mass migration of Hindus from what was East Bengal during the 20th century (and later East Pakistan, and now Bangladesh). But its indigenous people are the Tripuri, a tribe whose native language is clearly Tibeto-Burman, and physical type points to their connection with populations to the north and east. At the same time, ~90% of the Tripuri are Hindus, and during the period of Islamic rule in South Asia the rajahs of Tripura styled themselves defenders of Hindu civilization (just as the Tibeto-Burman Ahoms of Assam did). As such, linguistically and genetically the native people of Tripura exhibit a sharp contrast to the Indo-Aryan peoples of the Gangetic Plain, of whom the Bengalis are the easternmost representatives along with the Assamese. But, they have also long been part of the South Asian cultural scene, and can not longer be viewed as purely intrusive (their oral history indicates that they arrived before the Muslims, for one).
Finally, in regards to the detailed backgrounds of my 8 great-grandparents, 2 were of the Khan class. 1 was from a family of Hindu Thakurs who were recently converted to Islam. Another was of the family name Sarkar. 1 was likely from a family of Middle Eastern transplants to South Asia, at least in part. The 4 remaining great-grandparents were Bengali Muslims, with no particular background information beyond that known by my parents.
I gave you all this because genetic variation is strongly conditioned upon geographical and cultural parameters. Water barriers seem to have been particular efficacious in the pre-modern period dividing people culturally and genetically (though ironically water was also a precondition for any bulk trade). Language is also another major parameter of difference. And finally, there is religion. In the last section I would not be surprised if 300 years ago the majority of my ancestors in that generation were Hindus; there is some fluidity in this obviously. I provide the data on radius of place of birth because we know from European results that even villages exhibit genetic clustering. This is mitigated in my family because my father has a diverse background among his grandparents as far as community goes, while my mother has a grandparent who was from a different district, and to a great extent a different ethnic group in biological terms.
When I initially saw that I was ~40% Asian I was little taken aback by the high proportion (remember, the average South Asian is about 25% Asian), but there were two parsimonious explanations, a) I had a lot of ASI, b) I had ancestry which did not seem South Asian as such, but was genuinely from East Asia. To ascertain whether it was the former I began proactively gene sharing with a wide range of South Asians on 23andMe. After dozens of individuals it became clear that I was outside of the normal interval of variation. I was more Asian than individuals from South India or Sri Lanka. Additionally, even these individuals tended to be genetically closest to Central South/Asians in the HGDP data set. I was closest to East Asians. Also, on the two dimensional PCA projected onto Central South/Asians I was definitely outside of the cluster of all the other South Asians. Finally, I did find someone who broke the magic 35% barrier of Asian…and that individual was a Bangladeshi, at 38%. And, like me, he was closer to East Asians on the basic “Global Similarity” match. He also carried a Y chromosomal lineage which was rare in South Asia and common among the Hmong. Finally, when Dienekes started his Dodecad Ancestry Project it was clear that about ~15% of my ancestry clustered with an element which was not South Asian, but East Asian. If one removes this fraction, I would be about 70% European and 30% Asian, absolutely within the normal range for someone with ancestry to the east or south of the subcontinent.
If you’ve read up to this point, you may be wondering how it is that my father is 38% Asian and my mother is 40% Asian, and I’m 43% Asian. After all, shouldn’t I be an average between the two? Actually, on the PCA scatter plot I am (along with my sibling) exactly between my parents (you can’t see the offspring because the flags are just too large). So why the difference? First, remember that the PCA is projecting you onto a two dimensional axis where the x and y represent the two biggest components of variance in the data set. In other words, it’s yanking out the subset of genetic variance which really stands out in terms of between population difference. This is how an individual who is a first generation Eurasian can be so far from their parents on this plot, but still exhibit a great deal of identity by state in terms of total genome; there’s a lot of variation that the two dimensional plot does not capture (e.g., private variants to family lineages). The Ancestry Painting estimates are different; they’re looking across the whole genome and making assessments for each region as to its genetic affinity between the three reference populations. So to repeat, you have over 50 reference populations vs. 3, and, you have a small proportion of the total genetic variation, vs. the whole genome. Both methods are reporting real and valid results, but they’re somewhat different.
So there are two very simple and methodological explanations for the discrepancy above which I can think of. I’m on V2, while my parents and sibling are on V3. I know this has made a difference in other measurements. Additionally, there’s clearly some “noise” within this algorithm, resulting in people with trace African or Asian ancestry which isn’t real, even if you take into account the kludgey nature of the reference populations. But let’s take the results at face value. With the ancestry painting, recall how the European and Asian components were chunky across the genome? Both of my parents received half their genomes from their parents. My own chromosomes are a mosaic of those of my grandparents. Some of the original linkage between genomic regions because of their physical location on the same strand have been broke apart by recombination in the two generations downstream from my grandparents. Concretely, two instances of meiosis which produced sex cells. Therefore, some of the associations of alleles present in my grandparents have been transformed within me. But even without recombination, it is clear that one homologous chromosome could be more European or Asian than the total genome average. Because only one of these is passed to any given offspring, there is going to be variance from sibling to sibling. Genetics is not a pure blending process. That may be why I am 43% Asian while RS1 is 40% Asian. We’re both sampling from our parents genes, and there’s going be variance in that process (on the chromosomal level you have 22 autosomal draws from each parent where each draw has two outcomes).
An interesting implication of this is that the grandchildren of a multiracial couple will exhibit variance in their ancestral quanta from major racial groups. This is one reason why it is a fallacy to presume that intermarriage will result in the washing away of biological diversity. And processes such as assortative mating could even presumably extract out “pure” individuals from an originally admixed random-mating population.
With all that said, I now believe that with an N = 3 from eastern Bengal that I am not an exception with recent Southeast Asian ancestry, but rather eastern Bengal is part of the gene frequency cline between South Asia and Southeast Asia, and as such has a substantial fraction of eastern ancestry. Zack has my parents’ data, so once the results come back from the first runs of HAP I believe that he will see the same pattern of substantial non-South Asian ancestry in them that Dienekes found in me. The cline here is still sharp. The average Bangladeshi is probably interchangeable with just 10-20% with the average Burmese when it comes to proportions in inference of ancestral quanta algorithms. (remember that the Burmese probably have a small South Asian component too). In contrast, the average Bangladeshi probably can be interchangeable at 80-90% with a resident of Bihar (the closest match in total SNP comparison in 23andMe that I’m sharing with is a Bihar, not the two other ethnic Bengalis). This is clearly a function of geography, the north-south ranges in Burma seal it off from South Asia. In contrast, there are open plains from northern Bangladesh to Bihar. In some ways Burma has more cultural affinity and connection with peninsular South Asia because of the ease of maritime travel. The prevalence of Theravada Buddhism in Burma is a testament to the association of the lower Irrawaddy region with Sri Lanka.
Back to Bangladesh. One aspect of the Indian subcontinent in terms of religious demography is that the heart of Indo-Islam, the Delhi area, never had a Muslim majority. Rather, Muslims were a majority along the northwestern and northeastern fringe (along with a few other districts, such as northern Kerala). The predominance of Islam on the northwest isn’t that surprising, as that region borders upon the Dar-al-Islam proper. But what about Bengal? In the late 19th century the British were apparently surprised that in the united Bengal (which includes roughly the modern state of West Bengal in India, and Bangladesh) had a Muslim majority. Because of differential birth rates and conversion (this second includes sections of my family as I note above) about 2 out of 3 ethnic Bengalis alive today are Muslim, with the balance being Hindu. Bangladesh is estimated to be 90% Muslim, while West Bengal is 25% Muslim. Even today after generations of Hindu outmigration one pattern within Bangladesh is the relative concentration of Hindus to the west and north (also, Hindus in Bangladesh tend to be urban). The “buckle” of the “Koran belt” in Bangladesh is actually the district of Noakhali, on the southeast fringe of Bengal. My mother’s maternal grandfather, who came from a lineage of pirs who had originally settled in the Muslim heartland in Delhi, was from Noakhali. It is apparently said that in Noakhali even the Hindus know proper Islamic forms!**
An explanation for this pattern is that the religious influence and power of Hindu elites declined as a direct function of distance from the regions of West Bengal, which were closer to the core Aryavarata, and had traditionally been the locus of power of Hindu dynasties before the rise of Islam. Additionally, Bengal was the last region of the mainland subcontinent with a robust Buddhist society during the flowering of the Pala Empire around the year 1000. It is therefore suggested that many Bengali Muslims were converted directly from Buddhism, not Hinduism (there remains even today a small minority of ethnic Bengali Buddhists, who carry the surname “Baura.” This is in distinction to the descendants of Tibeto-Burman people who now speak Bengali, but retain a tribal identity and Theravada Buddhist religion). Also, it may be that eastern Bengal was populated mostly by animist tribes before the arrival of Muslims, and just as European colonial powers were more successful in Asia at spreading their religion among marginalized people (e.g., tribal peoples in northeast India and Southeast Asia are often Christian), so Islam found purchase among those outside of the Hindu caste system.
These models are broadly persuasive to me. But, I still am suspicious that there was such a strong disjunction in the depth of Hindu institutions in western vs. eastern Bengal; after all, the kings of Tripura to the east were Hindu when Islam was new in South Asia. If being tribal and marginal to the core Hindu civilization was one of the grounds for susceptibility to Islam it is peculiar that it is precisely many tribal people in modern Bangladesh who are not Muslim. Indeed, the Tibeto-Burman populations nearer to Indian groups in eastern South Asia are Hindu or Buddhist, not Muslim (those further in the hinterlands were not integrated into any South Asian religion, but converted to Christianity by Western missionaries within the last century).
Instead, I find the model espoused in The Rise of Islam and the Bengal Frontier, 1204-1760 broadly plausible as a complement, or even substitute, to the above hypotheses. Additionally, it has the utility of making sense of the genetic data which I have presented here so far. The author argues that eastern Bengal, most of Bangladesh, was very lightly populated before the conquest of Bengal by Muslims in the 13th century. During the modern era the western region of Bengal, in India, has tended to have issues with the moribund nature of many of the water courses. But one thousand years ago this region was more active in terms of sedimentation, while eastern Bengal was a wilderness. Over the centuries there has been a shift of large rivers to the east, opening up that area to cultivation because of improved transport. Additionally, the arrival of Muslims also resulted in the spread of new techniques of land clearing and settlement. The rough model is that eastern Bengal is in fact a relatively newly settled territory in terms of its current demographic density. As the clearance and settlement operations were performed by Muslim elites, many of the peasants who settled these lands were either Muslim, or more likely, adopted the religion of their landlords. Because of the virgin nature of the territory these original settlers entered into a phase of massive demographic expansion, to the point where eastern Bengal (Bangladesh) is now today twice as populous as western Bengal (West Bengal). The key here is that there need not be a massive conversion of the enormous masses of marginal animist, Hindu or Buddhist peasants. Rather, all one needs is a modest number of converted Bengali peasants to enter into exponential population growth until the land is “filled.” (interestingly, one sees similar patterns between descendant populations in both the USA and among Koreans. The religions in the “core” homelands are very different in constitution from the Diaspora)
I find this persuasive for two major reasons. First, Peter Bellwood’s First Farmers documents the difficulties of populations which have not been engaged in intensive farming to switch to that modality. At least back to the Mughal period Bengal was a densely settled land from which one could extract massive rents simply due to aggregate productivity. Today a united Bengal would have a population of 240 million, making it the fourth most populous nation in the world, below the USA, and just above Indonesia. In hindsight I find it less likely that the peasants of eastern Bengal descend from tribal peoples who had been practicing extensive agriculture, but were introduced to new techniques, than that western populations already habituated to the grinding expectations of intensive farming colonized the “empty” lands (in fact, Bengali peasants migrate to Assam in part because of the perception of land surplus there, even though Assam has 30 million inhabitants). But this initial phase of colonization would entail relatively few peasants, and probably exhibit some male bias. Therefore, this can to explain a substantial fraction of the eastern ancestry among Bangladeshis, as in the first generations the Bengali peasants did assimilate the native tribal peoples of the region, whether it to be the Munda Santhals or Tibeto-Burman relatives of the Tripura. With the massive numbers of ethnic Bengalis in comparison to Tibeto-Burman groups it seems one would need a great deal of gene flow in any model which posited that exchange between these two groups over long periods of time explain the high fractions that one finds of non-South Asian ancestry. In all of India there are only 10 million speakers of Tibeto-Burman languages, vs. the 240 million speakers of Bengali alone in the Indian subcontinent.
Where does this leave us? From what I gather you’ll probably not make it into the first round of results for HAP, but if you have 23andMe results and haven’t it sent it to Zack, and want to learn more about the historical genetics of the Indian subcontinent, you can still get involved! With my parents Zack now has an N = 2 of Bengalis. It would be nice to get more. We still need samples from North-Central India. The number of Punjabis is in the 5-10 range, Tamils is around 5. Enough to make inferences, but certainly not robust enough to bet the house on. In the near future I’ll get results from my other siblings, and I’ve decided to “upgrade” to the V3 chip. Once that comes in I’ll phase some of the results, and probably start comparing myself to my siblings, “phase” the results, etc.
* Native Americans, descendants of pre-Columbian Americans, have the inverted results from South Asians, mostly Asian with a European minority. This is not just due to recent European admixture. Rather, though Amerindians have affinities to East Asians, the two groups have been distinct for at least 10,000 years, and probably considerably longer.
** Also, some have stated that the people of Noakhali are sly and cunning, adept at following the letter of the law, but not the spirit. I only know this because when I was young one of my father’s friends, also from Bangladesh, complained that a mutual acquaintance from Noakhali who made much of his piety (he put his wife in purdah when she arrived from Bangladesh) requested that someone else purchase a pornographic magazine for him. His reasoning was that he did not want to be seen purchasing the magazine. It was a sin to purchase such an item for a good Muslim. Later my father and his friend (who was from northern Bangladesh for what it’s worth) commiserated that such was the way of the people of Noakhali, amongst whom you have to have your wits about you lest they exploit some angle for their own self-interest. The pious-porn-non-purchaser was notorious for being a non or late payer of rent when he was a lodger with other Bangladeshis, always emphasizing his religious piety as surety of final payment of the debt. He also eventually finagled a loophole in the immigration law of the time, obtaining green card with relative ease and no necessity of sponsorship. The proper connotation of how people from Noakhali are is probably captured by the American English word slick.