“There were giants in the earth in those days…when the sons of God came in unto the daughters of men, and they bare children to them, the same became mighty men which were of old, men of renown.” -Genesis 6:4
Seven years ago I wrote a short post, Why patriarchy?, which attempted to present a concise explanation for the ubiquity of what we might term patriarchy in complex societies (i.e., not “small-scale societies”). Broadly speaking my conjecture is that social and political dominance of small groups of males (proportionally) over the past several thousand years is an example of “evoked culture”. The higher population densities in agricultural societies produced a relative surfeit of accessible marginal surplus, which could be given over to supporting non-peasant classes who specialized in trade, religion, and war, all of which were connected. This new economic and cultural context served to trigger a reorganization the typical distribution of power relations of human societies because of the responses of the basic cognitive architecture of our species inherited from Paleolithic humans. Agon, or intra-specific competition, has always been part of the game on human socialization. The scaling up and channeling of this instinct in bands of males totally transformed human societies (another dynamic is elaboration of cooperative structures, though this often manifests as agonistic competition between coalitions of humans).
There has been a lot of attention to Erika Check Hayden’s piece Ethics: Taboo genetics, at least judging by people commenting on my Facebook feed. In some ways this is not an incredibly empirically grounded argument, because the biological basis of complex traits is going to be rather difficult to untangle on a gene-by-gene basis. In other words, this isn’t a clear and present “concern.” The heritability of many behavioral traits has long been known. This is not revolutionary, though for cultural reasons may well educated people are totally surprised when confronted with data that many traits, such as intelligence and personality, have robust heritabilities* (the proportion of trait variation explained by variation in genes across the population). The literature reviewed in The Nurture Assumption makes clear that a surprising proportion of contribution any parents make to their offspring is through their genetic composition, and not their modeled example. You wouldn’t know this if you read someone like Brian Palmer of Slate, who seems to be getting paid to reaffirm the biases of the current age among the smart set (pretty much every single one of his pieces that touch upon genetics is larded with phrases which could have been written by a software program designed to sooth the concerns of the cultural Zeitgeist). But the new genomics is confirming the broad outlines of the findings from behavior genetics. There’s nothing really to see there. The bigger issue of any interest is normative; the values we hold dear as a culture.
It is well known that Alexander the Great invaded the Indus river valley. Coincidentally in the mountains shadowing this region are isolated groups of tribal populations whose physical appearance is at at variance with South Asians. In particular, they are much lighter skinned, and often blonde or blue eyed. Naturally this led to 19th and early 20th century speculation that they were lost white races, perhaps descended from some of the Macedonian soldiers of Alexander. This was partly the basis of the Rudyard Kipling novel The Man Who Would Be King. Naturally over time some of these people themselves have forwarded this idea. In the case of a group such as the Kalash of Pakistan this conjecture is supported by the exotic nature of their religion, which seems to be Indo-European, and similar to Vedic Hinduism, with minimal influence from Islam.
Last year a paper came out in AJHG which reported that Ethiopian populations seem to be a compound of West Eurasians and Sub-Saharan Africans. This is result itself is not too surprising for a host of reasons. First, Ethiopians and other populations of the Horn of Africa are physically equidistant between West Eurasians and Sub-Saharan Africans. 20th century physical anthropologists sometimes placed them in the “Caucasoid” racial classification for this reason. Second, the languages of the Horn of Africa have Afro-Asiatic affinities. The Cushitic languages (e.g. Somali) have deep connections with more familiar tongues such as Arabic, but Semitic Ethiopian languages (e.g. Amharic) are much closer in historical distance. Third, there has been a fair amount of previous genetic analysis of these populations, and their synthetic character was obvious from those (e.g. mtDNA and Y results suggest a diverse array of haplogroups). What the AJHG paper reported was that the Eurasian ancestors of the Ethiopians admixed with the presumably Sub-Saharan indigenes ~3,000 years ago in a single pulse event, and, their closest modern relations in West Asia today are Levantines. To put a mild gloss on it the dating is controversial (using patterns of decayed genetic correlations of markers across the length of the genome). This is not just clinal variation.
Right before I was to sleep a reader sent me an email which pointed to a Nick Wade piece in The New York Times, Gene Sleuths Find How Some Naturally Resist Cholera. It’s about new research in ScienceTranslational Medicine, Natural Selection in a Bangladeshi Population from the Cholera-Endemic Ganges River Delta. The authors use the “composite of multiple signals” (CMS) test to ascertain regions of the genome subject to natural selection (look for long haplotypes, high frequency derived alleles, and alleles with high cross population frequency differences). The results aren’t too surprising, I was born in Bangladesh, and I can attest to the fact that it’s a germaphobe’s nightmare. Rather, it is a secondary and very minor aspect of the paper which frankly draws my ire. First let’s quote Wade’s treatment:
As a necessary preliminary to testing for natural selection, the researchers looked at the racial composition of the Bengali population and found that they are an Indian population with a 9 percent admixture of East Asian genes, probably Chinese. The admixture occurred almost exactly 52 generations ago, according to statistical calculation, or around A.D. 500, assuming 29 years per generation. The Gupta empire in India was in decline at this time, but it is unclear whether the intermarriage with East Asians took place through trade or conquest. “We can now go back to the historians and see what happened then,” Dr. Karlsson said.
But sometimes science gets garbled in transmission. What do they say in the paper? Again, the relevant section:
Razib’s daughter’s ancestry composition
Genome-wide associations are rather simple in their methodological philosophy. You take cases (affected) and controls (unaffected) of the same genetic background (i.e. ethnically homogeneous) and look for alleles which diverge greatly between the two pooled populations. Visually the risk alleles, which exhibit higher odds ratios, are represented via Manhattan plots. But please note the clause: ethnically homogeneous study populations. In practice this means white Europeans, and to a lesser extent East Asians and African Americans (the last because of the biomedical industrial complex in the United States performs many GWAS, and the USA is a diverse nation). Looking within ethnic groups eliminates many false positives one might obtain due to population stratification. Basically, alleles which differ between groups because of their history may produce associations when the groups themselves differ in the propensity of the trait of interest (e.g. hypertension in blacks vs. whites).
A few year ago there was a minor controversy when some evolutionary genomicists reported that they had reconstructed the genome of the extinct Taino people of Puerto Rico by reassembling fragments preserved in contemporary populations long since admixed. The controversy had to do with the fact that some individuals today claim to be Taino, and therefore, they were not an extinct population. Though that controversy eventually blew over, the methods lived on, and continue to be used. Now some of the same people who brought you that have come out with work which reconstructs the recent demographic history of the Caribbean, both maritime and mainland, using genomics. Even better, it’s totally open access because it’s up on arXiv, Reconstructing the Population Genetic History of the Caribbean (please see the comments at Haldane’s Sieve as well, kicked off by little old me). Though the authors pooled a variety of data sets (e.g., HapMap, POPRES, HGDP) the focus is on the populations highlighted in the map above.
Every now and then Richard Dawkins stirs controversy by bringing up the topic of eugenics. This is not surprising in terms of Dawkins’ intellectual pedigree. The most influential British evolutionary biologist in the generation before Dawkins, R. A. Fisher, was a eugenicist. Arguably the most the most eminent evolutionist of Dawkins’ own generation, W. D. Hamilton, clearly had eugenical sympathies, though he was keenly aware how unfashionable that had become.* University College London’s Galton Laboratory still had the word eugenics in its title until 1965. More recently Dawkins has brought up the issue of consanguinity amongst the British Pakistani community. A practice which one might argue is non-eugenical due to the high rate of recessive diseases.
This is a follow up to my post from yesterday. In case you care about the technical details (after I clean this stuff up I will put it on GitHub) I’m using R’s adehabitat package to create a 95% distribution curve after smoothing with kernel density. The goal is to give you a better intuition about where the populations are dispersed across two dimensional visualizations of genetic variation.
Thinking about how to plot text, I came up with a quick hack, which just used the initial data and found the median x and y position. That explains why some of the labels are shifted so, in populations with a huge range the label position is going to be sensitive to not being smoothed (if you know how to pull out the centroid out of the kver, tell!). I’ve given them colors and also used black. The latter actually seems to be clearer!
Note: This is not just for fun, as I plan to start rolling out results and methods from some of the data sets I have more regularly in the near future.
A reader points me to a talk given by David Reich at the Center for Human Genetic Research 2013 Retreat. One of the issues Reich brought up is old, but perhaps worth reemphasizing: due to endogamy many South Asians carry a higher load of recessive ailments. This is not due to recent inbreeding (which is barred by custom in many South Asian groups, which enforce kin-level exogamy), but long term genetic isolation. Over time even a moderate sized population can be affected by drift. This was one of the major points in the 2009 paper Reconstructing Indian History, but not one particularly emphasized in the press follow up. A major implication is that a relatively simple public health measure for South Asians would be to marry outside of their jati. The social or genetic distance need not be great. But one generation of outbreeding should “mask” many of the deleterious alleles. If this model is correct one should be able to track decreases in morbidity within the American South Asian population, where there are many inter-caste and inter-regional marriages (yes, this is between people of putative high status, but this doesn’t matter).
The above is a graph which illustrates phylogenetic relationships using the TreeMix package. It is from the paper I alluded to yesterday. The paper, DNA analysis of an early modern human from Tianyuan Cave, China, is open access, so everyone should be able to read it. Its mtDNA analysis shows that the Tianyuan sample, from the region of Beijing and dating to ~40,000 years B.P., is a basal clade in haplogroup B, which is common in eastern Eurasia and the New World. This is a satisfying result insofar as the understanding in relation to this haplogroup is that it diversified ~50,000 years B.P. There is very strong support in these data for the proposition that Tianyuan forms a distinct clade with the populations you see above, as opposed to western Eurasians. This is important because this sample seems to date with relatively good precision to 40,000 years B.P., supporting the archaeological contention that modern humans were already diversifying into western and eastern lineages 40-50,000 years ago. In contrast statistical genomic inferences tend toward a lower date for divergence. We can be moderately confident at this point that some aspect of the west-east divergence predates subsequent later gene flow events, which might lead to confusing archaeology-blind methods.
While reading The Founders of Evolutionary Genetics I encountered a chapter where the late James F. Crow admitted that he had a new insight every time he reread R. A. Fisher’s The Genetical Theory of Natural Selection. This prompted me to put down The Founders of Evolutionary Genetics after finishing Crow’s chapter and pick up my copy of The Genetical Theory of Natural Selection. I’ve read it before, but this is as good a time as any to give it another crack.
Almost immediately Fisher aims at one of the major conundrums of 19th century theory of Darwinian evolution: how was variation maintained? The logic and conclusions strike you like a hammer. Charles Darwin and most of his contemporaries held to a blending model of inheritance, where offspring reflect a synthesis of their parental values. As it happens this aligns well with human intuition. Across their traits offspring are a synthesis of their parents. But blending presents a major problem for Darwin’s theory of adaptation via natural selection, because it erodes the variation which is the raw material upon which selection must act. It is a famously peculiar fact that the abstraction of the gene was formulated over 50 years before the concrete physical embodiment of the gene, DNA, was ascertained with any confidence. In the first chapter of The Genetical Theory R. A. Fisher suggests that the logical reality of persistent copious heritable variation all around us should have forced scholars to the inference that inheritance proceeded via particulate and discrete means, as these processes do not diminish variation indefinitely in the manner which is entailed by blending.
The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).
To understand nature in all its complexity we have to cut down the riotous variety down to size. For ease of comprehension we formalize with math, verbalize with analogies, and visualize with representations. These approximations of reality are not reality, but when we look through the glass darkly they give us filaments of essential insight. Dalton’s model of the atom is false in important details (e.g., fundamental particles turn out to be divisible into quarks), but it still has conceptual utility.
Likewise, the phylogenetic trees popularized by L. L. Cavalli-Sforza in The History and Geography of Human Genes are still useful in understanding the shape of the human demographic past. But it seems that the bifurcating model of the tree must now be strongly tinted by the shades of reticulation. In a stylized sense inter-specific phylogenies, which assume the approximate truth of the biological species concept (i.e., little gene flow across lineages), mislead us when we think of the phylogeny of species on the microevolutionary scale of population genetics. On an intra-specific scale gene flow is not just a nuisance parameter in the model, it is an essential phenomenon which must be accommodated into the framework.
There’s an interesting piece in Slate, The Great Schism in the Environmental Movement, which seems to be a distillation of trends which have been bubbling within the modern environmentalist movement for a generation now (I’ve read earlier manifestos in a similar vein). I can’t assess the magnitude of the shift, but here’s the top-line:
But that is a false construct that scientists and scholars have been demolishing the past few decades. Besides, there’s a growing scientific consensus that the contemporary human footprint—our cities, suburban sprawl, dams, agriculture, greenhouse gases, etc.—has so massively transformed the planet as to usher in a new geological epoch. It’s called the Anthropocene.
Modernist greens don’t dispute the ecological tumult associated with the Anthropocene. But this is the world as it is, they say, so we might as well reconcile the needs of people with the needs of nature. To this end, Kareiva advises conservationists to craft “a new vision of a planet in which nature—forests, wetlands, diverse species, and other ancient ecosystems—exists amid a wide variety of modern, human landscapes.”
The New Republic has a piece up, How Older Parenthood Will Upend American Society, which won’t have surprising data for readers of this weblog. But it’s nice to see this sort of thing go “mainstream.” My daughter was born when her parents were in their mid-30s, so I know all the statistics. They aren’t good bed-time reading (she’s healthy and robust so far!). If I had to do it over again I definitely wouldn’t have waited this long. After becoming a father it brought home to me that waiting was one of the worst decisions of my life. Why postpone something this incredible for the more far more prosaic pleasures of an extended adolescence? Granted, I’m not sure that I would have been the best father at 25, but I don’t think there’s much I can say in reply to the argument that I should have become a father by 30.
More concretely, we would have had sperm and egg “banked” if we had been smart delaying parenthood. The article notes that storage of sperm costs $850 up front, and $300 to $500 per year after that, and that many balk at the cost. And how much do you spend on your cell phone every year? The issue here seems to be time preference.
Most people are aware that altitude imposes constraints on individual performance and function. Much of this is flexible; athletes who train at high altitudes may gain a performance edge. But over the long term there are costs, just as there are with computers which are ‘overclocked.’ This is the point where you make the transition from physiology to evolution. Residence at high altitude entails strong selective pressures on populations. Over the past few years there has been a great deal of exploration of the genetics of long resident high altitude groups, the Tibetans, Peruvians, and Ethiopians.
In many cases there are questions of a historical and ethnographic nature which are subject to controversy and debate. Scholarly arguments are laid out, and further dispute ensues. For decades progress seems fleeting, as one hypothesis is accepted, only to be subject to later revision. This sort of pattern gives succor to the most cynical and jaded of ‘Post Modern’ set, especially when the ‘discourse’ in question is in the domain of science.
But thankfully these debates can come to an end in some cases. So it is with the origins of the European Romani, better known as ‘Gypsies’ (though the Roma are the most well known of the Romani, other groups within Europe have different ethnonyms). Obviously many of the basic elements have long been there, but I think the most recent genetic work now establishes a level of closure. Taking a step back, what do we know?
1) The Romani language seems to be Indo-Aryan, with a likely affinity with the northwest group of Indo-Aryan languages
2) The Romani presence in Europe only dates to the past ~1,000 years, with an entry point in the Byzantine Empire
3) They are an admixture between an ancestral Indian element, and local populations
4) Their history of endogamy has resulted in a strong genetic drift effect
The two papers which seem to nail the coffin shut on these questions use somewhat different methodologies. One relies on Y chromosomal STRs (hypervariable repeat regions) to generate a paternal phylogeny. Focusing just on the paternal phylogeny allows for one to make very robust genealogical inferences. Additionally, the authors had a very large data set across India. Their goal was to ascertain the exact region of origin of the Romani before they left India. As noted in bullet #1 there is already some evidence from their language that this must be in northwest India. The second paper uses a SNP-chip; hundreds of thousands of autosomal markers. This has been done to death for other populations, so the method isn’t new. Rather, it is that it is now being applied to the Romani.
First, the Y chromosomal paper. The Phylogeography of Y-Chromosome Haplogroup H1a1a-M82 Reveals the Likely Indian Origin of the European Romani Populations:
Linguistic and genetic studies on Roma populations inhabited in Europe have unequivocally traced these populations to the Indian subcontinent. However, the exact parental population group and time of the out-of-India dispersal have remained disputed. In the absence of archaeological records and with only scanty historical documentation of the Roma, comparative linguistic studies were the first to identify their Indian origin. Recently, molecular studies on the basis of disease-causing mutations and haploid DNA markers (i.e. mtDNA and Y-chromosome) supported the linguistic view. The presence of Indian-specific Y-chromosome haplogroup H1a1a-M82 and mtDNA haplogroups M5a1, M18 and M35b among Roma has corroborated that their South Asian origins and later admixture with Near Eastern and European populations. However, previous studies have left unanswered questions about the exact parental population groups in South Asia. Here we present a detailed phylogeographical study of Y-chromosomal haplogroup H1a1a-M82 in a data set of more than 10,000 global samples to discern a more precise ancestral source of European Romani populations. The phylogeographical patterns and diversity estimates indicate an early origin of this haplogroup in the Indian subcontinent and its further expansion to other regions. Tellingly, the short tandem repeat (STR) based network of H1a1a-M82 lineages displayed the closest connection of Romani haplotypes with the traditional scheduled caste and scheduled tribe population groups of northwestern India.
Two trees illustrate the results succinctly:
The bottom line:
– This particular Y chromosomal lineage which is highly diagnostic of South Asian origin in the Romani shows that the Romani seem to derive from the populations of northwest India
– Additionally, within these populations the Romani Y chromosomal lineages derive from the lower caste elements, the scheduled castes and scheduled tribes
But the above results don’t get directly at genome-wide admixture. The second paper does, using hundreds of thousands of markers to explore the Romani affinity to other populations. Reconstructing the Population History of European Romani from Genome-wide Data:
The Romani, the largest European minority group with approximately 11 million people…constitute a mosaic of languages, religions, and lifestyles while sharing a distinct social heritage. Linguistic…and genetic…studies have located the Romani origins in the Indian subcontinent. However, a genome-wide perspective on Romani origins and population substructure, as well as a detailed reconstruction of their demographic history, has yet to be provided. Our analyses based on genome-wide data from 13 Romani groups collected across Europe suggest that the Romani diaspora constitutes a single initial founder population that originated in north/northwestern India ∼1.5 thousand years ago (kya). Our results further indicate that after a rapid migration with moderate gene flow from the Near or Middle East, the European spread of the Romani people was via the Balkans starting ∼0.9 kya. The strong population substructure and high levels of homozygosity we found in the European Romani are in line with genetic isolation as well as differential gene flow in time and space with non-Romani Europeans. Overall, our genome-wide study sheds new light on the origins and demographic history of European Romani.
The plot to the left illustrates the relationship of the Romani to world-wide populations using multi-dimensional scaling, where genetic variation is decomposed into dimensions, and individuals are plotted on those dimensions. In short, the Romani exhibit a classic admixture cline pattern.That is, they are the products of a two-way admixture between populations which occupy distinct positions along a cline, and Romani individuals and populations are distributed along the cline in proportion to their admixture. One notable aspect is that the Romani are actually two clusters; one which manifests a strong ‘east’-‘west’ distribution, and another which seems located purely within the European cluster. The latter seems to be the Welsh Romani, who in the neighbor-joining tree (see the supplements) fall on the same branch as European populations, as opposed to the other Romani, who form their own clade.
To drill down further you need to ascertain admixture with a model-based clustering algorithm. Ergo, ADMIXTURE. I’ve reedited the figure to illustrate the salient points. In particular, it is clear that the Roma populations except the Welsh have significant South Asian ancestry. The question is how much? To answer this question you need to know the source population in South Asia. A peculiar aspect of this plot is that the Romani have very little of the green ancestral component, which happens to be modal in the Middle East (not shown). This element happens to be highly enriched in many Pakistani populations, but not necessarily northwest Indian ones. Nevertheless, the issue that leaves me suspicious of this particular finding is that many of the European populations, in particular those groups (e.g., Balkans) which may have admixed with the Romani, have this element to extent not evident in one of their presumed ‘daughter’ populations. I wonder if perhaps the peculiarities of Romani inbreeding has skewed the allele frequency distribution so much that you get strangeness like this. I am not showing higher K’s because those break out with a Romani-cluster. Just like the Kalash-cluster this is to a great extent a feature of the long term endogamy of these communities. With high levels of drift the allele frequency of these groups moves into a very peculiar space in relation to their parental populations, but one must not become confused and assume that the Romani or Kalash are themselves appropriate independent clusters in the same way that Europeans or East Asians are.
Using various forms of admixture analysis the authors seem to conclude that the Balkan Romani are 30-50% South Asian. This seems in line with intuition. But that still leaves open the question of who those South Asians were. As I noted above the most thorough Y chromosomal data point to the lower caste elements of northwest India. What do the autosomes say?
I don’t want get into the technical details of how they tested the models, but it seems that one of the likely parental populations to the Romani had a close relationship to the Meghwal, a scheduled caste from northwest India. In other words, the autosome results align very well with the Y chromosomal inferences. Additionally, the models tested imply that the Romani likely left South Asian ~1,000 years before the present, which aligns well with what is known from the historical record (though this is a case where I put much more stock in the historical record than inferences from population genetic models; look at the intervals).
Finally, there is the question of inbreeding. One aspect of the Romani genome is jumps out you is that they have many long “runs-of-homozygosity” (ROH). This is totally expected, as decades of uniparental analyses suggested a great deal of population bottleneck events as the Romani spread throughout Europe. But the ROH patterns also unearth an interesting fact: some of the Balkan Romani clearly have recent European admixture, while the non-Balkan Romani had an initial period of admixture followed by endogamy. The latter scenario seems to resemble Askhenazi Jews, while the former would suggest that the boundary between Romani and non-Romani in the Balkans is more fluid than is sometimes portrayed.
So there we have it. The Romani derive from lower castes populations from the northwest Indian subcontinent who seem to have left ~1,000 years ago. Over time they admixed with local populations, and are now 50-70% non-South Asian, with some groups being ~90% European (e.g., Welsh Romani). And, they have a long history as an endogamous group, judging by their inbreeding.
Because the PGP is self-recruiting, we don’t have a very balanced set of participants. “Self-recruitment” means that all participants have enrolled in our project through word of mouth, finding our website and enrolling online. To put it bluntly, that means we mostly end up with young white men….
…Research within one or two racial/ethnic categories isn’t necessarily a virtue, biracial and multiracial heritage may be even more interesting to some researchers and can open more areas for future….
In particular, NIST is looking for “trios”: two parents and a child. Researchers like to use samples from trios because they know every piece of DNA in the child comes from one of the parents. This makes it easier to assess error rates — and that sort of quality control is what NIST expects the genome material to be used for. We think all such family groups are valuable, but current trios in the PGP haven’t been the most diverse….
Reader Paul brought this to my attention. I haven’t been too interested in the PGP for myself because it’s just so slow to “play” with whole genomes (~3 GB) as opposed to 1 million SNPs. But over Christmas I’ll look into signing up, and see if they are interested in my own “trio.” I also thought I’d pass this along to readers, though my readership actually looks almost exactly like current PGP participants, so I don’t know if I’ll be contributing to the problem.