A few year ago there was a minor controversy when some evolutionary genomicists reported that they had reconstructed the genome of the extinct Taino people of Puerto Rico by reassembling fragments preserved in contemporary populations long since admixed. The controversy had to do with the fact that some individuals today claim to be Taino, and therefore, they were not an extinct population. Though that controversy eventually blew over, the methods lived on, and continue to be used. Now some of the same people who brought you that have come out with work which reconstructs the recent demographic history of the Caribbean, both maritime and mainland, using genomics. Even better, it’s totally open access because it’s up on arXiv, Reconstructing the Population Genetic History of the Caribbean (please see the comments at Haldane’s Sieve as well, kicked off by little old me). Though the authors pooled a variety of data sets (e.g., HapMap, POPRES, HGDP) the focus is on the populations highlighted in the map above.
Every now and then Richard Dawkins stirs controversy by bringing up the topic of eugenics. This is not surprising in terms of Dawkins’ intellectual pedigree. The most influential British evolutionary biologist in the generation before Dawkins, R. A. Fisher, was a eugenicist. Arguably the most the most eminent evolutionist of Dawkins’ own generation, W. D. Hamilton, clearly had eugenical sympathies, though he was keenly aware how unfashionable that had become.* University College London’s Galton Laboratory still had the word eugenics in its title until 1965. More recently Dawkins has brought up the issue of consanguinity amongst the British Pakistani community. A practice which one might argue is non-eugenical due to the high rate of recessive diseases.
This is a follow up to my post from yesterday. In case you care about the technical details (after I clean this stuff up I will put it on GitHub) I’m using R’s adehabitat package to create a 95% distribution curve after smoothing with kernel density. The goal is to give you a better intuition about where the populations are dispersed across two dimensional visualizations of genetic variation.
Thinking about how to plot text, I came up with a quick hack, which just used the initial data and found the median x and y position. That explains why some of the labels are shifted so, in populations with a huge range the label position is going to be sensitive to not being smoothed (if you know how to pull out the centroid out of the kver, tell!). I’ve given them colors and also used black. The latter actually seems to be clearer!
Note: This is not just for fun, as I plan to start rolling out results and methods from some of the data sets I have more regularly in the near future.
A reader points me to a talk given by David Reich at the Center for Human Genetic Research 2013 Retreat. One of the issues Reich brought up is old, but perhaps worth reemphasizing: due to endogamy many South Asians carry a higher load of recessive ailments. This is not due to recent inbreeding (which is barred by custom in many South Asian groups, which enforce kin-level exogamy), but long term genetic isolation. Over time even a moderate sized population can be affected by drift. This was one of the major points in the 2009 paper Reconstructing Indian History, but not one particularly emphasized in the press follow up. A major implication is that a relatively simple public health measure for South Asians would be to marry outside of their jati. The social or genetic distance need not be great. But one generation of outbreeding should “mask” many of the deleterious alleles. If this model is correct one should be able to track decreases in morbidity within the American South Asian population, where there are many inter-caste and inter-regional marriages (yes, this is between people of putative high status, but this doesn’t matter).
The above is a graph which illustrates phylogenetic relationships using the TreeMix package. It is from the paper I alluded to yesterday. The paper, DNA analysis of an early modern human from Tianyuan Cave, China, is open access, so everyone should be able to read it. Its mtDNA analysis shows that the Tianyuan sample, from the region of Beijing and dating to ~40,000 years B.P., is a basal clade in haplogroup B, which is common in eastern Eurasia and the New World. This is a satisfying result insofar as the understanding in relation to this haplogroup is that it diversified ~50,000 years B.P. There is very strong support in these data for the proposition that Tianyuan forms a distinct clade with the populations you see above, as opposed to western Eurasians. This is important because this sample seems to date with relatively good precision to 40,000 years B.P., supporting the archaeological contention that modern humans were already diversifying into western and eastern lineages 40-50,000 years ago. In contrast statistical genomic inferences tend toward a lower date for divergence. We can be moderately confident at this point that some aspect of the west-east divergence predates subsequent later gene flow events, which might lead to confusing archaeology-blind methods.
While reading The Founders of Evolutionary Genetics I encountered a chapter where the late James F. Crow admitted that he had a new insight every time he reread R. A. Fisher’s The Genetical Theory of Natural Selection. This prompted me to put down The Founders of Evolutionary Genetics after finishing Crow’s chapter and pick up my copy of The Genetical Theory of Natural Selection. I’ve read it before, but this is as good a time as any to give it another crack.
Almost immediately Fisher aims at one of the major conundrums of 19th century theory of Darwinian evolution: how was variation maintained? The logic and conclusions strike you like a hammer. Charles Darwin and most of his contemporaries held to a blending model of inheritance, where offspring reflect a synthesis of their parental values. As it happens this aligns well with human intuition. Across their traits offspring are a synthesis of their parents. But blending presents a major problem for Darwin’s theory of adaptation via natural selection, because it erodes the variation which is the raw material upon which selection must act. It is a famously peculiar fact that the abstraction of the gene was formulated over 50 years before the concrete physical embodiment of the gene, DNA, was ascertained with any confidence. In the first chapter of The Genetical Theory R. A. Fisher suggests that the logical reality of persistent copious heritable variation all around us should have forced scholars to the inference that inheritance proceeded via particulate and discrete means, as these processes do not diminish variation indefinitely in the manner which is entailed by blending.
The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).
To understand nature in all its complexity we have to cut down the riotous variety down to size. For ease of comprehension we formalize with math, verbalize with analogies, and visualize with representations. These approximations of reality are not reality, but when we look through the glass darkly they give us filaments of essential insight. Dalton’s model of the atom is false in important details (e.g., fundamental particles turn out to be divisible into quarks), but it still has conceptual utility.
Likewise, the phylogenetic trees popularized by L. L. Cavalli-Sforza in The History and Geography of Human Genes are still useful in understanding the shape of the human demographic past. But it seems that the bifurcating model of the tree must now be strongly tinted by the shades of reticulation. In a stylized sense inter-specific phylogenies, which assume the approximate truth of the biological species concept (i.e., little gene flow across lineages), mislead us when we think of the phylogeny of species on the microevolutionary scale of population genetics. On an intra-specific scale gene flow is not just a nuisance parameter in the model, it is an essential phenomenon which must be accommodated into the framework.
There’s an interesting piece in Slate, The Great Schism in the Environmental Movement, which seems to be a distillation of trends which have been bubbling within the modern environmentalist movement for a generation now (I’ve read earlier manifestos in a similar vein). I can’t assess the magnitude of the shift, but here’s the top-line:
But that is a false construct that scientists and scholars have been demolishing the past few decades. Besides, there’s a growing scientific consensus that the contemporary human footprint—our cities, suburban sprawl, dams, agriculture, greenhouse gases, etc.—has so massively transformed the planet as to usher in a new geological epoch. It’s called the Anthropocene.
Modernist greens don’t dispute the ecological tumult associated with the Anthropocene. But this is the world as it is, they say, so we might as well reconcile the needs of people with the needs of nature. To this end, Kareiva advises conservationists to craft “a new vision of a planet in which nature—forests, wetlands, diverse species, and other ancient ecosystems—exists amid a wide variety of modern, human landscapes.”
The New Republic has a piece up, How Older Parenthood Will Upend American Society, which won’t have surprising data for readers of this weblog. But it’s nice to see this sort of thing go “mainstream.” My daughter was born when her parents were in their mid-30s, so I know all the statistics. They aren’t good bed-time reading (she’s healthy and robust so far!). If I had to do it over again I definitely wouldn’t have waited this long. After becoming a father it brought home to me that waiting was one of the worst decisions of my life. Why postpone something this incredible for the more far more prosaic pleasures of an extended adolescence? Granted, I’m not sure that I would have been the best father at 25, but I don’t think there’s much I can say in reply to the argument that I should have become a father by 30.
More concretely, we would have had sperm and egg “banked” if we had been smart delaying parenthood. The article notes that storage of sperm costs $850 up front, and $300 to $500 per year after that, and that many balk at the cost. And how much do you spend on your cell phone every year? The issue here seems to be time preference.
Most people are aware that altitude imposes constraints on individual performance and function. Much of this is flexible; athletes who train at high altitudes may gain a performance edge. But over the long term there are costs, just as there are with computers which are ‘overclocked.’ This is the point where you make the transition from physiology to evolution. Residence at high altitude entails strong selective pressures on populations. Over the past few years there has been a great deal of exploration of the genetics of long resident high altitude groups, the Tibetans, Peruvians, and Ethiopians.
In many cases there are questions of a historical and ethnographic nature which are subject to controversy and debate. Scholarly arguments are laid out, and further dispute ensues. For decades progress seems fleeting, as one hypothesis is accepted, only to be subject to later revision. This sort of pattern gives succor to the most cynical and jaded of ‘Post Modern’ set, especially when the ‘discourse’ in question is in the domain of science.
But thankfully these debates can come to an end in some cases. So it is with the origins of the European Romani, better known as ‘Gypsies’ (though the Roma are the most well known of the Romani, other groups within Europe have different ethnonyms). Obviously many of the basic elements have long been there, but I think the most recent genetic work now establishes a level of closure. Taking a step back, what do we know?
1) The Romani language seems to be Indo-Aryan, with a likely affinity with the northwest group of Indo-Aryan languages
2) The Romani presence in Europe only dates to the past ~1,000 years, with an entry point in the Byzantine Empire
3) They are an admixture between an ancestral Indian element, and local populations
4) Their history of endogamy has resulted in a strong genetic drift effect
The two papers which seem to nail the coffin shut on these questions use somewhat different methodologies. One relies on Y chromosomal STRs (hypervariable repeat regions) to generate a paternal phylogeny. Focusing just on the paternal phylogeny allows for one to make very robust genealogical inferences. Additionally, the authors had a very large data set across India. Their goal was to ascertain the exact region of origin of the Romani before they left India. As noted in bullet #1 there is already some evidence from their language that this must be in northwest India. The second paper uses a SNP-chip; hundreds of thousands of autosomal markers. This has been done to death for other populations, so the method isn’t new. Rather, it is that it is now being applied to the Romani.
First, the Y chromosomal paper. The Phylogeography of Y-Chromosome Haplogroup H1a1a-M82 Reveals the Likely Indian Origin of the European Romani Populations:
Linguistic and genetic studies on Roma populations inhabited in Europe have unequivocally traced these populations to the Indian subcontinent. However, the exact parental population group and time of the out-of-India dispersal have remained disputed. In the absence of archaeological records and with only scanty historical documentation of the Roma, comparative linguistic studies were the first to identify their Indian origin. Recently, molecular studies on the basis of disease-causing mutations and haploid DNA markers (i.e. mtDNA and Y-chromosome) supported the linguistic view. The presence of Indian-specific Y-chromosome haplogroup H1a1a-M82 and mtDNA haplogroups M5a1, M18 and M35b among Roma has corroborated that their South Asian origins and later admixture with Near Eastern and European populations. However, previous studies have left unanswered questions about the exact parental population groups in South Asia. Here we present a detailed phylogeographical study of Y-chromosomal haplogroup H1a1a-M82 in a data set of more than 10,000 global samples to discern a more precise ancestral source of European Romani populations. The phylogeographical patterns and diversity estimates indicate an early origin of this haplogroup in the Indian subcontinent and its further expansion to other regions. Tellingly, the short tandem repeat (STR) based network of H1a1a-M82 lineages displayed the closest connection of Romani haplotypes with the traditional scheduled caste and scheduled tribe population groups of northwestern India.
Two trees illustrate the results succinctly:
The bottom line:
- This particular Y chromosomal lineage which is highly diagnostic of South Asian origin in the Romani shows that the Romani seem to derive from the populations of northwest India
- Additionally, within these populations the Romani Y chromosomal lineages derive from the lower caste elements, the scheduled castes and scheduled tribes
But the above results don’t get directly at genome-wide admixture. The second paper does, using hundreds of thousands of markers to explore the Romani affinity to other populations. Reconstructing the Population History of European Romani from Genome-wide Data:
The Romani, the largest European minority group with approximately 11 million people…constitute a mosaic of languages, religions, and lifestyles while sharing a distinct social heritage. Linguistic…and genetic…studies have located the Romani origins in the Indian subcontinent. However, a genome-wide perspective on Romani origins and population substructure, as well as a detailed reconstruction of their demographic history, has yet to be provided. Our analyses based on genome-wide data from 13 Romani groups collected across Europe suggest that the Romani diaspora constitutes a single initial founder population that originated in north/northwestern India ∼1.5 thousand years ago (kya). Our results further indicate that after a rapid migration with moderate gene flow from the Near or Middle East, the European spread of the Romani people was via the Balkans starting ∼0.9 kya. The strong population substructure and high levels of homozygosity we found in the European Romani are in line with genetic isolation as well as differential gene flow in time and space with non-Romani Europeans. Overall, our genome-wide study sheds new light on the origins and demographic history of European Romani.
The plot to the left illustrates the relationship of the Romani to world-wide populations using multi-dimensional scaling, where genetic variation is decomposed into dimensions, and individuals are plotted on those dimensions. In short, the Romani exhibit a classic admixture cline pattern.That is, they are the products of a two-way admixture between populations which occupy distinct positions along a cline, and Romani individuals and populations are distributed along the cline in proportion to their admixture. One notable aspect is that the Romani are actually two clusters; one which manifests a strong ‘east’-'west’ distribution, and another which seems located purely within the European cluster. The latter seems to be the Welsh Romani, who in the neighbor-joining tree (see the supplements) fall on the same branch as European populations, as opposed to the other Romani, who form their own clade.
To drill down further you need to ascertain admixture with a model-based clustering algorithm. Ergo, ADMIXTURE. I’ve reedited the figure to illustrate the salient points. In particular, it is clear that the Roma populations except the Welsh have significant South Asian ancestry. The question is how much? To answer this question you need to know the source population in South Asia. A peculiar aspect of this plot is that the Romani have very little of the green ancestral component, which happens to be modal in the Middle East (not shown). This element happens to be highly enriched in many Pakistani populations, but not necessarily northwest Indian ones. Nevertheless, the issue that leaves me suspicious of this particular finding is that many of the European populations, in particular those groups (e.g., Balkans) which may have admixed with the Romani, have this element to extent not evident in one of their presumed ‘daughter’ populations. I wonder if perhaps the peculiarities of Romani inbreeding has skewed the allele frequency distribution so much that you get strangeness like this. I am not showing higher K’s because those break out with a Romani-cluster. Just like the Kalash-cluster this is to a great extent a feature of the long term endogamy of these communities. With high levels of drift the allele frequency of these groups moves into a very peculiar space in relation to their parental populations, but one must not become confused and assume that the Romani or Kalash are themselves appropriate independent clusters in the same way that Europeans or East Asians are.
Using various forms of admixture analysis the authors seem to conclude that the Balkan Romani are 30-50% South Asian. This seems in line with intuition. But that still leaves open the question of who those South Asians were. As I noted above the most thorough Y chromosomal data point to the lower caste elements of northwest India. What do the autosomes say?
I don’t want get into the technical details of how they tested the models, but it seems that one of the likely parental populations to the Romani had a close relationship to the Meghwal, a scheduled caste from northwest India. In other words, the autosome results align very well with the Y chromosomal inferences. Additionally, the models tested imply that the Romani likely left South Asian ~1,000 years before the present, which aligns well with what is known from the historical record (though this is a case where I put much more stock in the historical record than inferences from population genetic models; look at the intervals).
Finally, there is the question of inbreeding. One aspect of the Romani genome is jumps out you is that they have many long “runs-of-homozygosity” (ROH). This is totally expected, as decades of uniparental analyses suggested a great deal of population bottleneck events as the Romani spread throughout Europe. But the ROH patterns also unearth an interesting fact: some of the Balkan Romani clearly have recent European admixture, while the non-Balkan Romani had an initial period of admixture followed by endogamy. The latter scenario seems to resemble Askhenazi Jews, while the former would suggest that the boundary between Romani and non-Romani in the Balkans is more fluid than is sometimes portrayed.
So there we have it. The Romani derive from lower castes populations from the northwest Indian subcontinent who seem to have left ~1,000 years ago. Over time they admixed with local populations, and are now 50-70% non-South Asian, with some groups being ~90% European (e.g., Welsh Romani). And, they have a long history as an endogamous group, judging by their inbreeding.
Because the PGP is self-recruiting, we don’t have a very balanced set of participants. “Self-recruitment” means that all participants have enrolled in our project through word of mouth, finding our website and enrolling online. To put it bluntly, that means we mostly end up with young white men….
…Research within one or two racial/ethnic categories isn’t necessarily a virtue, biracial and multiracial heritage may be even more interesting to some researchers and can open more areas for future….
In particular, NIST is looking for “trios”: two parents and a child. Researchers like to use samples from trios because they know every piece of DNA in the child comes from one of the parents. This makes it easier to assess error rates — and that sort of quality control is what NIST expects the genome material to be used for. We think all such family groups are valuable, but current trios in the PGP haven’t been the most diverse….
Reader Paul brought this to my attention. I haven’t been too interested in the PGP for myself because it’s just so slow to “play” with whole genomes (~3 GB) as opposed to 1 million SNPs. But over Christmas I’ll look into signing up, and see if they are interested in my own “trio.” I also thought I’d pass this along to readers, though my readership actually looks almost exactly like current PGP participants, so I don’t know if I’ll be contributing to the problem.
As a follow up to my post from yesterday, I decided to run TreeMix on a data set I happened to have had on hand (see Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data for more on TreeMix). Basically I wanted to display a tree with, and without, gene flow.
The technical details are straightforward. I LD pruned ~550,000 SNPs down to ~150,000. I ran TreeMix without and with migration parameters with the Bantu Kenya population being the root. Finally, when I did turn on the migration parameter I set it for 5. You can see the results below.
Most of the flows are pretty expected. The West Eurasian flow from the Turks to the Uygurs makes sense, because there is a large West Asian component to what the Uygurs have (from East Iranians?). The Chuvash are a Turkic group with minor, but significant, Turkic component. The HGDP Russian sample does have some East Eurasian ancestry. And the Moroccans also have African ancestry. But your guess is as good as mine with the Bantu flow in. These are I think Kenya, so it might be trying to interpret Nilotic admixture as generalized Eurasian.
A minor note: installing TreeMix and generating the appropriate files from pedigree format is not to difficult. But you might have confusion in how to generate the pedigree input file. You do it like so in PLINK:
./plink --noweb --bfile YourFile --freq --within YourGroupNamesFile --out YourOutPutFile
It’s the last you want to put into TreeMix’s python conversion script. The YourGroupNamesFile is basically the .fam file with an extra column, the population names for each individual.
I mentioned this in passing on my post on ASHG 2012, but it seems useful to make explicit. For the past few years there has been word of research pointing to connections between the Khoisan and the Cushitic people of Ethiopia. To a great extent in the paper which is forthcoming there is the likely answer to the question of who lived in East Africa before the Bantu, and before the most recent back-migration of West Eurasians. On one level I’m confused as to why this has to be something of a mystery, because the most recent genetic evidence suggests a admixture on the order of 2-3,000 years before the past.* If the admixture was so recent we should find many of the “first people,” no? As it is, we don’t. I think these groups, and perhaps the Sandawe, are the closest we’ll get.
Publication is imminent at this point (of this, I was assured), so I’m going to just state the likely candidate population (or at least one of them): the Sanye, who speak a Cushitic language with possible Khoisan influences. There really isn’t that much information on these people, which is why when I first heard about the preliminary results a few years back and looked around for Khoisan-like populations in Kenya I wasn’t sure I’d hit upon the right group. But at ASHG I saw some STRUCTURE plots with the correct populations, and the Sanye were one of them. I would have liked to see something like TreeMix, but the STRUCTURE results were of a quality that I could accept that these populations were not being well modeled by the variation which dominated their data set. Though Cushitic in language the Sanye had far less of the West Eurasian element present among other Cushitic speaking populations of the Horn of Africa. Neither were their African ancestral components quite like that of the Nilotic or Bantu populations. The clustering algorithm was having a “hard time” making sense of them (it seemed to wanted to model them as linear combinations of more familiar groups, but was doing a bad job of it).
Here is an interesting article on these groups: Little known tribe that census forgot. Like the Sandawe this is a population which seems to have been hunter-gatherers very recently, and to some extent still engage in this lifestyle. In this way I think they are fundamentally different from Indian tribal populations, who are often held up to be the “first people” of the subcontinent. More and more it seems that the tribes of India are less the descendants of the original inhabitants of the subcontinent, at least when compared to the typical Indian peasant, and more simply those segments of the Indian population which were marginalized and pushed into less productive territory. Over time they naturally diverged culturally because of their isolation, but the difference was not primal. In contrast, groups like the Sanye and Sandawe may have mixed to a great extent with their neighbors (and lost their language like the Pygmies), but evidence of full featured hunting & gathering lifestyles implies a sort of direct cultural continuity with the landscape of eastern Africa before the arrival of farmers and pastoralists from the west and north.
* I understand some readers refuse to accept the likelihood of these results because of other lines of information. I am just relaying the results of the geneticists. I am not interested in re-litigating prior discussions on this. We’ll probably have a resolution soon enough.
A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:
Native Americans and Northern Europeans More Closely Related Than Previously Thought
Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America
Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS
The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.
What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.
So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.
A new short communication in Scientific Reports suggests that most demographic expansion as ascertained using mtDNA occurred before the Neolithic. MtDNA analysis of global populations support that major population expansions began before Neolithic Time:
Agriculture resulted in extensive population growths and human activities. However, whether major human expansions started after Neolithic Time still remained controversial. With the benefit of 1000 Genome Project, we were able to analyze a total of 910 samples from 11 populations in Africa, Europe and Americas. From these random samples, we identified the expansion lineages and reconstructed the historical demographic variations. In all the three continents, we found that most major lineage expansions (11 out of 15 star lineages in Africa, all autochthonous lineages in Europe and America) coalesced before the first appearance of agriculture. Furthermore, major population expansions were estimated after Last Glacial Maximum but before Neolithic Time, also corresponding to the result of major lineage expansions. Considering results in current and previous study, global mtDNA evidence showed that rising temperature after Last Glacial Maximum offered amiable environments and might be the most important factor for prehistorical human expansions.
When it comes to the human genetics of the Khoe-San there’s a little that’s stale and unoriginal for me in terms of presentation. The elements are always composed the same. The Bushmen are the “most ancient” humans, who can tell us something about “our past,” about “our evolution.” Tried & tested banalities just bubble forth unbidden. I have no idea why. There’s a new paper in Science on the genetics of the Khoe-San, which includes Bushmen, which brought to mind this issue for me because of the outrageous nature of the press releases.
The title of the paper itself is a testament to vanilla, Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History. This is absolutely not surprising. Are you shocked that the Khoe-San have adaptations? Or that African history is complex? The wonder of it all! This paper actually revisits much of the same ground as Pickrell et al.’s originally titled The genetic prehistory of southern Africa. Before Dr. Pickrell executes throw-down on me on Twitter let me concede that I have no creative ideas to offer in terms of an alternative title. Rather, I have an idea: perhaps in the future scientists could explore the evolutionary genetic basis for steatopygia? The trait is not limited just to Khoe-San, my distant cousins the Andaman Islanders also exhibit it. Perhaps this is the ancestral state of the human lineage? This is a situation where the titles just write themselves!
The Pith: You’re Asian. Yes, you!
A conclusion to an important paper, Nick Patterson, Priya Moorjani, Yontao Luo, Swapan Mallick, Nadin Rohland, Yiping Zhan, Teri Genschoreck, Teresa Webster, and David Reich:
In particular, we have presented evidence suggesting that the genetic history of Europe from around 5000 B.C. includes:
1. The arrival of Neolithic farmers probably from the Middle East.
2. Nearly complete replacement of the indigenous Mesolithic southern European populations by Neolithic migrants, and admixture between the Neolithic farmers and the indigenous Europeans in the north.
3. Substantial population movement into Spain occurring around the same time as the archaeologically attested Bell-Beaker phenomenon (HARRISON, 1980).
4. Subsequent mating between peoples of neighboring regions, resulting in isolation-by-distance (LAO et al., 2008; NOVEMBRE et al., 2008). This tended to smooth out population structure that existed 4,000 years ago.
Further, the populations of Sardinia and the Basque country today have been substantially less influenced by these events.
It’s in Genetics, Ancient Admixture in Human History. Reading through it I can see why it wasn’t published in Nature or Science: methods are of the essence. The authors review five population genetic statistics of phylogenetic and evolutionary genetic import, before moving onto the novel results. These statistics, which measure the possibility of admixture, the extent of admixture, and the date of admixture, are often presented, but nested into supplements, in previous papers by the same group. On the one hand this removes from view the engines which are driving the science. On the other hand I have always appreciated that a benefit of this injustice to the methods which make insight possible is that those without academic access can actually bite into the meat of the researcher’s mode of thought.
I did read through the methods. Twice. I’ve encountered all the statistics before, and I’ve read how they were generated, but I’ll be honest and admit that I haven’t internalized them. That has to end now, because the authors have finally released a software package which implements the statistics, ADMIXTOOLS. I plan to use it in the near future, and it is generally best if you understand the underlying mechanisms of a software package if you are at the bleeding end of analytics. I will review the technical points in more detail in future posts, more for my own edification than yours. But for the moment I’ll be a bit more cursory. Four of the tests use comparisons of allele frequencies along explicit phylogenetic trees. That’s so general as to be uninformative as a description, but I think it’s accurate to the best of my knowledge. In the basics the tests are seeing if a model fits the data (as opposed to TreeMix, which finds the best model out of a range to fit the data). The last method, rolloff, infers the timing of an admixture event based upon the decay of linkage disequilibrium. In short, admixture between two very distinct populations has the concrete result of producing striking genomic correlations. Over time these correlations dissipate due to recombination. The magnitude of dissipation can allow one to gauge the time in the past when the original admixture occurred.
The map to the right shows the frequencies of HGDP populations on SLC45A2, which is a locus that has been implicated in skin color variation in humans. It’s for the SNP rs16891982, and I yanked the figure from IrisPlex: A sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information. Brown represents the genotype CC, green CG, and blue, GG. Europeans who have olive skin often carry the minor allele, C. While SLC24A5 is really bad at distinguishing West Eurasians from each other, SLC45A2 is better. Though both are fixed in Northern Europe, the former stays operationally fixed in frequency outside of Europe, in the Near East. As I stated earlier the proportions of the ancestral SNP in the Middle Eastern populations in the HGDP seem to be easily explained by the Sub-Saharan admixture you can find in these groups.
In contrast major SNPs in SLC45A2 are closer to disjoint between Europeans and South Asians. For example I’m a homozygote for the C allele. And yet even here we need to be careful. I want in particular to draw your attention to the frequencies in the Middle Eastern populations, the Sardinians, and the Kalash of Pakistan.
The Kalash, and their Nuristani cousins, have often been observed to have “European” physical features. These populations even trade in legends of descent from the Macedonians of Alexander. And the genetics here shows why. Though the Kalash far are more closely related to other Northwest South Asians than to Europeans, on the subset of genes which are implicated in pigmentation many of them could actually “pass” for Europeans. In fact, it is interesting to me that by these measures the Sardinians are no more European than groups like the Kalash and the Druze (in contrast to the total genome, where Sardinians may be the best reference for Western Europeans). They have a lower frequency of the SNP strongly associated with blue eyes than either of these groups, for example.
In the above paper they also produced a chart which illustrated the relationships of HGDP populations as a measure only of the six SNPs they used in their prediction method. These are markers which distinguish blue and brown eye color in Europeans efficiently.