The Pith: You’re Asian. Yes, you!
A conclusion to an important paper, Nick Patterson, Priya Moorjani, Yontao Luo, Swapan Mallick, Nadin Rohland, Yiping Zhan, Teri Genschoreck, Teresa Webster, and David Reich:
In particular, we have presented evidence suggesting that the genetic history of Europe from around 5000 B.C. includes:
1. The arrival of Neolithic farmers probably from the Middle East.
2. Nearly complete replacement of the indigenous Mesolithic southern European populations by Neolithic migrants, and admixture between the Neolithic farmers and the indigenous Europeans in the north.
3. Substantial population movement into Spain occurring around the same time as the archaeologically attested Bell-Beaker phenomenon (HARRISON, 1980).
4. Subsequent mating between peoples of neighboring regions, resulting in isolation-by-distance (LAO et al., 2008; NOVEMBRE et al., 2008). This tended to smooth out population structure that existed 4,000 years ago.
Further, the populations of Sardinia and the Basque country today have been substantially less influenced by these events.
It’s in Genetics, Ancient Admixture in Human History. Reading through it I can see why it wasn’t published in Nature or Science: methods are of the essence. The authors review five population genetic statistics of phylogenetic and evolutionary genetic import, before moving onto the novel results. These statistics, which measure the possibility of admixture, the extent of admixture, and the date of admixture, are often presented, but nested into supplements, in previous papers by the same group. On the one hand this removes from view the engines which are driving the science. On the other hand I have always appreciated that a benefit of this injustice to the methods which make insight possible is that those without academic access can actually bite into the meat of the researcher’s mode of thought.
I did read through the methods. Twice. I’ve encountered all the statistics before, and I’ve read how they were generated, but I’ll be honest and admit that I haven’t internalized them. That has to end now, because the authors have finally released a software package which implements the statistics, ADMIXTOOLS. I plan to use it in the near future, and it is generally best if you understand the underlying mechanisms of a software package if you are at the bleeding end of analytics. I will review the technical points in more detail in future posts, more for my own edification than yours. But for the moment I’ll be a bit more cursory. Four of the tests use comparisons of allele frequencies along explicit phylogenetic trees. That’s so general as to be uninformative as a description, but I think it’s accurate to the best of my knowledge. In the basics the tests are seeing if a model fits the data (as opposed to TreeMix, which finds the best model out of a range to fit the data). The last method, rolloff, infers the timing of an admixture event based upon the decay of linkage disequilibrium. In short, admixture between two very distinct populations has the concrete result of producing striking genomic correlations. Over time these correlations dissipate due to recombination. The magnitude of dissipation can allow one to gauge the time in the past when the original admixture occurred.
There’s a new ancient DNA paper out which examines the maternal lineage and the autosomal background of two individuals extracted from a Spanish site dated to 7,000 years before the present. That is, during the European Mesolithic. In other words, these are the last wave of Iberian hunter-gatherers before agriculture. I have placed the PCA, with some informative labels, to illustrate the peculiarity of these samples. Here’s the abstract:
The genetic background of the European Mesolithic and the extent of population replacement during the Neolithic…is poorly understood, both due to the scarcity of human remains from that period…The mitochondria of both individuals are assigned to U5b2c1, a haplotype common among the small number of other previously studied Mesolithic individuals from Northern and Central Europe. This suggests a remarkable genetic uniformity and little phylogeographic structure over a large geographic area of the pre-Neolithic populations. Using Approximate Bayesian Computation, a model of genetic continuity from Mesolithic to Neolithic populations is poorly supported. Furthermore, analyses of 1.34% and 0.53% of their nuclear genomes, containing about 50,000 and 20,000 ancestry informative SNPs, respectively, show that these two Mesolithic individuals are not related to current populations from either the Iberian Peninsula or Southern Europe.
Here’s another PCA showing one individual on a more fine-grained representation of European populations:
The figure to the left is from a new paper in Science, When the World’s Population Took Off: The Springboard of the Neolithic Demographic Transition. It reports the findings from 133 cemeteries in the northern hemisphere in regards to the proportion of 5-19 year old individuals. When calibrated to period when agriculture was introduced into a specific region there seems to be a clear alignment in terms of a demographic transition toward a “youth bulge.” Why? A standard model of land surplus explains part of it surely. When farmers settle “virgin land” there is often a rapid “catch up” phase toward the Malthusian limit, the carrying capacity. Another possibility though is that sedentary populations did not need to space their offspring nearly as much as mobile hunter-gatherers. Whatever the details, the facts remain that the data do point to a shift in the age pyramid during this period. The author wonders as to the possible cultural implications of this. There is an a priori assumption that a young vs. old age profile in a society constrains its choices and channels its energies (e.g., think the “baby boom” generation in the USA). A final interesting point is that the authors note that today we are seeing the last gasp of this transition toward large numbers of children, as fertility drops toward replacement all across the world. That too may have some cultural consequences.
As many of you know when you have two adjacent demes, breeding populations, they often rapidly equilibrate in gene frequencies if they were originally distinct. There are plenty of good concrete examples of this. The Hui of China are Muslims who speak local Chinese dialects. The most probable root of this community goes back to the enormous population of Central Asia Muslims brought by the Mongol Yuan dynasty that ruled ruled China for over a century from the late 1200s to 1300s. Genetic studies of this group that I’ve seen indicate that a high bound estimate for West Eurasian ancestry is ~10%. The other ~90% is interchangeable with the Han Chinese. So let’s assume that the Hui are ~10% West Asian. If you assume that in the year 1400 the Hui were “pure,” you have 24 generations (25 years per generation). The original population of “Central Asian Muslims” were heterogeneous, including Iranians and Turks. But let’s take it granted that they were 50% East Eurasian and 50% West Eurasian in ancestry at the time of their arrival. What would the intermarriage rate per generation have to be so that the Hui are ~10% West Eurasian at t = 24 (24 generations after the beginning of intermarriage assuming 50/50 West vs. East Eurasian splits)? Turns out all you need is a constant 7% intermarriage rate per generation (the Han Chinese population is so large in relation to the Hui that you can model it as infinite in size).
The situation gets even simpler when you have one population which divides into two. For example, imagine that the Serbs and Croats fissioned from a set of unstructured South Slavic tribes which filtered into ancient Illyria ~600 A.D. Soon enough there was a cultural division between the two in terms of religion (Western vs. Eastern Christian) which threw up a population genetic barrier. If you assume that genetically the two groups were totally similar at t = 0, and you separated them perfectly, over time they would diverge due to drift in their allele frequencies. But the reality is that barriers between geographically close groups do not prevent all intermarriage. Even extremely insular groups in a cultural sense such as the Roma of Eastern Europe are clearly heavily admixed with their surrounding populations, as they seem to be no more than ~50% South Asian in total genome content. Going back to the South Slavs, who start out very similar in our putative scenario, how much intermarriage will be necessary for them to not diverge? The issue is not the rate of intermarriage, rather, one migrant per generation across the two demes will be sufficient to equilibrate allele frequencies. On the face of it this seems implausible, but recall that divergence is driven mostly by drifting of genes as well as new variation (whether through other exogenous migratory sources or mutation). Very small populations are subject to a lot of drift, and so diverge rapidly, but only very few migrants are needed to bring it back into alignment, because they are proportionally significant. In contrast, the frequencies of large populations are less buffeted by generation-to-generation sample variance (e.g., 10 tosses of a coin will deviate more from 50/50 proportionally than 100 tosses), requiring less gene flow proportionally to maintain parity.
Back when this sort of thing was cutting edge mtDNA haplogroup J was a pretty big deal. This was the haplogroup often associated with the demic diffusion of Middle Eastern farmers into Europe. This was the “Jasmine” clade in Seven Daughters of Eve. A new paper in PLoS ONE makes an audacious claim: that J is not a lineage which underwent recent demographic expansion, but rather one which has been subject to a specific set of evolutionary dynamics which have skewed the interpretations due to a false “molecular clock” assumption. By this assumption, I mean that mtDNA, which is passed down in an unbroken chain from mother to daughter, is by and large neutral to forces like natural selection and subject to a constant mutational rate which can serve as a calibration clock to the last common ancestor between two different lineages. Additionally, mtDNA has a high mutational rate, so it accumulates lots of variation to sample, and, it is copious, so easy to extract. What’s not to like?
The image above is adapted from the 2010 paper A Predominantly Neolithic Origin for European Paternal Lineages, and it shows the frequencies of Y chromosomal haplogroup R1b1b2 across Europe. As you can see as you approach the Atlantic the frequency converges upon ~100%. Interestingly the fraction of R1b1b2 is highest among populations such as the Basque and the Welsh. This was taken by some researchers in the late 1990s and early 2000s as evidence that the Welsh adopted a Celtic language, prior to which they spoke a dialect distantly related to Basque. Additionally, the assumption was that the Basques were the ur-Europeans. Descendants of the Paleolithic populations of the continent both biologically and culturally, so that the peculiar aspects of the Basque language were attributed by some to its ancient Stone Age origins.
As indicated by the title the above paper overturned such assumptions, and rather implied that the origin of R1b1b2 haplogroup was in the Near East, and associated with the expansion of Middle Eastern farmers from the eastern Mediterranean toward western Europe ~10,000 years ago. Instead of the high frequency of R1b1b2 being a confident peg for the dominance of Paleolithic rootedness of contemporary Europeans, as well as the spread of farming mostly though cultural diffusion, now it had become a lynch pin for the case that Europe had seen one, and perhaps more than one, demographic revolutions over the past 10,000 years.
This is made very evident in the results from ancient DNA, which are hard to superimpose upon a simplistic model of a two way admixture between a Paleolithic substrate and a Neolithic overlay. Rather, it may be that there were multiple pulses into a European cul-de-sac since the rise of agriculture from different starting points. We need to be careful of overly broad pronouncements at this point, because as they say this is a “developing” area. But, I want to go back to the western European fringe for a moment.
Seriously, sometimes history matches fiction a lot more than we’d have expected, or wished. In the early 2000s the Oxford geneticist Bryan Sykes observed a pattern of discordance between the spatial distribution of male mediated ancestry on the nonrecombinant Y chromosome (NRY) and female mediated ancestry in the mitochondrial DNA (mtDNA). To explains this he offered a somewhat sensationalist narrative to the press about possible repeated instances of male genocide against lineage groups who lost in conflicts.
Here is a portion of the book of Numbers in the Bible:
15 – And Moses said unto them, Have ye saved all the women alive?
16 – Behold, these caused the children of Israel, through the counsel of Balaam, to commit trespass against the LORD in the matter of Peor, and there was a plague among the congregation of the LORD.
17 – Now therefore kill every male among the little ones, and kill every woman that hath known man by lying with him.
18 – But all the women children, that have not known a man by lying with him, keep alive for yourselves.
Then there is the rape of the Sabine women. The ethnogenesis of the mestizo and mulatto populations of the New World in large part was the union between non-European women and European men. These are hard brutal myths and hard brutal facts. But do they reflect an essential aspect of the dynamics which have shaped our species’ past?
I’m not willing quite yet to add a confident weight upon this possibility, but this seems to be part at least part of the picture. You see a major disjunction on male and female lineages among South Asians for example. A new paper in PNAS adds weight to this possibility, albeit only incrementally. Ancient DNA reveals male diffusion through the Neolithic Mediterranean route:
Thanks to the fact that northern Europe is cool and archaeological research is rather well developed in the region due to quirks of history, there are lots of findings from ancient DNA which are answering long-standing questions. In particular Scandinavia is of special interest in regards to the transition of Europeans from a hunter-gatherer lifestyle to an agricultural one. We know that hunting and gathering as dominant modes of economic production persisted relatively late in European history in this region, up to ~5,000 years before the present. From my cursory reading of the material on the spread of agriculture in northern Europe one dynamic which seems clear is that the rate of expansion was not always constant, and that at the northern fringes in particular social or ecological frontiers served to demarcate the limits to the expansion of farming groups, which often originated from the south and east. Additionally, on the maritime fringes of the North Sea and Baltic there seem to have been relatively dense agglomerations of hunter-gatherers which resisted or coexisted with farming populations for long periods of time (perhaps they were more accurately termed fisher-gatherers!).
This is where Anna Linderholm’s research comes into the picture. I’ve blogged some of her work before. Linderholm’s goal seems to be to synthesize a range of results from disparate fields in understanding how two partially contemporaneous prehistoric Scandinavian cultures related to each other: the Pitted Ware Culture (PWC) and the Funnelbeaker Culture (TRB, which is an acronym for the German name for the culture). The former were hunter-gatherers who tended to rely upon marine resources, while the latter were agriculturalists who engaged in a great deal of animal husbandry.
You can find her contribution to the book Human Bioarchaeology of the Transition to Agriculture online. It’s pretty accessible for an ignorant lay person, and in the chapter she outlines some really interesting detail about the relationship between the PCW, TRB, modern northern European populations, and the functional genetic characteristics of these ancient groups.
A new paper in Proceedings of the Royal Society dovetails with some posts I’ve put up on the peopling of Japan of late. The paper is Bayesian phylogenetic analysis supports an agricultural origin of Japonic languages:
Languages, like genes, evolve by a process of descent with modification. This striking similarity between biological and linguistic evolution allows us to apply phylogenetic methods to explore how languages, as well as the people who speak them, are related to one another through evolutionary history. Language phylogenies constructed with lexical data have so far revealed population expansions of Austronesian, Indo-European and Bantu speakers. However, how robustly a phylogenetic approach can chart the history of language evolution and what language phylogenies reveal about human prehistory must be investigated more thoroughly on a global scale. Here we report a phylogeny of 59 Japonic languages and dialects. We used this phylogeny to estimate time depth of its root and compared it with the time suggested by an agricultural expansion scenario for Japanese origin. In agreement with the scenario, our results indicate that Japonic languages descended from a common ancestor approximately 2182 years ago. Together with archaeological and biological evidence, our results suggest that the first farmers of Japan had a profound impact on the origins of both people and languages. On a broader level, our results are consistent with a theory that agricultural expansion is the principal factor for shaping global linguistic diversity.
I don’t know the technical details of linguistics to comment, but the alignment between the linguistic model and archeology is pretty impressive to me. There’s a 95% confidence interval which can push the time back to 4,000 years, so there’s some fudge factor too. The basic technique is borrowed from phylogenetics. This is pretty clear when you notice that one of the algorithms seems to be the same one used in the rice genomics paper. Nick Wade covers the paper in The New York Times, so no need for me to give a blow-by-blow in a domain where I don’t have much insight anyway.
In the age of 500,000 SNP studies of genetic variation across dozens of populations obviously we’re a bit beyond lists of ABO blood frequencies. There’s no real way that a conventional human is going to be able to discern patterns of correlated allele frequency variations which point to between population genetic differences on this scale of marker density. So you rely on techniques which extract the general patterns out of the data, and present them to you in a human-comprehensible format. But, there’s an unfortunate tendency for humans to imbue the products of technique with a particular authority which they always should not have.
The History and Geography of Human Genes is arguably the most important historical genetics work of the past generation. It has surely influenced many within the field of genetics, and because of its voluminous elegant visual displays of genetic data it is also a primary source for those outside of genetics to make sense of phylogenetic relations between human populations. And yet one aspect of this great work which never caught on was the utilization of “synthetic maps” to visualize components of genetic variation between populations. This may have been fortuitous, a few years ago a paper was published, Interpreting principal components analyses of spatial population genetic variation, which suggested that the gradients you see on the map above may be artifacts:
Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.’s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.
A paper earlier this year took the earlier work further and used a series of simulations to show how the nature of the gradients varied. In light of recent preoccupations the results are of interest. Principal Component Analysis under Population Genetic Models of Range Expansion and Admixture:
After linking to Marnie Dunsmore’s blog on the Neolithic expansion, and reading Peter Bellwood’s First Farmers, I’ve been thinking a bit on how we might integrate some models of the rise and spread of agriculture with the new genomic findings. Bellwood’s thesis basically seems to be that the contemporary world pattern of expansive macro-language families (e.g., Indo-European, Sino-Tibetan, Afro-Asiatic, etc.) are shadows of the rapid demographic expansions in prehistory of farmers. In particular, hoe-farmers rapidly pushing into virgin lands. First Farmers was published in 2005, and so it had access mostly to mtDNA and Y chromosomal studies. Today we have a richer data set, from hundreds of thousands of markers per person, to mtDNA and Y chromosomal results from ancient DNA. I would argue that the new findings tend to reinforce the plausibility of Bellwood’s thesis somewhat.
The primary datum I want to enter into the record in this post, which was news to me, is this: the island of Cyprus seems to have been first settled (at least in anything but trivial numbers) by Neolithic populations from mainland Southwest Asia.* In fact, the first farmers in Cyprus perfectly replicated the physical culture of the nearby mainland in toto. This implies that the genetic heritage of modern Cypriots is probably attributable in the whole to expansions of farmers from Southwest Asia. With this in mind let’s look at Dienekes’ Dodecad results at K = 10 for Eurasian populations (I’ve reedited a bit):
A new paper in The New Journal of Physics shows that a relatively simple mathematical model can explain the rate of expansion of agriculture across Europe, Anisotropic dispersion, space competition and the slowdown of the Neolithic transition:
The front speed of the Neolithic (farmer) spread in Europe decreased as it reached Northern latitudes, where the Mesolithic (hunter-gatherer) population density was higher. Here, we describe a reaction–diffusion model with (i) an anisotropic dispersion kernel depending on the Mesolithic population density gradient and (ii) a modified population growth equation. Both effects are related to the space available for the Neolithic population. The model is able to explain the slowdown of the Neolithic front as observed from archaeological data
The paper is open access, so if you want more of this:
Just click through above. Rather, I am curious more about their nice visualization of the archaeological data: