A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:
Native Americans and Northern Europeans More Closely Related Than Previously Thought
Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America
Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS
The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.
What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.
So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.
The Pith: You’re Asian. Yes, you!
A conclusion to an important paper, Nick Patterson, Priya Moorjani, Yontao Luo, Swapan Mallick, Nadin Rohland, Yiping Zhan, Teri Genschoreck, Teresa Webster, and David Reich:
In particular, we have presented evidence suggesting that the genetic history of Europe from around 5000 B.C. includes:
1. The arrival of Neolithic farmers probably from the Middle East.
2. Nearly complete replacement of the indigenous Mesolithic southern European populations by Neolithic migrants, and admixture between the Neolithic farmers and the indigenous Europeans in the north.
3. Substantial population movement into Spain occurring around the same time as the archaeologically attested Bell-Beaker phenomenon (HARRISON, 1980).
4. Subsequent mating between peoples of neighboring regions, resulting in isolation-by-distance (LAO et al., 2008; NOVEMBRE et al., 2008). This tended to smooth out population structure that existed 4,000 years ago.
Further, the populations of Sardinia and the Basque country today have been substantially less influenced by these events.
It’s in Genetics, Ancient Admixture in Human History. Reading through it I can see why it wasn’t published in Nature or Science: methods are of the essence. The authors review five population genetic statistics of phylogenetic and evolutionary genetic import, before moving onto the novel results. These statistics, which measure the possibility of admixture, the extent of admixture, and the date of admixture, are often presented, but nested into supplements, in previous papers by the same group. On the one hand this removes from view the engines which are driving the science. On the other hand I have always appreciated that a benefit of this injustice to the methods which make insight possible is that those without academic access can actually bite into the meat of the researcher’s mode of thought.
I did read through the methods. Twice. I’ve encountered all the statistics before, and I’ve read how they were generated, but I’ll be honest and admit that I haven’t internalized them. That has to end now, because the authors have finally released a software package which implements the statistics, ADMIXTOOLS. I plan to use it in the near future, and it is generally best if you understand the underlying mechanisms of a software package if you are at the bleeding end of analytics. I will review the technical points in more detail in future posts, more for my own edification than yours. But for the moment I’ll be a bit more cursory. Four of the tests use comparisons of allele frequencies along explicit phylogenetic trees. That’s so general as to be uninformative as a description, but I think it’s accurate to the best of my knowledge. In the basics the tests are seeing if a model fits the data (as opposed to TreeMix, which finds the best model out of a range to fit the data). The last method, rolloff, infers the timing of an admixture event based upon the decay of linkage disequilibrium. In short, admixture between two very distinct populations has the concrete result of producing striking genomic correlations. Over time these correlations dissipate due to recombination. The magnitude of dissipation can allow one to gauge the time in the past when the original admixture occurred.
I cropped the image above from the paper Inference of Population Structure using Dense Haplotype Data. The main reason was emphasize the distinctiveness of the Sardinian cluster, on the bottom right. As you can see this population exhibits a lot of coancestry across individuals. This isn’t too surprising, Sardinia is an island, and islands are often genetically distinctive. Random genetic drift prevents populations from diverging through gene flow, but water is a major impediment to gradual isolation by distance dynamics. The original Sardinians are naturally going to diverge from mainlanders over time, and begin to share the same set of common ancestors in the recent past, because their space of reasonable mating possibilities is constrained. The other population which is similar in the heat map above are the residents of the Orkneys, off the north coast of Scotland (the Orkneys has a much smaller population than Sardinia, but, it is also much closer to the mainland).
This is on my mind because Dienekes has a long post where he explores the D-statistic results of various European populations, using Sardinians as one of the references. You don’t need to know the details, just that Northern European populations seem to exhibit an affinity to East Asians. Our favorite human genomicsts at Broad highlighted this tendency, and Dienekes went looking (if David Reich et al. were inclined toward mischief they should just posit some crazy scenario, and see if Dienekes assembles the requisite data set!). Which is all fine, and we’ll see more of this in the near future. It isn’t as if others aren’t using Sardinians.
A few years ago you started seeing the crest of studies which basically took several hundred individuals (or thousands) from a range of locations, and then extracted out the two largest components of genetic variation from the hundreds of thousands of variants. The clusters which fell out of the genetic data, with each point being an individual’s position, were transposed onto a geographical map. The figure to the left (from this paper) has been widely circulated. You don’t have to be a deep thinker to understand why things shake out this way; people are more closely related to those near than those far because gene flow ties populations together, and its power decreases as a function of distance.
Of course the world isn’t flat, and history perturbs regularities. Jews for example often don’t shake out where they “should” geographically, because of their historical mobility contingent upon random and often capricious geopolitical or social pressures. The Hazara of Afghanistan have their ethnogenesis in the melange of peoples who were thrown together after the Mongol conquest of Central Asia and Iran in the 13th century, and the subsequent collapse of the Ilkhan dynasty. Though the Hazara have mixed with their Persian, Tajik and Pashtun neighbors, they still retain a strong stamp of Mongolian ancestry which means that they are at some remove on the “genetic map” from their geographical neighbors.