The Pith: We are now moving from the human genome project, to the human genomes project. As more and more full genomes of various populations come online new methods will arise to take advantage of the surfeit of data. In this paper the authors crunch through the genomes of half a dozen individuals to make sweeping inferences about the history of the human species over the past few hundred thousand years.
Since the integration of evolution and genetics in the early years of the 20th century there have been several revolutions in our ability to perceive the underlying variation which is the raw material and result of evolutionary genetics. The understanding that DNA was the concrete substrate of Mendelian genetics, and the rise to prominence of molecular genetic techniques in understanding evolution the 1970s and 1980s, was one key transition. No longer were geneticists simply tracking the coat colors of mice or the visible mutations of fruit flies. In the 1990s the uniparental loci, the maternal and paternal lineages as inferred from the mtDNA and Y chromosomes, came into their own. Finally, the 2000s saw the post-genomic era, and researchers routinely began analyzing data sets of hundreds of thousands of single nucelotide polymorphisms (SNPs), genetic variants, in hundreds of individuals.
In this decade some of the promise of the Human Genomic Project will finally ripen, in that whole genomes are going to be used more and more in analyses. This is exciting, but there are some obvious issues. The human genome has ~3 billion base pairs, vs. the 1 million or less you might manipulate per individual in data sets focused on SNPs. There are some things for which a human genome is overkill. You don’t need a full genomic sequence to ascertain your identity as a member of a particular geographic race. Not only can visual inspection usually suffice to reassure you as to your background, but depending on the scale of granularity you want a random SNP set on the order of ~10,000 should suffice, or as few as 25 ancestrally informative markers! But, if you want to ascertain mutation rates within families will precious and confidence, you do need the full genome.
A new paper in Nature illustrates the possibilities of looking at the whole genome, instead of simply a variant subset. In it, the authors show the power of using only a few individuals’ whole genomes to derive insights about broader population histories. That’s because with a whole genome you obviously are maximizing the amount of data you’re getting in terms of raw sequence, and there’s no need for approximations.
The history of human population size is important for understanding human evolution. Various studies…have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH)…a Korean male (SJK)…three European individuals…and two Yoruba male...We infer that European and Chinese populations had very similar population-size histories before 10–20 kyr ago. Both populations experienced a severe bottleneck 10–60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure…We also infer that the differentiation of genetically modern humans may have started as early as 100–120 kyr ago…but considerable genetic exchanges may still have occurred until 20–40 kyr ago.
The results of the paper itself are not earth-shattering. It’s really just a test-run of a series of methods which will probably become widespread if they turn out to be more useful than the alternatives. We’ve long seen a pattern in the genetic data of a relatively larger long term African population, and a bottleneck with non-Africans.
In terms of method, first they seem to have focused on patterns of genetic variation on the intra-locus dimension. By this, I mean that they had diploid whole genomes, as every gene necessarily comes in two copies except on the sex chromosomes, and they analyzed the patterns of variation of heterozygosity (two different variants of the gene) or homozygosity (same variant of the gene). These patterns would be distributed across the genome, on the inter-locus dimension, differentiated by recombination events which chop apart the patterns across the genome by mixing and matching chromosomal segments. Recombination events occur steadily across time, so the nature of the patterns can allow one to infer recombination events, the magnitude of which can then lead one to to the time of the last common ancestor of two segments.
As I note, qualitatively they replicated what has long been known, but the authors claim that their model allows for more precise quantitative inferences with fewer parameters. The parameters free to vary in their model were the mutation rate, the recombination rate, and ancestral population sizes. With their assumptions in hand they generated the following figure panel which shows the effective population size inferred from genomes as a function of time:
Moving to the left you come closer to the present, while to the right you move further into the past. Because of the reliance on recombination rates the authors admit that their method lacks power <20,000 years before the present, and > 3 million years before the present. In the former case there are too few recombination events, and in the latter case I assume that the events saturate the genome (they also note that deep balancing selection could generate artifacts). The authors validated their method by simulating genomes, but the results are obviously correct to a first approximation from what we know in other disciplines. You have a major Eurasian bottleneck, and a less severe bottleneck for Africans. Then the bounce back after the Last Glacial Maximum.
The second chart is more complicated, but the take away is that it is from this that the authors inferred that there must have been admixture between the ancestors of West Africans and Eurasians ~20-40, thousand years ago. More intelligibly the authors noted that the X chromosomes of the Korean and African individuals did not diverge nearly as much as they should, in that regions with last common ancestry in the ~20-40,000 year interval were far more numerous in their data than simulated results would imply using a model of total separation 60,000 years ago.
Let’s jump straight to the discussion:
The time frame proposed above for continued genetic exchange between Africans and non-Africans is more recent than the archaeologically documented time of the out-of-Africa dispersal, because there are modern human fossils in both Europe and Australasia that date to >40 kyr ago…Further analysis of additional non-African genomes indicates that this genetic exchange occurred primarily before the separation of Europeans and East Asians…An important caveat to this conclusion is the uncertainty of the per-year mutation rate of 1.0 × 10−9 (2.5 × 10−8/25). Although this mutation rate agrees well with the rates estimated between primates averaged over millions of years…generation intervals as high as 29 years per generation over the last few thousand years23, and present mutation rates lower than 2.5 × 10−8 per generation…are possible in principle. These factors could make our recent date estimates too recent, although it seems unlikely that such inaccuracies would be consistent with a date of final genetic exchange as far back as 60 kyr ago….
If I was a journalist I would probably put this into the “developing….” bin, as there may be revisions to the human mutation rate, as they acknowledge above. In fact I have to wonder if a reviewer prodded them to add that caveat to the paper, though I am also rather sure that many of the authors are quite aware of some of the discussions as to the exact value of this parameter.
My own position as to the details of mutation rates and their implications for modern human origins are inchoate. But, let’s assume that we push back the last common ancestry estimates by a factor of 1.5-2. This may explain Eurasian-African admixture easily, if we presume that the ancestral proto-Eurasians were a liminal African population, which was in position to interbreed with both Africans proper and Neandertals because of their geographically equidistant position. Of course one thing that jumps out at me is that many of these arguments would be resolved if we sequenced a full blooded Australian Aborigines. If this population is descended from the humans who arrived 40-50,000 years ago, then we can test whether the African admixture occurred 20-40,000 years before the present. If the Aborigine shows signs of admixture, then we have to be open to moving the time back to a period when the Aborigines were resident on the Eurasian mainland. Another possibility of course is that the Aborigines we deem indigenous today are actually late arrivals, on the order of ~20-40,000 years, replacing the original humans who arrived ~40-50,000 years ago.
I suspect many of these questions will be answered with larger data sets in the near future. The utility of methods such as the one above will increase once we fine tune some of the parameters. Interesting times.
Citation: Heng Li, & Richard Durbin (2011). Inference of human population history from individual whole-genome sequences Nature : 10.1038/nature10231