The latest edition of The American Journal of Human Genetics has two papers using “old fashioned” uniparental markers to trace human migration out of Africa and Siberia respectively. I say old fashioned because the peak novelty of these techniques was around 10 years ago, before dense autosomal SNP marker analyses, let alone whole genome sequencing. But mtDNA, passed down the maternal line, and Y chromosomes, passed from father to son, are still useful. Prosaically they’re useful because the data sets are now so large for these sets of markers after nearly 20 years of surveying populations. More technically because these two regions of the genome do not recombine they lend themselves to excellent representation as a tree phylogeny. Finally, mtDNA in particular is particularly amenable to estimates via molecular clock methodologies (it has a region with a higher mutational rate, so you can sample a larger range of variation over a given number of base pairs; you can use STRs, which mutate rapidly, for Y chromosomes, but there seems to be a lot of controversy in dating).
The papers are The Arabian Cradle: Mitochondrial Relicts of the First Steps along the Southern Route out of Africa and Mitochondrial DNA and Y Chromosome Variation Provides Evidence for a Recent Common Ancestry between Native Americans and Indigenous Altaians. Dienekes has already commented on the first paper. I am not going to take a detailed position on either, but I have to add that we need to be very careful of extrapolating from maternal or paternal lineages, and, assuming that population turn over is low enough that we can make phylogeographic inferences about the past from the present. For example, if you look at mtDNA South Asians as a whole strongly cluster with East Asians and not Europeans, while if you look at Y chromosomes you see the reverse. The whole genome gives a more mixed picture. Additionally, ancient DNA analyses in Northern Eurasia are showing strong discontinuities between past and present populations. So coalescence back to last common ancestor between two different lineages in two different regions may actually be due to diversity in a common source population more recently, which entered into demographic expansion and replaced other groups.
If you need the papers, email me. Some of you know the alphabet soup of haplogroups better than I do. Below are two figures which I think give the top line results.
The BBC has a news report up gathering reactions to a new PLoS ONE paper, The Later Stone Age Calvaria from Iwo Eleru, Nigeria: Morphology and Chronology. This paper reports on remains found in Nigeria which date to ~13,000 years B.P. that exhibit a very archaic morphology. In other words, they may not be anatomically modern humans. A few years ago this would have been laughed out of the room, but science moves. Here is Chris Stringer in the BBC piece:
“[The skull] has got a much more primitive appearance, even though it is only 13,000 years old,” said Chris Stringer, from London’s Natural History Museum, who was part of the team of researchers.
“This suggests that human evolution in Africa was more complex… the transition to modern humans was not a straight transition and then a cut off.”
Prof Stringer thinks that ancient humans did not die away once they had given rise to modern humans.
They may have continued to live alongside their descendants in Africa, perhaps exchanging genes with them, until more recently than had been thought.
The Pith: The human X chromosome is subject to more pressure from natural selection, resulting in less genetic diversity. But, the differences in diversity of X chromosomes across human populations seem to be more a function of population history than differences in the power of natural selection across those populations.
In the past few years there has been a finding that the human X chromosome exhibits less genetic diversity than the non-sex regions of the genome, the autosome. Why? On the face of it this might seem inexplicable, but a few basic structural factors derived from the architecture of the human genome present themselves.
First, in males the X chromosome is hemizygous, rendering it more exposed to selection. This is rather straightforward once you move beyond the jargon. Human males have only one copy of genes which express on the X chromosome, because they have only one X chromosome. In contrast, females have two X chromosomes. This is the reason why sex linked traits in humans are disproportionately male. For genes on the X chromosome women can be carriers of many diseases because they have two copies of a gene, and one copy may be functional. In contrast, a male has only a functional or nonfunctional version of the gene, because he has one copy on the X chromosome. This is different from the case on the autosome, where both males and females have two copies of every gene.
This structural divergence matters for the selective dynamics operative upon the X chromosome vs. the autosome. On the autosome recessive traits pay far less of a cost in terms of fitness than they do on the X chromosome, because in the case of the latter they’re much more often exposed to natural selection via males. In the rest of the genome recessive traits only pay the cost of their shortcomings when they’re present as two copies in an individual, homozygotes. A simple quasi-formal example illustrates the process.
The Pith: We are now moving from the human genome project, to the human genomes project. As more and more full genomes of various populations come online new methods will arise to take advantage of the surfeit of data. In this paper the authors crunch through the genomes of half a dozen individuals to make sweeping inferences about the history of the human species over the past few hundred thousand years.
Since the integration of evolution and genetics in the early years of the 20th century there have been several revolutions in our ability to perceive the underlying variation which is the raw material and result of evolutionary genetics. The understanding that DNA was the concrete substrate of Mendelian genetics, and the rise to prominence of molecular genetic techniques in understanding evolution the 1970s and 1980s, was one key transition. No longer were geneticists simply tracking the coat colors of mice or the visible mutations of fruit flies. In the 1990s the uniparental loci, the maternal and paternal lineages as inferred from the mtDNA and Y chromosomes, came into their own. Finally, the 2000s saw the post-genomic era, and researchers routinely began analyzing data sets of hundreds of thousands of single nucelotide polymorphisms (SNPs), genetic variants, in hundreds of individuals.
In this decade some of the promise of the Human Genomic Project will finally ripen, in that whole genomes are going to be used more and more in analyses. This is exciting, but there are some obvious issues. The human genome has ~3 billion base pairs, vs. the 1 million or less you might manipulate per individual in data sets focused on SNPs. There are some things for which a human genome is overkill. You don’t need a full genomic sequence to ascertain your identity as a member of a particular geographic race. Not only can visual inspection usually suffice to reassure you as to your background, but depending on the scale of granularity you want a random SNP set on the order of ~10,000 should suffice, or as few as 25 ancestrally informative markers! But, if you want to ascertain mutation rates within families will precious and confidence, you do need the full genome.
A few months ago I exchanged some emails with Milford H. Wolpoff and Chris Stringer. These are the two figures who have loomed large in paleoanthropology and the origins of modernity human for a generation, and they were keen in making sure that their perspectives were represented accurately in the media. To further that they sent me some documents which would lay out their perspective, in their own words, and away from the public glare (as in, they’re academic publications).
Last summer I made a thoughtless and silly error in relation to a model of human population history when asked by a reader the question: “which population is most distantly related to Africans?” I contended that all non-African populations are equally distant. This is obviously wrong on the face of it if you look at any genetic distance measures. West Eurasians, even those without recent Sub-Saharan African admixture (e.g., North Europeans) are closer than East Eurasians, who are often closer than Oceanians and Amerindians. One explanation I offered is that these latter groups were subject to greater genetic drift through a series of population bottlenecks. In this framework the number of generations until the last common ancestor with Sub-Saharan Africans for all groups outside of Africa should be about the same, but due to evolutionary factors such as more extreme genetic drift or different selective pressures some non-African groups had diverged more from Africans than others in terms of their genetic state. In other words, the most genetically divergent groups in relation to Africans did not diverge any earlier, but simply diverged more rapidly.
Dienekes Pontikos disagreed with such a simple explanation. He argued that admixture or gene flow between Africans and non-African groups since the last common ancestor could explain the differences. I am now of the opinion that Dienekes may have been right. My own confidence in the “serial bottleneck” hypothesis as the primary explanation for the nature of relationships of the phylogenetic tree of human populations is shaky at best. Why my errors of inference?
There were two major issues at work in my misjudgments of the arc of the past and the topology of the present. In the latter instance I saw plenty of phylogenetic trees which illustrated clearly the variation in genetic distance from Africans for various non-African groups. Why didn’t I internalize those visual representations? It was I think the power of the “Out of Africa” (OoA) with replacement paradigm. Even by the summer of 2010 I had come to reject it in its strong form, due to the evidence of admixture with Neanderthals, and rumors of other events which were born out to be true with the publishing of the Denisovan results. But to a first approximation the clean and simple OoA was still looming so large in my mind that I made the incorrect inference, whereby all non-Africans are viewed simply as a branch of Africans without any particular differentiation in relation to their ancestral population. Secondarily, I also was still impacted by the idea that most of the genetic variation you see in the world around us has its roots tens of thousands of years ago. By this, I mean that the phylogeographic patterns of 25,000 years in the past would map on well to the phylogeographic patterns of the present. This assumption is what drove a lot of phylogeography in the early aughts, because the chain of causation could be reversed, and inferences about the past were made from patterns of the present. My own confidence in this model had already been perturbed when I made my errors, but it still held some sort of sway in my head implicitly I believe. It is one thing to move on from old models explicitly, but another thing to remove the furniture from your cognitive basement and attic.
I have moved further from my preconceptions between then and now. It took a while to sink in, but I’m getting there. A cognitive “paradigm shift” if you will. In particular I am more open to the idea of substantive back migration to Africa, as well as secondary migrations out of Africa. A new paper in Genome Research is out which adds some interesting details to this bigger discussion, and seems to weigh in further against my tentative hypothesis that serial bottlenecks and genetic drift can explain variation in distance to Africans of various non-African groups. Human population dispersal “Out of Africa” estimated from linkage disequilibrium and allele frequencies of SNPs:
The Pith: I review a recent paper which argues for a southern African origin of modern humanity. I argue that the statistical inference shouldn’t be trusted as the final word. This paper reinforces previously known facts, but does not add much that both novel and robust.
I have now read the paper which I expressed a touch of skepticism toward yesterday. Do note, I did not dispute the validity of their results. They seem eminently plausible. I was simply skeptical that we could, with any level of robustness, claim that anatomically modern humans arose in southern vs. eastern, or western, Africa. If I had to bet, my rank order would be southern ~ eastern > western. But my confidence in my assessment is very low.
First things first. You should read the whole paper, since someone paid for it to be open access. Second, much props to whoever decided to put their original SNP data online. I’ve already pulled it down, and sent off emails to Zack, David, and Dienekes. There are some northern African populations which allow us to expand beyond the Mozabites, though unfortunately there are only 55,000 SNPs in that case (I haven’t merged the data, so I don’t know how much will remain after combining with HapMap or HGDP data set).
The new picture most resembles so-called assimilation models, which got relatively little attention over the years. “This means so much,” says Fred Smith of Illinois State University in Normal, who proposed such a model. “I just thought ‘Hallelujah! No matter what anybody else says, I was as close to correct as anybody.’ ”
But the genomic data don’t prove the classic multiregionalism model correct either. They suggest only a small amount of interbreeding, presumably at the margins where invading moderns met archaic groups that were the worldwide descendants of H. erectus, the human ancestor that left Africa 1.8 million years ago. “I have lately taken to talking about the best model as replacement with hybridization, … [or] ‘leaky replacement,’ ” says paleogeneticist Svante Pääbo of the Max Planck Institute for Evolutionary Anthropology in Leipzig, lead author of the two nuclear genome studies.
Here’s the infographic that went along with it:
My post The paradigm is dead, long live the paradigm! expressed to some extent my befuddlement at the current state of human evolutionary genetics and paleoanthropology. After the review of the paper of possible elevated admixture with Neandertals on the dystrophin locus a friend emailed, “Remember when we thought everything would be so simple once we could finally see this stuff?” Indeed I do remember. The fact that things aren’t simple is very exhilarating, but it is also a major quash on theoretical clarity. Science is after all not a collection of facts, but it is in part facts which one can sieve through a analytic framework.
In hindsight with the relative robustness of ancient DNA results we can make some assessments about the role of human bias within particular heuristic frameworks over the past generation. From the mid-1980s up until 2000 it was victory after victory for the Out-of-Africa with total replacement model. The rise of mtDNA and Y chromosomal lineage studies seemed to buttress the idea of common descent from neo-Africans within the last 100-200,000 years for all human populations. There wasn’t much of a perturbation from this march toward paradigm ascendancy in the aughts, except that there were now also now a trickle of papers which claimed to phylogenetic “long branches” in the human genome. The 2006 Evans et al. paper, Evidence that the adaptive allele of the brain size gene microcephalin introgressed into Homo sapiens from an archaic Homo lineage, was probably the one that made the biggest media splash. But these were inferences. Subsequent analysis of the draft Neandertal genome seems to suggest that in fact the microcephalin allele in question did not introgress.
Case closed? Obviously not. Now we’re in a different era. The Evans et al. paper may have wrong in the specifics, but its general framework seems to likely have been validated: there are genetic lineages in the modern human genome which are not derived from the neo-Africans. But, let us remember that the overwhelming majority of the human genome is neo-African. A reasonable interval for non-Africans is 90-99% neo-African. But, a non-trivial minority has introgressed or admixed from other lineages. Out-of-Africa is mostly correct, but in some ways so is Multiregionalism. But how do we describe this? “Weighted multiregionalism”? “Mostly Out-of-Africa?” The old terms were nice because they were punchy and precise. If you look at Multiregionalism or Out-of-Africa in Wikipedia the newest results are noted, but it doesn’t seem that they’ve been integrated into the analytic narrative. Yet.
Everyone who is literate knows that the Sahara desert is the largest of its kind in the world. The chasm in cultural, biological, and physical geography is very noticeable. Northern Africa is part of the Palearctic zone, while the peoples north of the Sahara have long been part of the circum-Mediterranean population continuum. The primary continuous habitable corridor is that of the Nile valley. And yet scholars have long known that there has been variation in the climatic regime of the Sahara. The pharaohs of ancient Egypt seem to have hunted a wider range of fauna than is to be found in the deserts surrounding the current Nile valley, perhaps relics from a more humid period. Rock art in some regions of the desert indicate aquatic life, and species more characteristic of the savanna. And yet we should not think of the Sahara as a recent phenomenon; it does seem to be geologically ancient, despite periodic humid interregnums.
A new paper in PNAS attempts to map the hydrography of the Sahara over the Holocene, as well as back to the Pleistocene. The ultimate aim seems to be to better frame the geographic constraints on the expansion of humanity from its African homeland, and refute a simple projection from the present to the past. In this case, it is the existence of the Nile as a verdant and habitable watercourse which connects the north and south, and bisects the continuous desert. Ancient watercourses and biogeography of the Sahara explain the peopling of the desert:
Dilettante human genetics blogger Dienekes Pontikos has a post up with a somewhat oblique title, Is multi-regional evolution dead? I say oblique because a straightforward title would be “Multi-regionalism lives!” He posted a chart from a 2008 paper which outlines various models of human origins, and their relationship to molecular data at the time. I have also posted the chart, but with a little creative editing on the “assimilation” scenario to reflect the possible Neandertal and Denisovan admixture events. Of these models the “candelabra” can be rejected as highly implausible. It posits very deep roots in a given region for distinct human populations. Unless you accept some sort of hominin population structure in Africa which were maintained by distinctive migrations out of Africa then the “replacement” model can be discarded (since the classic replacement model did not posit ancient African population structure being of any relevance outside of Africa you’d have to salvage it with a modification in light of new results).
So the two primary disputants are a resurrected multi-regional model, and the assimilation model. But these two are really endpoints on a spectrum of models. What you need to do is vary the number of discrete populations and the rate of migration between the populations over time. The beauty of the replacement model was its parsimony: as far as recent human origins were concerned past gene flow via migration was a relatively academic concern. It was an exceedingly simple narrative framework. Consider this first episode of a 2009 British documentary, The Incredible Human Journey:
Quick review. In the 19th century once the idea that humans were derived from non-human ancestral species was injected into the bloodstream of the intellectual classes there was an immediate debate as to the location of the proto-human homeland; the Urheimat of us all. Charles Darwin favored Africa, but in many ways this ran against the cultural grain. The theory of evolution was birthed before the highest tide of the age of white supremacy and European hegemony, and Darwin’s model had to swim against the conviction that Africans were the most primitive of the colored races. After the waning of the ideological edifice of white supremacy, and the shock it received during and after World War II, the debates as to the origin of humanity still remained contentious and followed the same outlines (though without the charged normative inferences). But as the decades wore on many more researchers began to believe that Darwin was correct, and that the origin of humanity lay in the African continent. First, the deep origin of the human lineage in Africa was accepted, but eventually a more recent expansion out of Africa was argued for by one school. The turning point in these academic disputes was the popularization of the “mitochondrial Eve” theory of the 1980s.
What some paleontologists had long argued, that anatomically modern humans have their locus of origin in Africa, was supported now by research from genetics which indicated that Africans were the most basal clade of humans on a continental scale, so that non-Africans could be conceived of as a subset of Africans. From this originates the chestnut of wisdom that Africans have more genetic diversity than all other human populations combined. By the year 2000 one could say that the “Out of Africa” triumphalism had proceeded to the point where an almost exterminationist model had taken hold when it came to the relationships of anatomically modern H. sapiens, and other groups which had evolved outside of Africa over the past million or so years, such as the Neandertals.
But the theoretical dichotomies were too coarse and absolute as it turns out. A division between multiregionalist phyletic gradualism, where H. sapiens evolved out of its hominin ancestors concurrently on a world wide scale, and a model of rapid expansion of one tribe in Africa to replace all others in totality, may have been warranted in the age of classical genetics and a morphometric analysis, but now we can look at the raw genomic material in a more fine-grained fashion. In fact, we can now look at the genomic patterns of variation among extinct hominins! Though there have long been hints that the expansion-and-replacement paradigm was too extreme from the genetic and morphological data, with the publication last spring in Science of a paper which made the claim for admixture between Neandertals and non-Africans in the range of 1-4% in all non-African groups based on a comparison of Neandertal and modern human genetic variation, one can dismiss absolutist expansion-and-replacement as self-evidently true orthodoxy. But one orthodoxy has no given way to another, and the shock to the old models presented by the data has not resulted in the coalescence of new robust paradigms. We live in a time of scientific troubles, so to speak.
Despite the reality that I’ve cautioned against taking PCA plots too literally as Truth, unvarnished and without any interpretive juice needed, papers which rely on them are almost magnetically attractive to me. They transform complex patterns of variation which you are not privy to via your gestalt psychology into a two or at most three dimensional representation which can you can grok immediately. That is why History and Geography of Genes was so engrossing. You recognize patterns which were otherwise unrecognizable. But how you interpret those patterns, that’s a wholly different matter. And how those patterns arise is also not something one can ignore.
First, let’s start with an easy case. To the left is a PCA plot with four populations. Nigerians, East Asians (Chinese + Japanese), Europeans (whites from Utah), and finally, African Americans. The x-axis is the first principal component of variation, and the y-axis the second. That means that the x-axis is the independent dimension of variation within the patterns of genetic data which explains the largest fraction of the total amount of genetic variation. The sum totality of the variation can be decomposed into an large set of independent dimensions which can be rank ordered from the largest explanatory components to the smaller ones, successively by number. In a human genetic context the first principal component invariably separates Africans from non-Africans, and the second principal component often maps onto a west-east axis from Europe to the New World. Subsequent principal components can often be useful in smoking out fine scale distinctions, or relationships which are confused by the existence of similar but different signals in admixed populations.
The interpretation of this plot is rather easy. You see that African Americans lay along a continuum between Nigerians and Europeans, skewed toward Nigerians, with some outliers toward East Asians. We know from other genetic findings that ~20% of the African American ancestral quanta is European, but, that quanta is not equally distributed across the population. ~10% of the African American population is more than 50% European in ancestry, while 90% is less than 50% European. And so you have a distribution which reflects this variation. As for the outliers, I will speculate and suggest that these are indications of Native American ancestry among some African Americans.
The story I presented above is probably plausible as an explanation of the visual because we have a wealth of historical data to corroborate the plausibility of that narrative. The fit between the results from the technique of analysis of genetic variation and what scholars have long inferred from textual sources is relatively easy. It is far more difficult to look at a PCA plot, and generate a plausible narrative that you yourself accept with a high degree of confidence with little external support. It is with that caveat in mind that I present Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping:
I assume by now that everyone has read A Draft Sequence of the Neandertal Genome. It’s free to all, so you should. At least look at the figures. Also, if you haven’t at least skimmed the supplement, you should do that as well. It’s nearly 200 pages, and basically feels more like a collection of minimally edited papers than anything else. There’s no point in me reviewing the paper, since you can read it, and plenty of others have hit the relevant ground already.
Since there seem to be three main segments of the paper, here are a few minimal thoughts on each.
Alan Templeton, whose text Population Genetics and Microevolutionary Theory is right below Hartl & Clark in my book, recently published a strongly worded paper, Coherent and incoherent inference in phylogeography and human evolution. The possibility of statistical errors in published work is not shocking, I have heard that when statisticians are asked to sort through papers in medical genetics journals there are elementary errors in ~3/4 of those which have made it beyond peer review. That being said Templeton seems to be making a stronger case than simple refutation of basic errors, in particular he is suggesting that the “ABC” method which lay at the heart of the paper I reviewed last week is incoherent at the root. Here’s Templeton’s abstract:
A hypothesis is nested within a more general hypothesis when it is a special case of the more general hypothesis. Composite hypotheses consist of more than one component, and in many cases different composite hypotheses can share some but not all of these components and hence are overlapping. In statistics, coherent measures of fit of nested and overlapping composite hypotheses are technically those measures that are consistent with the constraints of formal logic. For example, the probability of the nested special case must be less than or equal to the probability of the general model within which the special case is nested. Any statistic that assigns greater probability to the special case is said to be incoherent. An example of incoherence is shown in human evolution, for which the approximate Bayesian computation (ABC) method assigned a probability to a model of human evolution that was a thousand-fold larger than a more general model within which the first model was fully nested. Possible causes of this incoherence are identified, and corrections and restrictions are suggested to make ABC and similar methods coherent. Another coalescent-based method, nested clade phylogeographic analysis, is coherent and also allows the testing of individual components of composite hypotheses, another attribute lacking in ABC and other coalescent-simulation approaches. Incoherence is a highly undesirable property because it means that the inference is mathematically incorrect and formally illogical, and the published incoherent inferences on human evolution that favor the out-of-Africa replacement hypothesis have no statistical or logical validity.
The method which Templeton favors is naturally one which he has pushed in the past. In any case, I don’t know the statistical details well enough to comment with much knowledge, but I see that a statistician has responded to Templeton already, so I would recommend checking that out. I immediately went looking for responses because the paper uses really strong and dismissive language, and I am somewhat wary of that sort of thing when attempting to tear down the fundamentals of a whole field of research (I want to emphasize that overall I enjoy Templeton’s work, but the paper reminded me a bit too much of Jerry Fodor). His citation of Popper in particular seems an appeal to authority that aims to convince the non-statisticians in the audience, and I don’t see the point of that besides rhetorical utility. I do tend to accept somewhat Templeton’s critique of models which assume very little gene flow between hominin populations before the Out-of-Africa migration, though from what I can tell it does seem that Africa has had relatively little back-migration south of the Sahara over the past 50,000 years, so perhaps this is an older dynamic as well. I am cautiously optimistic that DNA extraction from fossils themselves may put to bed some of these arguments over the dance of parameters, though naturally interpretation is always an issue outside of pure mathematics.
For what it’s worth, here’s the model which Templeton’s method favors: