The geography of genes tells us only so much about history

By Razib Khan | August 24, 2011 11:29 pm

L. L. Cavalli-Sforza’s The History and Geography of Human Genes is a book I reference a great deal. Cavalli-Sforza is the godfather of the field of historical population genetics, the phylogeography of humankind. Though his work was on classical autosomal markers, the huge literature which drew inferences from Y chromosomal and mtDNA variation followed in the wake of the The History and Geography of Human Genes. Spencer Wells, the director of the Genographic Project, alluded to Cavalli-Sforza’s influence in The Journey of Man. But at this point I think we have to be very careful of making inferences about the past from present patterns of genetic variation. This is made most stark by the fact that ancient DNA, which is a snapshot of the past, as opposed to an inference of it, sometimes diverges from our expectations based on present patterns of variation in surprising ways.

This to me is the big lesson to draw from a new paper in The Proceedings of the Royal Society B, The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269. The results focus on two issues. First, the distribution of Y chromosomal lineages in Europe, in particular R-M269. And second, the time to the last common ancestor of branches of the overall lineage. Patterns of distribution and variation of a lineage are informative, insofar as regions with higher variation are presumed to be the core zone from which the lineage expanded. This is the logic which underpins the conclusion that Africa is the locus of modern humanity; Africa has more genetic diversity than other continents. The time until the last common ancestor between two given lineages is contingent upon a “molecular clock” model. From what I have heard and read this is a very dicey proposition for Y chromosomal variation, and this paper confirms the erratic nature of these estimates.

Since the paper is free, I suggest you go read it. The major takeaway seems to be that the representativeness of a sample matters a lot, and never trust estimates of coalescence between two lineages. The statistical associations between geography and R-M269 diversity found by earlier researchers disappeared when the database was expanded and the markers typed more thoroughly. Maju and Dienekes have a lot to say on this paper in the broader context. I’m not too interested in arguing in detail about the results and what they mean, I am of the opinion that ancient DNA is going to be the ultimate arbiter. But, I do believe that a lot of our models are way too simple, which is one of the reasons why inferences are so often faulty. I wouldn’t be surprised for example if R-M269 is a male lineage which expanded rapidly very recently in Western Europe after the Neolithic, but that its point of origin has now come to be dominated by other lineages, obscuring the patterns of the past.

  • Paul Ó Duḃṫaiġ

    I do hope that advances in extracting viable Y-Chromosome aDNA (ancient DNA) over the next 10 years will give us alot clearer picture. Jean Manco has a list of ancient DNA recovered in Europe here:

    So far most of the aDNA extracted from Neolithic males appears to belong to Haplogroup G with Haplogroup I been other major component. The earliest R1a appears to be from Bronze age. However without alot more samples from across the continent it’s too earlier to derive any valid inferences let.


  • Justin Loe

    This claim is made in the conclusion, and I quote so that I do not do disservice to it:
    “Age estimates based on sets of Y-STRs carefully selected to possess the attributes necessary for uncovering deep ancestry (for example, from the almost 200 recently characterized here [33]), and from whole Y chromosome sequence comparisons, will provide robust dates for this haplogroup in the future.”

    Many hobbyists have made similar arguments before, and have asserted that many papers by academics in this field based on 10 STR or 17 STR haplotypes made unsupported claims. Possibly, the need for headlines in the past has trumped more reasonable arguments (and better supported science). Instead, a reasonable assertion that the question is currently unanswered is the better answer, rather than making claims based on the desire for a headline snippet that later turns out to be unsupported by the data.

    Good for them: “For now, we can offer no date as to the age of R-M269 or R-S127”

  • ihateaphids

    Razib, when you say “never trust estimates of coalescence between two lineages” are you referring to the specific timing (in years or whatever) of coalescence?


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar