The first, second, and third nations

By Razib Khan | July 11, 2012 10:47 pm

By now you’ve probably read about the paper which reports that there seem to have been three waves of humans migrating into the New World prior to the arrival of Europeans. A major aspect of this result is that it does not emerge out of a vacuum, but rather comes close to settling an old question in linguistics. The late Joseph Greenberg generated a series of audacious phylogenies of languages of the world. Greenberg’s attempts received mixed reviews. It seems that there is little controversy about some of his classifications of African languages, but linguists of American native dialects rejected his division of the languages of the New World into three broad families, Eskimo-Aleut, Na-Dene, and Amerind. Eskimo-Aleut is rather self-evident. Na-Dene encompasses a group of languages in northwest North America, along with some significant outliers such as Navajo. Amerind seems to roughly be a grab-bag of everything else. The linguistic trichotomy also lent itself to a narrative of three migrations. L. L. Cavalli-Sforza gave his support to Greenberg’s framework in The History and Geography of Human Genes, and it seems most non-linguists are particularly congenial toward his tendency of ‘lumping.’ In contrast, linguists remain more skeptical ‘splitters,’ at lease those who have a more ethnographic disciplinary bent. Geneticists have not always supported Greenberg’s suppositions. For example, many of the members of the same group which authored this paper implicitly put the kibosh on the attempt to construct a unified linguistic family which spanned the Andaman Islanders and the Papuans.

The method of the paper was relatively straightforward, assuming you are already somewhat familiar with the statistical genetic esoterica which was unveiled a few years ago by this group and others. Basically you take genetic data in the form of hundreds of thousands of SNPs, and you test the patterns of variation in that data across populations against explicit models of demographic history, represented visually by phylogenetic trees. You can see here that the sampling was relatively thick, except for the United States. Chalk this up to politics. I’ve been hearing about this particular problem in relation to this paper for over a year now. Not having asked any of the members of the group directly I obviously am going off hearsay, but the lack of American samples is most definitely not a feature. It’s a bug. In the supplement they also note that they couldn’t get Na-Dene data from another research group. Almost certainly that’s because of bioethical issues and legal contractual constraints.

Despite all this drama, the scientific isn’t too hard to understand. Aside from the nifty statistics one problem is that many of these native groups have European and African admixture, but there are workarounds to that (e.g., just pull out genomic segments which are indigenous, and use those). The outcome is neatly visualized in the figure below:

