Mark Liberman at Language Log has looked through the Science paper Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa. Overall he seems to think it is an interesting paper, but he has some pointed criticisms. Here’s the utility of the post: Liberman uses analogies to domains (e.g., genomics) which are comprehensible to me. My main issue with linguistic evolution is that I’m so ignorant that I barely understand the features being discussed. I may know their dictionary.com definition, but I have pretty much no deep comprehension with which to test the inferences against. By analogy, imagine trying to evaluate a morphological cladistic model with no understanding of anatomy. Here’s the part which may be of particular interest to readers of this weblog:
However, this combination of coarse binning into ranges, for functionally-defined subsets of elements with radically different numbers of members, seems to me to be much more problematic for Atkinson’s purposes. It’s as if a human genomic survey made geographically localized counts of the number of alleles involved in color vision and in blood physiology, divided each set of counts into a few bins (“a little variation”, “a medium amount of variation”, “a lot of variation”), standardized the binned counts for each functional class separately, and averaged the results, thus giving as much weight to each color-vision variant as to several orders of magnitude more blood-physiology variants. This might be OK, but choosing to give this kind of boost to features that happen to be enriched in one region or another will obviously push the results around by a considerable amount
Even if you can’t evaluate the technique in its guts, it is easy to spot some possible issues in the way the data you input into the method is coded or categorized. I hope in the near future this will be less and less of an issue, but it’s a problem which I can understand pretty easily without being very aware of the linguistic details. Also, Liberman’s last paragraph is funny. Though in defense of this paper I think we need to evaluate its plausibility in terms of the overall conditional probability; we often have strong prior models of the origin and expansion of modern humanity, and so we give a particular specific significance to this result. That can of course lead us to greater error than would otherwise be the case if our priors aren’t quite as robust as we’d thought.