The arcane art of ancient admixture

By Razib Khan | October 21, 2012 10:29 pm


I have mentioned the PLoS Genetics paper, The Date of Interbreeding between Neandertals and Modern Humans, before because a version of it was put up on arXiv. The final paper has a few additions. For example, it mentions the generally panned (at least in the circles I run in) PNAS paper which suggested that ancient population structure could produce the same patterns which were earlier used to infer admixture with Neandertals (the authors also point to Yang et al. as a support for the proposition of admixture rather than structure). The primary result, dating the admixture between Neandertals and anatomically modern humans ~40-80,000 years before the present, is reiterated.

An interesting aspect is that their method is to utilize linkage disequilibrium (LD) decay. It’s interesting because tens of thousands of years is a hell of a long time to be able to detect an admixture event via LD! In particular because there’s likely a palimpsest effect where there are intervening admixtures and other assorted demographic events (e.g., bottlenecks and selective sweeps can also generate LD). So how’d they do it? Basically the authors figured out a way to ascertain which pairs of  SNPs may have introgressed from Neandertals by comparing the frequency in modern humans to Neandertals at those given SNPs (in particular, by looking at variants at low frequency in Africans and derived in Neandertals). A major technical problem here is the “genetic map” which allows one to assess what the nature of recombination over time is going to be which breaks apart the associations which are the hallmark of LD is not particular precise enough to robustly allow them to make the inferences that they want.

The methods which they used to correct for these problems are ingenious and clever, and as is usually the case with this group the supporting information is well worth the read if you are a geneticist. But I am of a mind to recall what Dr. Joseph Pickrell’s statement about the nature of peer review in such specialized and fankly arcane field implied: that the number of genuine peers is relatively small. Unlike physicists or economists most biologists are not formally trained in a common technical mathematical language. This explains the surplus of people from physics and mathematics backgrounds in many genomic laboratories. These are people who can parse and analyze big data, and extract signal from the noise by generating their own statistical tools as needed. But despite the forbidding formal aspect to the methods, the results coming out of these laboratories are still of interest, both academically (scientists are interested in stuff, period) and professionally (scientists like to use the methods that others develop) to those outside the discipline.

And yet I believe that a divergence is developing here, as the methods developers are blazing to cut deep into the swell of data are moving  well ahead of where other biologists can follow. Of course it is not just biologists. These particular specific questions about deep history and the human phylogenetic tree is of great interest to paleoanthroplogists, most of whom clearly can not follow with any fluency the debates about ancient structure or admixture, and the relevant of D-statistics. This is clearly what happened when Richard Klein convinced The New York Times to write an article which brought to light his professional gripes with the statistical geneticists who have upturned his nicely situated apple-cart, and offered up a compelling competitor to him in his domain specific specialty. But in Klein’s defense his elegant verbal models were at least clear to the general public. There is a methodological opacity to statistical genomics which we have to admit is undeniable.

Ultimately from my own personal experience there is one primary way to truly grokk what is going on in a paper like this: replicate their analyses with the same computational techniques, and develop one’s own intuition. Unfortunately this takes time, and everyone has their own tasks before them, so less of this happens than should be the case (e.g., thousands of simulations are not cheap computationally). But all groups like the one above can do is provide the software tools, and point to where the data is (this emphasizes the crucial importance of open science today). Others can reanalyze, and importantly replicate simulations and modulate parameters to their own liking. This is all much more useful than armchair critiques, peer or not. Magic becomes a skill once you become familiar with it.

Citation: Sankararaman S, Patterson N, Li H, Pääbo S, Reich D (2012) The Date of Interbreeding between Neandertals and Modern Humans. PLoS Genet 8(10): e1002947. doi:10.1371/journal.pgen.1002947

MORE ABOUT: Human Genetics
  • Dm

    Very interesting. Even the “emerging classic” ROLLOFF has major issues with sub-centiMorgan distances and time scales over a couple thousand years, as discussed in Moorjani et al. Part of it is non-admixture LD and part, uncertainity in genetic maps; the bias generally results in overestimated time scales, and the calculated CIs are somewhat meaningless as they “only” describe the level reprodicibility of the estimates within the model (see also a recent discussion here).

    These guys operate on a 0.01 cM (!) scale, and one of the improvements of ROLLOFF algorithm seems to be very straightforward: only consider those SNP pairs where both alleles differ in the “main” vs. “admixing” populations. Dienekes? Is there anything you can apply to your ancient-admixture ROLLOFF results? Or the similarity between the human groups is such that there will be too few pairs of adjacent introgressed alleles to analyze?

  • S.J. Esposito

    Regarding the opaque nature of quantitative genomics: I wonder if this will persist into the future or if people (read: other biologists, anthropologists, etc) will be forced to develop a comfortable familiarity with the major subject matter..? I mean, if ‘big data’ is really all the rage, then it seems weird to me that many people who “follow the literature” are not able to sit down and grasp the theoretical underpinnings, much less replicate the results… My thinking is that it’s still early on in the game and that things will change in the next couple of years, but I see a lot of students (the next generation) who are still pursuing benchwork-only or, in my particular field, bone-only research and accepting the results of genomics on the face of it.

  • https://plus.google.com/109962494182694679780/posts Razib Khan

    genomics on the face of it.

    shit! :=) beware, lest someone rips your face off.

  • Chad

    Very few new grad students in Biology enter with the skill sets to begin analyzing such data and unfortunately I find that very few students with those skills start with the Biological knowledge to understand the experiment, let alone the results. I once took a CS class focused on Bioinformatics Algorithms where every class for the first couple weeks, I typically had to review the Central Dogma with the CS students and explain the purpose of experiments where one quantifies mRNA abundance or where one mutates genes.

    I have noticed recently a lot of typically bench labs diving into sequencing projects and throwing new grad students on these projects, so I think the situation will change 4-5 years from now when these students become post-docs……the down side (1) few of the PIs seem to have the patience in letting the students develop the needed skill sets (I was lucky in that my PI put up with the delays as I learned almost from scratch) (2) many of the PIs delve into these projects with bad experimental design. End result, you see a lot of grad students posting online at places like seqanwers.com asking how to rescue a project that no biological replicates……

  • Steviepinhead

    There’s only one ‘k’ in “grok.”

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com

ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »