Category: Statistics

Of association & evolution

By Razib Khan | January 9, 2011 1:23 pm

Two of the main avenues of research which I track rather closely in this space are genome-wide association studies (GWAS), which attempt to establish a connection between a trait/disease and particular genetic markers, and inquiries into the evolutionary parameters which shape the structure of variation within the human genome. Often with specific relation to a particular trait/disease. By evolutionary parameters I mean stochastic and deterministic forces; mutation, migration, random drift, and natural selection. These two angles are obviously connected. Both focus on phenomena which are proximate in relation to the broader evolutionary principle: the ultimate raison d’être, replication. Stochastic forces such as random genetic drift reflect the error of sampling of genes from generation to generation during the process of reproduction, while adaptation through natural selection is an outcome of the variation of reproductive fitness as a function of variation of heritable traits. Both of these forces have been implicated in diseases and traits which come under the purview of GWAS (and linkage mapping).

GWAS are regularly in the news because of their relevance in identifying the causal genetic factors for specific diseases. For example, schizophrenia. But they can be useful in a non-disease context as well. Human pigmentation is a character whose genetic architecture has been well elucidated thanks to a host of recent association studies. The common disease-common variant has yielded spectacular results for pigmentation; it does seem a few common variants are responsible for most of the variation on this trait. But this has been the exception rather than the rule.

One reason for this disjunction between the promise of GWAS and the concrete tangible outcomes is that many traits/diseases of interest may be polygenic and quantitative. This implies that variation in phenotype is controlled by variation across many genes, and, that the variation itself exhibits gradual continuity (a continuity which can be modeled as a normal distribution of values). The power of GWAS to detect correlated variation across genes and traits of small marginal effect is obviously limited. In contrast, it seems that about half a dozen genes can explain most of the between population variation in pigmentation. One SNP is able to account for 25-40% of the difference in shade between Europeans and Africans. This SNP is fixed in Europeans, nearly absent in Africans and East Asians, and segregating in both ancestral and derived variants in groups such as South Asians and African Americans. In contrast, though traits such as schizophrenia and height are substantially heritable, much of the variation at the population level of the trait is explainable by variation in genes. The effect size at any given locus may be small, or the variation may be accumulated through the sum of larger effect variants of low frequency. In other words, many common variants of small effect, or numerous distinctive rare variants of large effect.

Read More

Bayes & Out-of-Africa vs. Alan Templeton

By Razib Khan | April 26, 2010 8:00 am

Alan Templeton, whose text Population Genetics and Microevolutionary Theory is right below Hartl & Clark in my book, recently published a strongly worded paper, Coherent and incoherent inference in phylogeography and human evolution. The possibility of statistical errors in published work is not shocking, I have heard that when statisticians are asked to sort through papers in medical genetics journals there are elementary errors in ~3/4 of those which have made it beyond peer review. That being said Templeton seems to be making a stronger case than simple refutation of basic errors, in particular he is suggesting that the “ABC” method which lay at the heart of the paper I reviewed last week is incoherent at the root. Here’s Templeton’s abstract:

A hypothesis is nested within a more general hypothesis when it is a special case of the more general hypothesis. Composite hypotheses consist of more than one component, and in many cases different composite hypotheses can share some but not all of these components and hence are overlapping. In statistics, coherent measures of fit of nested and overlapping composite hypotheses are technically those measures that are consistent with the constraints of formal logic. For example, the probability of the nested special case must be less than or equal to the probability of the general model within which the special case is nested. Any statistic that assigns greater probability to the special case is said to be incoherent. An example of incoherence is shown in human evolution, for which the approximate Bayesian computation (ABC) method assigned a probability to a model of human evolution that was a thousand-fold larger than a more general model within which the first model was fully nested. Possible causes of this incoherence are identified, and corrections and restrictions are suggested to make ABC and similar methods coherent. Another coalescent-based method, nested clade phylogeographic analysis, is coherent and also allows the testing of individual components of composite hypotheses, another attribute lacking in ABC and other coalescent-simulation approaches. Incoherence is a highly undesirable property because it means that the inference is mathematically incorrect and formally illogical, and the published incoherent inferences on human evolution that favor the out-of-Africa replacement hypothesis have no statistical or logical validity.

The method which Templeton favors is naturally one which he has pushed in the past. In any case, I don’t know the statistical details well enough to comment with much knowledge, but I see that a statistician has responded to Templeton already, so I would recommend checking that out. I immediately went looking for responses because the paper uses really strong and dismissive language, and I am somewhat wary of that sort of thing when attempting to tear down the fundamentals of a whole field of research (I want to emphasize that overall I enjoy Templeton’s work, but the paper reminded me a bit too much of Jerry Fodor). His citation of Popper in particular seems an appeal to authority that aims to convince the non-statisticians in the audience, and I don’t see the point of that besides rhetorical utility. I do tend to accept somewhat Templeton’s critique of models which assume very little gene flow between hominin populations before the Out-of-Africa migration, though from what I can tell it does seem that Africa has had relatively little back-migration south of the Sahara over the past 50,000 years, so perhaps this is an older dynamic as well. I am cautiously optimistic that DNA extraction from fossils themselves may put to bed some of these arguments over the dance of parameters, though naturally interpretation is always an issue outside of pure mathematics.

For what it’s worth, here’s the model which Templeton’s method favors:
Read More


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar

Login to your Account

E-mail address:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »