The Pith: The rarer the genetic variant, the more likely that variant is to be specific to a distinct population. Including information about the distribution of these genetic variants missed in current techniques can increase greatly the precision of statistical inferences.
A few days ago I mentioned in passing an article in The New York Times which reported on results from a paper which illustrated how starkly differentiated populations might be on rare alleles. By this, I mean that some genetic variants are present at very low frequencies. It turns out that many of these are low frequency variants private to particular populations, in contrast to higher frequency variants which span varied human populations. The explanation presented by one of the authors of the referenced paper was that higher frequency variants presumably date back to a time before human populations had become geographically diversified across the world. Shared variants at higher frequencies then are shadows of shared past history. In contrast, rare variants are a reflection of more recent events, narrowing the circle of those effected.
I have now read the paper in question, Demographic history and rare allele sharing among human populations. From what I can gather The New York Times article was really an elaboration upon some of the issues which were highlighted in the discussion. The “meat” of the paper in terms of methods and results is actually rather technical and deeply embedded in the language of mathematical statistics. For example:
After further consideration, I have decided that I shall spare you my own clumsy exposition in plain English as to the details of site frequency spectrum calculations. There are after all enough points of interest in the paper at which I can throw my verbal talents more effectively. First, the abstract:
Two of the main avenues of research which I track rather closely in this space are genome-wide association studies (GWAS), which attempt to establish a connection between a trait/disease and particular genetic markers, and inquiries into the evolutionary parameters which shape the structure of variation within the human genome. Often with specific relation to a particular trait/disease. By evolutionary parameters I mean stochastic and deterministic forces; mutation, migration, random drift, and natural selection. These two angles are obviously connected. Both focus on phenomena which are proximate in relation to the broader evolutionary principle: the ultimate raison d’être, replication. Stochastic forces such as random genetic drift reflect the error of sampling of genes from generation to generation during the process of reproduction, while adaptation through natural selection is an outcome of the variation of reproductive fitness as a function of variation of heritable traits. Both of these forces have been implicated in diseases and traits which come under the purview of GWAS (and linkage mapping).
GWAS are regularly in the news because of their relevance in identifying the causal genetic factors for specific diseases. For example, schizophrenia. But they can be useful in a non-disease context as well. Human pigmentation is a character whose genetic architecture has been well elucidated thanks to a host of recent association studies. The common disease-common variant has yielded spectacular results for pigmentation; it does seem a few common variants are responsible for most of the variation on this trait. But this has been the exception rather than the rule.
One reason for this disjunction between the promise of GWAS and the concrete tangible outcomes is that many traits/diseases of interest may be polygenic and quantitative. This implies that variation in phenotype is controlled by variation across many genes, and, that the variation itself exhibits gradual continuity (a continuity which can be modeled as a normal distribution of values). The power of GWAS to detect correlated variation across genes and traits of small marginal effect is obviously limited. In contrast, it seems that about half a dozen genes can explain most of the between population variation in pigmentation. One SNP is able to account for 25-40% of the difference in shade between Europeans and Africans. This SNP is fixed in Europeans, nearly absent in Africans and East Asians, and segregating in both ancestral and derived variants in groups such as South Asians and African Americans. In contrast, though traits such as schizophrenia and height are substantially heritable, much of the variation at the population level of the trait is explainable by variation in genes. The effect size at any given locus may be small, or the variation may be accumulated through the sum of larger effect variants of low frequency. In other words, many common variants of small effect, or numerous distinctive rare variants of large effect.
Over the past decade evolutionary geneticist Mike Lynch has been articulating a model of genome complexity which relies on stochastic factors as the primary motive force by which genome size increases. The argument is articulated in a 2003 paper, and further elaborated in his book The Origins of Genome Architecture. There are several moving parts in the thesis, some of which require a rather fine-grained understanding of the biophysical structural complexity of the genome, the nature of Mendelian inheritance as a process, and finally, population genetics. But the core of the model is simple: there is an inverse relationship between long term effective population size and genome complexity. Low individual numbers ~ large values in terms of base pairs and counts of genetic elements such as introns.