Razib’s daughter’s ancestry composition
Genome-wide associations are rather simple in their methodological philosophy. You take cases (affected) and controls (unaffected) of the same genetic background (i.e. ethnically homogeneous) and look for alleles which diverge greatly between the two pooled populations. Visually the risk alleles, which exhibit higher odds ratios, are represented via Manhattan plots. But please note the clause: ethnically homogeneous study populations. In practice this means white Europeans, and to a lesser extent East Asians and African Americans (the last because of the biomedical industrial complex in the United States performs many GWAS, and the USA is a diverse nation). Looking within ethnic groups eliminates many false positives one might obtain due to population stratification. Basically, alleles which differ between groups because of their history may produce associations when the groups themselves differ in the propensity of the trait of interest (e.g. hypertension in blacks vs. whites).
Two of the main avenues of research which I track rather closely in this space are genome-wide association studies (GWAS), which attempt to establish a connection between a trait/disease and particular genetic markers, and inquiries into the evolutionary parameters which shape the structure of variation within the human genome. Often with specific relation to a particular trait/disease. By evolutionary parameters I mean stochastic and deterministic forces; mutation, migration, random drift, and natural selection. These two angles are obviously connected. Both focus on phenomena which are proximate in relation to the broader evolutionary principle: the ultimate raison d’être, replication. Stochastic forces such as random genetic drift reflect the error of sampling of genes from generation to generation during the process of reproduction, while adaptation through natural selection is an outcome of the variation of reproductive fitness as a function of variation of heritable traits. Both of these forces have been implicated in diseases and traits which come under the purview of GWAS (and linkage mapping).
GWAS are regularly in the news because of their relevance in identifying the causal genetic factors for specific diseases. For example, schizophrenia. But they can be useful in a non-disease context as well. Human pigmentation is a character whose genetic architecture has been well elucidated thanks to a host of recent association studies. The common disease-common variant has yielded spectacular results for pigmentation; it does seem a few common variants are responsible for most of the variation on this trait. But this has been the exception rather than the rule.
One reason for this disjunction between the promise of GWAS and the concrete tangible outcomes is that many traits/diseases of interest may be polygenic and quantitative. This implies that variation in phenotype is controlled by variation across many genes, and, that the variation itself exhibits gradual continuity (a continuity which can be modeled as a normal distribution of values). The power of GWAS to detect correlated variation across genes and traits of small marginal effect is obviously limited. In contrast, it seems that about half a dozen genes can explain most of the between population variation in pigmentation. One SNP is able to account for 25-40% of the difference in shade between Europeans and Africans. This SNP is fixed in Europeans, nearly absent in Africans and East Asians, and segregating in both ancestral and derived variants in groups such as South Asians and African Americans. In contrast, though traits such as schizophrenia and height are substantially heritable, much of the variation at the population level of the trait is explainable by variation in genes. The effect size at any given locus may be small, or the variation may be accumulated through the sum of larger effect variants of low frequency. In other words, many common variants of small effect, or numerous distinctive rare variants of large effect.
I recall projections in the early 2000s that 25% of the American population would be employed as systems administrators circa 2020 if rates of employment growth at that time were extrapolated. Obviously the projections weren’t taken too seriously, and the pieces were generally making fun of the idea that IT would reduce labor inputs and increase productivity. I thought back to those earlier articles when I saw a new letter in Nature in my RSS feed this morning, Hundreds of variants clustered in genomic loci and biological pathways affect human height:
Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits1, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait2, 3. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
The supplements run to nearly 100 pages, and the author list is enormous. But at least the supplements are free to all, so you should check them out. There are a few sections of the paper proper that are worth passing on though if you can’t get beyond the paywall.
Nature has two papers out about something called “Behçet’s disease.” It has apparently also been termed the “Silk Road Disease”, because of its associations with populations connected to the Central Eurasian trade networks.Though described by Hippocrates 2,500 years ago, apparently it was “discovered” only in the 20th century by a Turkish physician. The reason that that might be is obvious; the prevalence of Behçet’s disease is far higher in Turkey than any other nation. Two orders of magnitude difference between Northwest Europeans and Turks. East Asian populations are somewhere between Europeans and Turks, while the coverage of Inner Asia itself is thin (the first case diagnosed in Mongolia was in 2003). Additionally, the relatively similar frequency in Morocco and Iran, despite the latter nation being strong influenced by Turkic migration (25-30% of Iranian citizens are ethnically Turk), and the former not at all, leads to me wonder if there may be convergence or parallelism, rather than common ancestry, at work (or, more likely, a combination of both). The relationship between Morocco and Japan to the Silk Road in a direct fashion is tenuous at best. These were two polities which managed to be just outside the maximum expanse of Turanian empires. The Japanese famously repulsed the Mongol invasion ordered by Kublai Khan, while the Arab rulers of Morocco never fell under Ottoman control.And the early documentation by Hippocrates makes me wonder at the frequency of the disease in Greece itself. Greeks presumably contributed to the ancestry of modern Anatolian Turks, but it is far less likely because of the nature of the Ottoman system that Turks would have contributed to the ancestry of Greeks. I can’t find prevalence data for Greece, but it may be an open question in what direction the disease spread along the Silk Road.
But studies like these are nice because they are steps to overcoming one of the main issues with genome-wide associations: they use a narrow population sample, and so are not of necessary world wide relevance. Remember that even if a risk allele is not the direct cause of the disease, if it is closely associated with that alleles which are, it is of diagnostic utility. At least within that particular population. This study used groups from western and eastern Eurasia to check the power of particular single nucleotide polymorphisms (SNPs) to predict disease risk. First, Genome-wide association studies identify IL23R-IL12RB2 and IL10 as Behçet’s disease susceptibility loci:
It looks like Genomes Unzipped has their own Mortimer Adler, with an excellent posting, How to read a genome-wide association study. For those outside the biz I suspect that #4, replication, is going to be the easiest. In the early 2000s a biologist who’d been in the business for a while cautioned about reading too much into early association results which were sexy, as the same had occurred when linkage studies were all the vogue, but replication was not to be. Goes to show that history of science can be useful on a very pragmatic level. It can give you a sense of perspective on the evanescent impact of some techniques over the long run.