It is famously noted that when Charles Darwin published The Origin of Species he had no plausible theory of inheritance to drive his hypothesis. Specifically, one of the major issues of the “blending” model whereby the phenotypes of the parents average out in the subsequent generation is that such mixing eliminates the variation which is a necessary precondition for natural selection. At the same time that Darwin was revolutionizing our conceptualization of how the tree of life came to be, Gregor Mendel was preforming the experiments which solidified his eponymous theory of inheritance. Though ignored in his own day by ~1900 Mendelism reemerged and offered a relatively parsimonious abstraction which could explain why variation was not eliminated through the fusion of sexual reproduction. The discrete genes themselves were simply rearranged every generation in a digital manner, a genotype was translated into a phenotype, rather than the more analog model of phenotypic mixing which underpins a blending theory.* The fusion of genetics and quantitative evolutionary biology resulted in population genetics (see The Origins of Theoretical Population Genetics), while the cross-fertilization with ecology, natural history and paleontology eventually crystallized into what we would term the ‘Neo-Darwinian Synthesis’ by the middle of the 20th century.
And it was then that Francis Crick and James Watson elucidated specifically the biophysical substrate, DNA, through which Mendelian inheritance occurred. It was then that Crick also outlined his famous and infamous ‘central dogma,’ whereby information was transmitted unidirectionally from DNA to protein via RNA. While molecular biology was flowering the theorists who relied on the older abstractions were relatively unperturbed (see The Narrow Roads of Gene Land 1 by W. D. Hamilton). In Darwin’s Dangerous Idea the philosopher Daniel Dennett asserted that evolution was fundamentally substrate neutral; that is, how genetic information is transmitted biophysically is of less relevance than the abstract parameter of natural selection which operates upon the character of that information through the mediation of fitness and phenotype. In a broad philosophical sense this may be true. Assuming infinite population sizes and time this is indubitably so. But there is much that transpires from the beginning to the end, and more recent work has suggested that the physical realities and constraints of molecular function can not simply be abstracted away on a realistic time scale. It is I think somewhat peculiar to push the abstraction too far when speaking of biology in particular, because biological processes often operate under physical constraint or scarcity as a matter of course.
To understand evolution today in any non-trivial sense, that is, to understand evolution as a process which operates on scales shorter than the heat-death of the universe, it seems that one must consider the details of the substrate. In other words the great wall between molecular biology and evolutionary science must be buried once and for all. We have come far from the isolated alleles operating in a statistical sea of random variation which R. A. Fisher conceived of when he attempted to reformulate Darwin’s theories so that they were as precise and crisp as the laws of thermodynamics (see The Genetical Theory of Natural Selection). The recent debates between Sean Carroll and Michael Lynch (or Sean Carroll and Jerry Coyne) put into sharp relief the relevance of substrate, the importance of gene regulation and particularly cis-regulatory elements.**
Gene regulation entails the modulation of the expression of some genes by other genes, by any means possible. A new letter to Nature gives us a possible taste of the future, using the familiar HapMap data set to explore variation in gene expression, Understanding mechanisms underlying human gene expression variation with RNA sequencing:
Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal1. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project…By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.
The mapping of a genotype to a phenotype through the production of proteins is complex. All the cells in your body have the same set of genes, but they obviously express differently. If you have a background in biology you will be probably recall examples of this issue in the case of the liver, whose fine tune balance is essential toward our health. But think of something more prosaic, some haplotypes around the HERC2-OCA2 locus seem to correlate with somewhat lighter skin color, and also result in blue eyes. Pigmentation genes seem to vary in how they express (or don’t express) in various tissues, primarily the eyes, skin and hair.
Add to this the tangle that is RNA splicing in eukaryotes, and it gets very complicated indeed. The appeal of Fisherian abstraction is very strong, but after nearly one century of abstracting away the concrete I suspect to genuinely understand how the tree of life came to be we may have to understand its physical accidents in more depth. The paper finishes with an observation on the importances of SNPs around splice site:
We proposed that, as in the example described earlier, the mechanism of many of these associations acts through disruption of the splicing machinery. To test this, we extended a Bayesian hierarchical model used previously to include exon-specific effects…This model allows us to estimate the odds ratio for different types of SNPs to affect splicing. First, we considered the binding sites for the U1 small nuclear ribonucleoprotein (snRNP) and U2AF splice factor (of which the canonical splice sites are a part25); we found that SNPs throughout these binding sites are highly enriched among sQTLs relative to non-splice site intronic SNPs…We considered whether SNPs within the canonical 2 bp of the splice site alone are enriched for sQTLs; we find that they are…in contrast to previous studies using exon microarrays…Furthermore, SNPs within the spliced exon itself are also significantly enriched among sQTLs and, as expected, non-genic SNPs are markedly under-represented among sQTLs….
Not too surprising that the QTLs of note are near locations which we know to be importance in a molecular genetic context. Obviously we’ll have to get much further in understanding variation on this level of complexity before we can talk much about evolution. But if we want to understand something like height with any greater depth than Francis Galton I suspect that the long climb is just beginning….
Citation: Pickrell, JK et al., Understanding mechanisms underlying human gene expression variation with RNA sequencing, doi:10.1038/nature08872
* I am aware that there were many theories of inheritance between Darwin and Mendelism.