One of the elementary aspects of understanding genetics on a biophysical scale is to characterize the set of processes which span the chasm between the raw sequence information of base pairs (e.g. AGCGGTCGCAAG….) and the assorted macromolecules which are woven together to create the collection of tissues, and enable the physiological processes, which result in the organism. This suite of phenomena are encapsulated most succinctly in the often maligned Central Dogma of Molecular Biology. In short, the information of the DNA sequence is transcribed and translated into proteins. Though for greater accuracy and precision one must always add the caveats of phenomena such as splicing. The baroque character of the range of processes is such an extent that molecular genetics has become a massive enterprise, to a great extent superseding classical Mendelian genetics.
One critical structural detail from an evolutionary perspective is that the amino acids which are the building blocks of proteins are generally encoded by multiple nucleotide triplets, or codons. For example the amino acid Glyceine is “four-fold degenerate,” GGA, GGG, GGC, GGU (for RNA Uracil, U, substitutes for Thymine in DNA, T), all encode it. Notice that the change is fixed upon the third position in the codon. Altering the first or second position would transform the amino acid end product, and possibly perturb the function of the final protein (or perhaps disrupt transcription altogether in some case). These are synonymous substitutions because they don’t change the functional import of the sequence, as opposed to the nonsynonymous positions (which may abolish or change function). In an evolutionary context one may presume that these synonymous substitutions are “silent.” Because natural selection operates upon heritable variation of a phenotype, and synonymous substitutions presumably do not change phenotype, it is often assumed that evolutionary change on these bases is selectively neutral. In contrast, nonsynonymous changes may be deleterious or beneficial (far more likely the former than the latter because breaking contingent complexity is easier than creating new contingent complexity). Therefore the ratio of gentic change on nonsynonymous and synonymous bases across lineages has been a common measure of possible selection on a gene.
At this point I have sketched out in the most superficial sense a set of propositions which span the concrete physical realm of the biochemical mechanics of DNA to the abstract formal evolutionary genetic models which outline the trajectory of allele frequencies over time and space. But propositions are always embedded in axioms, and those axioms may not always be literally true. For example some codons, which are notionally equivalent in terms of their amino acid output, are favored due biases derived from the various efficiencies of the translational machinery of the cell. After a fashion this too is natural selection, but it does not manifest via fitness of individual organisms at some stage of life history in a straightforward fashion. Then there are cases where synonymous mutations change the regulatory pathway in a significant manner. And so on. Despite all these deviations from the ideal presumably the preponderance of researchers accept that the utility of neutral framework for synonymous mutations allowed for the prior assumption that they were not subject to selection.
A new paper in PLoS GENETICS, Strong Purifying Selection at Synonymous Sites in D. melanogaster, takes aim at the robustness of this axiom by highlighting the likelihood that many synonymous positions in Drosophila are subject to strong purifying selection. That is, a putative silent transition produces significant functional differences which result in a major decrease in the fitness of the organisms, removing the mutant alleles from the pool of polymorphisms. Note the key qualifier here that the selection is strong. Dynamics such as mutational bias and regulatory differences mean that many would acknowledge a weak and gentle purifying selection on even synonymous sites. These authors contend something rather more radical.
To be frank the paper is rather abstruse and dense in its prose, though impressive in its disciplinary breadth, ranging from statistical genetics to developmental biology. But the core result can be boiled down to raw counts of SNPs. In particular they compared introns, which like synonymous sites are putatively neutral because they are not part of the final RNA transcript which generates the protein, as a reference against which to check their sites of interest. Though subtle you can observe in the panel at the top of this post that here seems some deviation from neutrality in the 4D (for four-fold degenerate) sites. It is clearer in the second panel above. The synonymous sites seem have less genetic variation than they should. This is a tell for purifying selection, which removes low frequency deleterious mutations from the population continuously. But why is this strong selection? The issue highlighted by the authors is that the data sets from previous research were simply not dense and rich enough to distinguish between strong and weak purifying selection, as on a coarser scale of analysis the effects would be rather similar. In contrast here the authors used more than 100 Drosophila lines, and assembled nearly 1 million 4D SNPs. With such a deep sampling of the population they were able to probe even small differences, as strong selection would be discernibly more effective in flushing out very low frequency alleles (consider that in smaller samples low N variants are simply likely to be missed).
Being a paper in PLoS GENETICS it is free for all to read, so I will save you all the gory details in terms of how they corrected for biases of GC content, possible selective sweeps distorting the signal from flanked regions, etc. They were able to use resampling techniques to confirm the robustness of their inferences, though the slicing of the data into numerous categories does concern me a bit. Additionally there is mention of utilizing “parsimony,” which is somewhat concerning, in particular due to the fact that the authors even concede that this may produce false conclusions. But the big picture result is rather impressive even if the details have a daunting number of moving parts. I should mention as well that they explored the possible role that codon bias might have in generating this pattern, and that does not seem likely (in particular because purifying selection seems to effect optimal and non-optimal codons). And, there were some rather strange results too, such as their finding that purifying selection was weaker on the X chromosome than the autosome (contrary to mine, and I think their, expectation).
The “back end” of the paper is different in that it analyses the functional and developmental aspects of the genome regions of interest (4D sites). They report for example that purifying selection is operating on conserved sites across Drosophila species. Not surprising. But there also seems a significant amount of substitution and change on sites across lineages which are subject to purifying selection within lineages. This hints to gain of function of mutations which distinguish Drosophila species. Finally, are also broad patterns as to the temporal distribution of gene expression as they relate to 4D sites which are strongly conserved. As I am not well versed in developmental biology I will leave that to others, though the results seem suggestive, if opaque to me.
One paper does not overthrow 40 years of molecular evolution. And even if some of the primary assumptions and results validating neutral theory are wrong, that does not negate the utility of neutrality as a null hypothesis. But if synonymous sites are taken as a benchmark for neutrality, and have been subject to strong purifying selection all the while, then it does mean that our understanding of the balance of forces shaping the evolutionary genetic history of Drosophila may be quite wrong. The qualifier about Drosophila is I think warranted, because from what I recall earlier results reported ubiquitous selection in this model organism, and that may not hold for all taxa. The authors make the case for the generality of their results, and they may be right, but I think one should be more cautious about such claims. What this does tell us is that modern genomics and the scaling up of data is not revealing nature on just a finer scale, but may actually be smoking out structure and patterns which have long been hiding in plain site.
Citation: Lawrie DS, Messer PW, Hershberg R, Petrov DA (2013) Strong Purifying Selection at Synonymous Sites in D. melanogaster. PLoS Genet 9(5): e1003527. doi:10.1371/journal.pgen.1003527