Natural selection happens. It was hypothesized in copious detail by Charles Darwin, and has been confirmed in the laboratory, through observation, and also by inference via the methods of modern genomics. But science is more than broad brushes. We need to drill-down to a more fine-grained level to understand the dynamics with precision and detail, and so generate novel inferences which may then be tested. For example, there are various flavors of natural selection: stabilizing selection, negative selection, and positive directional selection. In the first case natural selection buffets the phenotype about an ideal mean, in the second case deleterious phenotypes and their associated alleles are purged from the genome, and finally, natural selection can also drive a novel trait toward greater prominence, and concomitantly the allelic variants which are associated with the fitter phenotype.
The last case is of particular interest to many because it is often with positive natural selection by which evolution as descent with modification occurs. Over time trait values and the nature of traits themselves shift such that a lineage changes its character beyond recognition. This phyletic gradualism and the scale independence of evolutionary process has been challenged, in particular from the domain of developmental biology (albeit, not all ,or even most, developmental biologists). But ultimately no one doubts that a classical understanding of evolution as change in allele frequency, often driven by natural selection, is part of the larger puzzle of how the tree of life came to be.
One of the phenomena associated with positive directional evolution is the selective sweep. How a selective sweep occurs, and its consequences, are rather straightforward. A genome consists of a sequence of base pairs (e.g., we have 3 billion base pairs). If a new mutation emerges at a particular base pair, a novel single nucelotide polymorphism (SNP), and, that allelic variant is ~10% fitter than the ancestral variant, natural selection could drive up its frequency (the conditionality is due to the fact that in all likelihood it would still go extinct because of the power of stochastic forces when a mutant is at low frequency). So the variant could in theory shift from ~0% (1 out of N, N being the number of individuals in a population, 2N if diploid, and so forth) to ~100%. This would be the fixation of the novel variant, driven by selective dynamics. So what’s the sweep aspect? The sweep in this case refers to the effect of the very rapid rise in frequency of the SNP in question on the adjacent genomic region. What is termed a genetic hitchiking dynamic results if the sweep occurs rapidly, so that nearby regions of the genome also move to fixation along with the favored SNP. But in a diploid organism with sexual reproduction genetic recombination persistently breaks apart associations across the physical genome. Therefore the span of the sequence of genetic markers nearby a favored SNP which form a haplotype is dependent on the rate of recombination as well as the rate of the rise in frequency of the allele, which is contingent on the strength of selection. A powerful selective sweep has the effect of homogenizing wide regions of the genome flanking the favored mutant; in other words the sweep “cleans” the gene pool of variation as one very long haplotype replaces many shorter haplotypes. As an example, in the genomes of Northern Europeans the locus LCT is characterized by a very long haplotype, which itself seems to correlate well with the trait of lactase persistence. The implication here is that the lactase persistence conferring variant arose relatively recently, and was swept up to near fixation by positive directional natural selection.
That’s the broad theory. But as you know, evolution and its subcomponents are more than “just a theory,” they’re a set of models which are amenable to testing, whether through observation, or via controlled laboratory experiments. A new letter to Nature elaborates how exactly selective sweeps play out in Drosophila melanogaster, a classic “model organism.” Interestingly, this is a case of experimental evolution, something we are more familiar with Richard Lenski’s E. coli. Genome-wide analysis of a long-term evolution experiment with Drosophila:
Experimental evolution systems allow the genomic study of adaptation, and so far this has been done primarily in asexual systems with small genomes, such as bacteria and yeast…Here we present whole-genome resequencing data from Drosophila melanogaster populations that have experienced over 600 generations of laboratory selection for accelerated development. Flies in these selected populations develop from egg to adult ~20% faster than flies of ancestral control populations, and have evolved a number of other correlated phenotypes. On the basis of 688,520 intermediate-frequency, high-quality single nucleotide polymorphisms, we identify several dozen genomic regions that show strong allele frequency differentiation between a pooled sample of five replicate populations selected for accelerated development and pooled controls. On the basis of resequencing data from a single replicate population with accelerated development, as well as single nucleotide polymorphism data from individual flies from each replicate population, we infer little allele frequency differentiation between replicate populations within a selection treatment. Signatures of selection are qualitatively different than what has been observed in asexual species; in our sexual populations, adaptation is not associated with ‘classic’ sweeps whereby newly arising, unconditionally advantageous mutations become fixed. More parsimonious explanations include ‘incomplete’ sweep models, in which mutations have not had enough time to fix, and ‘soft’ sweep models, in which selection acts on pre-existing, common genetic variants. We conclude that, at least for life history characters such as development time, unconditionally advantageous alleles rarely arise, are associated with small net fitness gains or cannot fix because selection coefficients change over time
Critical to understanding what’s going on here is the distinction they make between ‘classic’ ‘hard sweeps’ and ‘soft sweeps.’ Hard sweeps follow the spare description I outlined above:
1) A new mutant arises in the genetic background
2) Selection favors the mutant
3) The mutant rises in frequency and sweeps to fixation, 0% → 100%, replacing the ancestral variants
In contrast, for a soft sweep:
1) Selection favors a set of minor polymorphisms already segregating in the gene pool
2) These polymorphisms rise in frequency
3) But they may not sweep to fixation
In the first case the signature of natural selection will be clear, distinct, and indubitable. A novel haplotype which has replaced the ancestral variants and produced a wide region of genetic homogeneity as all other allele states are expunged by the sweep will have resulted. That isn’t what they saw at the genomic level.
But first, what did they do? The flies used in this experiment derive from a 30 year old lineage, and they selected them for 600 generations in the case of the treatments which were being driven to new phenotype values. 600 generations for humans would be about 15,000 years assuming 25 years per generation. If a trait is heritable, and you select offspring deviated away from the mean, over time you will see a shift in the trait value. This is classic quantitative genetics, and that’s what they saw. They had five lineages which exhibited accelerated development (ACO), and five which were controls which exhibited the ancestral phenotypes (CO). “Eclosion” refers to the fly’s emergence from the pupae. The lineages which were subject to natural had very different life histories from the control groups. The cluster of traits here shouldn’t be too surprising, we know from other taxa that short-lived fast-developing species tend to be smaller and metabolically more under-the-gun than the inverse.
But the real interesting aspects of this study are not the phenotypes. Who hasn’t seen weird things among the Drosophila? That’s one of the reasons they were chosen as model organisms in the first place! Rather, they explored the patterns of genomic variation within and across the lineages, and integrated the results into a broader theoretical framework of how evolutionary processes occur, and their implications for the genome-wide structure one should see. Below I’ve stitched together figure 2 & 3, which illustrate particular patterns of genomic variation.
The left figure shows differences in allele frequencies between the ACO and CO pooled lineages. The spikes indicate large differences, with the dotted line representing the threshold where there’s a 0.1% random chance of such a between population frequency difference. The vertical axis is log-scaled. The grey line at the bottom indicate the differences in one particular ACO lineage with the pooled ACO sample. In the right panel you see heterozygosities, with blue denoting the CO lineages, and red the selected ACO lineages which have shortened life histories. The grey again is a particular ACO lineage. Each vertical panel corresponds to a chromosomal arm of the the Drosophila melanogaster genome.
First, note the widespread distribution of allele frequency differences between ACO and CO. Additionally, there’s little difference between the specific ACO lineage, and the pooled sample. Despite their independent histories they seem to exhibit the same allelic configuration. Second, note that the heterozygosities in the case of the ACO pooled sample is lower than in the CO ancestral phenotype lineages. Why? Remember that selective sweeps should expunge genomic variation. But, the sweeps do not seem to have gone to fixation, otherwise we’d see many more inverted peaks converging to heterozygosity of ~0, as the selected variant replaces all others in the population.
What’s going on in the regions which exhibit differences between the controls and selected linages? They looked at the ~650 non-synonymous SNPs on ~500 genes which were most differentiated between ACO and CO (L10FET score > 4) and found the following categories of genes enriched: imaginal disc development, smoothened signalling pathway, larval development, wing disc development, larval development (sensu Amphibia), metamorphosis, organ morphogenesis, imaginal disc morphogenesis, organ development and regionalization. Life history is complex. Combine the wide class of genes with the dispersed genomic impact of selection as evident in figures 2 and 3, you get a good sense of the sort of consequences on the substrate level which quantitative genetic evolutionary dynamics have. Also of interest, they found that the X chromosome seemed enriched for signatures of selection and evolution. Why? They note that this chromosome would be more subject to selection for recessive or partially recessive expressing SNPs.
Clearly this study did not find the clean hard sweeps which theory may have predicted. Rather, the researchers found a lot of partially completed sweeps distributed all across the genome. Sound familiar? Before we move on to broader considerations, here are their explanations:
- The sweeps are hard, but haven’t reached fixation. So the selection coefficients have be rather small for them to still be in transient
- Selection is operating on “standing variation.” That is, the genetic variation extant naturally within a given population, and which may be operated upon by natural selection to change the population trait value mean through classical breeding techniques
- And finally, selection coefficients (the greater fitness of positively selected variants against the population mean) may not be static parameters, but change over time as a function of allele frequency. This shouldn’t be that surprising. Frequency dependence and epistasis can impact on linear assumptions within a statistical genetic model. The authors refer to deleterious alleles or antagonistic pleiotropy as possible genetic level forces which also prevent fixation
I personally lean against the first option, because it seems like we see a similar pattern in human evolutionary genomics, lots of partial sweeps and incomplete fixation. How much time does a brother need? In the long run we’re dead, and heat death swallows the universe. In the short run evolutionary pressures are always shifting. Fix now, or forget it say I! The wide distribution of allelic differences as well as moderate heterozygosities seems to be an indication that a quantitative trait, life history, is being modified through mass action on genetic variation. Interestingly, there’s also the parallel to humans insofar as the X chromosome seems to have more signatures of selection and variation in this evolutionary experiment. Next question: who’s working on experimental evolution of 600 generations in mice?
Citation: Burke, Molly K., Dunham, Joseph P., Shahrestani, Parvin, Thornton, Kevin R., Rose, Michael R., & Long, Anthony D. (2010). Genome-wide analysis of a long-term evolution experiment with Drosophila Nature : 10.1038/nature09352
Image Credit: Karl Magnacca