Evolutionary genetics as a field emerged in the early 20th century. There were some upsides to this. R. A. Fisher was alive, so there were some incredibly brilliant theoretical minds who could focus upon the project of formalizing evolutionary process and fusing it with Mendelian genetics. And, frankly there are situations where data-free theorizing is best because that sort of theorizing at least is blind to what the solutions should be. But there were also many downsides to this early flowering of theoretical evolutionary biology. The reality that biologists were not clear as to the nature of the biomolecular substrate of inheritance, DNA, was not a hindrance for most of the high level abstraction. But to trace patterns of transmission of characters, and implicitly genotypes, within populations researchers relied upon classical phenotypic markers. This means that the theoretical speculation advanced rapidly into confusing and tendentious terrain, while the empirical data sets to test the questions at issue were simply not sufficient to resolve the debates. The emergence of molecular markers in the 1960s, and the maturation of genomics in the 2000s, has revolutionized the empirical domain of evolutionary genetics. To use a rough analogy the large data sets of the present offer up raw material for the machinery of theory to sift, process, and refine.
A new paper in Nature is a perfect illustration of this, Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations:
The dynamics of adaptation determine which mutations fix in a population, and hence how reproducible evolution will be….Here we use whole-genome whole-population sequencing to examine the dynamics of genome sequence evolution at high temporal resolution in 40 replicate Saccharomyces cerevisiae populations growing in rich medium for 1,000 generations. We find pervasive genetic hitchhiking: multiple mutations arise and move synchronously through the population as mutational ‘cohorts’. Multiple clonal cohorts are often present simultaneously, competing with each other in the same population. Our results show that patterns of sequence evolution are driven by a balance between these chance effects of hitchhiking and interference, which increase stochastic variation in evolutionary outcomes, and the deterministic action of selection on individual mutations, which favours parallel evolutionary solutions in replicate populations.
The specific question here falls under the general set of queries relating to the trajectory of mutations within populations. A stylized model may be that within large populations a favored mutation emerges periodically. Not at a uniform rate, but one defined by a poisson distribution. This basically means that it is a rare event, with the variation of the occurrence being approximately the same order of magnitude as the frequency of the event. A perfect “spherical cow” model might be one where a sequence of favored mutations emerges, and rapidly sweeps up to fixation (frequency 0→1.0), one after the other as independent events. Conveniently these independent events can be analyzed with simpler models than a more cluttered space of various favored mutations crowding each other out.
There are some issues with this model on the face of it. First, multiple mutations may emerge at the same time. There’s nothing in nature that prevents this, even if it is theoretically inconvenient. Second, these mutations are embedded in the physical genome, which is arranged sequentially. The favored variant is flanked by a large region of sequence with which it is “linked.” Therefore the rise up in frequency is going to bring along other variants in their sweep through the hitchhiking process. This second phenomenon illustrates the stochasticity of selection itself. Recall that random genetic drift changes allele frequencies due to conventional sampling processes, with greater variation generation to generation across small populations. But even notionally deterministic forces such as selection, which favor particular alleles in a biased manner, are going to have random effects because there is no rhyme or reason to their flanking regions.
A major step forward in this paper is that the authors combined the large population sizes available in a model organism like S. cerevisiae with whole-genome analysis. The latter allows them to pick up favored mutational variants, and, also to annotate them, and examine them for their patterns. Basically they looked at 40 haploid yeast lineages over 1,000 generations (haploid yeast can recombine) where mutations of frequency 0.10 were already known, and performed 100-fold coverage whole-genome sequencing. There were a total of 480 data points, so the sampling was around 80 generations apart. Their extensive time coverage allowing for further correction of false positives (sequencing errors), above and beyond the 100-fold coverage. The design seems relatively straightforward and elegant to me, though I would have preferred a bigger range of populations, as they compared 14 N=106 (large) to 26 N=105 (small). It seems likely that stochastic effects might be more discernible at lower N’s than what they looked at, but that’s surely for follow up papers.
As the curve above illustrates they found an excess of time points sampled where there were zero or more than the expected number of mutations (as defined by a poisson distribution). In an idealized model as I outlined above you’d have periodic novel mutations, with occasional clusters, tailing off rapidly as you increase the number segregating. These mutations would quickly sweep through the population to fixation. What the results here illustrate is that the real dynamic process seems more dispersed; more samples with no mutations and with many mutations than would be expected. This seems to be underpinned by the fact that in multiple cases you have combined mutations driving the same sweep. Not only does this increase the probability of fixation, but it also interferes with other mutational sweeps up. Increasing the population size seems to increase the multiple mutation scenario…but, it also results in more interference, so strangely these are less likely to fix in the population! (this where I would like a larger range in N’s to test how robust this prediction is)
There are many empirical results in this work that don’t fall into elegant verbal models which starkly present a “A Grand Unified Theory of Evolutionary Genetics”. So I thought I would sidestep quickly into the distribution of favored mutations as illustrated in table 2 of the paper. What you see here are variants found in genes where there were more mutations observed than by chance, so likely biasing it toward adaptively favored variants. Nonsynonymous mutations are those which change the amino acid. They are more numerous than silent mutations (synonymous), or those outside of genes (intergenic). You see here the weird pattern of less fixation within large populations due to interference because there are more favored mutations. Not shown in this table, but 24 genes were “hit” by mutations >=2 times, across multiple populations. These genes repeatedly targeted by selection had 141 mutations, but only one was synonymous. Of the rest there was an enrichment for frameshift or nonsense mutations, as opposed to missense. The latter alter one amino acid at a time, and many only modify protein function, rather than radically change or abolish it (as is likely in the first two cases). In other words “driver genes” which seem very likely to be subject to selection through multiple mutations over and over across populations tend to have drastic mutations. In addition, depending on functional category (e.g., mating vs. cell wall assembly) there were different proportions of missense vs. nonsense/frameshift mutations.
I have a hard time summarizing this sort of research in a few sentences. In any case the review of the results here are cursory at best. This is ultimately only the beginning of a huge area of evolutionary genetics which utilize the power of genomics to test decades old theories about general patterns. But, I wonder if perhaps what will be uncovered is that old and stale arguments about stylized verbal models will be shown to be without deep substance. For years public intellectuals such as Richard Dawkins and Stephen Jay Gould argued about the role of determinism and contingency in evolutionary biology. What results such as the above are telling us is that all this commentary was irrelevant, because both chance and determinism have complex and interleaved roles in evolutionary process. The exact nature of this dance is to be empirically determined, though there is thankfully a robust theoretical scaffold already in place.