Some of the topics that I discuss in this space may seem abstruse, but really they’re often elaborations upon rather elementary basic models of the world. When it comes to a subject like evolutionary genetics deep thinking extending from a few simple conceptual anchors yields great insight. Those anchors trace back to the foundations of Mendelian genetics. For diploid organisms the law of equal segregation states that of the two gene copies organisms have there is an equal probability of contribution of either to their offspring. This explains the simple power of Punnett squares and the inheritance patterns of recessive traits. The law of independent assortment states that genes (and therefore implicitly Mendelian traits) are passed independently from each other from parents to offspring. These abstractions are concretized on the cytological and molecular genetic scale during meiosis, as homlogous chromosomes which are composed of packed sequences of genes partition themselves into separate haploid gametes (sperm and egg*). Early in meiosis, during prophase 1, crossing-over between homologs results in genetic recombination, which preserves the law of independent assortment even when genes are on the same chromosome by breaking apart associations between specific physical genetic regions which might exhibit co-inherited distinctiveness (if the genes are very close they are linked).
These patterns of classical Mendelian genetics served as the basis for analysis and abstraction which preceded the molecular understanding of these patterns by decades. To move to the stage where I often put the spotlight in this blog, population genetics, one must imagine repeated meioses across sets of individuals. In some ways Mendelian dynamics can be recapitulated at the population genetic scale. If two parents are heterozygote carriers of a gene for a recessive trait there is an expectation that 1/4 of their offspring will exhibit the trait in question. But if the parents have only three offspring, obviously the expectation will not be met. Across the population there will be deviation from the expectation because of sampling noise within the meiotic processes and subsequent fertilization, but by aggregating across the population you can replicate the 3:1 ratio at the heart of recessive traits when heterozygotes are paired. This also falls out of the Hardy-Weinberg Equilibrium, p2 + 2pq + q2 = 1, where p = 0.5 and q = 1 – p, and q2 defines the fraction with recessive expression.
Population genetics can be thought of as an attempt to understand the flux and flow of the fractions of p and q. More traditionally it is the study of the change in allele frequencies modified by parameters such as mutation, migration, selection, and drift. An allele is a genetic variant, which arises through mutation. Today many people think of a mutation as a change in a DNA base, such as A → G. This is a single-nucleotide polymoprhism. But the idea of a mutation goes back decades before the understanding that DNA was the physical basis for genetic inheritance. It was initially observed via radical phenotypic transformations, albeit assumed to be coded by discrete genetic elements which were variable. Physically, mutations can be due to genetic duplication, or deletions of chromosomal regions, not just single base changes. One could argue that rearrangements of karyotype, the number and nature of chromosomes, are mutations as well. All of these have physical explanations. SNPs can emerge via faulty DNA repair. Duplication and deletion can occur during crossing-over and such. Meiosis is a complex and fraught process, and molecular geneticists have documented many ways by which it can produce copies of the genome with errors (often resulting in pathology). But for the purposes of population genetics what you need to remember is that mutations are changes, variation injected into the gene pool. Migration, like mutation, often introduces variation as well (though one can imagine situations where variation is reduced, such as the case of a population which is swamped out by an isogenic lineage).
Selection and drift tend to work in the other direction. They usually remove variation. Positive selection can sweep one allele to fixation, from zero to 100 percent. As a side effect due to genetic linkage flanking regions of the favored allele are also swept to fixation, resulting in a long homogeneous region of the genome identical to the copy carried by the individual who carried the first mutant. Negative selection favors homogeneity by putting a tamp down on the variation which mutation introduces, purifying the gene pool. Random genetic drift may seem an odd candidate for a parameter removing variation, but recall that allele frequencies which reaches zero or 100 percent result in loci which are not polymorphic and will remain in that state barring new variation such as mutation or migration. Over time without new variation all polymorphic sites will become monomorphic as one variant comes to dominate each locus. Drift is just a consequence of the sampling variance which you can see within meiosis itself. The law of equal segregation implies you have an equal probability of contributing either of your copies of a gene to your offspring, but with finite offspring you are likely to be somewhat biased toward one or the other allele by chance. Working over a whole population this sampling eventually results in consistent inequalities of transmission generation to generation. This is information lost through imperfect fidelity of replications of the gene pool of one generation to the next. But whereas this imperfection on a finer genetic scale introduces variation (mutation), pooling across the population you are losing variation as only a subset of individuals defining particular allele frequencies in a given generation give rise to the whole of the next generation.
Working up the ladder of abstraction I want to introduce now the concept of a site frequency spectrum. It is truly a simple idea, and is illustrated in the figure at the top of this post. If you imagine a population where individuals are reproducing, there are some common mutations, derived alleles different from the ancestral state, which are nearly fixed in the population. Others are at middling frequency, perhaps being maintained by some sort of balancing selecting. But an enormous number of mutations will be singletons, which help define an individual’s distinct mutational load due to de novo variants introduced during the process of meiosis. Many of these will immediately be lost due to chance, or perhaps removed through negative selection. What you see above is that the spectrum of derived mutations, by taking the frequency with specific counts (i.e., the first bar indicates the frequency of derived mutants which can be classed as singletons, the last the class where there are 15 copies segregating within the population). Demography and selection will perturb this distribution. Positive selection will probably skew it more toward the right, as there are more derived variants with many copies. Negative selection should skew it to the left, as mutants deviating from the ancestral state are selected against.
Genetics can seem complex, especially theoretical evolutionary models. But the reality is that these models are ultimately grounded in a finite number of relatively simple elements. The difficulty comes from the fact that these interlocking foundational units are numerous, and can be difficult to keep track of. One doesn’t have to know that crossing-over during genetic recombination occurs during synapsis, but at the end of the day the different dimensions of genetics are complementary, as they cross-link and construct a seamless whole.
* I am aware that only one of the four may develop into a gamete for the egg.