The trait of lactase persistence (lactose tolerance) is probably one of the better schoolbook examples of natural selection in human populations. The reasons for this are probably two-fold. There is a very strong signature of selection within a specific gene known to associate with the trait in question in many populations. And, there is a very compelling historical narrative which explains rather neatly how this particular functional change could have undergone such strong selection within the past ~5,000 years across these populations. But the elucidation of the origin and spread of this genetic adaptation is also interesting because it looks as if it was not a singular event. Populations as disparate as Arabians, Danes, and Masai seem to carry different alleles around the locus of interest which confer the ability to digest milk. This illustrates the fact when selection pressures have a viable target, there is a rapid response on the genomic level. At some point during the maturation of a mammal the regulatory pathway which produces lactase enzyme shuts down. Yet within numerous human populations this gradual shutdown process has been short-circuited.
The variety of response in relation to this adaptation was brought home to me as I read Diversity of Lactase Persistence Alleles in Ethiopia – Signature of a Soft Selective Sweep, in the latest issue of The American Journal of Human Genetics:
The two phylogenies above represent Mycobacterium tuberculosis, to the left, and human mitochondrial DNA (passed from mother to daughter) on the right. It was pulled from the paper, Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans, which just came out recently, and has naturally been making a splash. As the title implies the paper concludes that humans and tuberculosis have been each other’s “partners,” after a fashion, for the whole existence of modern humanity. The main method here is somewhat brute force and straightforward, by sequencing 259 tuberculosis strains from all across the world they managed to make relatively robust phylogeographic inferences. Throwing data at a question usually resolves something. The correspondence between human and pathogen strains is qualitatively uncanny, and there is plenty enough statistical footwork to confirm it more rigorously within the body of the text.
For many the image of evolutionary processes brings to mind something on a macro scale. Perhaps that of the changing nature of protean life on earth writ large, depicted on a broad canvas such as in David Attenborough’s majestic documentaries over millions of years and across geological scales. But one can also reduce the phenomenon to a finer-grain on a concrete level, as in specific DNA molecules. Or, transform it into a more abstract rendering manipulable by algebra, such as trajectories of allele frequencies over generations. Both of these reductions emphasize the genetic aspect of natural history.
Obviously evolutionary processes are not just fundamentally the flux of genetic elements, but genes are crucial to the phenomena in a biological sense. It therefore stands to reason that if we look at patterns of variation within the genome we will be able to infer in some deep fashion the manner in which life on earth has evolved, and conclude something more general about the nature of biological evolution. These are not trivial affairs; it is not surprising that philosophy-of-biology is often caricatured as philosophy-of-evolution. One might dispute the characterization, but it can not be denied that some would contend that evolutionary processes in some way allow us to understand the nature of Being, rather than just how we came into being (Creationists depict evolution as a religion-like cult, which imparts the general flavor of some of the meta-science and philosophy which serves as intellectual subtext).
There is the fact of evolution. And then there is the long-standing debate of how it proceeds. The former is a settled question with little intellectual juice left. The latter is the focus of evolutionary genetics, and evolutionary biology more broadly. The debate is an old one, and goes as far back as the 19th century, where you had arch-selectionists such as Alfred Russel Wallace (see A Reason For Everything) square off against pretty much the whole of the scholarly world (e.g., Thomas Henry Huxely, “Darwin’s Bulldog,” was less than convinced of the power of natural selection as the driving force of evolutionary change). This old disagreement planted the seeds for much more vociferous disputations in the wake of the fusion of evolutionary biology and genetics in the early 20th century. They range from the Wright-Fisher controversies of the early years of evolutionary genetics, to the neutralist vs. selectionist debate of the 1970s (which left bad feelings in some cases). A cartoon-view of the implication of the debates in regards to the power of selection as opposed to stochastic contingency can be found in the works of Stephen Jay Gould (see The Structure of Evolutionary Theory) and Richard Dawkins (see The Ancestor’s Tale): does evolution result in an infinitely creative assortment due to chance events, or does it drive toward a finite set of idealized forms which populate the possible parameter space?*
Evolutionary genetics as a field emerged in the early 20th century. There were some upsides to this. R. A. Fisher was alive, so there were some incredibly brilliant theoretical minds who could focus upon the project of formalizing evolutionary process and fusing it with Mendelian genetics. And, frankly there are situations where data-free theorizing is best because that sort of theorizing at least is blind to what the solutions should be. But there were also many downsides to this early flowering of theoretical evolutionary biology. The reality that biologists were not clear as to the nature of the biomolecular substrate of inheritance, DNA, was not a hindrance for most of the high level abstraction. But to trace patterns of transmission of characters, and implicitly genotypes, within populations researchers relied upon classical phenotypic markers. This means that the theoretical speculation advanced rapidly into confusing and tendentious terrain, while the empirical data sets to test the questions at issue were simply not sufficient to resolve the debates. The emergence of molecular markers in the 1960s, and the maturation of genomics in the 2000s, has revolutionized the empirical domain of evolutionary genetics. To use a rough analogy the large data sets of the present offer up raw material for the machinery of theory to sift, process, and refine.
A new paper in Nature is a perfect illustration of this, Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations:
One of the elementary aspects of understanding genetics on a biophysical scale is to characterize the set of processes which span the chasm between the raw sequence information of base pairs (e.g. AGCGGTCGCAAG….) and the assorted macromolecules which are woven together to create the collection of tissues, and enable the physiological processes, which result in the organism. This suite of phenomena are encapsulated most succinctly in the often maligned Central Dogma of Molecular Biology. In short, the information of the DNA sequence is transcribed and translated into proteins. Though for greater accuracy and precision one must always add the caveats of phenomena such as splicing. The baroque character of the range of processes is such an extent that molecular genetics has become a massive enterprise, to a great extent superseding classical Mendelian genetics.
One critical structural detail from an evolutionary perspective is that the amino acids which are the building blocks of proteins are generally encoded by multiple nucleotide triplets, or codons. For example the amino acid Glyceine is “four-fold degenerate,” GGA, GGG, GGC, GGU (for RNA Uracil, U, substitutes for Thymine in DNA, T), all encode it. Notice that the change is fixed upon the third position in the codon. Altering the first or second position would transform the amino acid end product, and possibly perturb the function of the final protein (or perhaps disrupt transcription altogether in some case). These are synonymous substitutions because they don’t change the functional import of the sequence, as opposed to the nonsynonymous positions (which may abolish or change function). In an evolutionary context one may presume that these synonymous substitutions are “silent.” Because natural selection operates upon heritable variation of a phenotype, and synonymous substitutions presumably do not change phenotype, it is often assumed that evolutionary change on these bases is selectively neutral. In contrast, nonsynonymous changes may be deleterious or beneficial (far more likely the former than the latter because breaking contingent complexity is easier than creating new contingent complexity). Therefore the ratio of gentic change on nonsynonymous and synonymous bases across lineages has been a common measure of possible selection on a gene.
Every now and then I’m asked about the ‘aquatic ape hypothesis’. My standard response is that there’s nothing to see, and everyone should just move on. But reading a new (open access) paper in Nature, Great ape genetic diversity and population history, it crossed my mind again. The reason is this section of the legend of figure 1, “The Sanaga River forms a natural boundary between Nigeria–Cameroon and central chimpanzee populations whereas the Congo River separates the bonobo population from the central and eastern chimpanzees.” I knew of the latter division. The former was novel to me. In fact I’d never even heard of the Sanaga river prior to this paper. Though the Congo seems clearly a significant geological and hydrological entity, I’m not quite so sure of the Sanaga. The division between the chimpanzees of Nigeria-Cameroon and those of the western Congo region may be one with an overdetermined number of causes. Nevertheless, taking these riverine features as a given parameters in generating allopatric speciation and subspecies level differences, I am struck by the contrast between ourselves and our cousins. In particular, the phylogeny above seems to imply that bonobos and common chimpanzees diverged on the order of ~2 million years ago, while the Nigeria-Cameroon population separated from the western Congo population ~500,000 years before the present (depending on the method of inference you rely on, though the qualitative insight here is preserved even if you switch them around). Though it took H. sapiens sapiens to break out of the world island of Afro-Eurasia, even our erectine cousins pushed on toward the southeastern extremities of Eurasia over 1 million years ago. It seems then that our savanna ape lineage is characterized by the behavior of wander lust and lack of fear of water.
The Y chromosome is strange. It’s gene poor and loaded with repeats. That’s one reason mtDNA phylogenetic and phylogeographic analysis preceded the Y chromosome by about 10-15 years (the other major reason in the pre-PCR age is that mtDNA is very copious). While the hypervariable region of mtDNA is an excellent molecular clock because of its high mutation rate (though at a deep enough time depth this causes problems, as bases start to turnover), in the pre-next generation sequencing era hunting around the Y chromosomes for SNPs was tedious (a significant portion of Spencer Wells’ Journey of Man focused on the nitty gritty of extraction and preparation).
Despite all this one of the weirder stories over the past decade in relation to the Y chromosome is the peculiar theory promoted by Oxford geneticist Bryan Sykes, and outlined in his book Adam’s Curse: A Future without Men. As I observed above the Y chromosome has a tendency to be filled up with genetic garbage (since it does not recombine deleterious mutations tend to accumulate). There are a few important functional regions (e.g., SRY), but there’s also a reason that sex-linked diseases occur: in most cases males have to rely on the X chromosome to pick up the slack for the Y. Extrapolating this genetic decay Sykes posited that human males would disappear within ~10 million years due to this process working its inevitable logic. Needless to say most scientists were skeptical. Extrapolating without seeing if the projections pass the sniff test is a fool’s errand. And in any case there’s no Law of Nature that sex determination has to be via the Y chromosome. Birds and reptiles have males despite a somewhat different sex determination system.
Sexual selection is a big deal. A few years ago Geoffrey Miller wrote The Mating Mind: How Sexual Choice Shaped the Evolution of Human Nature, which seemed to herald a renaissance of the public awareness of this evolutionary phenomenon, triggered in part by debates over Amotz Zahavi’s Handicap Principle in the 1970s. Of course Charles Darwin discussed the process in the 19th century, and it has always been part of the arsenal of the evolutionary biologist (I first encountered it in Jared Diamond’s The Third Chimpanzee, where he lent some credence to Darwin’s supposition that human racial differences may be a consequence of sexual selection). But this bump in recognition for sexual selection seems to be accompanied by its co-option as a deus ex machina for all sorts of unexplained events. And yet as they say, that which explains everything explains nothing.
To get a better sense of the current scientific literature I consulted A Guide to Sexual Selection Theory in the Annual Review of Ecology, Evolution, and Systematics. The image above is from an actual box in this review! Normally technical boxes illuminate with an air of superior authority (e.g. “it therefore follows from eq. 1…/”), but it seems to me that the admission that a parameter can be represented by the verbal assertion that it’s complicated tells us something about the state of sexual selection theory. In short: its formal basis is baroque because the dynamic itself is not amenable to easy decomposition.
It’s an exciting time for those interested in the evolutionary genomics of the dog. In 2010 a big SNP-array paper came out, Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Today we’re going whole genome, which is important because many of the SNP-arrays are ascertained on domestic dogs (i.e., they are designed to pick up dog variation, and so may distort our perception of the variation in wolves). Recently I talked about an analysis of the evolutionary genomics of the dog, The genomics of selection in dogs and the parallel evolution between dogs and humans. The main interesting result of that group was to push the divergence of the dog and wolf lineages further back in time, ~30,000 years, in line with some archaeological and mtDNA finds. I did not find their arguments for the origin of the dog in East Asian convincing. Now a new preprint on arXiv, Genome Sequencing Highlights Genes Under Selection and the Dynamic Early History of Dogs, pushes this even further.
Since the last post on genomic tools was a bit parochial, I figure it’s acceptable to put up this notice for the Bay Area Population Genomics meeting on June 8th. Registration closes on June 3rd (that is, Monday). Here’s the announcement:
We are excited to be hosting the 8th meeting of the Bay Area Population Genomics group at UCSF Mission Bay on June 8th! Thanks to support from Ancestry.com and the Institute for Quantitative Biosciences (QB3 @ UCSF), this conference will include breakfast and lunch. In addition, we will also have a reception during the poster session, so we highly encourage you to preview your work at BAPG before heading out to summer conferences.
Please register at http://tinyurl.com/a8h6uo8, and sign up to give a talk or poster. Registration is again free, but required by June 3rd.
There is paid parking in the lot/garage at the corner of 4th and 16th streets, and we have a limited number of parking passes for people that sign up to present and/or make a strong effort to carpool (please email me for details).
We are very much looking forward to seeing you at UCSF in a few weeks!
To the left is a figure which illustrates the phylogenetic inferences from a new paper in Nature Communications, The genomics of selection in dogs and the parallel evolution between dogs and humans (see Carl Zimmer’s coverage in The New York Times). Why is this paper important? The first thing that jumped out at me is that because they’re using whole genomes (~10X coverage) of a selection of dogs and wolves the results aren’t as subject to the bias of using “chips” of polymorphisms discovered in dogs on wolves (see: Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication). The second aspect is that the coalescence of the dog vs. wolf lineage is pushed further back in time than earlier genetic work, by a factor of three. A standard model for the origin of dogs is that they arose in the Middle East ~10,000-15,000 years ago , possibly as part of the broad shift of lifestyles which culminated in the Neolithic Revolution.
This model is now in serious question. Though there have always been claims of fossils of older domestic canids (adduced as such in terms of morphology) than the ones discovered in the Middle East ~15,000 years ago, this year there has been publication of ancient mtDNA results from ~30,000 years before the present which imply the separation of putative domestic and wolf lineages at least to that date. Over the past few years I have wondered about the specific nature of the emergence of both modern humans and modern dogs, and their co-evolutionary trajectory, over the Pleistocene and into the Holocene, in light of these results.
What a great age we live in. Until recently critical parameters in population genetics such as mutation rates had to be inferred and assumed, even though they served as bases for much more complex inferences. Now with humans (and humans are only the beginning!) much of what was inferred is being assessed in a more direct fashion. Caterina Campbell and Even Eichler have a review in Trends in Genetics which surveys the field as it stands now, Properties and rates of germline mutations in humans. Notice that there’s a rough convergence using pedigree analysis of a mutation rate in the low 10-8 range. Additionally, it does seem that a disproportionate number of novel mutations come through the paternal lineage via sperm. This should increase our moderate worry about older fathers (something reiterated in the piece, with caveats). Finally, the authors suggest these results are a floor for the mutational rate, in part due to the long term conflict with the inferred ‘evolutionary rates,’ which are higher. This matters because to infer the last common ancestors between lineages the value of the mutation rate is obviously critical.
E. O. Wilson has a op-ed in WSJ which I find quite interesting, Great Scientist ≠ Good at Math:
For many young people who aspire to be scientists, the great bugbear is mathematics. Without advanced math, how can you do serious work in the sciences? Well, I have a professional secret to share: Many of the most successful scientists in the world today are mathematically no more than semiliterate.
This imbalance is especially the case in biology, where factors in a real-life phenomenon are often misunderstood or never noticed in the first place. The annals of theoretical biology are clogged with mathematical models that either can be safely ignored or, when tested, fail. Possibly no more than 10% have any lasting value. Only those linked solidly to knowledge of real living systems have much chance of being used.
Wilson has been on this for a bit now, to the bewilderment of some of the scientists I follow on Twitter (granted, the people I follow tend to be quantitative genomics types whose backgrounds may have been in math, physics, or statistics). Two immediate things come to mind reading this. First, a disproportionate number of the famous and successful scientists alive today are old, like E. O. Wilson. Just because you could get by with a certain level of mathematical fluency as an enfant terrible in the 1970s does not mean that that will cut it in the 2010s. Great scientists who are mathematically weak often have collaborators, post-docs, and graduate students, who do their bidding. It might be a different matter if you aren’t one of the Great Ones of the earth. From what I can tell scientists who are doing the hiring who don’t have mathematical skills prefer candidates who do have mathematical skills.
One could argue that William Donald Hamilton is one of the most prominent scientific figures who has been influential upon the public understanding of the world around us, who the public nonetheless is totally unaware of. Many well educated individuals with an interest in science have some understanding of the concept of inclusive fitness, at least in an inchoate sense. And, there is also an awareness that sex is somehow a biological conundrum, with the Red Queen hypothesis stepping into the explanatory void (amongst others). Hamilton’s standing within science is without question, and it was externally validated by his being awarded the Crafoord Prize, which attempts to fill in the disciplinary gaps in the Nobel awards. And yet to the world at large he is a shadowy entity in the diffuse and anonymous background of science from which writers draw their source material.
Hamilton’s influence was particularly strong upon the popular expositions of evolutionary biology of Richard Dawkins and Matt Ridley. His theories as to the origins of altruism shaped how E. O. Wilson and Robert Trivers viewed the question more broadly. Finally, one could argue that the Hamiltonian paradigm was one of the primary sources of antagonism for Stephen Jay Gould and his relationship to adaptationism and the biological basis of human behavior. Hamilton’s scientific opinions were complex, and too often they have been reduced down to inaccurate essences. This is somewhat evident in Hamilton’s own collections of papers, especially the first volume, Narrow Roads of Gene Land: The Evolution of Social Behavior. The legend of Hamilton’s framework for inclusive fitness had had many decades to mature and twist into shapes which the creator did not necessarily agree with, and he attempted to set the record straight where he thought it appropriate (e.g., inclusive fitness is not just about the origin of eusocial insects). In Nature’s Oracle Ullica Segerstrale extends Hamilton’s own reflections, and introduces a more objective third party observer into the process of evaluating the historical arc of scientific production of this one particular man.
My friend Aziz Poonawalla* asked me to comment on this piece in The New Scientist, The father of all men is 340,000 years old. My primary thought: the title probably confuses people, while the article itself is quite serviceable.The first paragraph condenses the specific and precise scientific detail well:
Albert Perry carried a secret in his DNA: a Y chromosome so distinctive that it reveals new information about the origin of our species. It shows that the last common male ancestor down the paternal line of our species is over twice as old as we thought
In the post below I alluded to the views of R. A. Fisher. This was a moderately dangerous move on my part because many of Fisher’s views have been transmitted only through later researchers, who may have lacked a clear understanding of what Fisher himself was trying to say. Heap on top of that the reality that the debate between Fisher and Sewall Wright was often abstruse for the evolutionary biologists who nevertheless managed to take sides and transmit their understandings of the conflict, and it’s a recipe for misrepresentation. With that in mind let me enter into the record an email from a friend who has engaged in a deep reading of Fisher, and attempted to understand his reasoning (no, this is not A. W. F. Edwards!):