There’s an excellent paper up at Cell right now, Modeling Recent Human Evolution in Mice by Expression of a Selected EDAR Variant. It synthesizes genomics, computational modeling, as well as the effective execution of mouse models to explore non-pathological phenotypic variation in humans. It was likely due the last element that this paper, which pushes the boundary on human evolutionary genomics, found its way to Cell (and the “impact factor” of course).
The focus here is on EDAR, a locus you may have heard of before. By fiddling with the EDAR locus researchers had earlier created “Asian mice.” More specifically, mice which exhibit a set of phenotypes which are known to distinguish East Asians from other populations, specifically around hair form and skin gland development. More generally EDAR is implicated in development of ectodermal tissues. That’s a very broad purview, so it isn’t surprising that modifying this locus results in a host of phenotypic changes. The figure above illustrates the modern distribution of the mutation which is found in East Asians in HGDP populations.
One thing to note is that the derived East Asian form of EDAR is found in Amerindian populations which certainly diverged from East Asians > 10,000 years before the present (more likely 15-20,000 years before the present). The two populations in West Eurasia where you find the derived East Asian EDAR variant are Hazaras and Uyghurs, both likely the products of recent admixture between East and West Eurasian populations. In Melanesia the EDAR frequency is correlated with Austronesian admixture. Not on the map, but also known, is that the Munda (Austro-Asiatic) tribal populations of South Asia also have low, but non-trivial, frequencies of East Asian EDAR. In this they are exceptional among South Asian groups without recent East Asian admixture. This lends credence to the idea that the Munda are descendants in part of Austro-Asiatic peoples intrusive from Southeast Asia, where most Austro-Asiatic languages are present.
- Life Technologies/Ion Torrent apparently hires d-bag bros to represent them at conferences. The poster people were fine, but the guys manning the Ion Torrent Bus were total jackasses if they thought it would be funny/amusing/etc. Human resources acumen is not always a reflection of technological chops, but I sure don’t expect organizational competence if they (HR) thought it was smart to hire guys who thought (the d-bags) it would be amusing to alienate a selection of conference goers at ASHG. Go Affy & Illumina!
- Speaking of sequencing, there were some young companies trying to pitch technologies which will solve the problem of lack of long reads. I’m hopeful, but after the Pacific Biosciences fiasco of the late 2000s, I don’t think there’s a point in putting hopes on any given firm.
- I walked the poster hall, read the titles, and at least skimmed all 3,000+ posters’ abstracts. No surprise that genomics was all over the place. But perhaps a moderate surprise was how big exomes are getting for medically oriented people.
- Speaking of medical/clinical people, I noticed that in their presentations they used the word ‘Caucasian‘ a lot. This was not evident in the pop-gen folks. It shows the influence of bureaucratic nomenclature in modern medicine, as they have taken to using somewhat nonsensical US Census Bureau categories.
- Twitter was a pretty big deal. There were so many interesting sessions that I found myself checking my feed constantly for the #ASHG2012 hashtag. It was also an easy way to figure out who else was at the same session (e.g., in my case, very often Luke Jostins).
- If you could track the patterns of movements of smartphones at the conference it would be interesting to see a network of clustering of individuals. For example, the evolutionary and population genomics posters were bounded by more straight-up informatics (e.g., software to clean your raw sequence data), from which there was bleed over. But right next to the evolution and population genomics sections (and I say genomics rather than genetics, because the latter has been totally subsumed by the former) you had some type of pediatric disease genetics aisles. I wasn’t the only one to have a freak out when I mistakenly kept on moving (i.e., you go from abstruse discussions of the population structure of Ethiopia, to concrete ones about the likely probability of death of a newborn with an autosomal dominant disorder, with photos of said newborn!).
The Pith: Natural selection comes in different flavors in its genetic constituents. Some of those constituents are more elusive than others. That makes “reading the label” a non-trivial activity.
As you may know when you look at patterns of variation in the genome of a given organism you can make various inferences from the nature of these patterns. But the power of those inferences is conditional on the details of the real demographic and evolutionary histories, as well as the assumptions made about the models one which is testing. When delving into the domain of population genomics some of the concepts and models may seem abstruse, but the reality is that such details are the stuff of which evolution is built. A new paper in PLoS Genetics may seem excessively esoteric and theoretical, but it speaks to very important processes which shape the evolutionary trajectory of a given population. The paper is titled Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation. Here’s the author summary:
Considerable effort has been devoted to detecting genes that are under natural selection, and hundreds of such genes have been identified in previous studies. Here, we present a method for extending these studies by inferring parameters, such as selection coefficients and the time when a selected variant arose. Of particular interest is the question whether the selective pressure was already present when the selected variant was first introduced into a population. In this case, the variant would be selected right after it originated in the population, a process we call selection from a de novo mutation. We contrast this with selection from standing variation, where the selected variant predates the selective pressure. We present a method to distinguish these two scenarios, test its accuracy, and apply it to seven human genes. We find three genes, ADH1B, EDAR, and LCT, that were presumably selected from a de novo mutation and two other genes, ASPM and PSCA, which we infer to be under selection from standing variation.
The dynamic which they refer to seems to be a reframing of the conundrum of detecting hard sweeps vs. soft sweeps. In the former you case have a new mutation, so its frequency is ~1/(2N). It is quickly subject to natural selection (though stochastic processes dominate at low frequencies, so probability of extinction is high), and adaptation drives the allele to fixation (or nearly to fixation). In the latter scenario you have a great deal of extant genetic variation, present in numerous different allelic variants. A novel selection pressure reshapes the frequency landscape, but you can not ascribe the genetic shift to only one allele. It is no surprise that the former is easier to model and detect than the latter. Much of the evolutionary genomics of the 2000s focused on hard sweeps from de novo mutations because they were low hanging fruit. The methods had reasonable power to detect them (as well as many false positives!). But of late many are suspecting that hard sweeps are not the full story, and that much of evolutionary genetic process may be characterized by a combination of hard sweeps, soft sweeps (from standing variation), various forms of negative selection, not to mention the plethora of possibilities which abound in the domain of balancing selection.
Many of the details of the paper may seem overly technical and opaque (and to be fair, I will say here that the figures are somewhat difficult to decrypt, though the subject is not one that lends itself to general clarity), but the major finding is straightforward, and illustrated in figure 4 (I’ve added labels):
A friend pointed me to the heated comment section of this article in Nature, Rebuilding the genome of a hidden ethnicity. The issue is that Nature originally stated that the Taino, the native people of Puerto Rico, were extinct. That resulted in an avalanche of angry comments, which one of the researchers, Carlos Bustamante, felt he had to address. Eventually Nature updated their text:
CORRECTED: This article originally stated that the Taíno were extinct, which is incorrect. Nature apologizes for the offence caused, and has corrected the text to better explain the research project described.
Here’s Wikipedia on the Taino today:
The Pith: The human X chromosome is subject to more pressure from natural selection, resulting in less genetic diversity. But, the differences in diversity of X chromosomes across human populations seem to be more a function of population history than differences in the power of natural selection across those populations.
In the past few years there has been a finding that the human X chromosome exhibits less genetic diversity than the non-sex regions of the genome, the autosome. Why? On the face of it this might seem inexplicable, but a few basic structural factors derived from the architecture of the human genome present themselves.
First, in males the X chromosome is hemizygous, rendering it more exposed to selection. This is rather straightforward once you move beyond the jargon. Human males have only one copy of genes which express on the X chromosome, because they have only one X chromosome. In contrast, females have two X chromosomes. This is the reason why sex linked traits in humans are disproportionately male. For genes on the X chromosome women can be carriers of many diseases because they have two copies of a gene, and one copy may be functional. In contrast, a male has only a functional or nonfunctional version of the gene, because he has one copy on the X chromosome. This is different from the case on the autosome, where both males and females have two copies of every gene.
This structural divergence matters for the selective dynamics operative upon the X chromosome vs. the autosome. On the autosome recessive traits pay far less of a cost in terms of fitness than they do on the X chromosome, because in the case of the latter they’re much more often exposed to natural selection via males. In the rest of the genome recessive traits only pay the cost of their shortcomings when they’re present as two copies in an individual, homozygotes. A simple quasi-formal example illustrates the process.
The Pith: In this post I review a paper which covers the evolutionary dimension of human childbirth. Specifically, the traits and tendencies peculiar to our species, the genes which may underpin those traits and tendencies, and how that may relate to broader public health considerations.
Human babies are special. Unlike the offspring of organisms such as lizards or snakes human babies are exceedingly helpless, and exhibit an incredible amount of neoteny in relation to adults. This is true to some extent for all mammals, but obviously there’s still a difference between a newborn foal and a newborn human. One presumes that the closest analogs to human babies are those of our closest relatives, the “Great Apes.” And certainly the young of chimpanzees exhibit the same element of “cuteness” which is appealing to human adults. Still there is a difference of degree here. As a childophobic friend observed human infants resemble “larvae.” The ultimate and proximate reason for this relative underdevelopment of human newborns is usually attributed to our huge brains, which run up against the limiting factor of the pelvic opening of women. If a human baby developed for much longer through extended gestation then the mortality rates of their mothers during childbirth would rise. Therefore natural selection operated in the direction it could: shortening gestation times. You might say that in some ways then the human newborn is an extra-uterine fetus.
A new paper in PLoS Genetics attempts to fix upon which specific genomic regions might be responsible for this accelerated human gestational clock. An Evolutionary Genomic Approach to Identify Genes Involved in Human Birth Timing:
Does the chart above strike you as strange? What it shows is that the mean fitness of a population drops as you increase the rate of deleterious mutation (many more mutations are deleterious than favorable)…but at some point the fitness of the population bounces back, despite (or perhaps because of?) the deleterious mutations! This would seem, to me, an illustration of bizzaro-world evolution. Worse is better! More is less! Deleterious is favorable? By definition deleterious isn’t favorable, so one would have to back up and check one’s premises.
And yet this seems just what a new paper in PLoS ONE is reporting. Purging Deleterious Mutations under Self Fertilization: Paradoxical Recovery in Fitness with Increasing Mutation Rate in Caenorhabditis elegans:
Compensatory mutations can be more frequent under high mutation rates and may alleviate a portion of the fitness lost due to the accumulation of deleterious mutations through epistatic interactions with deleterious mutations. The prolonged maintenance of tightly linked compensatory and deleterious mutations facilitated by self-fertilization may be responsible for the fitness increase as linkage disequilibrium between the compensatory and deleterious mutations preserves their epistatic interaction.
Got that? OK, you probably need some background first….
The number 1 gets a lot more press than -1, and the concept of heterozygosity gets more attention than homozygosity. Concretely the difference between the latter two is rather straightforward. In diploid organisms the genes come in duplicates. If the alleles are the same, then they’re homozygous. If they’re different, then they’re heterozygous. Sex chromosomes can be an exception to this because in the heterogametic sex you generally have only one copy of gene as one of the chromosomes is sharply truncated. This is why in human males are subject to X-linked recessive traits at such a great frequency in comparison to females; recessive expression is irrelevant when you don’t have a compensatory X chromosome to mask the malfunction of one allele.
Of course recessive traits are not simply a function of sex-linked traits. Consider microcephaly, an autosomal recessive disease. To manifest the trait you need two malfunctioning copies of the gene, one from each parent. In other words, you exhibit a homozygous genotype with two mutant copies. I suspect that this particularly common context of homozygosity, recessive autosomal diseases, is one reason why it is less commonly discussed outside of specialist circles: there are whole cluster of medical and social factors which lead to homozygosity which are already the focus of attention. The genetic architecture of the trait is of less note than the etiology of the disease and the possible reasons in the family’s background which might have increased the risk probability, especially inbreeding. In contrast heterozygosity is generally not so disastrous. Even if functionality is not 100%, it is close enough for “government work.” The deleterious consequences of a malfunctioning allele are masked by the “wild type” good copy. The exceptions are in areas such as breeding for hybrid vigor, when heterozygote advantage may be coming to the fore. The details of complementation of two alleles matter a great deal to the bottom line, and the concept of hybrid vigor has percolated out to the general public, with the more informed being cognizant of heterozygosity.
But homozygosity is of interest beyond the unfortunate instances when it is connected to a recessive disease. Like heterozygosity, homozygosity exists in spades across our genome. My 23andMe sample comes up as 67.6% homozygous on my SNPs (which are biased toward ~500,000 base pairs which tend to have population wide variation), while Dr. Daniel MacArthur’s results show him to be 68.1% homozygous across his SNPs. This is not atypical for outbred individuals. In contrast someone whose parents were first cousins can come up as ~72% homozygous. This is important: zygosity is not telling you simply about the state of two alleles, in this case base pairs, it may also be telling you about the descent of two alleles. Obviously this is not always clear on the base pair level; mutations happen frequently enough that even if you carry two minor alleles it is not necessarily evidence that they’re identical by descent (IBD), or autozygous (just a term which denotes ancestry of the alleles from the same original copy). What you need to look for are genome-wide patterns of homozygosity, in particular “runs of homozygosity” (ROH). These are long sequences biased toward homozygous genotypes.
Natural selection happens. It was hypothesized in copious detail by Charles Darwin, and has been confirmed in the laboratory, through observation, and also by inference via the methods of modern genomics. But science is more than broad brushes. We need to drill-down to a more fine-grained level to understand the dynamics with precision and detail, and so generate novel inferences which may then be tested. For example, there are various flavors of natural selection: stabilizing selection, negative selection, and positive directional selection. In the first case natural selection buffets the phenotype about an ideal mean, in the second case deleterious phenotypes and their associated alleles are purged from the genome, and finally, natural selection can also drive a novel trait toward greater prominence, and concomitantly the allelic variants which are associated with the fitter phenotype.
The last case is of particular interest to many because it is often with positive natural selection by which evolution as descent with modification occurs. Over time trait values and the nature of traits themselves shift such that a lineage changes its character beyond recognition. This phyletic gradualism and the scale independence of evolutionary process has been challenged, in particular from the domain of developmental biology (albeit, not all ,or even most, developmental biologists). But ultimately no one doubts that a classical understanding of evolution as change in allele frequency, often driven by natural selection, is part of the larger puzzle of how the tree of life came to be.
One of the phenomena associated with positive directional evolution is the selective sweep. How a selective sweep occurs, and its consequences, are rather straightforward. A genome consists of a sequence of base pairs (e.g., we have 3 billion base pairs). If a new mutation emerges at a particular base pair, a novel single nucelotide polymorphism (SNP), and, that allelic variant is ~10% fitter than the ancestral variant, natural selection could drive up its frequency (the conditionality is due to the fact that in all likelihood it would still go extinct because of the power of stochastic forces when a mutant is at low frequency). So the variant could in theory shift from ~0% (1 out of N, N being the number of individuals in a population, 2N if diploid, and so forth) to ~100%. This would be the fixation of the novel variant, driven by selective dynamics. So what’s the sweep aspect? The sweep in this case refers to the effect of the very rapid rise in frequency of the SNP in question on the adjacent genomic region. What is termed a genetic hitchiking dynamic results if the sweep occurs rapidly, so that nearby regions of the genome also move to fixation along with the favored SNP. But in a diploid organism with sexual reproduction genetic recombination persistently breaks apart associations across the physical genome. Therefore the span of the sequence of genetic markers nearby a favored SNP which form a haplotype is dependent on the rate of recombination as well as the rate of the rise in frequency of the allele, which is contingent on the strength of selection. A powerful selective sweep has the effect of homogenizing wide regions of the genome flanking the favored mutant; in other words the sweep “cleans” the gene pool of variation as one very long haplotype replaces many shorter haplotypes. As an example, in the genomes of Northern Europeans the locus LCT is characterized by a very long haplotype, which itself seems to correlate well with the trait of lactase persistence. The implication here is that the lactase persistence conferring variant arose relatively recently, and was swept up to near fixation by positive directional natural selection.
With all the justified concern about “missing heritability”, the age of human genomics hasn’t been a total bust. As I have observed before in 2005′s excellent book Mutants the evolutionary geneticist Armand M. Leroi asserted that we really didn’t have a good understanding of normal variation of human pigmentation. At the time I think it was a defensible claim, but within three years I’d say that most of the mystery had been cleared up. Though there are still some holes to be plugged, and details to be elucidated, the genetic architecture of pigmentation is now understood more or less. By the fall of 2006 Richard Sturm penned a review titled A golden age of human pigmentation genetics, an age I think which in some ways probably was closed with his 2009 review Molecular genetics of human pigmentation diversity. It’s not surprising that many of the traits that 23andMe tells you about have to do with your pigmentation. Of course there’s some limited utility in this, one assumes that most individuals don’t gain much benefit from the knowledge that they have an “85% change of having brown eyes,” though it may be useful in terms of offspring prediction (I would say I have an 85% chance of having brown eyes, but since I’m not European the genetic background isn’t right to make that probability assertion).
But as the golden age of pigmentation genetics comes to a close and the low hanging fruit is stripped bare, where next? I wonder if it may be altitude adaptations. Like pigmentation altitude genetics has been around for a while, but it seems there’s a recent cresting of papers in the area, focusing in particular on the three canonical high altitude peoples, the Tibetans, Andeans, and the Ethiopians. Last spring two major groups came out with papers on the genetics of Tibetan altitude adaptation, and its evolutionary history, using somewhat different techniques. A new paper in PLoS Genetics builds upon that work (verifying two of the loci as targets of selection in Tibetans implicated in the previous papers), and, adds Andean populations to the mix to assess the possibilities of convergent adaptations. Identifying Signatures of Natural Selection in Tibetan and Andean Populations Using Dense Genome Scan Data:
Over the past decade evolutionary geneticist Mike Lynch has been articulating a model of genome complexity which relies on stochastic factors as the primary motive force by which genome size increases. The argument is articulated in a 2003 paper, and further elaborated in his book The Origins of Genome Architecture. There are several moving parts in the thesis, some of which require a rather fine-grained understanding of the biophysical structural complexity of the genome, the nature of Mendelian inheritance as a process, and finally, population genetics. But the core of the model is simple: there is an inverse relationship between long term effective population size and genome complexity. Low individual numbers ~ large values in terms of base pairs and counts of genetic elements such as introns.
One of the great things about evolutionary theory is that it is a formal abstraction of specific concrete aspects of reality and dynamics. It allows us to squeeze inferential juice from incomplete prior knowledge of the state of nature. In other words, you can make predictions and models instead of having to observe every last detail of the natural world. But abstractions, models and formalisms often leave out extraneous details. Sometimes those details turn out not to be so extraneous. Charles Darwin’s original theory of evolution had no coherent or plausible mechanism of inheritance. R. A. Fisher and others imported the empirical reality of Mendelism into the logic of evolutionary theory, to produce the framework of 20th century population genetics. Though accepting the genetic inheritance process of Mendelism this is original synthesis was not informed by molecular biology, because it pre-dated molecular biology. After James Watson and Francis Crick uncovered the biophysical basis for Mendelism molecular evolution came to the fore, and neutral theory emerged as a response to the particular patterns of genetic variation which new molecular techniques were uncovering. And yet through this much of R. A. Fisher’s image of an abstract genetic variant floating against a statistical soup of background noise variation persisted, sometimes dismissed as “bean bag genetics”.
We’ve come a long way from the first initial wave of discussions which were prompted by the molecular genetic revolution. We have epigenetics, evo-devo and variation in gene regulation. None of these processes “overthrow” evolutionary biology, though in some ways they may revolutionize aspects of it. Science is over the long haul after all an eternal revolution, as the boundaries of comprehension keep getting pushed outward. A few days ago I pointed to Sean Carroll’s recent work, which emphasizes that one must think beyond the sequence level, and focus on particular features such as cis-regulartory elements. Here we’ve been tunneling down to the level of the gene, but what about the traits, the phenotypes, which are affected by genetic variation?
It is well known that the sparest abstraction of genotypic-phenotypic relationship can be illustrated like so:
genetic variation → phenetic variation
But each element of this relation has to be examined greater detail. What type of genetic variation? Sequence level variation? Epigenetic variation? The second component is perhaps the most fraught, with the arrow waving away the myriad details and interactions which no doubt lurk between genotype and phenotype. And finally you have the phenotype itself. Are they all created alike in quality so that we can ascribe to them dichotomous values and quantities?
A new paper in PNAS examines the particulars of morphological phenotypes and physiological phenotypes, and their genetic control, as well as rates of evolution. Contrasting genetic paths to morphological and physiological evolution: