I should be careful about being flip on this issue. As recently as the mid aughts (see Mutants) the details of this trait were not entirely understood. Today the nature of inheritance in various populations is well understood, and a substantial proportion of the evolutionary history is also known to a reasonable clarity as far as these things go. The 50,000 foot perspective is this: we lost our fur millions of years ago, and developed dark skin, and many of us lost our pigmentation after we left Africa ~50,000 years ago (in fact, it seems likely that hominins in the northern latitudes were always diverse in their pigmentation)
The trait of lactase persistence (lactose tolerance) is probably one of the better schoolbook examples of natural selection in human populations. The reasons for this are probably two-fold. There is a very strong signature of selection within a specific gene known to associate with the trait in question in many populations. And, there is a very compelling historical narrative which explains rather neatly how this particular functional change could have undergone such strong selection within the past ~5,000 years across these populations. But the elucidation of the origin and spread of this genetic adaptation is also interesting because it looks as if it was not a singular event. Populations as disparate as Arabians, Danes, and Masai seem to carry different alleles around the locus of interest which confer the ability to digest milk. This illustrates the fact when selection pressures have a viable target, there is a rapid response on the genomic level. At some point during the maturation of a mammal the regulatory pathway which produces lactase enzyme shuts down. Yet within numerous human populations this gradual shutdown process has been short-circuited.
The variety of response in relation to this adaptation was brought home to me as I read Diversity of Lactase Persistence Alleles in Ethiopia – Signature of a Soft Selective Sweep, in the latest issue of The American Journal of Human Genetics:
There is the fact of evolution. And then there is the long-standing debate of how it proceeds. The former is a settled question with little intellectual juice left. The latter is the focus of evolutionary genetics, and evolutionary biology more broadly. The debate is an old one, and goes as far back as the 19th century, where you had arch-selectionists such as Alfred Russel Wallace (see A Reason For Everything) square off against pretty much the whole of the scholarly world (e.g., Thomas Henry Huxely, “Darwin’s Bulldog,” was less than convinced of the power of natural selection as the driving force of evolutionary change). This old disagreement planted the seeds for much more vociferous disputations in the wake of the fusion of evolutionary biology and genetics in the early 20th century. They range from the Wright-Fisher controversies of the early years of evolutionary genetics, to the neutralist vs. selectionist debate of the 1970s (which left bad feelings in some cases). A cartoon-view of the implication of the debates in regards to the power of selection as opposed to stochastic contingency can be found in the works of Stephen Jay Gould (see The Structure of Evolutionary Theory) and Richard Dawkins (see The Ancestor’s Tale): does evolution result in an infinitely creative assortment due to chance events, or does it drive toward a finite set of idealized forms which populate the possible parameter space?*
There were two papers in Science which came out on the Y chromosome, Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females and Low-Pass DNA Sequencing of 1200 Sardinians Reconstructs European Y-Chromosome Phylogeny. I can recommend what Dienekes had to say, and I wasn’t going to comment until I saw this egregious piece in The New Scientist: Arabian flights: Early humans diverged in 150 years. Because of the title I did not initially think that this had anything to do with the Y chromosome, but it turns out that the piece uses the finding that three primary non-African haplogroups diverged in rapid succession from each other as the hook for the headline. In fact not only does the Y not offer definitive accounts of human history, it doesn’t even necessarily tell us about the history of men. It’s a marker, not a time machine. To repeat: the history of a specific genetic locus is not the history of a population. It has to be said.
The above figure is from a paper in PLoS GENETICS, Analysis of the Genetic Basis of Disease in the Context of Worldwide Human Relationships and Migration. The authors synthesize two diverse domains of human genomics. First, there are biomedically focused genome-wide association studies and their like which attempt to identify risk alleles for particular diseases. In some cases these risk alleles are very penetrant, in that a particular state predicts with high likelihood a disease phenotype. But in most cases the yield is elevated or decreased risks for highly complex traits such as type 2 diabetes. Second, there is the domain of evolutionary genomics which attempts to reconstruct a phylogenetic and population genetic history so as to frame contemporary patterns of variation in their proper context. How this might be important or of interest is obvious in the case of malaria resistance genes. Alleles conferring resistance have arisen in multiple populations due to parallel environmental pressures. Phylogenetic relationships between these populations should inform your predictions as to the likely similarities of the mutations between the populations. Meanwhile, population genetic theory can give you clues as to the likelihood of multiple adaptations.
There’s an excellent paper up at Cell right now, Modeling Recent Human Evolution in Mice by Expression of a Selected EDAR Variant. It synthesizes genomics, computational modeling, as well as the effective execution of mouse models to explore non-pathological phenotypic variation in humans. It was likely due the last element that this paper, which pushes the boundary on human evolutionary genomics, found its way to Cell (and the “impact factor” of course).
The focus here is on EDAR, a locus you may have heard of before. By fiddling with the EDAR locus researchers had earlier created “Asian mice.” More specifically, mice which exhibit a set of phenotypes which are known to distinguish East Asians from other populations, specifically around hair form and skin gland development. More generally EDAR is implicated in development of ectodermal tissues. That’s a very broad purview, so it isn’t surprising that modifying this locus results in a host of phenotypic changes. The figure above illustrates the modern distribution of the mutation which is found in East Asians in HGDP populations.
One thing to note is that the derived East Asian form of EDAR is found in Amerindian populations which certainly diverged from East Asians > 10,000 years before the present (more likely 15-20,000 years before the present). The two populations in West Eurasia where you find the derived East Asian EDAR variant are Hazaras and Uyghurs, both likely the products of recent admixture between East and West Eurasian populations. In Melanesia the EDAR frequency is correlated with Austronesian admixture. Not on the map, but also known, is that the Munda (Austro-Asiatic) tribal populations of South Asia also have low, but non-trivial, frequencies of East Asian EDAR. In this they are exceptional among South Asian groups without recent East Asian admixture. This lends credence to the idea that the Munda are descendants in part of Austro-Asiatic peoples intrusive from Southeast Asia, where most Austro-Asiatic languages are present.
A few days ago I was browsing Haldane’s Sieve,when I stumbled upon an amusing discussion which arose on it’s “About” page. This “inside baseball” banter got me to thinking about my own intellectual evolution. Over the past few years I’ve been delving more deeply into phylogenetics and phylogeography, enabled by the rise of genomics, the proliferation of ‘big data,’ and accessible software packages. This entailed an opportunity cost. I did not spend much time focusing so much on classical population and evolutionary genetic questions. Strewn about my room are various textbooks and monographs I’ve collected over the years, and which have fed my intellectual growth. But I must admit that it is a rare day now that I browse Hartl and Clark or The Genetical Theory of Natural Selection without specific aim or mercenary intent.
Like a river inexorably coursing over a floodplain, with the turning of the new year it is now time to take a great bend, and double-back to my roots, such as they are. This is one reason that I am now reading The Founders of Evolutionary Genetics. Fisher, Wright, and Haldane, are like old friends, faded, but not forgotten, while Muller was always but a passing acquaintance. But ideas 100 years old still have power to drive us to explore deep questions which remain unresolved, but where new methods and techniques may shed greater light. A study of the past does not allow us to make wise choices which can determine the future with any certitude, but it may at least increase the luminosity of the tools which we have iluminate the depths of the darkness. The shape of nature may become just a bit less opaque through our various endeavors.
The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).
To understand nature in all its complexity we have to cut down the riotous variety down to size. For ease of comprehension we formalize with math, verbalize with analogies, and visualize with representations. These approximations of reality are not reality, but when we look through the glass darkly they give us filaments of essential insight. Dalton’s model of the atom is false in important details (e.g., fundamental particles turn out to be divisible into quarks), but it still has conceptual utility.
Likewise, the phylogenetic trees popularized by L. L. Cavalli-Sforza in The History and Geography of Human Genes are still useful in understanding the shape of the human demographic past. But it seems that the bifurcating model of the tree must now be strongly tinted by the shades of reticulation. In a stylized sense inter-specific phylogenies, which assume the approximate truth of the biological species concept (i.e., little gene flow across lineages), mislead us when we think of the phylogeny of species on the microevolutionary scale of population genetics. On an intra-specific scale gene flow is not just a nuisance parameter in the model, it is an essential phenomenon which must be accommodated into the framework.
While I was at Spencer Wells’ poster at ASHG I was primarily curious about bar plots. He’s got really good spatial coverage, so I’m moderately excited about the paper (though I didn’t see much explicit testing of phylogenetic hypotheses, which I think this sort of paper has to do now; we’re beyond PCA and bar plots only papers). That being said, Spencer was more interested in me promoting the Scientific Grants Program. Here’s some more information:
The Genographic Project’s Scientific Grants Program awards grants on a rolling basis for projects that focus on studying the history of the human species utilizing innovative anthropological genetic tools. The variety of projects supported by the scientific grants will aim to construct our ancient migratory and demographic history while developing a better understanding of the phylogeographic structure of world populations. Sample research topics could include subjects like the origin and spread of the Indo-European languages, genetic insights into Papua New Guinea’s high linguistic diversity, the number and routes of migrations out of Africa, the origin of the Inca, or the genetic impact of the spread of maize agriculture in the Americas.
Recipients will typically be population geneticists, students, linguists, and other researchers or scientists interested in pursuing questions relevant to the Genographic Project’s broad goal of exploring our migratory history. Recipients of Genographic scientific grant funds will become members of the Genographic Consortium, and will be expected to act as agents of the greater Genographic mission, participating in and reporting on multiple aspects of Genographic fieldwork, in addition to their own proposed and mission‐aligned pilot projects. Openness and transparency within the Consortium are the key values of the project’s research team, and grantees will be expected to abide by this code of conduct.
- Life Technologies/Ion Torrent apparently hires d-bag bros to represent them at conferences. The poster people were fine, but the guys manning the Ion Torrent Bus were total jackasses if they thought it would be funny/amusing/etc. Human resources acumen is not always a reflection of technological chops, but I sure don’t expect organizational competence if they (HR) thought it was smart to hire guys who thought (the d-bags) it would be amusing to alienate a selection of conference goers at ASHG. Go Affy & Illumina!
- Speaking of sequencing, there were some young companies trying to pitch technologies which will solve the problem of lack of long reads. I’m hopeful, but after the Pacific Biosciences fiasco of the late 2000s, I don’t think there’s a point in putting hopes on any given firm.
- I walked the poster hall, read the titles, and at least skimmed all 3,000+ posters’ abstracts. No surprise that genomics was all over the place. But perhaps a moderate surprise was how big exomes are getting for medically oriented people.
- Speaking of medical/clinical people, I noticed that in their presentations they used the word ‘Caucasian‘ a lot. This was not evident in the pop-gen folks. It shows the influence of bureaucratic nomenclature in modern medicine, as they have taken to using somewhat nonsensical US Census Bureau categories.
- Twitter was a pretty big deal. There were so many interesting sessions that I found myself checking my feed constantly for the #ASHG2012 hashtag. It was also an easy way to figure out who else was at the same session (e.g., in my case, very often Luke Jostins).
- If you could track the patterns of movements of smartphones at the conference it would be interesting to see a network of clustering of individuals. For example, the evolutionary and population genomics posters were bounded by more straight-up informatics (e.g., software to clean your raw sequence data), from which there was bleed over. But right next to the evolution and population genomics sections (and I say genomics rather than genetics, because the latter has been totally subsumed by the former) you had some type of pediatric disease genetics aisles. I wasn’t the only one to have a freak out when I mistakenly kept on moving (i.e., you go from abstruse discussions of the population structure of Ethiopia, to concrete ones about the likely probability of death of a newborn with an autosomal dominant disorder, with photos of said newborn!).
A new paper in Molecular Biology and Evolution, The timing of pigmentation lightening in Europeans, is rather interesting. It’s important because skin pigmentation has been one of the major successes of the first age of human genomics. In 2002 we really didn’t know the nature of normal human variation in skin color in terms of specific genes (basically, we knew about MC1R). This is what Armand Leroi observed in Mutants in 2005, wondering about our ignorance of such a salient trait. Within a few years though Leroi’s contention was out of date (in fact, while Mutants was going to press it became out of date) . Today we do know the genetic architecture of pigmentation. This is why GEDmatch can predict that my daughter’s eyes will be light brown from just her SNPs (they are currently hazel). This genomic yield was facilitated by the fact that pigmentation seems to be a trait where most human variation is controlled by half a dozen genes. In contrast, height or I.Q. are controlled by innumerable genes.
1) Remember these are not papers, and some of the abstracts may never become papers, at least in recognizable form
2) Speaking of which, Estimating a date of mixture of ancestral South Asian populations:
Over the years one issue that crops up repeatedly in human evolutionary genetics and paleoanthropology (or more precisely, the popular exposition of the topics in the media) is the idea that is that “population X are the most ancient Y.” X will always refer to a population within a larger set, Y, which is defined by relative marginalization or retention of older cultural folkways. So, for example, I have seen it said that the Andaman Islanders are the “most ancient Asian population.” Why? The standard model for a while now has been that non-Africans derive from a line of Africans which left the ancestral continent 50 to 100 thousand years ago, and began to diversify. Presumably Andaman Islanders have ancestry which goes back to this original dispersion, just as Europeans and Chinese do (revisions which suggest that Aboriginals may have been part of an earlier wave, still put the Andamanese in the second wave). The reason that the Andaman populations are termed ancient is pretty straightforward: they’re Asia’s last hunter-gatherers, literally chucking spears at outsiders. An ancient lifestyle gets conflated with ancient genetics.
This is a much bigger problem with the hunter-gatherers of Africa, the Pygmies, Hadza, and Bushmen. The reason is that these populations are of particular interest because they seem to have diverged from the rest of humanity rather early on. Both Y chromosomes and mtDNA confirmed this, and now autosomal analyses looking across the whole genome are confirming it. In other words, they’re basal to the rest of humanity. I believe this is moderately misleading. With the Bantu Expansion much of African genetic diversity disappeared. The hunter-gatherers seem exceptional long and bare branches on the phylogenetic tree because all their relatives are gone!
The new article in The American Journal of Human Genetics, A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root, is open access, so you should check it out. The discussion gets to the heart of the matter:
Supported by a consensus of many colleagues and after a few years of hesitation, we have reached the conclusion that on the verge of the deep-sequencing revolution…when perhaps tens of thousands of additional complete mtDNA sequences are expected to be generated over the next few years, the principal change we suggest cannot be postponed any longer: an ancestral rather than a “phylogenetically peripheral” and modern mitogenome from Europe should serve as the epicenter of the human mtDNA reference system. Inevitably, the proposed change could raise some temporary inconveniences. For this reason, we provide tables and software to aid data transition.
What we propose is much more than a mere clerical change. We use the Ptolemaian geocentric versus Copernican heliocentric systems as a metaphor. And the metaphor extends further: as the acceptance of the heliocentric system circumvented epicycles in the orbits of planets, switching the mtDNA reference to an ancestral RSRS will end an academically inadmissible conjuncture where virtually all mitochondrial genome sequences are scored in part from derived-to-ancestral states and in part from ancestral-to-derived states. We aim to trigger the radical but necessary change in the way mtDNA mutations are reported relative to their ancestral versus derived status, thus establishing an intellectual cohesiveness with the current consensus of shared common ancestry of all contemporary human mitochondrial genomes.
Note that the problem is not restricted to mtDNA. Indeed, in the much larger perspective of complete nuclear genomes in which comparisons are often currently made relative to modern human reference sequences, often of European origin, it seems worthwhile to begin considering, as valuable alternatives, public reference sequences of ancestral alleles (common in all primates) whereby derived alleles (common to some human populations) would be distinguished.
Perhaps the first generation or so of human molecular evolutionary genetics might be thought of as a “first draft.” A serviceable first draft which rendered in broad strokes the gist of the truth as we understand it, but lacking in some essential details.
On a minor note, there are some theoretical reasons why mtDNA did not yield much evidence for archaic admixture, which is clear in the nuclear genomics (e.g., higher rate of change due to lower effective population size, so more rapid extinction of ancient lineages). But perhaps now that the number of complete mtDNA genomes is increasing in size we might start to see “long branches,” which reflect the inferences generated from the ancient nuclear genomes.
The face is an important aspect of our phenotype. So important that facial recognition is one of many innate reflexive cognitive competencies. By this, I mean that you can recognize a face in a gestalt manner, just like you can recognize a set of three marbles. You don’t have to think about it in a step-by-step fashion. Particular types of brain injuries can actually result in disablement of this faculty, and a minority of humans seem to lack it altogether at birth (prosopagnosia). That’s why I’ve long been interested in the genetic architecture and evolution of craniofacial traits. I long ago knew the potential range of pigmentation phenotypes for my daughter because both her parents have been genotyped, but when it comes to facial features we’re stuck with the old ‘blending inheritance’ heuristic. The most obvious importance of teasing apart the genetic architecture of craniofacial traits is forensics. It might not put the sketch artist out of a job, but it would be an excellent supplement to problematic eye witness reports.
But it isn’t just forensics. The issue has evolutionary relevance. It looks like that in terms of morphology our own lineage has had a lot of diversity up until recently. I’m thinking in particular of the ‘archaic’ looking humans recently discovered in China and Nigeria, who seem to have persisted down into the Holocene. More generally, humans as a whole have become more gracile over the last 10,000 years. Why? There are two extreme answers we can look to. First, gracile humans have replaced robust humans. Second, natural selection for gracility has resulted in the in situ evolution of many populations over the last ~10,000 years. An interesting aspect of this is that it looks as if many salient traits have been targets of selection, and therefore evolution and population differentiation.
Here the top 10 SNPs which deviate from the overall phylogenetic tree of population relationships in the HGDP data set:
The latest edition of The American Journal of Human Genetics has two papers using “old fashioned” uniparental markers to trace human migration out of Africa and Siberia respectively. I say old fashioned because the peak novelty of these techniques was around 10 years ago, before dense autosomal SNP marker analyses, let alone whole genome sequencing. But mtDNA, passed down the maternal line, and Y chromosomes, passed from father to son, are still useful. Prosaically they’re useful because the data sets are now so large for these sets of markers after nearly 20 years of surveying populations. More technically because these two regions of the genome do not recombine they lend themselves to excellent representation as a tree phylogeny. Finally, mtDNA in particular is particularly amenable to estimates via molecular clock methodologies (it has a region with a higher mutational rate, so you can sample a larger range of variation over a given number of base pairs; you can use STRs, which mutate rapidly, for Y chromosomes, but there seems to be a lot of controversy in dating).
The papers are The Arabian Cradle: Mitochondrial Relicts of the First Steps along the Southern Route out of Africa and Mitochondrial DNA and Y Chromosome Variation Provides Evidence for a Recent Common Ancestry between Native Americans and Indigenous Altaians. Dienekes has already commented on the first paper. I am not going to take a detailed position on either, but I have to add that we need to be very careful of extrapolating from maternal or paternal lineages, and, assuming that population turn over is low enough that we can make phylogeographic inferences about the past from the present. For example, if you look at mtDNA South Asians as a whole strongly cluster with East Asians and not Europeans, while if you look at Y chromosomes you see the reverse. The whole genome gives a more mixed picture. Additionally, ancient DNA analyses in Northern Eurasia are showing strong discontinuities between past and present populations. So coalescence back to last common ancestor between two different lineages in two different regions may actually be due to diversity in a common source population more recently, which entered into demographic expansion and replaced other groups.
If you need the papers, email me. Some of you know the alphabet soup of haplogroups better than I do. Below are two figures which I think give the top line results.
The excellent site io9 has a piece up today which is a fascinating indicator of the nature of popular science publications as a lagging indicator. It is a re-post of a piece published last April, How Mitochondrial Eve connected all humanity and rewrote human evolution. In it you have an encapsulation of a particular period in our understanding of human natural history through evolutionary genetics. Notice for example the focus on maternally transmitted lineages, mtDNA and Y chromosomes. And the citations on genealogy date to the middle aughts. The science is mostly correct as far as it goes in the details (or at least it is defensible, last I checked there was still debate as to the validity of the molecular clocks used for Y chromosomal lineages), but it misses the big picture of how we’ve reframed our understanding of the human past over the last few years. The distance between 2011 and 2009 is far greater in this sense than between 2009 and 1999 (or even 2009 and 1989!). The io9 piece is a reflection of the era before the paradigmatic rupture.
I have blogged about the genetics of altitude adaptation before. There seem to be three populations in the world which have been subject to very strong natural selection, resulting in physiological differences, in response to the human tendency toward hypoxia. Two of them are relatively well known, the Tibetans and the indigenous people of the Andes. But the highlanders of Ethiopia have been less well studied, nor have they received as much attention. But the capital of Ethiopia, Addis Ababa, is nearly 8,000 feet above sea level!
Another interesting aspect to this phenomenon is that it looks like the three populations respond to adaptive pressures differently. Their physiological response varies. And the more recent work in genomics implies that though there are similarities between the Asian and American populations, there are also differences. This illustrates the evolutionary principle of convergence, where different populations approach the same phenotypic optimum, though by somewhat different means. To my knowledge there has not been as much investigation of the African example. Until now. A new provisional paper in Genome Biology is out, Genetic adaptation to high altitude in the Ethiopian highlands:
Dienekes and Maju have both commented on a new paper which looked at the likelihood of lactase persistence in Neolithic remains from Spain, but I thought I would comment on it as well. The paper is: Low prevalence of lactase persistence in Neolithic South-West Europe. The location is on the fringes of the modern Basque country, while the time frame is ~3000 BC. Table 3 shows the major result:
Lactase persistence is a dominant trait. That means any individual with at least one copy of the T allele is persistent. As Maju noted a peculiarity here is that the genotypes are not in Hardy-Weinberg Equilibrium. Specifically, there are an excess of homozygotes. Using the SJAPL location as a potentially random mating scenario you should expect ~7 T/C genotypes, not 2. Interestingly the persistent individual in the Longar location also a homozygote.