There’s an excellent paper up at Cell right now, Modeling Recent Human Evolution in Mice by Expression of a Selected EDAR Variant. It synthesizes genomics, computational modeling, as well as the effective execution of mouse models to explore non-pathological phenotypic variation in humans. It was likely due the last element that this paper, which pushes the boundary on human evolutionary genomics, found its way to Cell (and the “impact factor” of course).
The focus here is on EDAR, a locus you may have heard of before. By fiddling with the EDAR locus researchers had earlier created “Asian mice.” More specifically, mice which exhibit a set of phenotypes which are known to distinguish East Asians from other populations, specifically around hair form and skin gland development. More generally EDAR is implicated in development of ectodermal tissues. That’s a very broad purview, so it isn’t surprising that modifying this locus results in a host of phenotypic changes. The figure above illustrates the modern distribution of the mutation which is found in East Asians in HGDP populations.
One thing to note is that the derived East Asian form of EDAR is found in Amerindian populations which certainly diverged from East Asians > 10,000 years before the present (more likely 15-20,000 years before the present). The two populations in West Eurasia where you find the derived East Asian EDAR variant are Hazaras and Uyghurs, both likely the products of recent admixture between East and West Eurasian populations. In Melanesia the EDAR frequency is correlated with Austronesian admixture. Not on the map, but also known, is that the Munda (Austro-Asiatic) tribal populations of South Asia also have low, but non-trivial, frequencies of East Asian EDAR. In this they are exceptional among South Asian groups without recent East Asian admixture. This lends credence to the idea that the Munda are descendants in part of Austro-Asiatic peoples intrusive from Southeast Asia, where most Austro-Asiatic languages are present.
A few days ago I was browsing Haldane’s Sieve,when I stumbled upon an amusing discussion which arose on it’s “About” page. This “inside baseball” banter got me to thinking about my own intellectual evolution. Over the past few years I’ve been delving more deeply into phylogenetics and phylogeography, enabled by the rise of genomics, the proliferation of ‘big data,’ and accessible software packages. This entailed an opportunity cost. I did not spend much time focusing so much on classical population and evolutionary genetic questions. Strewn about my room are various textbooks and monographs I’ve collected over the years, and which have fed my intellectual growth. But I must admit that it is a rare day now that I browse Hartl and Clark or The Genetical Theory of Natural Selection without specific aim or mercenary intent.
Like a river inexorably coursing over a floodplain, with the turning of the new year it is now time to take a great bend, and double-back to my roots, such as they are. This is one reason that I am now reading The Founders of Evolutionary Genetics. Fisher, Wright, and Haldane, are like old friends, faded, but not forgotten, while Muller was always but a passing acquaintance. But ideas 100 years old still have power to drive us to explore deep questions which remain unresolved, but where new methods and techniques may shed greater light. A study of the past does not allow us to make wise choices which can determine the future with any certitude, but it may at least increase the luminosity of the tools which we have iluminate the depths of the darkness. The shape of nature may become just a bit less opaque through our various endeavors.
The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).
To understand nature in all its complexity we have to cut down the riotous variety down to size. For ease of comprehension we formalize with math, verbalize with analogies, and visualize with representations. These approximations of reality are not reality, but when we look through the glass darkly they give us filaments of essential insight. Dalton’s model of the atom is false in important details (e.g., fundamental particles turn out to be divisible into quarks), but it still has conceptual utility.
Likewise, the phylogenetic trees popularized by L. L. Cavalli-Sforza in The History and Geography of Human Genes are still useful in understanding the shape of the human demographic past. But it seems that the bifurcating model of the tree must now be strongly tinted by the shades of reticulation. In a stylized sense inter-specific phylogenies, which assume the approximate truth of the biological species concept (i.e., little gene flow across lineages), mislead us when we think of the phylogeny of species on the microevolutionary scale of population genetics. On an intra-specific scale gene flow is not just a nuisance parameter in the model, it is an essential phenomenon which must be accommodated into the framework.
While I was at Spencer Wells’ poster at ASHG I was primarily curious about bar plots. He’s got really good spatial coverage, so I’m moderately excited about the paper (though I didn’t see much explicit testing of phylogenetic hypotheses, which I think this sort of paper has to do now; we’re beyond PCA and bar plots only papers). That being said, Spencer was more interested in me promoting the Scientific Grants Program. Here’s some more information:
The Genographic Project’s Scientific Grants Program awards grants on a rolling basis for projects that focus on studying the history of the human species utilizing innovative anthropological genetic tools. The variety of projects supported by the scientific grants will aim to construct our ancient migratory and demographic history while developing a better understanding of the phylogeographic structure of world populations. Sample research topics could include subjects like the origin and spread of the Indo-European languages, genetic insights into Papua New Guinea’s high linguistic diversity, the number and routes of migrations out of Africa, the origin of the Inca, or the genetic impact of the spread of maize agriculture in the Americas.
Recipients will typically be population geneticists, students, linguists, and other researchers or scientists interested in pursuing questions relevant to the Genographic Project’s broad goal of exploring our migratory history. Recipients of Genographic scientific grant funds will become members of the Genographic Consortium, and will be expected to act as agents of the greater Genographic mission, participating in and reporting on multiple aspects of Genographic fieldwork, in addition to their own proposed and mission‐aligned pilot projects. Openness and transparency within the Consortium are the key values of the project’s research team, and grantees will be expected to abide by this code of conduct.
- Life Technologies/Ion Torrent apparently hires d-bag bros to represent them at conferences. The poster people were fine, but the guys manning the Ion Torrent Bus were total jackasses if they thought it would be funny/amusing/etc. Human resources acumen is not always a reflection of technological chops, but I sure don’t expect organizational competence if they (HR) thought it was smart to hire guys who thought (the d-bags) it would be amusing to alienate a selection of conference goers at ASHG. Go Affy & Illumina!
- Speaking of sequencing, there were some young companies trying to pitch technologies which will solve the problem of lack of long reads. I’m hopeful, but after the Pacific Biosciences fiasco of the late 2000s, I don’t think there’s a point in putting hopes on any given firm.
- I walked the poster hall, read the titles, and at least skimmed all 3,000+ posters’ abstracts. No surprise that genomics was all over the place. But perhaps a moderate surprise was how big exomes are getting for medically oriented people.
- Speaking of medical/clinical people, I noticed that in their presentations they used the word ‘Caucasian‘ a lot. This was not evident in the pop-gen folks. It shows the influence of bureaucratic nomenclature in modern medicine, as they have taken to using somewhat nonsensical US Census Bureau categories.
- Twitter was a pretty big deal. There were so many interesting sessions that I found myself checking my feed constantly for the #ASHG2012 hashtag. It was also an easy way to figure out who else was at the same session (e.g., in my case, very often Luke Jostins).
- If you could track the patterns of movements of smartphones at the conference it would be interesting to see a network of clustering of individuals. For example, the evolutionary and population genomics posters were bounded by more straight-up informatics (e.g., software to clean your raw sequence data), from which there was bleed over. But right next to the evolution and population genomics sections (and I say genomics rather than genetics, because the latter has been totally subsumed by the former) you had some type of pediatric disease genetics aisles. I wasn’t the only one to have a freak out when I mistakenly kept on moving (i.e., you go from abstruse discussions of the population structure of Ethiopia, to concrete ones about the likely probability of death of a newborn with an autosomal dominant disorder, with photos of said newborn!).
OK, perhaps I can help with that. Dr. Coop speaks of the collaboration between himself & Dr. Joseph Pickrell, Haldane’s Sieve, which I added to my RSS days ago (and you can see me pushing it to my Pinboard). From the “About”:
As described above, most posts to Haldane’s Sieve will be basic descriptions of relevant preprints, with little to no commentary. All posts will have comment sections where discussion of the papers will be welcome. A second type of post will be detailed comments on a preprint of particular interest to a contributor. These posts could take the style of a journal review, or may simply be some brief comments. We hope they will provide useful feedback to the authors of the preprint. Finally, there will be posts by authors of preprints in which they describe their work and place it in broader context.
We ask the commenters to remember that by submitting articles to preprint servers the authors (often biologists) are taking a somewhat unusual step. Therefore, comments should be phrased in a constructive manner to aid the authors.
It might be helpful if other evolution/genetics bloggers reblog this so we can push it up the Google search results. If you google “Haldane’s Sieve” some of the results are interesting…and not necessarily in a good way. I do feel guilt blogging on stuff my readers can’t get, so the more preprints become acceptable the more we (as in, the general public) can understand about evolution.
Yesterday I pointed out that David Reich had a moderately dismissive attitude toward the new paper in PNAS, Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Here’s what Reich said:
…But Reich believes that the discussion would have been different if it had happened in the open. The PNAS paper questioning the Neanderthal admixture addresses issues swirling around two years ago, but not Reich and Slatkin’s latest work. “It’s been an issue for several years. They were right to work on this,” says Reich. But now, “it’s kind of an obsolete paper,” he says.
Here’s what Nick Patterson, Reich’s colleague told me via email:
Ancient structure in Africa was considered when we wrote the Green et al. paper, and we were aware that this could explain D-statistics. But the hypothesis is no longer viable as the major explanation of Neandertal genetics in Eurasia. This was discussed in the recent paper of Yang et al. (MBE, 2012). (Not referenced by the PNAS paper).
A very simple argument, that convinces me, is that the allelic frequency spectrum of Neandertal alleles in Eurasia falls off very quickly. A bottleneck flattens out the spectrum, and it turns out that the Neandertal gene flow has to be placed after the out of Africa bottleneck or the spectrum is much too flat.
The paper on the arXiv from the Reich lab (Sankararaman et al.) is trying to do something much more subtle than this and date the flow. I personally am no longer interested in explaining the introgression as ancient structure. That ship has sailed.
Of course the question of what was the genetic structure of Ancient Africa is quite open, and remains very interesting.
If Nick’s explanation is a bit cryptic for you (he was a cryptographer!), figure 2 from the Yang et al. paper lays it out quite clearly:
1) Remember these are not papers, and some of the abstracts may never become papers, at least in recognizable form
2) Speaking of which, Estimating a date of mixture of ancestral South Asian populations:
Over the years one issue that crops up repeatedly in human evolutionary genetics and paleoanthropology (or more precisely, the popular exposition of the topics in the media) is the idea that is that “population X are the most ancient Y.” X will always refer to a population within a larger set, Y, which is defined by relative marginalization or retention of older cultural folkways. So, for example, I have seen it said that the Andaman Islanders are the “most ancient Asian population.” Why? The standard model for a while now has been that non-Africans derive from a line of Africans which left the ancestral continent 50 to 100 thousand years ago, and began to diversify. Presumably Andaman Islanders have ancestry which goes back to this original dispersion, just as Europeans and Chinese do (revisions which suggest that Aboriginals may have been part of an earlier wave, still put the Andamanese in the second wave). The reason that the Andaman populations are termed ancient is pretty straightforward: they’re Asia’s last hunter-gatherers, literally chucking spears at outsiders. An ancient lifestyle gets conflated with ancient genetics.
This is a much bigger problem with the hunter-gatherers of Africa, the Pygmies, Hadza, and Bushmen. The reason is that these populations are of particular interest because they seem to have diverged from the rest of humanity rather early on. Both Y chromosomes and mtDNA confirmed this, and now autosomal analyses looking across the whole genome are confirming it. In other words, they’re basal to the rest of humanity. I believe this is moderately misleading. With the Bantu Expansion much of African genetic diversity disappeared. The hunter-gatherers seem exceptional long and bare branches on the phylogenetic tree because all their relatives are gone!
The new article in The American Journal of Human Genetics, A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root, is open access, so you should check it out. The discussion gets to the heart of the matter:
Supported by a consensus of many colleagues and after a few years of hesitation, we have reached the conclusion that on the verge of the deep-sequencing revolution…when perhaps tens of thousands of additional complete mtDNA sequences are expected to be generated over the next few years, the principal change we suggest cannot be postponed any longer: an ancestral rather than a “phylogenetically peripheral” and modern mitogenome from Europe should serve as the epicenter of the human mtDNA reference system. Inevitably, the proposed change could raise some temporary inconveniences. For this reason, we provide tables and software to aid data transition.
What we propose is much more than a mere clerical change. We use the Ptolemaian geocentric versus Copernican heliocentric systems as a metaphor. And the metaphor extends further: as the acceptance of the heliocentric system circumvented epicycles in the orbits of planets, switching the mtDNA reference to an ancestral RSRS will end an academically inadmissible conjuncture where virtually all mitochondrial genome sequences are scored in part from derived-to-ancestral states and in part from ancestral-to-derived states. We aim to trigger the radical but necessary change in the way mtDNA mutations are reported relative to their ancestral versus derived status, thus establishing an intellectual cohesiveness with the current consensus of shared common ancestry of all contemporary human mitochondrial genomes.
Note that the problem is not restricted to mtDNA. Indeed, in the much larger perspective of complete nuclear genomes in which comparisons are often currently made relative to modern human reference sequences, often of European origin, it seems worthwhile to begin considering, as valuable alternatives, public reference sequences of ancestral alleles (common in all primates) whereby derived alleles (common to some human populations) would be distinguished.
Perhaps the first generation or so of human molecular evolutionary genetics might be thought of as a “first draft.” A serviceable first draft which rendered in broad strokes the gist of the truth as we understand it, but lacking in some essential details.
On a minor note, there are some theoretical reasons why mtDNA did not yield much evidence for archaic admixture, which is clear in the nuclear genomics (e.g., higher rate of change due to lower effective population size, so more rapid extinction of ancient lineages). But perhaps now that the number of complete mtDNA genomes is increasing in size we might start to see “long branches,” which reflect the inferences generated from the ancient nuclear genomes.
The face is an important aspect of our phenotype. So important that facial recognition is one of many innate reflexive cognitive competencies. By this, I mean that you can recognize a face in a gestalt manner, just like you can recognize a set of three marbles. You don’t have to think about it in a step-by-step fashion. Particular types of brain injuries can actually result in disablement of this faculty, and a minority of humans seem to lack it altogether at birth (prosopagnosia). That’s why I’ve long been interested in the genetic architecture and evolution of craniofacial traits. I long ago knew the potential range of pigmentation phenotypes for my daughter because both her parents have been genotyped, but when it comes to facial features we’re stuck with the old ‘blending inheritance’ heuristic. The most obvious importance of teasing apart the genetic architecture of craniofacial traits is forensics. It might not put the sketch artist out of a job, but it would be an excellent supplement to problematic eye witness reports.
But it isn’t just forensics. The issue has evolutionary relevance. It looks like that in terms of morphology our own lineage has had a lot of diversity up until recently. I’m thinking in particular of the ‘archaic’ looking humans recently discovered in China and Nigeria, who seem to have persisted down into the Holocene. More generally, humans as a whole have become more gracile over the last 10,000 years. Why? There are two extreme answers we can look to. First, gracile humans have replaced robust humans. Second, natural selection for gracility has resulted in the in situ evolution of many populations over the last ~10,000 years. An interesting aspect of this is that it looks as if many salient traits have been targets of selection, and therefore evolution and population differentiation.
Here the top 10 SNPs which deviate from the overall phylogenetic tree of population relationships in the HGDP data set:
The excellent site io9 has a piece up today which is a fascinating indicator of the nature of popular science publications as a lagging indicator. It is a re-post of a piece published last April, How Mitochondrial Eve connected all humanity and rewrote human evolution. In it you have an encapsulation of a particular period in our understanding of human natural history through evolutionary genetics. Notice for example the focus on maternally transmitted lineages, mtDNA and Y chromosomes. And the citations on genealogy date to the middle aughts. The science is mostly correct as far as it goes in the details (or at least it is defensible, last I checked there was still debate as to the validity of the molecular clocks used for Y chromosomal lineages), but it misses the big picture of how we’ve reframed our understanding of the human past over the last few years. The distance between 2011 and 2009 is far greater in this sense than between 2009 and 1999 (or even 2009 and 1989!). The io9 piece is a reflection of the era before the paradigmatic rupture.
I have blogged about the genetics of altitude adaptation before. There seem to be three populations in the world which have been subject to very strong natural selection, resulting in physiological differences, in response to the human tendency toward hypoxia. Two of them are relatively well known, the Tibetans and the indigenous people of the Andes. But the highlanders of Ethiopia have been less well studied, nor have they received as much attention. But the capital of Ethiopia, Addis Ababa, is nearly 8,000 feet above sea level!
Another interesting aspect to this phenomenon is that it looks like the three populations respond to adaptive pressures differently. Their physiological response varies. And the more recent work in genomics implies that though there are similarities between the Asian and American populations, there are also differences. This illustrates the evolutionary principle of convergence, where different populations approach the same phenotypic optimum, though by somewhat different means. To my knowledge there has not been as much investigation of the African example. Until now. A new provisional paper in Genome Biology is out, Genetic adaptation to high altitude in the Ethiopian highlands:
Dienekes and Maju have both commented on a new paper which looked at the likelihood of lactase persistence in Neolithic remains from Spain, but I thought I would comment on it as well. The paper is: Low prevalence of lactase persistence in Neolithic South-West Europe. The location is on the fringes of the modern Basque country, while the time frame is ~3000 BC. Table 3 shows the major result:
Lactase persistence is a dominant trait. That means any individual with at least one copy of the T allele is persistent. As Maju noted a peculiarity here is that the genotypes are not in Hardy-Weinberg Equilibrium. Specifically, there are an excess of homozygotes. Using the SJAPL location as a potentially random mating scenario you should expect ~7 T/C genotypes, not 2. Interestingly the persistent individual in the Longar location also a homozygote.
In the near future I will be analyzing the genotype of an individual where all four grandparents have been typed. But this got me thinking about my own situation: is there a way I could “reconstruct” my own grandparents? None of them are living. The easiest way to type them would be to obtain tissue samples from hospitals. This is not totally implausible, though in this case these would be Bangladeshi hospitals, so they might not have saved samples or even have a good record of hem. Another way would be to extract DNA from the burial site. This is not necessarily palatable. But assuming you did this, if you have access to a forensic lab it might be pretty easy (though I think most forensic labs using VNTRs, rather than SNP chips, so I don’t know if they’d touch every chromosome), I’m not sure that the quality would be optimal for more vanilla typing operations, especially for older samples which are likely to be contaminated with a lot of bacteria.
For me the simplest option is to look at relatives. Each of my grandparents happens to have had siblings, so there are many sets of relatives related to just each of those individuals of interest. I also have many cousins, so pooling all the genotypes together and using the information of a pedigree one could ascertain which chromosomal segments are likely to derive from a particular grandparent. To give a concrete example, my mother has a maternal cousin to whom she is quite close. By typing my mother and her cousin one could infer that the segments shared across the two individuals derive from the common maternal grandparents. Of course there’s a problem that cousins have a coefficient of relatedness of only 1/8th, so there is going to be a lot of information missing. But, if you had lots of cousins you could presumably reconstruct the genotypes far better.
There is a new paper in PLoS Genetics out which purports to characterize the ancestry of the populations of northern Africa in greater detail. This is important. The HGDP data set does have a North African population, the Mozabites, but it’s not ideal to represent hundreds of millions of people with just one group. The first author on this new paper is Brenna Henn, who was also first author on another paper with a diverse African data set. Importantly the data was posted online. Unfortunately though most of the populations didn’t have too many markers. This isn’t an issue in an of itself, but it becomes a big deal when trying to combine it with other data sets. If you limit the markers to those which intersect across two data sets you start to thin them down a lot, to the point where they’re not useful. Though the the results of the paper are worth talking about, the authors claim that they’ll be putting the data online. This is important because they used a large number of markers, so the intersections will be nice (I can, for example, envisage exploring the relationship between the North Africans and the IBS Iberian sample in the near future).
As for the paper itself, Genomic Ancestry of North Africans Supports Back-to-Africa Migrations:
Hominin increase in cranial capacity, courtesy of Luke Jostins
A few years ago a statistical geneticist at Cambridge’s Sanger Institute, Luke Jostins, posted the chart above using data from fossils on cranial capacity of hominins (the human lineage). As you can see there was a gradual increase in cranial capacity until ~250,000 years before the present, and then a more rapid increase. I should also note that from what I know about the empirical data, mean human cranial capacity peaked around the Last Glacial Maximum. Our brains have been shrinking, even relative to our body sizes (we’re not as large as we were during the Ice Age). But that’s neither here nor there. In the comments Jostins observes:
The data above includes all known Homo skulls, but none of the results change if you exclude the 24 Neandertals. In fact, you see the same results if you exclude Sapiens but keep Neandertals; the trends are pan-Homo, and aren’t confined to a specific lineage….
In the middle years of the last decade there were many papers which came out which reported many ‘hard’ selective sweeps reshaping the human genome. By this, I mean that you had a novel mutation arise against the genetic background, and positive selection rapidly increased the frequency of that mutation. Because of the power and rapidity of the sweep many of the flanking regions of the genome would “hitchhike” along, generating long homogenized regions of linkage disequilibrium. If that’s a little dense for you, just understand that very strong selective events tend to result in disorder and distinctiveness in the local genomic region.
But the late aughts and the early years of the teens are shaping up give us a more subtle picture. Instead of classic hard sweeps, researchers are suggesting that there may also be many ‘soft’ sweeps, where selection draws upon the well of standing genic variation. Instead of a novel trait becoming prominent, one tail of the distribution would rise in frequency. The ‘problem’ with this model is that it’s not as tractable as the earlier one of hard sweeps, and selection on quantitative traits with many loci of small effect is more difficult to detect. Its effect on the genome is more subtle and understated, which means that statistical tests often lack the power to grasp onto the underlying dynamics. Naturally this means that there is an extension of statistical techniques to ever greater degrees of sophistication. A new paper in PLoS Genetics attempting to tease apart the various potential selective pressures in the human genome is reflective of that tendency. Signatures of Environmental Genetic Adaptation Pinpoint Pathogens as the Main Selective Pressure through Human Evolution:
Last August I had a post up, The point mutation which made humanity, which suggested that it may be wrong to conceive of the difference between Neanderthals and the African humans which absorbed and replaced them ~35,000 years ago as a matter of extreme differences at specific genes. I was prompted to this line of thinking by Svante Pääbo‘s admission that he and his colleagues were searching for locations in the modern human genome which differed a great deal from Neanderthals as a way through which we might understand what makes us distinctively human. This sort of method has a long pedigree. Much of the past generation of chimpanzee genetics and now genomics has focused on finding the magic essence which differentiates us from our closest living relatives. Because of our perception of massive phenotypic differences between H. sapiens and Pan troglodytes the 95-99% sequence level identity is thought by some to be perplexing. Therefore models have emerged which appeal to gene regulation and expression, or perhaps other forms of variation such as copy number, to clear up how it can be that chimpanzees and humans differ so much. Setting aside that the perception of difference probably has some anthropocentric bias (i.e., would an alien think that chimpanzees and humans are actually surprisingly different in light of their phylogenetic similarities? I’m not so sure), it doesn’t seem to be unreasonable on the face of it to plumb the depths of the genomes of hominids so as to ascertain the source of their phenotypic differentiation.
But can this model work for differentiating different hominin lineages? Obviously there’s going to be a quantitative difference. The separation between chimpanzees and modern humans is on the order of 5 million years. The separation between Neanderthals and modern humans (or at least the African ancestors of modern humans ~50,000 years B.P.) is on the order of 500,000 years. An order of magnitude difference should make us reconsider, I think, the plausibility of fixed differences between two populations explaining phenotypic differences.