Standard apologies that I have had not the marginal time to blog much, but I thought it was important that I least note that Dr. Peter Ralph and Dr. Graham Coop’s paper on identity-by-descent segments and European populations and history is out in its final form in PLoS Biology, The Geography of Recent Genetic Ancestry across Europe. I’ve been familiar with the outlines of these results for about a year now, and to be frank I am still digesting them. The media hype will come and go, with true but to some extent trivial headlines that “all Europeans are related,” but the consequences of these sorts of genetic inquiries into the relatedness of populations are going to be long lasting. At least they should be.
But before I go on about that, if you find the paper itself a bit daunting (though the main body of the text strikes me as eminently readable for a piece of statistical genetics), see Carl Zimmer’s condensation. With this sort of result there is liable to be confusion, so note that Graham Coop has been posting comments on Carl’s blog (and elsewhere, and you can always send him a note on Twitter). Additionally he has a very readable FAQ out. Dr. Coop told me on Twitter that there would even be updates tomorrow as well! In particular one aspect of the paper which I noticed is that most relatively short, but detectable segments (~10 cM), between any two individuals in many nationalities is not going to be evidence of recent genealogical affinities, but deeper historical process.
An old argument going back to the origins of theoretical population genetics has to do with the nature of the genetic effects which control traits and are subject to change in allele frequency due to adaptation. Often these are bracketed as part of the controversies between R. A. Fisher and Sewall Wright (see Sewall Wright and Evolutionary Biology). In short, Fisher contended that most evolution through adaptation was driven by selection operating upon additive genetic variation. That is, variation due to alleles across the genome, each having independent and additive effects on the trait. One might think of these as linear effects. In contrast Wright’s views were more complex or confused, depending upon your perspective on the sum totality of his theories. In the domain of genetic architecture he presented a model where gene-gene interactions, epistasis, played an important role in the evolutionary trajectory of populations, which traversed ‘adaptive landscapes’ in a contingent fashion.
To understand nature in all its complexity we have to cut down the riotous variety down to size. For ease of comprehension we formalize with math, verbalize with analogies, and visualize with representations. These approximations of reality are not reality, but when we look through the glass darkly they give us filaments of essential insight. Dalton’s model of the atom is false in important details (e.g., fundamental particles turn out to be divisible into quarks), but it still has conceptual utility.
Likewise, the phylogenetic trees popularized by L. L. Cavalli-Sforza in The History and Geography of Human Genes are still useful in understanding the shape of the human demographic past. But it seems that the bifurcating model of the tree must now be strongly tinted by the shades of reticulation. In a stylized sense inter-specific phylogenies, which assume the approximate truth of the biological species concept (i.e., little gene flow across lineages), mislead us when we think of the phylogeny of species on the microevolutionary scale of population genetics. On an intra-specific scale gene flow is not just a nuisance parameter in the model, it is an essential phenomenon which must be accommodated into the framework.
The title here is somewhat misleading. This is not just a plea for population genetics, but for quantitative genetics as well. Genetics is a big field. But today it is defined by and large by DNA, the concrete entity in which the abstraction of the gene is embedded. Look at the header of this website, or the background to my Twitter account. Mind you, I’m pathetically informed about molecular genetics, and don’t have a strong interest in the topic! I did consider using the H.W.E. or the breeder’s equation for the header, but in the end I judged it too abstruse and unfamiliar to most readers. DNA dominates when it comes to the modern mental conception of genetics, and we have to live with it to some extent.
But there is also great value in the genetics which has intellectual roots in the pre-DNA Mendelians and biometricians. This genetics exhibits a symbiotic, but not necessary, association with genetics as a branch of biophysics. Yet I come here not to insult or impugn my friends who toil in the trenches of the molecular wars. Rather, I simply want to point out that our world needs balance, and the systematic aerial perspective of population, evolutionary, and quantitative genetics can provide a different kind of intellectual ballast. More importantly, for the mnemonically lazy in the audience pop, evo, and quant gives you information for free. By this, I mean that these are highly theoretical fields, and theory can predict and allow you to infer facts about the world. You don’t need immerse yourself in every scrap of data if you can derive the likely probable pattern from theory.
It seems a new field is being born! Jeff Wall & Monty Slatkin have a pretty thorough review out, Paleopopulation Genetics:
Paleopopulation genetics is a new field that focuses on the population genetics of extinct groups and ancestral populations (i.e., populations ancestral to extant groups). With recent advances in DNA sequencing technologies, we now have unprecedented ability to directly assay genetic variation from fossils. This allows us to address issues, such as past population structure, changes in population size, and evolutionary relationships between taxa, at a much greater resolution than can traditional population genetics studies. In this review, we discuss recent developments in this emerging field as well as prospects for the future.
Nothing very new for close readers of this weblog, but the references are useful for later mining.
OK, perhaps I can help with that. Dr. Coop speaks of the collaboration between himself & Dr. Joseph Pickrell, Haldane’s Sieve, which I added to my RSS days ago (and you can see me pushing it to my Pinboard). From the “About”:
As described above, most posts to Haldane’s Sieve will be basic descriptions of relevant preprints, with little to no commentary. All posts will have comment sections where discussion of the papers will be welcome. A second type of post will be detailed comments on a preprint of particular interest to a contributor. These posts could take the style of a journal review, or may simply be some brief comments. We hope they will provide useful feedback to the authors of the preprint. Finally, there will be posts by authors of preprints in which they describe their work and place it in broader context.
We ask the commenters to remember that by submitting articles to preprint servers the authors (often biologists) are taking a somewhat unusual step. Therefore, comments should be phrased in a constructive manner to aid the authors.
It might be helpful if other evolution/genetics bloggers reblog this so we can push it up the Google search results. If you google “Haldane’s Sieve” some of the results are interesting…and not necessarily in a good way. I do feel guilt blogging on stuff my readers can’t get, so the more preprints become acceptable the more we (as in, the general public) can understand about evolution.
A reader emailed me to ask what I thought would be a good way to better understand some of the more technical posts I put up.
First, two course notes which I’ve found useful as personal references:
- Evolutionary Quantitative Genetics, Uppsala University (if you are ambitious, bookmark this too)
Some people might argue that John Gillespie’s Population Genetics: A Concise Guide (Kindle edition) is a touch too abstruse and cryptic for the introductory reader. It’s short, and the mathematics isn’t challenging, but because of its concision the author can sometimes unleash upon your nearly cryptic formalism, perhaps defeating the purpose of a soft introduction in the first place. To get the most out of this book you probably ironically have to have a more thorough textbook on hand to clear up those particular points which you find confusing. But to get the general logic of population genetics and establish familiarity this seems to be the right entry point (assuming you’re not to terrified by algebra).
Of course most readers of this weblog are focused more particularly on the topic of evolution, and evolutionary genetics. Evolutionary Genetics by the late great John Maynard Smith is a rather good full-spectrum introduction into this topic. It covers many of the topics in the Gillespie book, though less mathematically intensively. But this is not not just a population genetics books, and expands into other topics of relevance to evolutionary genetics (e.g., of course phylogenetics gets a big shout out). And Smith has richer empirical examples too. This is probably a less intimidating book than Population Genetics, but I’d recommend you hit it second because it will make much more sense if you’ve got some of the foundations undernearth you.
Some of you though might be a little on the unbalanced side. I actually “learned” population genetics (if it can be said I know population genetics, I’d more honestly state I’d familiar with population genetics) through Principles of Population Genetics, the “Hartl & Clark” book. This is not really an introductory book, but I think in some ways it’s more comprehensible than Population Genetics, because it doesn’t need to be concise. Sometimes formal methods in pop gen only make sense with lots of empirical examples and worked out problems, and this is a book which has the scope in terms of space for that. There’s really no big downside I can give for getting this book. I’ve got he 2nd, 3rd, and 4th editions, mostly because I couldn’t find an affordable copy of the 3rd in the early 2000s, while the 4th came out literally right after I purchased the 3rd!
In the survey below I asked if you knew about how many migrants per generation were needed to prevent divergence between populations. About ~80 percent of you stated you did not know the answer. That was not totally surprising to me. The reason I asked is that the result is moderately obscure, but also rather surprisingly simple and fruitful. The rule of thumb is that 1 migrant per generation is needed to prevent divergence.*
It doesn’t tell you much in and of itself of course. But if you think about it you can inject that fact into all sorts of other population genetic phenomena. For example, to have selection across two populations which is not reducible to selection within those populations (i.e., inter-demic selection) you need group-level genetic differences. These differences can be measured by the Fst statistic. In short the value of Fst tells you the proportion of variation which can be attributed to between-group differences (e.g., Fst across human races is ~0.15). For natural selection to have any adaptive effect you also need heritable variation. If you have lots of heritable variation selection can be weaker, while if you have little heritable variation selection has to be very strong (see response to selection). Fst is a rough gauge of heritable variation when you are evaluating group level differences. An Fst of 1.0 would imply that the groups are nearly perfectly distinct at the loci of interest, while an Fst of 0.0 would imply that the groups are not genetically distinct at all. With no distinction selection would have no efficacy in terms of driving adaptation. All this is a long way to saying that the 1 migrant rule is one reason that evolutionary biologists take a skeptical position in relation to group selection. It tends to quickly erase the variation which group selection depends upon.
I recently heard an eminent geneticist declare that population genetics began with Theodosius Dobzhansky’s Genetics and the Origin of Species in 1937. My immediate reflex was to be skeptical of this, at least going by Will Provine’s treatment in The Origins of Theoretical Population Genetics, which seemed to push back the timing to the 1920s.
So I looked up “population genetics” in Ngram viewer.
These results are not consistent with my expectations. Looks like my intuition was wrong. At least for the term population genetics. Score one for experience and wisdom.
Over at A Replicated Typo they are talking about a short paper in Science, Mother Tongue and Y Chromosomes. In it Peter Forster and Colin Renfrew observe that “A correlation is emerging that suggests language change in an already-populated region may require a minimum proportion of immigrant males, as reflected in Y-chromosome DNA types.” But there’s a catch: they don’t calculate a correlation in the paper. Rather, they’re making a descriptive verbal observation. This observation seems plausible on the face of it. In addition to the examples offered, one can add the Latin American case, where mestizo populations tend to have European Y chromosomal profiles and indigenous mtDNA.
The Pith: There has been a long running argument whether Pygmies in Africa are short due to “nurture” or “nature.” It turns out that non-Pygmies with more Pygmy ancestry are shorter and Pygmies with more non-Pygmy ancestry are taller. That points to nature.
In terms of how one conceptualizes the relationship of variation in genes to variation in a trait one can frame it as a spectrum with two extremes. One the one hand you have monogenic traits where the variation is controlled by differences on just one locus. Many recessively expressed diseases fit this patter (e.g., cystic fibrosis). Because you have one gene with only a few variants of note it is easy to capture in one’s mind’s eye the pattern of Mendelian inheritance for these traits in a gestalt fashion. Monogenic traits are highly amenable to a priori logic because their atomic units are so simple and tractable. At the other extreme you have quantitative polygenic traits, where the variation of the trait is controlled by variation on many, many, genes. This may seem a simple formulation, but to try and understand how thousands of genes may act in concert to modulate variation on a trait is often a more difficult task to grokk (yes, you can appeal to the central limit theorem, but that means little to most intuitively). This is probably why heritability is such a knotty issue in terms of public understanding of science, as it concerns the component of variation in quantitative continuous traits which is dispersed across the genome. The traits where there is no “gene for X.” Additionally, quantitative traits are likely to have a substantial environmental component of variation, confounding a simple genotype to phenotype mapping.
Arguably the classic quantitative trait is height. It is clear and distinct (there aren’t arguments about the validity of measurement as occurs in psychometrics), and, it is substantially heritable. In Western societies with a surfeit of nutrition height is ~80-90% heritable. What this means is that ~80-90% of the variance of the trait value within the population is due to variance of the genes within the population. Concretely, there will be a very strong correspondence between the heights of offspring and the average height of the two parents (controlled for sex, so you’re thinking standard deviation units, not absolute units). And yet height is at the heart of the question of the “missing heriability” in genetics. By this, I mean the fact that so few genes have been associated with variation in height, despite the reality that who your parents are is the predominant determination of height in developed societies.
I own a book of Motoo Kimura’s collected papers, and of course I have a copy of John Gillespie’s Population Genetics: A Concise Guide. But I’d forgotten the acrimony between the two men. Gillespie has been retired for half a decade now, while Kimura died in in 1994. I randomly stumbled onto an old newspaper story from 1992 covering the feud between these two eminent population geneticsts, Scientists in Open War over “Neutral Theory” of Genetics. It was in the Sacramento Bee, which is based near Gillespie’s university. The background is that both were principals in the “neutralist–selectionist” debate of the 1970s and 1980s. Kimura was one of the main theoretical architects of the neutral theory of molecular evolution, which eventually spread its influence to the point where old-line adaptationists such as Richard Dawkins had to offer up a counter-argument.
Some choice bits:
To the left you see a zoom in of a PCA which Dienekes produced for a post, Structure in West Asian Indo-European groups. The focus of the post is the peculiar genetic relationship of Kurds, an Iranian-speaking people, with Iranians proper, as well as Armenians (Indo-European) and Turks (not Indo-European). As you can see in some ways the Kurds seem to be the outgroup population, and the correspondence between linguistic and genetic affinity is difficult to interpret. For those of you interested in historical population genetics this shouldn’t be that surprising. West Asia is characterized by of endogamy, language shift, and a great deal of sub and supra-national communal identity (in fact, national identity is often perceived to be weak here). A paper from the mid-2000s already suggested that western and eastern Iran were genetically very distinctive, perhaps due to the simple fact of geography: central Iran is extremely arid and relatively unpopulated in relation to the peripheries.
But this post isn’t about Kurds, rather, observe the very close relationship between Turks and Armenians on the PCA. The _D denotes Dodecad samples, those which Dienekes himself as collected. This affinity could easily be predicted by the basic parameters of physical geography. Armenians and Anatolian Turks were neighbors for nearly 1,000 years. Below is a map which shows the expanse of the ancient kingdom of Armenia:
This morning I received an email from the communication director of the American Anthropology Association. The contents are on the web:
AAA Responds to Public Controversy Over Science in Anthropology
Some recent media coverage, including an article in the New York Times, has portrayed anthropology as divided between those who practice it as a science and those who do not, and has given the mistaken impression that the American Anthropological Association (AAA) Executive Board believes that science no longer has a place in anthropology. On the contrary, the Executive Board recognizes and endorses the crucial place of the scientific method in much anthropological research. To clarify its position the Executive Board is publicly releasing the document “What Is Anthropology?” that was, together with the new Long-Range Plan, approved at the AAA’s annual meeting last month.
The “What Is Anthropology?” statement says, “to understand the full sweep and complexity of cultures across all of human history, anthropology draws and builds upon knowledge from the social and biological sciences as well as the humanities and physical sciences. A central concern of anthropologists is the application of knowledge to the solution of human problems.” Anthropology is a holistic and expansive discipline that covers the full breadth of human history and culture. As such, it draws on the theories and methods of both the humanities and sciences. The AAA sees this pluralism as one of anthropology’s great strengths.
Changes to the AAA’s Long Range Plan have been taken out of context and blown out of proportion in recent media coverage. In approving the changes, it was never the Board’s intention to signal a break with the scientific foundations of anthropology – as the “What is Anthropology?” document approved at the same meeting demonstrates. Further, the long range plan constitutes a planning document which is pending comments from the AAA membership before it is finalized.
Anthropologists have made some of their most powerful contributions to the public understanding of humankind when scientific and humanistic perspectives are fused. A case in point in the AAA’s $4.5 million exhibit, “RACE: Are We So Different?” The exhibit, and its associated website at www.understandingRACE.org, was developed by a team of anthropologists drawing on knowledge from the social and biological sciences and humanities. Science lays bare popular myths that races are distinct biological entities and that sickle cell, for example, is an African-American disease. Knowledge derived from the humanities helps to explain why “race” became such a powerful social concept despite its lack of scientific grounding. The widely acclaimed exhibit “shows the critical power of anthropology when its diverse traditions of knowledge are harnessed together,” said Leith Mullings, AAA’s President-Elect and the Chair of the newly constituted Long-Range Planning Committee.
Tishkoff et al.
Reading Peter Bellwood’s First Farmers: The Origins of Agricultural Societies, I’m struck by how much of a difference five years has made. When Bellwood was writing the ‘orthodoxy’ of the nature of the expansion of farming into Europe leaned toward cultural diffusion. Today the paradigm is in flux, as a new generation of genomic studies using ancient DNA, wider sets of markers, and a broader sampling of populations, makes untenable solid old truths. I’m reading Bellwood’s work in part because from what I have read elsewhere it seems as if his model seems less and less ridiculous in light of the new information bubbling out of human genomics. The swell of data in this field is such that it’s hard to keep up. You never know what you’re going to wake up to in the morning.
The assertions of archaeologists and pre-historians such as Bellwood have clear implications and offer up specific predictions about the shape of the tree of human phylogenetics. Now the results are getting robust enough that the models can be tested, and alternatives refuted or accepted. But sometimes you need to take stock. Many of my posts make the assumption that you have a lot of the background information in hand, but I know that’s not always possible. With that, I’d like to bring your attention to a paper in Human Molecular Genetics, Fine-scale population structure and the era of next-generation sequencing:
An interesting readable review in PLoS Genetics taking on population genetics, Frail Hypotheses in Evolutionary Biology:
In conclusion, I return to Michael Lynch’s challenging questions about blind spots and bad wheels in evolutionary biology which motivated this review…Concerning blind spots I have pointed out some limitations of current population genetics. There is too much emphasis on elegant mathematics, and not enough concern for the real values of the critical parameters -in particular, in models of mutation spread and fixation, or in models of optimal mutation rates. Recombination, a crucial genetic mechanism, is misrepresented in the models. Features that looked anecdotal, such as recombination between sister chromatids and germ-line mutations are perhaps central to the mechanisms of evolution in higher organisms. My proposals on mutation strategies…—see also Amos…—lead to rather precise insights on compensatory mutations or polymorphism propagation, yet they are largely ignored by population geneticists.
The beauty of population genetics is that it leads to relatively simple algebras which one can use to guide one’s intuitions. Phenomena such as selection or drift are more than words, they’re specific values. That being said, plenty of readers of this weblog have expressed caution, and skepticism, at the over-utilization of monogenic diallelic models as “quick & dirty” prototypes for evolution more generally. More concretely small changes in parameter values can lead to radically different inferences within the real context of natural history. Excessive reliance on elegant population genetic theory can lead one astray just as excessive reliance on economic theory can. The real world introduces so many complications that discarding too many of them to make a model tractable may render the framework of trivial importance, or even lead one down false paths. I don’t find author’s specific objections of distinction, but the paper is useful as an entry-point into the debate within the literature. The fact that R. A. Fisher, J. B. S. Haldane, and Sewall Wright, did not predict the full path of empirical discovery over the 20th century indicates very concretely the limitations of theoretical frameworks within biology.
Hopefully by now the image to the left is familiar to you. It’s from a paper in Human Genetics, Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study. The paper is interesting in and of itself, as it combines a wide set of populations and puts the focus on the extent of disjunction between self-identified ethnic identity, and the population clusters which fall out of patterns of genetic variation. In particular, the authors note that the “Native Hawaiian” identification in Hawaii is characterized by a great deal of admixture, and within their sample only ~50% of the ancestral contribution within this population was Polynesian (the balance split between European and Asian). The figure suggests that subjective self assessment of ancestral quanta is generally accurate, though there are a non-trivial number of outliers. Dienekes points out that the same dynamic holds (less dramatically) for Europeans and Japanese populations within their data set.
All well and good. And I like these sorts of charts because they’re pithy summations of a lot of relationships in a comprehensible geometrical fashion. But they’re not reality, they’re a stylized representation of a slice of reality, abstractions which distill the shape and processes of reality. More precisely the x-axis is an independent dimension of correlations of variation across genes which can account for ~7% of the total population variance. This is the dimension with the largest magnitude. The y-axis is the second largest dimension, accounting for ~4%. The magnitudes decline precipitously as you descend down the rank orders of the principle components. The 5th component accounts for ~0.2% of the variance.
The first two components in these sorts of studies usually conform to our intuitions, and add a degree of precision to various population scale relations. Consider this supplement chart from a 2008 paper (I’ve rotated and reedited for clarity):
Yesterday I pointed to a new paper, Plasmodium vivax clinical malaria is commonly observed in Duffy-negative Malagasy people. P. vivax is the least virulent of the malaria inducing pathogens, and it is presumably responsible for the fact that the Duffy antigen locus is one of the more ancestrally informative ones in the human genome. In most of Eurasia the the Duffy negative null allele* is present at very low frequencies, less than 5%, and often simply absent. In contrast, in Sub-Saharan Africa the Duffy negative variant reaches frequencies as high as 95% in West Africa, and and 90% in many other regions. In North Africa and the Middle East the frequencies are intermediate, likely due to the necessity for local adaptation to malaria in many regions, and the historical introduction of the Duffy negative allele via the slave trade.
Before genomics, looking at the Duffy locus was one simple way that geneticists ascertained the proportion of white admixture in the African American population. The Duffy negative allele was nearly absent in Europeans, and present in frequencies of ~95% in West Africa. Therefore, the ~70% frequency in African Americans indicates what we know from other sources, a substantial minority European contribution to their ancestry. The people of Madagascar are similar insofar as they are a byproduct of admixture between African and non-African populations. The source of the non-African ancestry is rather easy to determine, unlike most African countries Madagascar has one language, Malagasy, and it is of the Barito family of languages. Aside from Malagasy the Barito languages are spoke only in a small region of southern Borneo in Indonesia. There are other aspects of the Malagasy culture which make their Southeast Asian provenance clear. The photo above is of Andry Rajoelina, the current President of Madagascar. Two aspects of his visage are salient, his youth (he used to be a disk jockey!), and the fact that his features do not seem typical Sub-Saharan African. Many of the leaders of Madagascar, including the former royal family, are from the highlands where Asiatic features and folkways are more prevalent.
But there is also a clear African component to the Malagasy, more obvious among coastal populations, but also possibly dominant in a genetic sense in terms of proportion to the Asian according to research using uniparental markers. An analysis of Y lineage Fst genetic distances suggests that the Malagasy are, on the whole, somewhat closer to East Africans than to people from Borneo. I stipulate on the whole because as implied above there seems to be regional variation, which Southeast Asian ancestry and culture least hybridized with a Sub-Saharan African in the central highlands, likely for ecological reasons.
If you are like me, and if you are reading this weblog there is a significant probability you are like me, you read L. L. Cavalli-Sforza‘s History and Geography of Human Genes in the 1990s, and in the early aughts Spencer Wells’ A Journey of Man. Science has come very far in the last in the last 10-15 years, even Cavalli-Sforza’s magnum opus pales in comparison to the literal tsunami of data and analysis which the “post-genomic era” has ushered in. Instead of a gene here and there, or even the mtDNA and Y chromosome, researchers are now looking at hundreds of thousands of genetic variants, SNPs, across genomes. We’re rapidly approaching the era of whole genome sequencing, even if we’re not quite there yet.
But what’s the purpose of advances in technique and computation? Though the long-term project is to understand human variation and genetic function so as to have biomedical utility, in the short-term there is an enormous wealth of more abstract population genetic insight which can be extracted. Because of the biomedical focus of contemporary genomics we take a somewhat anthropocentric view, which is fine by me as I am an unregenerate speciest. The fish, fowl and crawling things of the earth can come later. And in any case, the beauty of the human focus of modern evolutionary genomics is that there are whole disciplines such as paleoanthropology which can serve as partners in interdisciplinary projects.
Humans are like any other organism, buffeted by conventional evolutionary genetic dynamics, drift, migration, natural selection, as well as processes which are more biophysically rooted such as recombination and mutation. Each of these processes leave their tell-tale marks on the genome. Mutation replenishes variation which drift and selection often eliminate, the former by chance and the latter in the form of negative selection. Migration serves to homogenize across populations through gene flow, while diversifying within populations by introducing novel variants. Finally, recombination breaks up linear associations of genetic variants along a DNA sequence, and has been used to explain sex.