Google’s Ngrams viewer yields some interesting results when you query the mentions of various fields within genetics over the past century or so. Nothing too surprising, though I would have thought that molecular genetics would have surpassed population genetics even earlier than it did.
Click To Enlarge
Note that the default settings go to 2000. But I was curious about genomics’ trajectory, and I pushed it to the max (2008). The qualitative result was not a surprise to me, but the magnitude did take me aback:
For many the image of evolutionary processes brings to mind something on a macro scale. Perhaps that of the changing nature of protean life on earth writ large, depicted on a broad canvas such as in David Attenborough’s majestic documentaries over millions of years and across geological scales. But one can also reduce the phenomenon to a finer-grain on a concrete level, as in specific DNA molecules. Or, transform it into a more abstract rendering manipulable by algebra, such as trajectories of allele frequencies over generations. Both of these reductions emphasize the genetic aspect of natural history.
Obviously evolutionary processes are not just fundamentally the flux of genetic elements, but genes are crucial to the phenomena in a biological sense. It therefore stands to reason that if we look at patterns of variation within the genome we will be able to infer in some deep fashion the manner in which life on earth has evolved, and conclude something more general about the nature of biological evolution. These are not trivial affairs; it is not surprising that philosophy-of-biology is often caricatured as philosophy-of-evolution. One might dispute the characterization, but it can not be denied that some would contend that evolutionary processes in some way allow us to understand the nature of Being, rather than just how we came into being (Creationists depict evolution as a religion-like cult, which imparts the general flavor of some of the meta-science and philosophy which serves as intellectual subtext).
I hope we don’t bomb Syria. El Yucateco XXXtra Hot Kutbil-ik Mayan Style Habanero Hot Sauce is delicious.
One of the things that people like to do when thinking about evolutionary processes is to consider future predictions of a model. The problem with this is that evolutionary trajectories are not defined just by linear transitions driven by powerful positive selection. There are long term balancing forces which result in modulated equilibria. In some cases the power of selection to reshape the genome under the impetus of a new adaptation is clear and evident. Lactase persistence and malaria come to mind. But in other domains the outcomes are not so clear. For example, human intuition may tell us that higher intelligence, or in the case of men greater height, are beneficial. But the reality is that these are heritable traits which exhibit a great deal of genetic variation. The latest research also suggests that variation on these traits are controlled by innumerable genes, rather than a few of large effect. Going back to what I learned in Principles of Population Genetics, my initial impulse is to assume that continuous quantitative traits which are highly heritable are not subject to strong positive selection. But sometimes textbook wisdom needs to be updated.
There is the fact of evolution. And then there is the long-standing debate of how it proceeds. The former is a settled question with little intellectual juice left. The latter is the focus of evolutionary genetics, and evolutionary biology more broadly. The debate is an old one, and goes as far back as the 19th century, where you had arch-selectionists such as Alfred Russel Wallace (see A Reason For Everything) square off against pretty much the whole of the scholarly world (e.g., Thomas Henry Huxely, “Darwin’s Bulldog,” was less than convinced of the power of natural selection as the driving force of evolutionary change). This old disagreement planted the seeds for much more vociferous disputations in the wake of the fusion of evolutionary biology and genetics in the early 20th century. They range from the Wright-Fisher controversies of the early years of evolutionary genetics, to the neutralist vs. selectionist debate of the 1970s (which left bad feelings in some cases). A cartoon-view of the implication of the debates in regards to the power of selection as opposed to stochastic contingency can be found in the works of Stephen Jay Gould (see The Structure of Evolutionary Theory) and Richard Dawkins (see The Ancestor’s Tale): does evolution result in an infinitely creative assortment due to chance events, or does it drive toward a finite set of idealized forms which populate the possible parameter space?*
This is probably the longest stretch of time with me not posting much since May/June of 2010. But I’ll be back to offering my opinions and analyses soon enough. Also, I should probably mention that I’ll be presenting at the 7th International Conference on Advances in Canine and Feline Genomics and Inherited Diseases (a.k.a. the cat & dog conference) at the end of the month (I’ll be in Cambridge/Boston 23rd to 29th). I’ll be rather busy the whole week, but I thought I would mention it in case some readers see someone who looks like me around the Broad Institute and are curious. Higher chance than normal that it is me.
I don’t currently have time to read Emily Oster’s Expecting Better: Why the Conventional Pregnancy Wisdom Is Wrong and What YouReally Need to Know, but I am very excited that it came out. Having had a pregnant wife and becoming a parent has made it clear to me that much of ‘conventional wisdom’ in regards to both parenting and pregnancy are socially enforced norms which have marginal empirical grounding (this is clear when you look at the huge variation in cross-cultural expectations even in developing societies). So I’m glad that Oster is pushing this issue in a somewhat more rational and hard-headed fashion that has previously been the case (i.e., some of people who I think have good ideas about skepticism of the idea that every pregnancy is a medical emergency waiting to happen at any moment, also try to sell you on ‘alternative medicine’ more generally). It doesn’t help that she’s within the penumbra of University of Chicago’s academic celebrity.
But the reason I’m posting right now is that the book’s Amazon page is a case in point in regards to the intersection between pregnancy and the culture wars. Of 50 reviews as of this writing 20 give it five stars, 2 give it two stars, and 28 give it 1 star!
The Pith:In India 5,000 years ago there were the hunter-gathers. Then came the Dravidian farmers. Finally came the Indo-Aryan cattle herders.
There is a new paper out of the Reich lab, Genetic Evidence for Recent Population Mixture in India, which follows up on their seminal 2009 work, Reconstructing Indian Population History. I don’t have time right now to do justice to it, but as noted this morning in the press, it is “carefully and cautiously crafted.” Since I am not associated with the study, I do not have to be cautious and careful, so I will be frank in terms of what I think these results imply (note that confidence on many assertions below are modest). Though less crazy in a bald-faced sense than another recent result which came out of the Reich lab, this paper is arguably more explosive because of its historical and social valence in the Indian subcontinent. There has been a trend over the past few years of scholars in the humanities engaging in deconstruction and intellectual archaeology which overturns old historical orthodoxies, understandings, and leaves the historiography of a particular topic of study in a chaotic mess. From where I stand the Reich lab and its confederates are doing the same, but instead of attacking the past with cunning verbal sophistry (I’m looking at you postcolonial“theorists”), they are taking a sledge-hammer of statistical genetics and ripping apart paradigms woven together by innumerable threads. I am not sure that they even understand the depths of the havoc they’re going to unleash, but all the argumentation in the world will not stand up to science in the end, we know that.
Since the paper is not open access, let me give you the abstract first:
For various reasons the idea of mitochondrial Eve and Y chromosomal Adam capture the public imagination. This frustrates many people, including me. I’ve gotten into the fatigue stage on this topic, but some sort of counter-attack is necessary against malignant memes. Even geneticists who don’t usually work with populations can get confused by the implications of mtDNA and Y chromosomal phylogenies. Melissa Wilson Sayres, who works on Y chromosomes, has a useful post (promised first of two) at Panda’s Thumb, Y and mtDNA are not Adam and Eve: Part 1. If you have friends/acquaintances who are confused by this issue, it might be a good place to start.
The above figure is from Norton et al.’s Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians. It shows that rs16891982 on the SLC45A2 locus exhibits strong differentiation between Europe and the rest of the world. This is in contrast to SLC24A5, where the well known allele which differentiates Africans/East Asians from Europeans is found at very high frequencies across Western Eurasia (both my parents are homozygotes for the “European” variant; in fact SLC24A5′s derived variant is found at fractions on the order of ~50% in eastern and southern India). The ancestral allele on SLC24A5 is very difficult to find in Europeans, it is so close to fixation for the derived variant. In contrast SLC45A2‘s minor allele is segregating at appreciable frequencies in places like southern Spain, and the derived allele is not fixed even in Northern Europe.
I won’t review the literature on the genomics and evolution of human pigmentation at this point. Rather, I’ll just note that it seems most of the inter-population variation is controlled by a handful of genes. It’s a polygenic trait, but just. Second, a fair amount of evidence has emerged that some of the lightening derived variants have increased in frequency only very recently (e.g., on the order of ~10,000 years). Pigmentation is then a peculiar trait where the genetic underpinnings can give historical phylogenetic information because of the varied dates of differentiation and selective sweeps.
Below I’ve collated results from several studies on frequencies of SLC45A2. I invite readers to persue them. I will say two things. First, the frequency of the “European” variant in ~140 northern Ethiopians is 0%. This is peculiar for a population which may be on the order of ~50% West Eurasian. Second, the fraction of SLC45A2 derived variant in South Asians coincidentally tracks the “NE Euro” percentage in Zack Ajmal’s results.
Sports Illustrated writer David Epstein has a new book out, The Sports Gene: Inside the Science of Extraordinary Athletic Performance. The title strikes me as coarse and reductive, but I am aware that authors do not always have control over such things. I’ve corresponded with Epstein a bit over the past year, and he’s sent me some passages relating to human evolutionary genetics and paleoanthropology to me to make sure they don’t sound crazy. I haven’t had time to read the book, but judging from the interview I listened to on NPR it’s data rich and theory subtle. Though the title seems to imply that athleticism is a single gene trait where most of the variation in the population is due to genetic variation, Epstein denies this and instead presents the reality that athleticism is a complex trait which many dimensions, subject to numerous genetic and environment variables, and, interactions across those variables. That would make for a less sexy subtitle, but it would have had the attribute of being correct.
There were two papers in Science which came out on the Y chromosome, Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females and Low-Pass DNA Sequencing of 1200 Sardinians Reconstructs European Y-Chromosome Phylogeny. I can recommend what Dienekes had to say, and I wasn’t going to comment until I saw this egregious piece in The New Scientist: Arabian flights: Early humans diverged in 150 years. Because of the title I did not initially think that this had anything to do with the Y chromosome, but it turns out that the piece uses the finding that three primary non-African haplogroups diverged in rapid succession from each other as the hook for the headline. In fact not only does the Y not offer definitive accounts of human history, it doesn’t even necessarily tell us about the history of men. It’s a marker, not a time machine. To repeat: the history of a specific genetic locus is not the history of a population. It has to be said.
Before there was Structure there was just structure. By this, I mean that population substructure has always been. The question is how we as humans shall characterize and visualize it in a manner which imparts some measure of wisdom and enlightenment. A simple fashion in which we can assess population substructure is to visualize the genetic distances across individuals or populations on a two dimensional plot. Another way which is quite popular is to represent the distance on a neighbor joining tree, as on the left. As you can see this is not always satisfying: dense trees with too many tips are often almost impossible to interpret beyond the most trivial inferences (though there is an aesthetic beauty in their feathery topology!). And where graphical representations such as neighbor-joining trees and MDS plots remove too much relevant information, cluttered FSTmatrices have the opposite problem. All the distance data is there in its glorious specific detail, but there’s very little Gestalt comprehension.
My friend Zack Ajmal has been running the Harappa Ancestry Project for several years now. This is a non-institutional complement to the genomic research which occurs in the academy. His motivation was in large part to fill in the gaps of population coverage within South Asia which one sees in the academic literature. Much of this is due to politics, as the government of India has traditionally been reluctant to allow sample collection (ergo, the HGDP data uses Pakistanis as their South Asian reference, while the HapMap collected DNA from Indian Americans in Houston). Of course this sort of project is not without its own blind spots. Zack must rely on public data sets to get a better picture of groups like tribal populations and Dalits, because they are so underrepresented in the Diaspora from which he draws many of the project participants.
Once Zack has the genotype one of the primary things he does is add it to his broader data set (which includes many public samples) and analyze it with the Admixture model-based clustering package. What Admixture does is take a specific number of populations (e.g. K = 12) and generate quantity assignments to individuals. So, for example individual A might be assigned 40% population 1 and 60% population 2 for K = 2. Individual B might be 45% population 1 and 55% population 2. These are not necessarily ‘real’ populations. Rather, the populations and their proportions are there to allow you to discern patterns of relationships across individuals.
Since Zack has put his results online, I thought it would be useful to review what patterns have emerged over the past two years, as his sample sizes for some regions are now moderately significant. Though he has K=16 populations, not all of them will concern us, because South Asians do not tend to exhibit many of the components. I will focus on seven: S Indian, Baloch, Caucasian, NE Euro, SE Asian, Siberian and NE Asian. These are not real populations, but the labels tell you which region these components are modal. So, for example, the “S Indian” component peaks in southern India. The “Baloch” in among the Baloch people of southeastern Iran and southwest Pakistan. The “NE Euro” among the eastern Baltic peoples. The last three are Asian components, running the latitude from south to north to center. They only concern the first population of interest, Bengalis. I will combine these last three together as “Asian.”
Below is a table, mostly individuals from Zack’s results (though there are some aggregate results from public data sets). Comments below.
I thought it might be useful for new readers to understand a bit about my comments policy and how I’ve come this stance. Let me give you an example of one individual who occasionally left comments on my blog, often combative, though just on the legitimate side of the trolling boundary. One of the major tactics of argument of this individual was to impute upon me particular life experiences which he thought I must have had, and so shaped my opinions. Though I do not share much about my personal life online, I do go by my “real name,” and over 11+ years of writing on the internet one can construct a rough narrative from stray anecdotes. The key is though that this picture is rough. After one exchange where my interlocutor made an inference based on his own perception of various likelihoods about me, I tired of the one sided game (he was anonymous), and looked him upon on Facebook. I left a quick comment to that effect, and asserted now the scales were somewhat balanced. He never left a comment after that incident.