Credit: Albozagros
The genetics and history of Tibet are fascinating to many. To be honest the primary reason here is elevation. The Tibetan plateau has served as a fortress for populations who have adapted biologically and culturally to the extreme conditions. Naturally this means that there has been a fair amount of population genetics on Tibetans, as hypoxia is a side effect of high altitude living which dramatically impacts fitness. I have discussed papers on this topic before. And I will probably talk more about it in the future, considering rumblings at ASHG 2012.
But to understand the character of the effect of natural selection on a population it is often very important to keep in mind the phylogenetic context. By this, I mean that evolutionary processes occur over history, and those historical events shape the course of subsequent of phenomena. Concretely, to understand how the Tibetans came to be adapted to high altitudes one must understand who they are related to, and what their long term history is. There is a paper in Molecular Biology and Evolution which attempts to do just that, Genetic evidence of Paleolithic colonization and Neolithic expansion of modern humans on the Tibetan Plateau:

Byzantine Emperor Leo “the Khazar” with his son Constantine IV. Credit: Cplakidas
For the past year or so I’ve been getting queries about what I think about Eran Elhaik’s preprint on the genetic character of European Jews. I found some of the conclusions frankly a little weird, but I assumed that things would be cleaned up for publication. Well, it’s been out for a while now: The Missing Link of Jewish European Ancestry: Contrasting the Rhineland and the Khazarian Hypotheses. But some reporting in The Jewish Daily Forward has brought the author and his detractors a bit into the spotlight. The reason is that as you can tell from the title of the author takes a position on the Khazarian origin model of Ashkenazi Jews (in favor). Here is a non-genetic take over at GeoCurrents, the thrust of which I basically concur with.
In any case, many of the problems with the paper remain. Really it all begins and ends here:
Because of Angelina Jolie’s revelation, the Myriad Genetics case is in the news again. If you don’t know what I’m talking about, look it up. Because of the patent Myriad can charge thousands of dollars for a test which would otherwise be much cheaper (and putting it out of reach of many without health insurance). My question here is simple: if you are a geneticist do you think Myriad’s position has any validity? The reason I ask is that I know many geneticists, and I know many geneticists read me, and I follow many geneticists on Twitter, but I’ve never encountered one who would be willing to defend Myriad’s position as plausible and passing the smell test. If you are one of those geneticists please leave a comment, because I’m honestly curious.
I went to the talks about the Myriad case at ASHG, and I have to say it was all law, and no science. The science was confused and laughable. The panelists themselves rolled their eyes and expressed resignation as to the garbled ratiocinations of the judges who reviewed the case. There is a classic “two cultures” problem.

Don’t forget the deep structure in Italy!
Credit: Rita Molnar
Standard apologies that I have had not the marginal time to blog much, but I thought it was important that I least note that Dr. Peter Ralph and Dr. Graham Coop’s paper on identity-by-descent segments and European populations and history is out in its final form in PLoS Biology, The Geography of Recent Genetic Ancestry across Europe. I’ve been familiar with the outlines of these results for about a year now, and to be frank I am still digesting them. The media hype will come and go, with true but to some extent trivial headlines that “all Europeans are related,” but the consequences of these sorts of genetic inquiries into the relatedness of populations are going to be long lasting. At least they should be.
But before I go on about that, if you find the paper itself a bit daunting (though the main body of the text strikes me as eminently readable for a piece of statistical genetics), see Carl Zimmer’s condensation. With this sort of result there is liable to be confusion, so note that Graham Coop has been posting comments on Carl’s blog (and elsewhere, and you can always send him a note on Twitter). Additionally he has a very readable FAQ out. Dr. Coop told me on Twitter that there would even be updates tomorrow as well! In particular one aspect of the paper which I noticed is that most relatively short, but detectable segments (~10 cM), between any two individuals in many nationalities is not going to be evidence of recent genealogical affinities, but deeper historical process.
A few years back I was rather fixated on issues of maternal fetal health. In particular I was worried about gestational diabetes in relation to my wife because I come from an ethnic group with an elevated risk for these sorts of problems, and the effect when you are in mixed-race marriages seems to be additive (i.e., unlike some risk factors associated with pregnancies the mother’s ethnicity is not the only relevant variable). This is embedded in the broader suite of metabolic diseases which exhibit ethnic variation. Early work on genome-wide selection in humans yielded the result that there was a strong enrichment for signals of adaption within regions of the genome associated with metabolism, so this should not be that surprising. Humans are a geographically dispersed species that inhabits a wide range of environments, so natural selection would shape the distribution of phenotypes within populations if evolution is a significant historical process (it is).
A paper in last month’s Trends in Genetics highlights more precisely how natural selection would operate in a life history context in specific cases. Many ways to die, one way to arrive: how selection acts through pregnancy:
When considering selective forces shaping human evolution, the importance of pregnancy to fitness should not be underestimated. Although specific mortality factors may only impact upon a fraction of the population, birth is a funnel through which all individuals must pass. Human pregnancy places exceptional energetic, physical, and immunological demands on the mother to accommodate the needs of the fetus, making the woman more vulnerable during this time-period. Here, we examine how metabolic imbalances, infectious diseases, oxygen deficiency, and nutrient levels in pregnancy can exert selective pressures on women and their unborn offspring. Numerous candidate genes under selection are being revealed by next-generation sequencing, providing the opportunity to study further the relationship between selection and pregnancy. This relationship is important to consider to gain insight into recent human adaptations to unique diets and environments worldwide.

German woman, product of Mid-Neolithic?
Source: Siebbi
Yesterday I pointed to a paper which was interesting enough, but didn’t pass the smell test in relation to other evidence we have (at least in my opinion!). A primary concern was the fact that uniparental (male and female lineages) show a peculiar distribution of variation in comparison to autosomal genetic variation (i.e., the vast majority of the genome) in the case of Europe (genome-wide analysis suggest more of Europe’s variation is partitioned north-south, but Y and mtDNA results often imply an east-west split). But a secondary concern I had was that I felt the models were a bit too stylized. In particular following Cavalli-Sforza and Ammerman the authors concluded that demic diffusion better fits their results of genetic variation in Europe (as opposed to continuity of Paleolithic hunter-gatherers). This is likely correct, but these are not the only two models.
A paper out in Nature Communications, using analysis of the phylogenetics of whole ancient mitchondrial genomes, outlines my primary concern when it comes to the models being tested, Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans:
Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitochondrial genomes from ancient human remains. We then compare this ‘real-time’ genetic data with cultural changes taking place between the Early Neolithic (~5450 BC) and Bronze Age (~2200 BC) in Central Europe. Our results reveal that the current diversity and distribution of haplogroup H were largely established by the Mid Neolithic (~4000 BC), but with substantial genetic contributions from subsequent pan-European cultures such as the Bell Beakers expanding out of Iberia in the Late Neolithic (~2800 BC). Dated haplogroup H genomes allow us to reconstruct the recent evolutionary history of haplogroup H and reveal a mutation rate 45% higher than current estimates for human mitochondria.
There’s a new paper in PLoS ONE, Female and Male Perspectives on the Neolithic Transition in Europe: Clues from Ancient and Modern Genetic Data, which uses a combination of contemporary and ancient (that is, from subfossils) Y and mitochondrial DNA to understand the demographic past of Europe. Recall that the Y traces the direct male lineage, and the mtDNA the direct female lineage. Because they don’t recombine and generate clean converges back to a last common ancestor (there is no reticulation because there is no sex on these loci; they’re inherited from one of the two parents), they’re amenable to a lot of nifty demographic inference generation. In this paper they test specific models, and produce probability distributions of those models. Since it is open access I invite you to read the paper. The problem with these sorts of papers is I have a hard time trusting them until I replicate the results or have a sense of how cranky the software/code is!
Well, not really. But a new paper in PLOS GENETICS has a really weird speculation nested into the discussion of what seems a relatively banal paper on the phylogeography of South Americans. It’s a Y chromosomal survey of the populations of the New World, so it’s tracing the male lineage only. Because Amerindian populations likely went through at least one (more if you accept multiple migrations) bottleneck the variation on the Y chromosome is low. Ideally you’d be looking at tens of thousands of markers on the autosome, the non-sex inherited genome. But this group had a very good population coverage. Over 1,000 men from 50 tribal populations, with a focus on South America. Additionally, non-recombining markers are more manageable in terms of reconstructing demographic histories.

Inbred lineage. The Role of Inbreeding in the Extinction of a European Royal Dynasty, Alvarez et. al.
Every now and then Richard Dawkins stirs controversy by bringing up the topic of eugenics. This is not surprising in terms of Dawkins’ intellectual pedigree. The most influential British evolutionary biologist in the generation before Dawkins, R. A. Fisher, was a eugenicist. Arguably the most the most eminent evolutionist of Dawkins’ own generation, W. D. Hamilton, clearly had eugenical sympathies, though he was keenly aware how unfashionable that had become.* University College London’s Galton Laboratory still had the word eugenics in its title until 1965. More recently Dawkins has brought up the issue of consanguinity amongst the British Pakistani community. A practice which one might argue is non-eugenical due to the high rate of recessive diseases.
As recently as 10 years ago one could plausibly talk about mtDNA Eve and Y chromosomal Adam. The “Human Story” might then be stylized into a rapid expansion from a small core East African population which flourished ~100,000 years ago, and engaged in a jailbreak sweep out of Africa and across the rest of the World Island, and beyond, to Oceania and the New World. In the process all other human lineages extirpated, marginalized, and eliminated, their culture and genes consigned to oblivion. No longer, the origin of our species may have been characterized by several admixture events with “other” lineages, both within, and outside of, Africa. Instead of a bifurcating tree, imagine a graph with reticulation. A phylogenetic tree with a light, but noticeable lattice scaffold, tying together disparate branches.
Over at Slate the advice columnist received an email from a man who found out that his wife is really his half-sister. If you don’t want to follow the link, the back story is straightforward, the couples’ parents were lesbians, and used sperm donors. Recently the man sought out the identity of his biological father at the urging of his wife, because they have three children and she thought it would be important to have that information for them. That is how he found out that they shared the same biological father. Here is the part that has me concerned about realism on the part of the advice columnist:
I don’t see how you can keep this information to yourself. She’s bound to sense something off in your behavior and you simply can’t say, “I’m struggling with father issues.” I think you have to sit her down and show you what you’ve discovered. Then you two should likely seek out a counselor who deals with reproductive technology to help you sort through your emotions. I don’t see why your healthy children should ever be informed of this. That Dad didn’t want to find out who his sperm donor was is a sufficient answer when they get old enough to ask about this.
There’s an excellent paper up at Cell right now, Modeling Recent Human Evolution in Mice by Expression of a Selected EDAR Variant. It synthesizes genomics, computational modeling, as well as the effective execution of mouse models to explore non-pathological phenotypic variation in humans. It was likely due the last element that this paper, which pushes the boundary on human evolutionary genomics, found its way to Cell (and the “impact factor” of course).
The focus here is on EDAR, a locus you may have heard of before. By fiddling with the EDAR locus researchers had earlier created “Asian mice.” More specifically, mice which exhibit a set of phenotypes which are known to distinguish East Asians from other populations, specifically around hair form and skin gland development. More generally EDAR is implicated in development of ectodermal tissues. That’s a very broad purview, so it isn’t surprising that modifying this locus results in a host of phenotypic changes. The figure above illustrates the modern distribution of the mutation which is found in East Asians in HGDP populations.
One thing to note is that the derived East Asian form of EDAR is found in Amerindian populations which certainly diverged from East Asians > 10,000 years before the present (more likely 15-20,000 years before the present). The two populations in West Eurasia where you find the derived East Asian EDAR variant are Hazaras and Uyghurs, both likely the products of recent admixture between East and West Eurasian populations. In Melanesia the EDAR frequency is correlated with Austronesian admixture. Not on the map, but also known, is that the Munda (Austro-Asiatic) tribal populations of South Asia also have low, but non-trivial, frequencies of East Asian EDAR. In this they are exceptional among South Asian groups without recent East Asian admixture. This lends credence to the idea that the Munda are descendants in part of Austro-Asiatic peoples intrusive from Southeast Asia, where most Austro-Asiatic languages are present.
Yesterday I re-ran Plink with a narrower European-biased data set, and generated some MDS plots. I only had a few Asian and African populations, mostly so that I could replicate the standard dimensions 1 and 2, producing the classic “v-shape” which you’ve seen before. But what’s more interesting are lower coordinates. They may not capture as much of the variation in the distance matrix, but illustrate important dynamics. I haven’t used the directlabels package yet, so right now the labels are still imperfect. I’m giving black text as well as colored text. Also, here’s the original data (as in MDS results, not the raw data).
A reader points me to a talk given by David Reich at the Center for Human Genetic Research 2013 Retreat. One of the issues Reich brought up is old, but perhaps worth reemphasizing: due to endogamy many South Asians carry a higher load of recessive ailments. This is not due to recent inbreeding (which is barred by custom in many South Asian groups, which enforce kin-level exogamy), but long term genetic isolation. Over time even a moderate sized population can be affected by drift. This was one of the major points in the 2009 paper Reconstructing Indian History, but not one particularly emphasized in the press follow up. A major implication is that a relatively simple public health measure for South Asians would be to marry outside of their jati. The social or genetic distance need not be great. But one generation of outbreeding should “mask” many of the deleterious alleles. If this model is correct one should be able to track decreases in morbidity within the American South Asian population, where there are many inter-caste and inter-regional marriages (yes, this is between people of putative high status, but this doesn’t matter).
Most people in South Asia speak one of two varieties of language, Indo-Aryan and Dravidian. These two are not particularly closely related. Indo-Aryan is an Indo-European language, as is evident in the plethora of obvious cognates with other Indo-European dialects. I have a minimal fluency in Bengali, the easternmost of the Indo-European languages, and quite a bit more fluency with English, one of the most westernmost, and it was evident to me rather early on (e.g., grass vs. gash, man vs. manush, nose vs. nak). In contrast to me Dravidian languages are peculiar because the accent and cadence are clearly South Asian, but they are utterly impenetrable (though there are many loan words into Indo-Aryan from Dravidian).
In the links below I alluded to a controversy over the “Neurodiversity movement”. The basic issue is that people with Asperger syndrome and high functioning autism are being accused of putting their concerns above and beyond those of the large number of mentally disabled autistic individuals (some of whom are non-verbal, and exhibit severe cognitive deficits) in the grab for “rights.” Rights here understood as the rights which black Americans, women, and gays have claimed, to be recognized as equal before the law and endowed with the same value in the eyes of society. As a deep philosophical matter I’m skeptical of Rights in a fundamental sense. As a conservative I’m skeptical of the push for a huge array of rights by a plethora identity groups. Socially recognized rights are valuable, and are cheapened and debased by dispensing them too liberally.

Citation:
Q Fu, M Meyer, XGao, U Stenzel, H A. Burbano, J Kelso, S Pääbo
DNA analysis of an early modern human from Tianyuan Cave, China
PNAS 2013 ; published ahead of print January 22, 2013, doi:10.1073/pnas.1221359110
The above is a graph which illustrates phylogenetic relationships using the TreeMix package. It is from the paper I alluded to yesterday. The paper, DNA analysis of an early modern human from Tianyuan Cave, China, is open access, so everyone should be able to read it. Its mtDNA analysis shows that the Tianyuan sample, from the region of Beijing and dating to ~40,000 years B.P., is a basal clade in haplogroup B, which is common in eastern Eurasia and the New World. This is a satisfying result insofar as the understanding in relation to this haplogroup is that it diversified ~50,000 years B.P. There is very strong support in these data for the proposition that Tianyuan forms a distinct clade with the populations you see above, as opposed to western Eurasians. This is important because this sample seems to date with relatively good precision to 40,000 years B.P., supporting the archaeological contention that modern humans were already diversifying into western and eastern lineages 40-50,000 years ago. In contrast statistical genomic inferences tend toward a lower date for divergence. We can be moderately confident at this point that some aspect of the west-east divergence predates subsequent later gene flow events, which might lead to confusing archaeology-blind methods.
Over the past decade or so much of the reconstruction of the human genetic past has occurred through inferences generated from variation of extant human beings. In more plain English the patterns of genetic variation of modern populations have been used to map out the patterns of the past. There are serious difficulties with these sorts of inferences. For example you generate a huge number of potential phylogenetic trees and zero in on the “most probable tree” (or, the distribution of trees). But at the end of the day these inferences are only as good as your assumptions.
The above figure is from a paper which leaves me somewhat befuddled, Genome-wide data substantiate Holocene gene flow from India to Australia. The authors ran several hundred thousand SNPs through treemix, and generated the above graph which leads one to the conclusion that there has been significant gene flow from Indian populations to Australia. More precisely, from Dravidian populations to the Aboriginal peoples of Northern Australia. In plain English the authors found the tree which was the best fit to the data, and then they improved it by by adding migration across branches which were the poorest fits.
Obviously the whole paper is not going to rest on the above graph. They performed some clustering analysis on the data, which you’ll recognize. PCA and Admixture:
In my earlier posts where I gave a short intro to using Plink I distributed a data set termed PHLYO. One thing I did not mention is that I’ve also been running it on Admixture. But here’s an important point: I ran the data set 10 times from K = 2 to K = 15. Why? Because the algorithm produces somewhat different results on each run (if you use a different seed, which you should), and I wanted to not be biased by one particular result. Additionally, I also turned on cross-validation error, which gives me a better sense of which K’s to trust. But after I select the K which I want to visualize which replicate run will I then use to generate the bar plots? I won’t pick any specific one. Rather, I’ll merge them together with an off-the-shelf algorithm. Additionally, I also want to sort the individuals by their modal population cluster.
This sounds rather convoluted, and it is somewhat. I have a pipeline that I use, but it’s not too user friendly. One of my projects is to clean it up, document it, and publish it online. Though if you have your own pipeline all ready to go, please post it in the comments with a link! The general steps are as follows for me:
1) Convert Admixture Q files into Structure format, transform family identifications to numeric values, and generate a file with family identification and numeral pairs
2) Merge the results across runs using Clumpp
3) Sort the individual results within populations
4) The use Distruct to produce an output file
Before I show you the resultant bar plot, here are the cross-validation results with standard deviation ticks: