I’ve been thinking about how best to visualize PCA/MDS type of results, which allow for the two dimensional representation of genetic variation. Below are a few of my efforts with a data set I have. You can see the individuals in gray, but also ellipses which cover ~95% of the distribution of a given population.
Please click the images for a larger version. They represent coordinate 1 on the y axis and 2 on the z axis derive from a multidimesional scaling representing identity by state across individuals.
Obviously the news over the past week has been filled with the events in the Middle East, and the broader Muslim world, in reaction to an anti-Muslim film. I think the most eloquent commentary is from The Onion (NSFW!!!), No One Murdered Because Of This Image. That being said, there are some serious broader issues here. A friend of mine who lives in India (he is Indian American, though raised for several years in India, so not totally unfamiliar with the culture) has expressed to me his frustration with having to defend American liberalism in a society where American liberalism is an abstraction, rather than concrete. The frustration has to do with the fundamental divergence in basic values. For example, his interlocutors have argued to him (he is a practicing Christian of libertarian political orientation) that if someone committed an act of blasphemy against his faith of course he would react in anger and violence. And yet of course the clause “and” is false, though he is greeted with skepticism when he asserts he wouldn’t react violently. As a matter of fact I can attest to the reality that he wouldn’t react angrily necessarily, because in interactions where I’ve made casually blasphemous comments he’s only rolled his eyes. Just as Americans have a vague, even misleading, understanding of the broader historical forces which engender resentment of American hegemony in the broader world, so many non-Americans lack a proper awareness of the broader historical forces, and cultural reality, of the particular American radicalism and extremism in the domain of free expression.
I was alerted to Samuel’s Arbesman’s new paper, The Life-Spans of Empires, by the fact that he pointed to his research on his weblog. Interestingly I’m not the only one who was interested, as after I pointed to it on my link round up a few people asked if they could get a copy of the paper (yes, I almost always send papers if I have access). Luckily it’s a nicely elegant piece of work, basically quantifying what we’ve already probably known qualitatively. There isn’t that great of a value-add to quantification as such, but with a mathematical understanding of a topic one can engage in an algebra of mental manipulations so as to construct models with which one can project other facts. Quantitative information is often an excellent way to generate “free information” from theoretical models. The figure above is the primary result of the paper. Basically Arbesman took a data set which was laying around which measured the lengths of various empires (N = 41), and showed that the rise and fall of these political entities tends to follow an exponential distribution: e−λt . This is an incredibly elegant summation of what we know qualitatively: some empires last a long time, but most do not.
The image above is adapted from the 2010 paper A Predominantly Neolithic Origin for European Paternal Lineages, and it shows the frequencies of Y chromosomal haplogroup R1b1b2 across Europe. As you can see as you approach the Atlantic the frequency converges upon ~100%. Interestingly the fraction of R1b1b2 is highest among populations such as the Basque and the Welsh. This was taken by some researchers in the late 1990s and early 2000s as evidence that the Welsh adopted a Celtic language, prior to which they spoke a dialect distantly related to Basque. Additionally, the assumption was that the Basques were the ur-Europeans. Descendants of the Paleolithic populations of the continent both biologically and culturally, so that the peculiar aspects of the Basque language were attributed by some to its ancient Stone Age origins.
As indicated by the title the above paper overturned such assumptions, and rather implied that the origin of R1b1b2 haplogroup was in the Near East, and associated with the expansion of Middle Eastern farmers from the eastern Mediterranean toward western Europe ~10,000 years ago. Instead of the high frequency of R1b1b2 being a confident peg for the dominance of Paleolithic rootedness of contemporary Europeans, as well as the spread of farming mostly though cultural diffusion, now it had become a lynch pin for the case that Europe had seen one, and perhaps more than one, demographic revolutions over the past 10,000 years.
This is made very evident in the results from ancient DNA, which are hard to superimpose upon a simplistic model of a two way admixture between a Paleolithic substrate and a Neolithic overlay. Rather, it may be that there were multiple pulses into a European cul-de-sac since the rise of agriculture from different starting points. We need to be careful of overly broad pronouncements at this point, because as they say this is a “developing” area. But, I want to go back to the western European fringe for a moment.
The Pith: Over the past 10,000 years a small coterie of farming populations expanded rapidly and replaced hunter-gatherer groups which were once dominant across the landscape. So, the vast majority of the ancestry of modern Europeans can be traced back to farming cultures of the eastern Mediterranean which swept over the west of Eurasia between 10 and 5 thousand years before the before.
Dienekes Pontikos points me to a new paper in PNAS which uses a coalescent model of 400+ mitochondrial DNA lineages to infer the pattern of expansions of populations over the past ~40,000 years. Remember that mtDNA is passed just through the maternal lineage. That means it is not subject to the confounding dynamic of recombination, allowing for easier modeling as a phylogenetic tree. Unlike the autosomal genome there’s no reticulation. Additionally, mtDNA tends to be highly mutable, and many regions have been presumed to be selectively neutral. So they are the perfect molecular clock. There straightforward drawback is that the history of one’s foremothers may not be a good representative of the history of one’s total lineage. Additionally the haploid nature of mtDNA means that genetic drift is far more powerful in buffeting gene frequencies and introduced stochastic fluctuations, which eventually obscure past mutational signals through myriad mutations. Finally, there are serious concerns as to the neutrality of mtDNA…though the authors claim to address that in the methods. I should also add that it also happens to be the case that there is less controversy and more surety as to the calibration of mutational rates of mtDNA than the Y chromosomal lineages of males. Their good for determining temporal patterns of demographic change, and not just tree structures.
Here’s the abstract, Rapid, global demographic expansions after the origins of agriculture:
Credit: David Shankbone
The more and more I see fine-scale genomic analyses of population structure across the world the more and more I believe that the “stylized” models which were in vogue in the early 2000s which explained how the world was re-populated after the last Ice Age (and before) were wrong in deep ways. I’m talking about the grand narratives outlined in works such as Bryan Sykes’ The Seven Daughters of Eve, the subtitle of which was “The Science That Reveals Our Genetic Ancestry.” If I had less faith in science to always ultimately right its course I’d probably become a post-modernist type who asserts that all these stories are fictions. Sykes’ model in particular seems to be very likely incorrect because of the utilization of ancient DNA to elucidate population movements past in Europe. From what we can gather it looks like coarse attempts to infer past distributions from current distributions (of specific lineages and their diversity) resulted in a great deal of false clarity. We’re not talking differences on the margins, but fundamental confusions. For example, Basques were always assumed to be a viable “reference” population for descendants of European hunter-gatherers. This was one of the linchpins of older historical genetics models. It turns out that this fixed assumption may have been a false one.
Not only were our past assumptions in simple models wrong, but the real explanations may also be rather complex. It turns out that ancient DNA of the “first farmers” and their “hunter-gatherer” neighbors in Central Europe reveals a lot of discontinuity between both these groups and modern Europeans. Why? It may be that in fact there were multiple migrations, and the palimpsest is going to be a tough cookie to excavate. But there’s no need to be disheartened, the old paradigms came crashing down thanks to data.
With that in mind I’ve been particularly interested in the European fringe, the far west and north. If any hunter-gatherer descendants survive in large numbers, it will be here. This is why I’m curious as to the genetics of the Sami as well as the archaeology which tracks the spread of agriculture in Northern Europe. A new paper in PLoS ONE focuses on Sweden, Swedish Population Substructure Revealed by Genome-Wide Single Nucleotide Polymorphism Data:
Since I know plenty of friends are getting, or just got, their V3 results, I thought I’d pass this on, Open-ended submission opportunity for 23andMe data (#2):
Who is eligible
Everyone who is of European, Asian, or North African ancestry and all four of his/her grandparents are from the same European, Asian, or North African ethnic group or the same European, Asian, or North African country.
Also, Zack has more than 30 individuals in HAP. The “cow belt” is still way underrepresented. The only Bengalis in the data set are my parents.
Everyone who is literate knows that the Sahara desert is the largest of its kind in the world. The chasm in cultural, biological, and physical geography is very noticeable. Northern Africa is part of the Palearctic zone, while the peoples north of the Sahara have long been part of the circum-Mediterranean population continuum. The primary continuous habitable corridor is that of the Nile valley. And yet scholars have long known that there has been variation in the climatic regime of the Sahara. The pharaohs of ancient Egypt seem to have hunted a wider range of fauna than is to be found in the deserts surrounding the current Nile valley, perhaps relics from a more humid period. Rock art in some regions of the desert indicate aquatic life, and species more characteristic of the savanna. And yet we should not think of the Sahara as a recent phenomenon; it does seem to be geologically ancient, despite periodic humid interregnums.
A new paper in PNAS attempts to map the hydrography of the Sahara over the Holocene, as well as back to the Pleistocene. The ultimate aim seems to be to better frame the geographic constraints on the expansion of humanity from its African homeland, and refute a simple projection from the present to the past. In this case, it is the existence of the Nile as a verdant and habitable watercourse which connects the north and south, and bisects the continuous desert. Ancient watercourses and biogeography of the Sahara explain the peopling of the desert:
I decided to take the Dodecad ADMIXTURE results at K = 10, and redo some of the bar plots, as well as some scatter plots relating the different ancestral components by population. Don’t try to pick out fine-grained details, see what jumps out in a gestalt fashion. I removed most of the non-European populations to focus on Western Europeans, with a few outgroups for reference.
Here’s a table of the correlations (I bolded the ones I thought were interesting):
|W Asian||NW African||S Europe||NE Asian||SW Asian||E Asian||N European||W African||E African||S Asian|
When I was in college I would sometimes have late night conversations with the guys in my dorm, and the discussion would random-walk in very strange directions. During one of these quasi-salons a friend whose parents were from Korea expressed some surprise and disgust at the idea of wet earwax. It turns out he had not been aware of the fact that the majority of the people in the world have wet, sticky, earwax. I’d stumbled onto that datum in the course of my reading, and had to explain to most of the discussants that East Asians generally have dry earwax, while convincing my Korean American friend that wet earwax was not something that was totally abnormal. Earwax isn’t something we explore in polite conversation, so it makes sense that most people would be ignorant of the fact that there was inter-population variation on this phenotype.
But it doesn’t end there. Over the past five years the genetics of earwax has come back into the spotlight, because of its variation and what it can tell us about the history and evolution of humans since the Out of Africa event. Not only that, it seems the variation in earwax has some other phenotypic correlates. The SNPs in and around ABCC11 are a set where East Asians in particular show signs of being different from other world populations. The variants which are nearly fixed in East Asia around this locus are nearly disjoint in frequency with those in Africa. Here are the frequencies of the alleles of rs17822931 on ABCC11 from ALFRED:
There’s a lot of stuff you stumble upon via Google Public Data Explorer which you kind of knew, but is made all the more stark through quantitative display. For example, consider Saudi Arabia and Yemen. In gross national income per capita the difference between these two nations is one order of magnitude (PPP and nominal). Depending on the measure you use (PPP or nominal) the difference between the USA and Mexico is in the range of a factor of 3.5 to 5. Until recently most Americans did not know much about Yemen. It was famous for being the homeland of Osama bin Laden’s father and the Queen of Sheba.
Let’s do some comparisons.