This is a follow up to my post from yesterday. In case you care about the technical details (after I clean this stuff up I will put it on GitHub) I’m using R’s adehabitat package to create a 95% distribution curve after smoothing with kernel density. The goal is to give you a better intuition about where the populations are dispersed across two dimensional visualizations of genetic variation.
Thinking about how to plot text, I came up with a quick hack, which just used the initial data and found the median x and y position. That explains why some of the labels are shifted so, in populations with a huge range the label position is going to be sensitive to not being smoothed (if you know how to pull out the centroid out of the kver, tell!). I’ve given them colors and also used black. The latter actually seems to be clearer!
Note: This is not just for fun, as I plan to start rolling out results and methods from some of the data sets I have more regularly in the near future.
A reader points me to a talk given by David Reich at the Center for Human Genetic Research 2013 Retreat. One of the issues Reich brought up is old, but perhaps worth reemphasizing: due to endogamy many South Asians carry a higher load of recessive ailments. This is not due to recent inbreeding (which is barred by custom in many South Asian groups, which enforce kin-level exogamy), but long term genetic isolation. Over time even a moderate sized population can be affected by drift. This was one of the major points in the 2009 paper Reconstructing Indian History, but not one particularly emphasized in the press follow up. A major implication is that a relatively simple public health measure for South Asians would be to marry outside of their jati. The social or genetic distance need not be great. But one generation of outbreeding should “mask” many of the deleterious alleles. If this model is correct one should be able to track decreases in morbidity within the American South Asian population, where there are many inter-caste and inter-regional marriages (yes, this is between people of putative high status, but this doesn’t matter).
In the links below I alluded to a controversy over the “Neurodiversity movement”. The basic issue is that people with Asperger syndrome and high functioning autism are being accused of putting their concerns above and beyond those of the large number of mentally disabled autistic individuals (some of whom are non-verbal, and exhibit severe cognitive deficits) in the grab for “rights.” Rights here understood as the rights which black Americans, women, and gays have claimed, to be recognized as equal before the law and endowed with the same value in the eyes of society. As a deep philosophical matter I’m skeptical of Rights in a fundamental sense. As a conservative I’m skeptical of the push for a huge array of rights by a plethora identity groups. Socially recognized rights are valuable, and are cheapened and debased by dispensing them too liberally.
Over the past decade or so much of the reconstruction of the human genetic past has occurred through inferences generated from variation of extant human beings. In more plain English the patterns of genetic variation of modern populations have been used to map out the patterns of the past. There are serious difficulties with these sorts of inferences. For example you generate a huge number of potential phylogenetic trees and zero in on the “most probable tree” (or, the distribution of trees). But at the end of the day these inferences are only as good as your assumptions.
Over at Scientific American Christie Wilcox has a post up with the provocative title, People With Brown Eyes Appear More Trustworthy, But That’s Not The Whole Story, which reports on a new PLoS ONE paper, Trustworthy-Looking Face Meets Brown Eyes. Like Christie I would enjoy illustrating this post with my own trustworthy and youthful brown eyed visage, but I worry that my mien is a bit on the sly side! In any case, what of the paper? Wilcox reviews the salient points of the results. In short, the issue here is that brown eyed men seem to have more ‘trustworthy faces’ than blue eyed men. When the eyes were digitally manipulated it turned out that color had no influence on perception. Rather, it was the correlation between eye color and facial proportion which which was driving the initial association. Christie finishes:
Given the importance of trust in human interactions, from friendships to business partnerships or even romance, these findings pose some interesting evolutionary questions. Why would certain face shapes seem more dangerous? Why would blue-eyed face shapes persist, even when they are not deemed as trustworthy? Are our behaviors linked to our bodies in ways we have yet to understand? There are no easy answers. Face shape and other morphological traits are partially based in genetics, but also partially to environmental factors like hormone levels in the womb during development. In seeking to understand how we perceive trust, we can learn more about the interplay between physiology and behavior as well as our own evolutionary history.
While reading The Founders of Evolutionary Genetics I encountered a chapter where the late James F. Crow admitted that he had a new insight every time he reread R. A. Fisher’s The Genetical Theory of Natural Selection. This prompted me to put down The Founders of Evolutionary Genetics after finishing Crow’s chapter and pick up my copy of The Genetical Theory of Natural Selection. I’ve read it before, but this is as good a time as any to give it another crack.
Almost immediately Fisher aims at one of the major conundrums of 19th century theory of Darwinian evolution: how was variation maintained? The logic and conclusions strike you like a hammer. Charles Darwin and most of his contemporaries held to a blending model of inheritance, where offspring reflect a synthesis of their parental values. As it happens this aligns well with human intuition. Across their traits offspring are a synthesis of their parents. But blending presents a major problem for Darwin’s theory of adaptation via natural selection, because it erodes the variation which is the raw material upon which selection must act. It is a famously peculiar fact that the abstraction of the gene was formulated over 50 years before the concrete physical embodiment of the gene, DNA, was ascertained with any confidence. In the first chapter of The Genetical Theory R. A. Fisher suggests that the logical reality of persistent copious heritable variation all around us should have forced scholars to the inference that inheritance proceeded via particulate and discrete means, as these processes do not diminish variation indefinitely in the manner which is entailed by blending.
The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).
There’s an interesting piece in Slate, The Great Schism in the Environmental Movement, which seems to be a distillation of trends which have been bubbling within the modern environmentalist movement for a generation now (I’ve read earlier manifestos in a similar vein). I can’t assess the magnitude of the shift, but here’s the top-line:
But that is a false construct that scientists and scholars have been demolishing the past few decades. Besides, there’s a growing scientific consensus that the contemporary human footprint—our cities, suburban sprawl, dams, agriculture, greenhouse gases, etc.—has so massively transformed the planet as to usher in a new geological epoch. It’s called the Anthropocene.
Modernist greens don’t dispute the ecological tumult associated with the Anthropocene. But this is the world as it is, they say, so we might as well reconcile the needs of people with the needs of nature. To this end, Kareiva advises conservationists to craft “a new vision of a planet in which nature—forests, wetlands, diverse species, and other ancient ecosystems—exists amid a wide variety of modern, human landscapes.”
The New Republic has a piece up, How Older Parenthood Will Upend American Society, which won’t have surprising data for readers of this weblog. But it’s nice to see this sort of thing go “mainstream.” My daughter was born when her parents were in their mid-30s, so I know all the statistics. They aren’t good bed-time reading (she’s healthy and robust so far!). If I had to do it over again I definitely wouldn’t have waited this long. After becoming a father it brought home to me that waiting was one of the worst decisions of my life. Why postpone something this incredible for the more far more prosaic pleasures of an extended adolescence? Granted, I’m not sure that I would have been the best father at 25, but I don’t think there’s much I can say in reply to the argument that I should have become a father by 30.
More concretely, we would have had sperm and egg “banked” if we had been smart delaying parenthood. The article notes that storage of sperm costs $850 up front, and $300 to $500 per year after that, and that many balk at the cost. And how much do you spend on your cell phone every year? The issue here seems to be time preference.
Most people are aware that altitude imposes constraints on individual performance and function. Much of this is flexible; athletes who train at high altitudes may gain a performance edge. But over the long term there are costs, just as there are with computers which are ‘overclocked.’ This is the point where you make the transition from physiology to evolution. Residence at high altitude entails strong selective pressures on populations. Over the past few years there has been a great deal of exploration of the genetics of long resident high altitude groups, the Tibetans, Peruvians, and Ethiopians.
In many cases there are questions of a historical and ethnographic nature which are subject to controversy and debate. Scholarly arguments are laid out, and further dispute ensues. For decades progress seems fleeting, as one hypothesis is accepted, only to be subject to later revision. This sort of pattern gives succor to the most cynical and jaded of ‘Post Modern’ set, especially when the ‘discourse’ in question is in the domain of science.
But thankfully these debates can come to an end in some cases. So it is with the origins of the European Romani, better known as ‘Gypsies’ (though the Roma are the most well known of the Romani, other groups within Europe have different ethnonyms). Obviously many of the basic elements have long been there, but I think the most recent genetic work now establishes a level of closure. Taking a step back, what do we know?
1) The Romani language seems to be Indo-Aryan, with a likely affinity with the northwest group of Indo-Aryan languages
2) The Romani presence in Europe only dates to the past ~1,000 years, with an entry point in the Byzantine Empire
3) They are an admixture between an ancestral Indian element, and local populations
4) Their history of endogamy has resulted in a strong genetic drift effect
The two papers which seem to nail the coffin shut on these questions use somewhat different methodologies. One relies on Y chromosomal STRs (hypervariable repeat regions) to generate a paternal phylogeny. Focusing just on the paternal phylogeny allows for one to make very robust genealogical inferences. Additionally, the authors had a very large data set across India. Their goal was to ascertain the exact region of origin of the Romani before they left India. As noted in bullet #1 there is already some evidence from their language that this must be in northwest India. The second paper uses a SNP-chip; hundreds of thousands of autosomal markers. This has been done to death for other populations, so the method isn’t new. Rather, it is that it is now being applied to the Romani.
First, the Y chromosomal paper. The Phylogeography of Y-Chromosome Haplogroup H1a1a-M82 Reveals the Likely Indian Origin of the European Romani Populations:
Linguistic and genetic studies on Roma populations inhabited in Europe have unequivocally traced these populations to the Indian subcontinent. However, the exact parental population group and time of the out-of-India dispersal have remained disputed. In the absence of archaeological records and with only scanty historical documentation of the Roma, comparative linguistic studies were the first to identify their Indian origin. Recently, molecular studies on the basis of disease-causing mutations and haploid DNA markers (i.e. mtDNA and Y-chromosome) supported the linguistic view. The presence of Indian-specific Y-chromosome haplogroup H1a1a-M82 and mtDNA haplogroups M5a1, M18 and M35b among Roma has corroborated that their South Asian origins and later admixture with Near Eastern and European populations. However, previous studies have left unanswered questions about the exact parental population groups in South Asia. Here we present a detailed phylogeographical study of Y-chromosomal haplogroup H1a1a-M82 in a data set of more than 10,000 global samples to discern a more precise ancestral source of European Romani populations. The phylogeographical patterns and diversity estimates indicate an early origin of this haplogroup in the Indian subcontinent and its further expansion to other regions. Tellingly, the short tandem repeat (STR) based network of H1a1a-M82 lineages displayed the closest connection of Romani haplotypes with the traditional scheduled caste and scheduled tribe population groups of northwestern India.
Two trees illustrate the results succinctly:
The bottom line:
- This particular Y chromosomal lineage which is highly diagnostic of South Asian origin in the Romani shows that the Romani seem to derive from the populations of northwest India
- Additionally, within these populations the Romani Y chromosomal lineages derive from the lower caste elements, the scheduled castes and scheduled tribes
But the above results don’t get directly at genome-wide admixture. The second paper does, using hundreds of thousands of markers to explore the Romani affinity to other populations. Reconstructing the Population History of European Romani from Genome-wide Data:
The Romani, the largest European minority group with approximately 11 million people…constitute a mosaic of languages, religions, and lifestyles while sharing a distinct social heritage. Linguistic…and genetic…studies have located the Romani origins in the Indian subcontinent. However, a genome-wide perspective on Romani origins and population substructure, as well as a detailed reconstruction of their demographic history, has yet to be provided. Our analyses based on genome-wide data from 13 Romani groups collected across Europe suggest that the Romani diaspora constitutes a single initial founder population that originated in north/northwestern India ∼1.5 thousand years ago (kya). Our results further indicate that after a rapid migration with moderate gene flow from the Near or Middle East, the European spread of the Romani people was via the Balkans starting ∼0.9 kya. The strong population substructure and high levels of homozygosity we found in the European Romani are in line with genetic isolation as well as differential gene flow in time and space with non-Romani Europeans. Overall, our genome-wide study sheds new light on the origins and demographic history of European Romani.
The plot to the left illustrates the relationship of the Romani to world-wide populations using multi-dimensional scaling, where genetic variation is decomposed into dimensions, and individuals are plotted on those dimensions. In short, the Romani exhibit a classic admixture cline pattern.That is, they are the products of a two-way admixture between populations which occupy distinct positions along a cline, and Romani individuals and populations are distributed along the cline in proportion to their admixture. One notable aspect is that the Romani are actually two clusters; one which manifests a strong ‘east’-'west’ distribution, and another which seems located purely within the European cluster. The latter seems to be the Welsh Romani, who in the neighbor-joining tree (see the supplements) fall on the same branch as European populations, as opposed to the other Romani, who form their own clade.
To drill down further you need to ascertain admixture with a model-based clustering algorithm. Ergo, ADMIXTURE. I’ve reedited the figure to illustrate the salient points. In particular, it is clear that the Roma populations except the Welsh have significant South Asian ancestry. The question is how much? To answer this question you need to know the source population in South Asia. A peculiar aspect of this plot is that the Romani have very little of the green ancestral component, which happens to be modal in the Middle East (not shown). This element happens to be highly enriched in many Pakistani populations, but not necessarily northwest Indian ones. Nevertheless, the issue that leaves me suspicious of this particular finding is that many of the European populations, in particular those groups (e.g., Balkans) which may have admixed with the Romani, have this element to extent not evident in one of their presumed ‘daughter’ populations. I wonder if perhaps the peculiarities of Romani inbreeding has skewed the allele frequency distribution so much that you get strangeness like this. I am not showing higher K’s because those break out with a Romani-cluster. Just like the Kalash-cluster this is to a great extent a feature of the long term endogamy of these communities. With high levels of drift the allele frequency of these groups moves into a very peculiar space in relation to their parental populations, but one must not become confused and assume that the Romani or Kalash are themselves appropriate independent clusters in the same way that Europeans or East Asians are.
Using various forms of admixture analysis the authors seem to conclude that the Balkan Romani are 30-50% South Asian. This seems in line with intuition. But that still leaves open the question of who those South Asians were. As I noted above the most thorough Y chromosomal data point to the lower caste elements of northwest India. What do the autosomes say?
I don’t want get into the technical details of how they tested the models, but it seems that one of the likely parental populations to the Romani had a close relationship to the Meghwal, a scheduled caste from northwest India. In other words, the autosome results align very well with the Y chromosomal inferences. Additionally, the models tested imply that the Romani likely left South Asian ~1,000 years before the present, which aligns well with what is known from the historical record (though this is a case where I put much more stock in the historical record than inferences from population genetic models; look at the intervals).
Finally, there is the question of inbreeding. One aspect of the Romani genome is jumps out you is that they have many long “runs-of-homozygosity” (ROH). This is totally expected, as decades of uniparental analyses suggested a great deal of population bottleneck events as the Romani spread throughout Europe. But the ROH patterns also unearth an interesting fact: some of the Balkan Romani clearly have recent European admixture, while the non-Balkan Romani had an initial period of admixture followed by endogamy. The latter scenario seems to resemble Askhenazi Jews, while the former would suggest that the boundary between Romani and non-Romani in the Balkans is more fluid than is sometimes portrayed.
So there we have it. The Romani derive from lower castes populations from the northwest Indian subcontinent who seem to have left ~1,000 years ago. Over time they admixed with local populations, and are now 50-70% non-South Asian, with some groups being ~90% European (e.g., Welsh Romani). And, they have a long history as an endogamous group, judging by their inbreeding.
As a follow up to my post from yesterday, I decided to run TreeMix on a data set I happened to have had on hand (see Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data for more on TreeMix). Basically I wanted to display a tree with, and without, gene flow.
The technical details are straightforward. I LD pruned ~550,000 SNPs down to ~150,000. I ran TreeMix without and with migration parameters with the Bantu Kenya population being the root. Finally, when I did turn on the migration parameter I set it for 5. You can see the results below.
Most of the flows are pretty expected. The West Eurasian flow from the Turks to the Uygurs makes sense, because there is a large West Asian component to what the Uygurs have (from East Iranians?). The Chuvash are a Turkic group with minor, but significant, Turkic component. The HGDP Russian sample does have some East Eurasian ancestry. And the Moroccans also have African ancestry. But your guess is as good as mine with the Bantu flow in. These are I think Kenya, so it might be trying to interpret Nilotic admixture as generalized Eurasian.
A minor note: installing TreeMix and generating the appropriate files from pedigree format is not to difficult. But you might have confusion in how to generate the pedigree input file. You do it like so in PLINK:
./plink --noweb --bfile YourFile --freq --within YourGroupNamesFile --out YourOutPutFile
It’s the last you want to put into TreeMix’s python conversion script. The YourGroupNamesFile is basically the .fam file with an extra column, the population names for each individual.
I mentioned this in passing on my post on ASHG 2012, but it seems useful to make explicit. For the past few years there has been word of research pointing to connections between the Khoisan and the Cushitic people of Ethiopia. To a great extent in the paper which is forthcoming there is the likely answer to the question of who lived in East Africa before the Bantu, and before the most recent back-migration of West Eurasians. On one level I’m confused as to why this has to be something of a mystery, because the most recent genetic evidence suggests a admixture on the order of 2-3,000 years before the past.* If the admixture was so recent we should find many of the “first people,” no? As it is, we don’t. I think these groups, and perhaps the Sandawe, are the closest we’ll get.
Publication is imminent at this point (of this, I was assured), so I’m going to just state the likely candidate population (or at least one of them): the Sanye, who speak a Cushitic language with possible Khoisan influences. There really isn’t that much information on these people, which is why when I first heard about the preliminary results a few years back and looked around for Khoisan-like populations in Kenya I wasn’t sure I’d hit upon the right group. But at ASHG I saw some STRUCTURE plots with the correct populations, and the Sanye were one of them. I would have liked to see something like TreeMix, but the STRUCTURE results were of a quality that I could accept that these populations were not being well modeled by the variation which dominated their data set. Though Cushitic in language the Sanye had far less of the West Eurasian element present among other Cushitic speaking populations of the Horn of Africa. Neither were their African ancestral components quite like that of the Nilotic or Bantu populations. The clustering algorithm was having a “hard time” making sense of them (it seemed to wanted to model them as linear combinations of more familiar groups, but was doing a bad job of it).
Here is an interesting article on these groups: Little known tribe that census forgot. Like the Sandawe this is a population which seems to have been hunter-gatherers very recently, and to some extent still engage in this lifestyle. In this way I think they are fundamentally different from Indian tribal populations, who are often held up to be the “first people” of the subcontinent. More and more it seems that the tribes of India are less the descendants of the original inhabitants of the subcontinent, at least when compared to the typical Indian peasant, and more simply those segments of the Indian population which were marginalized and pushed into less productive territory. Over time they naturally diverged culturally because of their isolation, but the difference was not primal. In contrast, groups like the Sanye and Sandawe may have mixed to a great extent with their neighbors (and lost their language like the Pygmies), but evidence of full featured hunting & gathering lifestyles implies a sort of direct cultural continuity with the landscape of eastern Africa before the arrival of farmers and pastoralists from the west and north.
* I understand some readers refuse to accept the likelihood of these results because of other lines of information. I am just relaying the results of the geneticists. I am not interested in re-litigating prior discussions on this. We’ll probably have a resolution soon enough.
- Life Technologies/Ion Torrent apparently hires d-bag bros to represent them at conferences. The poster people were fine, but the guys manning the Ion Torrent Bus were total jackasses if they thought it would be funny/amusing/etc. Human resources acumen is not always a reflection of technological chops, but I sure don’t expect organizational competence if they (HR) thought it was smart to hire guys who thought (the d-bags) it would be amusing to alienate a selection of conference goers at ASHG. Go Affy & Illumina!
- Speaking of sequencing, there were some young companies trying to pitch technologies which will solve the problem of lack of long reads. I’m hopeful, but after the Pacific Biosciences fiasco of the late 2000s, I don’t think there’s a point in putting hopes on any given firm.
- I walked the poster hall, read the titles, and at least skimmed all 3,000+ posters’ abstracts. No surprise that genomics was all over the place. But perhaps a moderate surprise was how big exomes are getting for medically oriented people.
- Speaking of medical/clinical people, I noticed that in their presentations they used the word ‘Caucasian‘ a lot. This was not evident in the pop-gen folks. It shows the influence of bureaucratic nomenclature in modern medicine, as they have taken to using somewhat nonsensical US Census Bureau categories.
- Twitter was a pretty big deal. There were so many interesting sessions that I found myself checking my feed constantly for the #ASHG2012 hashtag. It was also an easy way to figure out who else was at the same session (e.g., in my case, very often Luke Jostins).
- If you could track the patterns of movements of smartphones at the conference it would be interesting to see a network of clustering of individuals. For example, the evolutionary and population genomics posters were bounded by more straight-up informatics (e.g., software to clean your raw sequence data), from which there was bleed over. But right next to the evolution and population genomics sections (and I say genomics rather than genetics, because the latter has been totally subsumed by the former) you had some type of pediatric disease genetics aisles. I wasn’t the only one to have a freak out when I mistakenly kept on moving (i.e., you go from abstruse discussions of the population structure of Ethiopia, to concrete ones about the likely probability of death of a newborn with an autosomal dominant disorder, with photos of said newborn!).
I have mentioned the PLoS Genetics paper, The Date of Interbreeding between Neandertals and Modern Humans, before because a version of it was put up on arXiv. The final paper has a few additions. For example, it mentions the generally panned (at least in the circles I run in) PNAS paper which suggested that ancient population structure could produce the same patterns which were earlier used to infer admixture with Neandertals (the authors also point to Yang et al. as a support for the proposition of admixture rather than structure). The primary result, dating the admixture between Neandertals and anatomically modern humans ~40-80,000 years before the present, is reiterated.
An interesting aspect is that their method is to utilize linkage disequilibrium (LD) decay. It’s interesting because tens of thousands of years is a hell of a long time to be able to detect an admixture event via LD! In particular because there’s likely a palimpsest effect where there are intervening admixtures and other assorted demographic events (e.g., bottlenecks and selective sweeps can also generate LD). So how’d they do it? Basically the authors figured out a way to ascertain which pairs of SNPs may have introgressed from Neandertals by comparing the frequency in modern humans to Neandertals at those given SNPs (in particular, by looking at variants at low frequency in Africans and derived in Neandertals). A major technical problem here is the “genetic map” which allows one to assess what the nature of recombination over time is going to be which breaks apart the associations which are the hallmark of LD is not particular precise enough to robustly allow them to make the inferences that they want.
A new short communication in Scientific Reports suggests that most demographic expansion as ascertained using mtDNA occurred before the Neolithic. MtDNA analysis of global populations support that major population expansions began before Neolithic Time:
Agriculture resulted in extensive population growths and human activities. However, whether major human expansions started after Neolithic Time still remained controversial. With the benefit of 1000 Genome Project, we were able to analyze a total of 910 samples from 11 populations in Africa, Europe and Americas. From these random samples, we identified the expansion lineages and reconstructed the historical demographic variations. In all the three continents, we found that most major lineage expansions (11 out of 15 star lineages in Africa, all autochthonous lineages in Europe and America) coalesced before the first appearance of agriculture. Furthermore, major population expansions were estimated after Last Glacial Maximum but before Neolithic Time, also corresponding to the result of major lineage expansions. Considering results in current and previous study, global mtDNA evidence showed that rising temperature after Last Glacial Maximum offered amiable environments and might be the most important factor for prehistorical human expansions.
After yesterday’s post I feel it is important again to reiterate that there is an unfortunate tyranny of the gene-as-physical-entity when it comes to our understanding of human heredity. To clarify what I mean, I think it is useful to borrow a framework from Andrew Brown. On the one hand you have a conventional modern mainstream understanding of the gene as a molecular biological entity, fundamentally derived from DNA and its role as envisaged by Francis Crick and James Watson, but with roots deeper back into the physiological genetic tradition which Sewall Wright was embedded within. In contrast to this concrete and biophysical conception of the gene there are those who conceive of the gene as an abstract unity of analysis. Richard Dawkins is the primary proponent of this viewpoint on the public intellectual scene, though men such as William D. Hamilton self-consciously understood the difference between their own genetics, and that which arose out of the insights of Crick and Watson.
When it comes to the human genetics of the Khoe-San there’s a little that’s stale and unoriginal for me in terms of presentation. The elements are always composed the same. The Bushmen are the “most ancient” humans, who can tell us something about “our past,” about “our evolution.” Tried & tested banalities just bubble forth unbidden. I have no idea why. There’s a new paper in Science on the genetics of the Khoe-San, which includes Bushmen, which brought to mind this issue for me because of the outrageous nature of the press releases.
The title of the paper itself is a testament to vanilla, Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History. This is absolutely not surprising. Are you shocked that the Khoe-San have adaptations? Or that African history is complex? The wonder of it all! This paper actually revisits much of the same ground as Pickrell et al.’s originally titled The genetic prehistory of southern Africa. Before Dr. Pickrell executes throw-down on me on Twitter let me concede that I have no creative ideas to offer in terms of an alternative title. Rather, I have an idea: perhaps in the future scientists could explore the evolutionary genetic basis for steatopygia? The trait is not limited just to Khoe-San, my distant cousins the Andaman Islanders also exhibit it. Perhaps this is the ancestral state of the human lineage? This is a situation where the titles just write themselves!
Over at Haldane’s Sieve there are more than preprints posted, there are commentaries from the authors as well. For example, for The genetic prehistory of southern Africa, the first author, Dr. Joseph K. Pickrell, has a extended comment up.
But occasionally you get contributions & perspectives from non-authors which are very interesting. And it is to one of these I want to draw your attention, Thoughts on: The date of interbreeding between Neandertals and modern humans. It’s a comment on The date of interbreeding between Neandertals and modern humans. In the post Dr. Graham Coop contends:
At this point you are likely saying: well we know that Neandertals existed as a [somewhat] separate population/species who are these population X you keep talking about and where are their remains? Population X could easily be a subset of what we call Neandertals, in which case you’ve been reading this all for no reason [if you only want to know if we interbred with Neandertals]. However, my view is that in the next decade of ancient human population history things are going to get really interesting. We have already seen this from the Denisovian papers [1,2], and the work of ancient admixture in Africa (e.g. Hammer et al. 2011, Lachance et al. 2012). We will likely discover a bunch of cryptic somewhat distinct ancient populations, that we’ve previously [rightly] grouped into a relatively small number of labels based on their morphology and timing in the fossil record. We are not going to have names for many of these groups, but with large amounts of genomic data [ancient and modern] we are going to find all sorts of population structure. The question then becomes not an issue of naming these populations, but understanding the divergence and population genetic relationship among them.
This is a bold contention, and I suspect some physical anthropologists will take issue with it. But it’s a testable prediction. We’ll know if it’s panned out in 2020. I may still be blogging between now and then, and so I will now self-importantly label this “Coop’s Conjecture.” Is there anyone who wants to wager some money on Coop’s Conjecture? Any side of the bet you think is a sure thing?
The map to the right shows the frequencies of HGDP populations on SLC45A2, which is a locus that has been implicated in skin color variation in humans. It’s for the SNP rs16891982, and I yanked the figure from IrisPlex: A sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information. Brown represents the genotype CC, green CG, and blue, GG. Europeans who have olive skin often carry the minor allele, C. While SLC24A5 is really bad at distinguishing West Eurasians from each other, SLC45A2 is better. Though both are fixed in Northern Europe, the former stays operationally fixed in frequency outside of Europe, in the Near East. As I stated earlier the proportions of the ancestral SNP in the Middle Eastern populations in the HGDP seem to be easily explained by the Sub-Saharan admixture you can find in these groups.
In contrast major SNPs in SLC45A2 are closer to disjoint between Europeans and South Asians. For example I’m a homozygote for the C allele. And yet even here we need to be careful. I want in particular to draw your attention to the frequencies in the Middle Eastern populations, the Sardinians, and the Kalash of Pakistan.
The Kalash, and their Nuristani cousins, have often been observed to have “European” physical features. These populations even trade in legends of descent from the Macedonians of Alexander. And the genetics here shows why. Though the Kalash far are more closely related to other Northwest South Asians than to Europeans, on the subset of genes which are implicated in pigmentation many of them could actually “pass” for Europeans. In fact, it is interesting to me that by these measures the Sardinians are no more European than groups like the Kalash and the Druze (in contrast to the total genome, where Sardinians may be the best reference for Western Europeans). They have a lower frequency of the SNP strongly associated with blue eyes than either of these groups, for example.
In the above paper they also produced a chart which illustrated the relationships of HGDP populations as a measure only of the six SNPs they used in their prediction method. These are markers which distinguish blue and brown eye color in Europeans efficiently.