A phylogenetic tree is an essential tool in understanding the broad scope of natural history, placing particular lineages in specific evolutionary contexts of relatedness. These sorts of trees range from Ernst Haeckel’s classical attempt, depicting relationships which biologists derived from intuition within the framework of a grand evolutionary scheme, all the way down to modern methods implemented in software packages such as Mr. Bayes, which many frankly utilize in a “turnkey” manner. These trees are abstractions, in that they reduce down a wide range of phenomena into schematic representations which impart aspects of particular interest in a stylized form. This is important, because the actual nature of the phenomena being represented may be more complex than is being represented. A simple illustration of what I’m getting is clear when you look at the long history of phylogenetics and phylogeography utilizing mitochondrial DNA lineages (mtDNA). Because mtDNA is copious in comparison to nuclear DNA, it is easy to obtain. And, as there is no recombination and it is inherited in a haploid fashion (mother to daughter) it makes the inference of gene trees much easier. The key problem is that the genealogy of this particular sequence is used to infer aspects about population history, when they may not accurately represent the history of other regions of the genome very well. Different genes may have different histories.
“There were giants in the earth in those days…when the sons of God came in unto the daughters of men, and they bare children to them, the same became mighty men which were of old, men of renown.” -Genesis 6:4
Seven years ago I wrote a short post, Why patriarchy?, which attempted to present a concise explanation for the ubiquity of what we might term patriarchy in complex societies (i.e., not “small-scale societies”). Broadly speaking my conjecture is that social and political dominance of small groups of males (proportionally) over the past several thousand years is an example of “evoked culture”. The higher population densities in agricultural societies produced a relative surfeit of accessible marginal surplus, which could be given over to supporting non-peasant classes who specialized in trade, religion, and war, all of which were connected. This new economic and cultural context served to trigger a reorganization the typical distribution of power relations of human societies because of the responses of the basic cognitive architecture of our species inherited from Paleolithic humans. Agon, or intra-specific competition, has always been part of the game on human socialization. The scaling up and channeling of this instinct in bands of males totally transformed human societies (another dynamic is elaboration of cooperative structures, though this often manifests as agonistic competition between coalitions of humans).
There has been a lot of attention to Erika Check Hayden’s piece Ethics: Taboo genetics, at least judging by people commenting on my Facebook feed. In some ways this is not an incredibly empirically grounded argument, because the biological basis of complex traits is going to be rather difficult to untangle on a gene-by-gene basis. In other words, this isn’t a clear and present “concern.” The heritability of many behavioral traits has long been known. This is not revolutionary, though for cultural reasons may well educated people are totally surprised when confronted with data that many traits, such as intelligence and personality, have robust heritabilities* (the proportion of trait variation explained by variation in genes across the population). The literature reviewed in The Nurture Assumption makes clear that a surprising proportion of contribution any parents make to their offspring is through their genetic composition, and not their modeled example. You wouldn’t know this if you read someone like Brian Palmer of Slate, who seems to be getting paid to reaffirm the biases of the current age among the smart set (pretty much every single one of his pieces that touch upon genetics is larded with phrases which could have been written by a software program designed to sooth the concerns of the cultural Zeitgeist). But the new genomics is confirming the broad outlines of the findings from behavior genetics. There’s nothing really to see there. The bigger issue of any interest is normative; the values we hold dear as a culture.
The above figure is from Norton et al.’s Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians. It shows that rs16891982 on the SLC45A2 locus exhibits strong differentiation between Europe and the rest of the world. This is in contrast to SLC24A5, where the well known allele which differentiates Africans/East Asians from Europeans is found at very high frequencies across Western Eurasia (both my parents are homozygotes for the “European” variant; in fact SLC24A5′s derived variant is found at fractions on the order of ~50% in eastern and southern India). The ancestral allele on SLC24A5 is very difficult to find in Europeans, it is so close to fixation for the derived variant. In contrast SLC45A2‘s minor allele is segregating at appreciable frequencies in places like southern Spain, and the derived allele is not fixed even in Northern Europe.
I won’t review the literature on the genomics and evolution of human pigmentation at this point. Rather, I’ll just note that it seems most of the inter-population variation is controlled by a handful of genes. It’s a polygenic trait, but just. Second, a fair amount of evidence has emerged that some of the lightening derived variants have increased in frequency only very recently (e.g., on the order of ~10,000 years). Pigmentation is then a peculiar trait where the genetic underpinnings can give historical phylogenetic information because of the varied dates of differentiation and selective sweeps.
Below I’ve collated results from several studies on frequencies of SLC45A2. I invite readers to persue them. I will say two things. First, the frequency of the “European” variant in ~140 northern Ethiopians is 0%. This is peculiar for a population which may be on the order of ~50% West Eurasian. Second, the fraction of SLC45A2 derived variant in South Asians coincidentally tracks the “NE Euro” percentage in Zack Ajmal’s results.
The inimitable Joe Pickrell has dropped his Khoisan-are-part-Italian preprint onto arXiv, Ancient west Eurasian ancestry in southern and eastern Africa. I’m being glib in my characterization of the paper’s core conclusion, but there’s a reason for such a flip response: the inferences that he seems to draw from the genetic data strike me as verging on crazy. But that’s OK, what genetics is telling us is that history was a whole lot crazier than we had imagined.
Let’s back up for a moment here. For several decades now geneticists have assumed that the Bushmen of the Kalahari, the Khoisan-qua-Khoisan, Africa’s last hunter-gatherers who retain their ancestral language along with the Hadza, are the ur-humans. The basal lineage that first diverged from the rest of mankind at the cusp of the Out of Africa event. This is evident in Y chromosomal and mtDNA phylogenies, where the Bushmen and their kin harbor variants which coalesce deeply in time with those of others. And, a few years ago another group revealed the likelihood that Bushmen also are products of an admixture event in the last ~50,000 years with a distinct hominin lineage which diverged ~1 million years before the present from the main line which led up to anatomically modern humanity. Now Pickrell et al. present us with a twist which is perhaps even more astringent than a lime: in their genomes the Bushmen and their Khoisan kin, the Khoe herders, reflect an ancient admixture event with East Africans, who themselves were the outcomes of hybridizations between West Eurasians and indigenous African populations. More relevantly for my concise summation of the conclusion, the West Eurasian component does not necessarily reflect modern Middle Eastern populations, so much as Southern Europeans!
It is well known that Alexander the Great invaded the Indus river valley. Coincidentally in the mountains shadowing this region are isolated groups of tribal populations whose physical appearance is at at variance with South Asians. In particular, they are much lighter skinned, and often blonde or blue eyed. Naturally this led to 19th and early 20th century speculation that they were lost white races, perhaps descended from some of the Macedonian soldiers of Alexander. This was partly the basis of the Rudyard Kipling novel The Man Who Would Be King. Naturally over time some of these people themselves have forwarded this idea. In the case of a group such as the Kalash of Pakistan this conjecture is supported by the exotic nature of their religion, which seems to be Indo-European, and similar to Vedic Hinduism, with minimal influence from Islam.
Well, not quite. You have to read the paper, Genomic Analysis of Natural Selection and Phenotypic Variation in High-Altitude Mongolians, to see why I’m skeptical. Frankly it doesn’t seem like they found too much of note in their results, so I’m kind of confused why this paper got into PLOS GENETICS (and to give due credit, this group has published very interesting work in the past which I have smiled upon). So why am I even posting about this paper? Because I was pretty sure they’d release their data, and they have (just page down to the bottom). All researchers who take the trouble to do this should be praised, highlighted, and respected. This improves science. After the AHA fiasco I’m going to redouble the effort to put the spotlight on those who release their data.
Addendum: It must be noted that a “Mongolian” identity is very much an outcome of Genghis Khan’s rise and paramountcy. The Mongols were just one of numerous tribes across what is today Mongolia. With the rise of the Mongol Empire many populations, including Turkic populations who were not part of a dialect continuum in close proximity to the Mongols, were assimilated into that ethnic identity with a few generations. The “Zulu” identity is similar, as it is a function of the rise to prominence of Shaka’s particular clan.
Last year a paper came out in AJHG which reported that Ethiopian populations seem to be a compound of West Eurasians and Sub-Saharan Africans. This is result itself is not too surprising for a host of reasons. First, Ethiopians and other populations of the Horn of Africa are physically equidistant between West Eurasians and Sub-Saharan Africans. 20th century physical anthropologists sometimes placed them in the “Caucasoid” racial classification for this reason. Second, the languages of the Horn of Africa have Afro-Asiatic affinities. The Cushitic languages (e.g. Somali) have deep connections with more familiar tongues such as Arabic, but Semitic Ethiopian languages (e.g. Amharic) are much closer in historical distance. Third, there has been a fair amount of previous genetic analysis of these populations, and their synthetic character was obvious from those (e.g. mtDNA and Y results suggest a diverse array of haplogroups). What the AJHG paper reported was that the Eurasian ancestors of the Ethiopians admixed with the presumably Sub-Saharan indigenes ~3,000 years ago in a single pulse event, and, their closest modern relations in West Asia today are Levantines. To put a mild gloss on it the dating is controversial (using patterns of decayed genetic correlations of markers across the length of the genome). This is not just clinal variation.
Right before I was to sleep a reader sent me an email which pointed to a Nick Wade piece in The New York Times, Gene Sleuths Find How Some Naturally Resist Cholera. It’s about new research in ScienceTranslational Medicine, Natural Selection in a Bangladeshi Population from the Cholera-Endemic Ganges River Delta. The authors use the “composite of multiple signals” (CMS) test to ascertain regions of the genome subject to natural selection (look for long haplotypes, high frequency derived alleles, and alleles with high cross population frequency differences). The results aren’t too surprising, I was born in Bangladesh, and I can attest to the fact that it’s a germaphobe’s nightmare. Rather, it is a secondary and very minor aspect of the paper which frankly draws my ire. First let’s quote Wade’s treatment:
As a necessary preliminary to testing for natural selection, the researchers looked at the racial composition of the Bengali population and found that they are an Indian population with a 9 percent admixture of East Asian genes, probably Chinese. The admixture occurred almost exactly 52 generations ago, according to statistical calculation, or around A.D. 500, assuming 29 years per generation. The Gupta empire in India was in decline at this time, but it is unclear whether the intermarriage with East Asians took place through trade or conquest. “We can now go back to the historians and see what happened then,” Dr. Karlsson said.
But sometimes science gets garbled in transmission. What do they say in the paper? Again, the relevant section:
A few year ago there was a minor controversy when some evolutionary genomicists reported that they had reconstructed the genome of the extinct Taino people of Puerto Rico by reassembling fragments preserved in contemporary populations long since admixed. The controversy had to do with the fact that some individuals today claim to be Taino, and therefore, they were not an extinct population. Though that controversy eventually blew over, the methods lived on, and continue to be used. Now some of the same people who brought you that have come out with work which reconstructs the recent demographic history of the Caribbean, both maritime and mainland, using genomics. Even better, it’s totally open access because it’s up on arXiv, Reconstructing the Population Genetic History of the Caribbean (please see the comments at Haldane’s Sieve as well, kicked off by little old me). Though the authors pooled a variety of data sets (e.g., HapMap, POPRES, HGDP) the focus is on the populations highlighted in the map above.
The above figure is from a paper in PLoS GENETICS, Analysis of the Genetic Basis of Disease in the Context of Worldwide Human Relationships and Migration. The authors synthesize two diverse domains of human genomics. First, there are biomedically focused genome-wide association studies and their like which attempt to identify risk alleles for particular diseases. In some cases these risk alleles are very penetrant, in that a particular state predicts with high likelihood a disease phenotype. But in most cases the yield is elevated or decreased risks for highly complex traits such as type 2 diabetes. Second, there is the domain of evolutionary genomics which attempts to reconstruct a phylogenetic and population genetic history so as to frame contemporary patterns of variation in their proper context. How this might be important or of interest is obvious in the case of malaria resistance genes. Alleles conferring resistance have arisen in multiple populations due to parallel environmental pressures. Phylogenetic relationships between these populations should inform your predictions as to the likely similarities of the mutations between the populations. Meanwhile, population genetic theory can give you clues as to the likelihood of multiple adaptations.
What a great age we live in. Until recently critical parameters in population genetics such as mutation rates had to be inferred and assumed, even though they served as bases for much more complex inferences. Now with humans (and humans are only the beginning!) much of what was inferred is being assessed in a more direct fashion. Caterina Campbell and Even Eichler have a review in Trends in Genetics which surveys the field as it stands now, Properties and rates of germline mutations in humans. Notice that there’s a rough convergence using pedigree analysis of a mutation rate in the low 10-8 range. Additionally, it does seem that a disproportionate number of novel mutations come through the paternal lineage via sperm. This should increase our moderate worry about older fathers (something reiterated in the piece, with caveats). Finally, the authors suggest these results are a floor for the mutational rate, in part due to the long term conflict with the inferred ‘evolutionary rates,’ which are higher. This matters because to infer the last common ancestors between lineages the value of the mutation rate is obviously critical.
Standard apologies that I have had not the marginal time to blog much, but I thought it was important that I least note that Dr. Peter Ralph and Dr. Graham Coop’s paper on identity-by-descent segments and European populations and history is out in its final form in PLoS Biology, The Geography of Recent Genetic Ancestry across Europe. I’ve been familiar with the outlines of these results for about a year now, and to be frank I am still digesting them. The media hype will come and go, with true but to some extent trivial headlines that “all Europeans are related,” but the consequences of these sorts of genetic inquiries into the relatedness of populations are going to be long lasting. At least they should be.
But before I go on about that, if you find the paper itself a bit daunting (though the main body of the text strikes me as eminently readable for a piece of statistical genetics), see Carl Zimmer’s condensation. With this sort of result there is liable to be confusion, so note that Graham Coop has been posting comments on Carl’s blog (and elsewhere, and you can always send him a note on Twitter). Additionally he has a very readable FAQ out. Dr. Coop told me on Twitter that there would even be updates tomorrow as well! In particular one aspect of the paper which I noticed is that most relatively short, but detectable segments (~10 cM), between any two individuals in many nationalities is not going to be evidence of recent genealogical affinities, but deeper historical process.
There’s a new paper in PLoS ONE, Female and Male Perspectives on the Neolithic Transition in Europe: Clues from Ancient and Modern Genetic Data, which uses a combination of contemporary and ancient (that is, from subfossils) Y and mitochondrial DNA to understand the demographic past of Europe. Recall that the Y traces the direct male lineage, and the mtDNA the direct female lineage. Because they don’t recombine and generate clean converges back to a last common ancestor (there is no reticulation because there is no sex on these loci; they’re inherited from one of the two parents), they’re amenable to a lot of nifty demographic inference generation. In this paper they test specific models, and produce probability distributions of those models. Since it is open access I invite you to read the paper. The problem with these sorts of papers is I have a hard time trusting them until I replicate the results or have a sense of how cranky the software/code is!
Well, not really. But a new paper in PLOS GENETICS has a really weird speculation nested into the discussion of what seems a relatively banal paper on the phylogeography of South Americans. It’s a Y chromosomal survey of the populations of the New World, so it’s tracing the male lineage only. Because Amerindian populations likely went through at least one (more if you accept multiple migrations) bottleneck the variation on the Y chromosome is low. Ideally you’d be looking at tens of thousands of markers on the autosome, the non-sex inherited genome. But this group had a very good population coverage. Over 1,000 men from 50 tribal populations, with a focus on South America. Additionally, non-recombining markers are more manageable in terms of reconstructing demographic histories.
Every now and then Richard Dawkins stirs controversy by bringing up the topic of eugenics. This is not surprising in terms of Dawkins’ intellectual pedigree. The most influential British evolutionary biologist in the generation before Dawkins, R. A. Fisher, was a eugenicist. Arguably the most the most eminent evolutionist of Dawkins’ own generation, W. D. Hamilton, clearly had eugenical sympathies, though he was keenly aware how unfashionable that had become.* University College London’s Galton Laboratory still had the word eugenics in its title until 1965. More recently Dawkins has brought up the issue of consanguinity amongst the British Pakistani community. A practice which one might argue is non-eugenical due to the high rate of recessive diseases.
Over at Slate the advice columnist received an email from a man who found out that his wife is really his half-sister. If you don’t want to follow the link, the back story is straightforward, the couples’ parents were lesbians, and used sperm donors. Recently the man sought out the identity of his biological father at the urging of his wife, because they have three children and she thought it would be important to have that information for them. That is how he found out that they shared the same biological father. Here is the part that has me concerned about realism on the part of the advice columnist:
I don’t see how you can keep this information to yourself. She’s bound to sense something off in your behavior and you simply can’t say, “I’m struggling with father issues.” I think you have to sit her down and show you what you’ve discovered. Then you two should likely seek out a counselor who deals with reproductive technology to help you sort through your emotions. I don’t see why your healthy children should ever be informed of this. That Dad didn’t want to find out who his sperm donor was is a sufficient answer when they get old enough to ask about this.
Yesterday I re-ran Plink with a narrower European-biased data set, and generated some MDS plots. I only had a few Asian and African populations, mostly so that I could replicate the standard dimensions 1 and 2, producing the classic “v-shape” which you’ve seen before. But what’s more interesting are lower coordinates. They may not capture as much of the variation in the distance matrix, but illustrate important dynamics. I haven’t used the directlabels package yet, so right now the labels are still imperfect. I’m giving black text as well as colored text. Also, here’s the original data (as in MDS results, not the raw data).