Category: Genomics

Selection happens; but where, when, and why?

By Razib Khan | November 8, 2013 3:49 am
Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Nina Davuluri, Miss America 2014, Credit: Andy Jones

Nina Davuluri, Miss America 2014, Credit: Andy Jones

One of the secondary issues which cropped up with Nina Davuluri winning Miss America is that it seems implausible that someone with her complexion would be able to win any Indian beauty contest. A quick skim of Google images “Miss India” will make clear the reality that I’m alluding to. The Indian beauty ideal, especially for females, is skewed to the lighter end of the complexion distribution of native South Asians. Nina Davuluri herself is not particularly dark skinned if you compared her to the average South Asian; in fact she is likely at the median. But it would be surprising to see a woman who looks like her held up as conventionally beautiful in the mainstream Indian media. When I’ve pointed this peculiar aspect out to Indians* some of them of will submit that there are dark skinned female celebrities, but when I look up the actresses in question they are invariably not very dark skinned, though perhaps by comparison to what is the norm in that industry they may be. But whatever the cultural reality is, the fraught relationship of color variation to aesthetic variation prompts us to ask, why are South Asians so diverse in their complexions in the first place? A new paper in PLoS Genetics, The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, explores this genetic question in depth.

Much of the low hanging fruit in this area was picked years ago. A few large effect genetic variants which are known to be polymorphic across many populations in Western Eurasia segregate within South Asian populations. What this means in plainer language is that a few genes which cause major changes in phenotype are floating around in alternative flavors even within families among people of Indian subcontinental origin. Ergo, you can see huge differences between full siblings in complexion (African Americans, as an admixed population, are analogous). While loss of pigmentation in eastern and western Eurasia seems to be a case of convergent evolution (different mutations in overlapping sets of genes), the H. sapiens sapiens ancestral condition of darker skin is well conserved from Melanesia to Africa.

Read More

CATEGORIZED UNDER: Anthroplogy, Genetics, Genomics
MORE ABOUT: Pigmentation

The age of the sword

By Razib Khan | October 17, 2013 2:35 am


Credit: Aviok

“Think not that I am come to send peace on earth: I came not to send peace, but a sword.” -Matthew 10:34

“There were giants in the earth in those days…when the sons of God came in unto the daughters of men, and they bare children to them, the same became mighty men which were of old, men of renown.” -Genesis 6:4

Seven years ago I wrote a short post, Why patriarchy?, which attempted to present a concise explanation for the ubiquity of what we might term patriarchy in complex societies (i.e., not “small-scale societies”). Broadly speaking my conjecture is that social and political dominance of small groups of males (proportionally) over the past several thousand years is an example of “evoked culture”. The higher population densities in agricultural societies produced a relative surfeit of accessible marginal surplus, which could be given over to supporting non-peasant classes who specialized in trade, religion, and war, all of which were connected. This new economic  and cultural context served to trigger a reorganization the typical distribution of power relations of human societies because of the responses of the basic cognitive architecture of our species inherited from Paleolithic humans. Agon, or intra-specific competition, has always been part of the game on human socialization. The scaling up and channeling of this instinct in bands of males totally transformed human societies (another dynamic is elaboration of cooperative structures, though this often manifests as agonistic competition between coalitions of humans).

To get a sense of what I mean when I say transforming, consider this section of an article in The Wall Street Journal which profiles the wife of one of the 2012 New Delhi gang rape:

Read More

Tiger, tiger! (genome of the day edition)

By Razib Khan | September 17, 2013 5:09 pm

Credit: Derek Ramsey

Getting a paper published with a newly sequenced genome is considered somewhat passé and so aughts at this point, but there are cases which are exceptions to this rule. Tigers are a charismatic and rare (<10,000 in the wild) super-predator, so when you see that they, along with a few other Panthera species, have been sequenced you take some note. The paper in question is open access, so you can read it yourself: The tiger genome and comparative analysis with lion and snow leopard genomes (not to spoil it, but there’s a Venn diagram!).

Before today only Felis silvetris catus had a reference sequence within the mammalian family Felidae. This fact should make you reconsider the idea that a new genome sequence is always boring and not noteworthy, as most lineages of mammals are represented by only one representative individual from one representative species. In ~5 years it is true that we’ll be beyond this stage of data scarcity in the sense of phylogenetic coverage, but we’re not there yet.

Read More

MORE ABOUT: Genomics, Tiger genome

Bay Area Population Genomics (BAPG) IX Conference

By Razib Khan | September 11, 2013 12:54 am

Registration is free. It will be hosted by the Nielsen Group in Berkeley on October 5th. As I am not going to be at the methods-orgy (OK, my own peculiar perspective) that is going to be ASHG 2013 I am definitely going to BAPG to get a preview of anything that might be unveiled in Cambridge a few months later (and to be frank I registered last June when it was announced).

MORE ABOUT: Genomics

Peeling back the palimpsest, and finding selection again

By Razib Khan | September 7, 2013 2:04 am

Layers and layers….

There is the fact of evolution. And then there is the long-standing debate of how it proceeds. The former is a settled question with little intellectual juice left. The latter is the focus of evolutionary genetics, and evolutionary biology more broadly. The debate is an old one, and goes as far back as the 19th century, where you had arch-selectionists such as Alfred Russel Wallace (see A Reason For Everything) square off against pretty much the whole of the scholarly world (e.g., Thomas Henry Huxely, “Darwin’s Bulldog,” was less than convinced of the power of natural selection as the driving force of evolutionary change). This old disagreement planted the seeds for much more vociferous disputations in the wake of the fusion of evolutionary biology and genetics in the early 20th century. They range from the Wright-Fisher controversies of the early years of evolutionary genetics, to the neutralist vs. selectionist debate of the 1970s (which left bad feelings in some cases). A cartoon-view of the implication of the debates in regards to the power of selection as opposed to stochastic contingency can be found in the works of Stephen Jay Gould (see The Structure of Evolutionary Theory) and Richard Dawkins (see The Ancestor’s Tale): does evolution result in an infinitely creative assortment due to chance events, or does it drive toward a finite set of idealized forms which populate the possible parameter space?*

Read More

Indo-Aryans, Dravidians, and waves of admixture (migration?)

By Razib Khan | August 8, 2013 12:46 pm

Citation: Genetic Evidence for Recent Population Mixture in India
Moorjani et al.

The Pith:In India 5,000 years ago there were the hunter-gathers. Then came the Dravidian farmers. Finally came the Indo-Aryan cattle herders.

There is a new paper out of the Reich lab, Genetic Evidence for Recent Population Mixture in India, which follows up on their seminal 2009 work, Reconstructing Indian Population History. I don’t have time right now to do justice to it, but as noted this morning in the press, it is “carefully and cautiously crafted.” Since I am not associated with the study, I do not have to be cautious and careful, so I will be frank in terms of what I think these results imply (note that confidence on many assertions below are modest). Though less crazy in a bald-faced sense than another recent result which came out of the Reich lab, this paper is arguably more explosive because of its historical and social valence in the Indian subcontinent. There has been a trend over the past few years of scholars in the humanities engaging in deconstruction and intellectual archaeology which overturns old historical orthodoxies, understandings, and leaves the historiography of a particular topic of study in a chaotic mess. From where I stand the Reich lab and its confederates are doing the same, but instead of attacking the past with cunning verbal sophistry (I’m looking at you postcolonial“theorists”), they are taking a sledge-hammer of statistical genetics and ripping apart paradigms woven together by innumerable threads. I am not sure that they even understand the depths of the havoc they’re going to unleash, but all the argumentation in the world will not stand up to science in the end, we know that.

Since the paper is not open access, let me give you the abstract first:

Read More

Pigmentation, phylogeny, history, and adaptation

By Razib Khan | August 7, 2013 2:45 am

SLC45A2 rs16891982 frequency, Norton, Heather L., et al. “Genetic evidence for the convergent evolution of light skin in Europeans and East Asians.” Molecular biology and evolution 24.3 (2007): 710-722.


The above figure is from Norton et al.’s Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians. It shows that rs16891982 on the SLC45A2 locus exhibits strong differentiation between Europe and the rest of the world. This is in contrast to SLC24A5, where the well known allele which differentiates Africans/East Asians from Europeans is found at very high frequencies across Western Eurasia (both my parents are homozygotes for the “European” variant; in fact SLC24A5′s derived variant is found at fractions on the order of ~50% in eastern and southern India). The ancestral allele on SLC24A5 is very difficult to find in Europeans, it is so close to fixation for the derived variant. In contrast SLC45A2‘s minor allele is segregating at appreciable frequencies in places like southern Spain, and the derived allele is not fixed even in Northern Europe.

I won’t review the literature on the genomics and evolution of human pigmentation at this point. Rather, I’ll just note that it seems most of the inter-population variation is controlled by a handful of genes. It’s a polygenic trait, but just. Second, a fair amount of evidence has emerged that some of the lightening derived variants have increased in frequency only very recently (e.g., on the order of ~10,000 years). Pigmentation is then a peculiar trait where the genetic underpinnings can give historical phylogenetic information because of the varied dates of differentiation and selective sweeps.

Below I’ve collated results from several studies on frequencies of SLC45A2. I invite readers to persue them. I will say two things. First, the frequency of the “European” variant in ~140 northern Ethiopians is 0%. This is peculiar for a population which may be on the order of ~50% West Eurasian. Second, the fraction of SLC45A2 derived variant in South Asians coincidentally tracks the “NE Euro” percentage in Zack Ajmal’s results.

Read More


Population structure, concrete and ineffable

By Razib Khan | August 5, 2013 2:58 am

Pritchard, Jonathan K., Matthew Stephens, and Peter Donnelly. “Inference of population structure using multilocus genotype data.” Genetics 155.2 (2000): 945-959.

Before there was Structure there was just structure. By this, I mean that population substructure has always been. The question is how we as humans shall characterize and visualize it in a manner which imparts some measure of wisdom and enlightenment. A simple fashion in which we can assess population substructure is to visualize the genetic distances across individuals or populations on a two dimensional plot. Another way which is quite popular is to represent the distance on a neighbor joining tree, as on the left. As you can see this is not always satisfying: dense trees with too many tips are often almost impossible to interpret beyond the most trivial inferences (though there is an aesthetic beauty in their feathery topology!). And where graphical representations such as neighbor-joining trees and MDS plots remove too much relevant information, cluttered FSTmatrices have the opposite problem. All the distance data is there in its glorious specific detail, but there’s very little Gestalt comprehension.

Read More

CATEGORIZED UNDER: Anthroplogy, Genomics

An illusion of neutrality and synonymous sites

By Razib Khan | July 11, 2013 7:42 am
Central Dogma

Central Dogma

One of the elementary aspects of understanding genetics on a biophysical scale is to characterize the set of processes which span the chasm between the raw sequence information of base pairs (e.g. AGCGGTCGCAAG….) and the assorted macromolecules which are woven together to create the collection of tissues, and enable the physiological processes, which result in the organism. This suite of phenomena are encapsulated most succinctly in the often maligned Central Dogma of Molecular Biology. In short, the information of the DNA sequence is transcribed and translated into proteins. Though for greater accuracy and precision one must always add the caveats of phenomena such as splicing. The baroque character of the range of processes is such an extent that molecular genetics has become a massive enterprise, to a great extent superseding classical Mendelian genetics.

One critical structural detail from an evolutionary perspective is that the amino acids which are the building blocks of proteins are generally encoded by multiple nucleotide triplets, or codons. For example the amino acid Glyceine is “four-fold degenerate,” GGA, GGG, GGC, GGU (for RNA Uracil, U, substitutes for Thymine in DNA, T), all encode it. Notice that the change is fixed upon the third position in the codon. Altering the first or second position would transform the amino acid end product, and possibly perturb the function of the final protein (or perhaps disrupt transcription altogether in some case). These are synonymous substitutions because they don’t change the functional import of the sequence, as opposed to the nonsynonymous positions (which may abolish or change function). In an evolutionary context one may presume that these synonymous substitutions are “silent.” Because natural selection operates upon heritable variation of a phenotype, and synonymous substitutions presumably do not change phenotype, it is often assumed that evolutionary change on these bases is selectively neutral. In contrast, nonsynonymous changes may be deleterious or beneficial (far more likely the former than the latter because breaking contingent complexity is easier than creating new contingent complexity). Therefore the ratio of gentic change on nonsynonymous and synonymous bases across lineages has been a common measure of possible selection on a gene.

Read More

Genetic diversity and intellectual disability

By Razib Khan | July 5, 2013 2:17 am

Illustration of runs of homozygosity for affected and unaffected siblings
Credit: Intellectual Disability Is Associated with Increased Runs of Homozygosity in Simplex Autism

It is generally understood that inbreeding has some negative biological consequences for complex animals. Recessive diseases are the most straightforward. The rarer a recessive disease is the higher and higher fraction of sufferers of that disease will be products of pairings between relatives (the reason for this is straightforward, as extremely rare alleles which express in a deleterious fashion in homozygotes will be unlikely to come together in unrelated individuals). But when it comes to traits associated with inbred individuals recessive diseases are not what comes to mind for most, the boy from the film Deliverance is usually the more gripping image (contrary to what some of the actors claimed the young boy did not have any condition).

Some are curious about the consequences of inbreeding for a trait such as intelligence. The scientific  literature here is somewhat muddled. But it seems likely that all things equal if two people of average intelligence pair up and are first cousins the I.Q. of their offspring will be expected to be 0-5 points lower than would otherwise be the case. By this, I mean that the studies you can find in the literature suggest when correcting for other variables that the inbreeding depression on the phenotypic level is greater than 0 (there is an effect) but less than 5 (it is not that large, less than 1/3 of a standard deviation of the trait value). Presumably for higher levels of inbreeding the consequences are going to be more dire.

Read More

The four million year odyssey of the horse

By Razib Khan | June 28, 2013 7:31 pm

Credit: Ealgdyth

The horse is a beautiful animal.* That is not a trivial matter, but there is the added fact that historically it is has been of great consequence. Obviously the rise of horses as vehicles of war is preeminent in our minds, but on a more prosaic level draft horses revolutionized many societies via their effect on agricultural productivity. Dogs may be man’s best friend, but horses are arguably** man’s most useful friend. Or at least they were. The critical importance of the horse is probably lost on modern people, but until the rise of the automobile they were ubiquitous in many large cities (this is clear when you view early films). Today horses are perceived to be luxurious playthings (ergo, the term “horse country”), but during the heyday of the horse-powered world they served the roles of tractors, tanks, automobiles, and telegraphs.

These are just some of the reasons that horse genomics may be of more than passing curiosity for those who are not to the manor born. Horses are part of our history, and as a large charismatic mammal there is a particular interest in the origins of this lineage. This is part of the reason that a new paper in Nature is important, Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. But this is not the only reason that this paper is of worthy note. It extends the time frame of ancient DNA sequencing back by an order of magnitude, from ~50,000 years before the present to ~500,000 years before the present. Obviously that is a big leap, though it is not surprising that these DNA were retrieved from remains in Canada’s Yukon. Often mammalian ancient DNA breakthroughs, which entail the destruction of fossils, presage prehistoric analysis of our own lineage. But I am not quite sure that that will necessarily happen here (with the caveat that there is going to be a lot of ancient human DNA publication of more recent vintage over the next few years), as the expansion of Homo into the far north truly reached an extensive scope only with our particular lineage of sapiens sapiens.*** Nevertheless, this publication no doubt solidifies the new era in phylogenetics, where inferences of trees can be calibrated and checked against long extinct nodes and branches which had heretofore only been posited.

Read More


Yes Virginia, trans-ethnic inferences from GWAS are kosher

By Razib Khan | June 24, 2013 12:00 am

Razib’s daughter’s ancestry composition

An F1, r = 0.5 to Razib

Genome-wide associations are rather simple in their methodological philosophy. You take cases (affected) and controls (unaffected) of the same genetic background (i.e. ethnically homogeneous) and look for alleles which diverge greatly between the two pooled populations. Visually the risk alleles, which exhibit higher odds ratios, are represented via Manhattan plots. But please note the clause: ethnically homogeneous study populations. In practice this means white Europeans, and to a lesser extent East Asians and African Americans (the last because of the biomedical industrial complex in the United States performs many GWAS, and the USA is a diverse nation). Looking within ethnic groups eliminates many false positives one might obtain due to population stratification. Basically, alleles which differ between groups because of their history may produce associations when the groups themselves differ in the propensity of the trait of interest (e.g. hypertension in blacks vs. whites).

Read More

The Tuatara genome

By Razib Khan | June 17, 2013 2:26 am

Credit: Benchill

The whole “genome paper” genre is probably in decline now, as sequencing is so easy that there is little value in just throwing out data with no questions attached. That being said I think the new project to sequence the Tuatara genome is pretty worthwhile. The reason is evident to the right, as this lineage represents an outgroup to many other reptiles. Not only that, but there is now dedicated blog devoted to the project. It’s nice to see science which aims to be out in the open. I wish the project the best of luck, and I’ll definitely be keeping and eye out for this particular “genome paper.”

MORE ABOUT: Tuatara, Tuatara genome

Genes not patentable

By Razib Khan | June 13, 2013 10:40 am

Supreme Court Rules Human Genes May Not Be Patented:

“A naturally occurring DNA segment is a product of nature and not patent eligible merely because it has been isolated,” Justice Clarence Thomas wrote for a unanimous court. But manipulating a gene to create something not found in nature is an invention eligible for patent protection.

The case concerned patents held by Myriad Genetics, a Utah company, on genes that correlate with increased risk of hereditary breast and ovarian cancer.

Believe it or not USA Today was out fast with a long story on this issue with quotes and everything. Here’s the full text of the decision. Needless to say this is a pretty big deal, and I’m somewhat surprised this was a unanimous decision. Perhaps the justices actually listen to scientists and their bleating sometimes?

I’ll be checking in to the Genomics Law Report regularly today….

Update: Just to note, several friends have noted that aspects of the science in the ruling seem to have some howlers. That is not surprising (see Scalia’s admission of ignorance in the concurrence). But from listening to the panel discussion on the Myriad case at ASHG 2012 this ruling is still a huge step forward. Being wrong is preferable to “not even wrong.”

CATEGORIZED UNDER: Genetics, Genomics
MORE ABOUT: Gene Patent, Myriad

Mother of all microsatellites

By Razib Khan | June 13, 2013 9:21 am

MDS of all samples

Noah Rosenberg’s lab has put out the mother of all microsatellite papers, Population Structure in a Comprehensive Genomic Data Set on Human Microsatellite Variation. It seems to me that this is the culmination of all the work with microsatellite markers which has come out of his lab over the past decade, applying all sorts of fancy analytic techniques they’ve developed (for example, Procrustes transformation). The big thing to note is that the human sample size is nearly 6,000 individuals with over 600 loci. Because microsatellites mutate and diverge very fast (mutation rates 10-4 rather than 10-8as with SNPs) 600 loci is more than sufficient to differentiate populations. Because of this rapid mutation I’m a little dubious about their attempt to explore human-chimp differences using a smaller set ascertained on humans, though that may be simply a proof of principle (if the markers evolve too fast they might not tell you much informative about very deep divergences).

Read More

Intelligence is still heritable

By Razib Khan | June 12, 2013 1:23 am

Sir Francis Galton

Modern evolutionary genetics owes its origins to a series of intellectual debates around the turn of the 20th century. Much of this is outlined in Will Provines’ The Origins of Theoretical Population Genetics, though a biography of Francis Galton will do just as well. In short what happened is that during this period there were conflicts between the heirs of Charles Darwin as to the nature of inheritance (an issue Darwin left muddled from what I can tell). On the one side you had a young coterie around William Bateson, the champion of Gregor Mendel’s ideas about discrete and particulate inheritance via the abstraction of genes. Arrayed against them were the acolytes of Charles Darwin’s cousin Francis Galton, led by the mathematician Karl Pearson, and the biologist Walter Weldon. This school of “biometricians” focused on continuous characteristics and Darwinian gradualism, and are arguably the forerunners of quantitative genetics. There is some irony in their espousal of a “Galtonian” view, because Galton was himself not without sympathy for a discrete model of inheritance!

William Bateson

In the end science and truth won out. Young scholars trained in the biometric tradition repeatedly defected to the Mendelian camp (e.g. Charles Davenport). Eventually, R. A. Fisher, one of the founders of modern statistics and evolutionary biology, merged both traditions in his seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance. The intuition for why Mendelism does not undermine classical Darwinian theory is simple (granted, some of the original Mendelians did seem to believe that it was a violation!). Many discrete genes of moderate to small effect upon a trait can produce a continuous distribution via the central limit theorem. In fact classical genetic methods often had difficulty perceiving traits with more than half dozen significant loci as anything but quantitative and continuous (consider pigmentation, which we know through genomic methods to vary across populations mostly due to half a dozen segregating genes or so).

Read More

What is a population?

By Razib Khan | June 10, 2013 11:12 pm

Anyone who reads the genomic posts with any interest on his weblog must read Daniel Lawson’s fine review of the topic which he has posted on arXiv, Populations in statistical genetic modelling and inference (via Haldane’s Sieve). Even if you don’t have a population genetic and genomic background the gist is entirely accessible. If you do have a population genetic and genomic background and haven’t used various packages such as STRUCTURE or EIGENSOFT yourself, I would recommend reading Lawson’s characterizations, as they are all spot on.

Also, if you have not, I recommend Lawson’s website for ChromoPainter and fineSTRUCTURE. The utility of these methods is outlined in the paper Inference of Population Structure using Dense Haplotype Data.

CATEGORIZED UNDER: Evolution, Genomics
MORE ABOUT: Evolution, Genomics

The genetic legacy of the conquistadors

By Razib Khan | June 9, 2013 9:07 pm

Christopher Columbus

A few year ago there was a minor controversy when some evolutionary genomicists reported that they had reconstructed the genome of the extinct Taino people of Puerto Rico by reassembling fragments preserved in contemporary populations long since admixed. The controversy had to do with the fact that some individuals today claim to be Taino, and therefore, they were not an extinct population. Though that controversy eventually blew over, the methods lived on, and continue to be used. Now some of the same people who brought you that have come out with work which reconstructs the recent demographic history of the Caribbean, both maritime and mainland, using genomics.  Even better, it’s totally open access because it’s up on arXiv, Reconstructing the Population Genetic History of the Caribbean (please see the comments at Haldane’s Sieve as well, kicked off by little old me). Though the authors pooled a variety of data sets (e.g., HapMap, POPRES, HGDP) the focus is on the populations highlighted in the map above.

Read More

Bay Area Population Genomics meeting at UCSF (Mission Bay)

By Razib Khan | June 1, 2013 5:31 am

Since the last post on genomic tools was a bit parochial, I figure it’s acceptable to put up this notice for the Bay Area Population Genomics meeting on June 8th. Registration closes on June 3rd (that is, Monday). Here’s the announcement:

Hello Everyone,

We are excited to be hosting the 8th meeting of the Bay Area Population Genomics group at UCSF Mission Bay on June 8th! Thanks to support from and the Institute for Quantitative Biosciences (QB3 @ UCSF), this conference will include breakfast and lunch. In addition, we will also have a reception during the poster session, so we highly encourage you to preview your work at BAPG before heading out to summer conferences.

Please register at, and sign up to give a talk or poster. Registration is again free, but required by June 3rd.

There is paid parking in the lot/garage at the corner of 4th and 16th streets, and we have a limited number of parking passes for people that sign up to present and/or make a strong effort to carpool (please email me for details).

We are very much looking forward to seeing you at UCSF in a few weeks!



WDIST, Plink, but faster?

By Razib Khan | June 1, 2013 1:07 am

Long time readers know that I spend a lot of time with Plink, developed by Shaun Purcell. That being said, even with the modest data sets I play with I’ve had to make recourse to to writing shell scripts to perform various Plink manipulations serially and let them run overnight. Well, perhaps no more. Here’s the description for WDIST genomic analysis toolset:

Read More


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar

Login to your Account

E-mail address:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »