I should be careful about being flip on this issue. As recently as the mid aughts (see Mutants) the details of this trait were not entirely understood. Today the nature of inheritance in various populations is well understood, and a substantial proportion of the evolutionary history is also known to a reasonable clarity as far as these things go. The 50,000 foot perspective is this: we lost our fur millions of years ago, and developed dark skin, and many of us lost our pigmentation after we left Africa ~50,000 years ago (in fact, it seems likely that hominins in the northern latitudes were always diverse in their pigmentation)
I didn’t go that route…I’ve been writing for 10+ years now, and long time readers can probably attest to the fact that I’ve become more and more focused on genetics as time goes by. This is due to the reality that I really like genetics. Really. The friend with whom I was having the conversation about our various interests admitted she couldn’t even imagine an alternative universe version of me who would nerd out on neuroscience. That would be a bizarro-world Razib.
There has been a lot of attention to Erika Check Hayden’s piece Ethics: Taboo genetics, at least judging by people commenting on my Facebook feed. In some ways this is not an incredibly empirically grounded argument, because the biological basis of complex traits is going to be rather difficult to untangle on a gene-by-gene basis. In other words, this isn’t a clear and present “concern.” The heritability of many behavioral traits has long been known. This is not revolutionary, though for cultural reasons may well educated people are totally surprised when confronted with data that many traits, such as intelligence and personality, have robust heritabilities* (the proportion of trait variation explained by variation in genes across the population). The literature reviewed in The Nurture Assumption makes clear that a surprising proportion of contribution any parents make to their offspring is through their genetic composition, and not their modeled example. You wouldn’t know this if you read someone like Brian Palmer of Slate, who seems to be getting paid to reaffirm the biases of the current age among the smart set (pretty much every single one of his pieces that touch upon genetics is larded with phrases which could have been written by a software program designed to sooth the concerns of the cultural Zeitgeist). But the new genomics is confirming the broad outlines of the findings from behavior genetics. There’s nothing really to see there. The bigger issue of any interest is normative; the values we hold dear as a culture.
Some of the topics that I discuss in this space may seem abstruse, but really they’re often elaborations upon rather elementary basic models of the world. When it comes to a subject like evolutionary genetics deep thinking extending from a few simple conceptual anchors yields great insight. Those anchors trace back to the foundations of Mendelian genetics. For diploid organisms the law of equal segregation states that of the two gene copies organisms have there is an equal probability of contribution of either to their offspring. This explains the simple power of Punnett squares and the inheritance patterns of recessive traits. The law of independent assortment states that genes (and therefore implicitly Mendelian traits) are passed independently from each other from parents to offspring. These abstractions are concretized on the cytological and molecular genetic scale during meiosis, as homlogous chromosomes which are composed of packed sequences of genes partition themselves into separate haploid gametes (sperm and egg*). Early in meiosis, during prophase 1, crossing-over between homologs results in genetic recombination, which preserves the law of independent assortment even when genes are on the same chromosome by breaking apart associations between specific physical genetic regions which might exhibit co-inherited distinctiveness (if the genes are very close they are linked).
For many the image of evolutionary processes brings to mind something on a macro scale. Perhaps that of the changing nature of protean life on earth writ large, depicted on a broad canvas such as in David Attenborough’s majestic documentaries over millions of years and across geological scales. But one can also reduce the phenomenon to a finer-grain on a concrete level, as in specific DNA molecules. Or, transform it into a more abstract rendering manipulable by algebra, such as trajectories of allele frequencies over generations. Both of these reductions emphasize the genetic aspect of natural history.
Obviously evolutionary processes are not just fundamentally the flux of genetic elements, but genes are crucial to the phenomena in a biological sense. It therefore stands to reason that if we look at patterns of variation within the genome we will be able to infer in some deep fashion the manner in which life on earth has evolved, and conclude something more general about the nature of biological evolution. These are not trivial affairs; it is not surprising that philosophy-of-biology is often caricatured as philosophy-of-evolution. One might dispute the characterization, but it can not be denied that some would contend that evolutionary processes in some way allow us to understand the nature of Being, rather than just how we came into being (Creationists depict evolution as a religion-like cult, which imparts the general flavor of some of the meta-science and philosophy which serves as intellectual subtext).
My friend Zack Ajmal has been running the Harappa Ancestry Project for several years now. This is a non-institutional complement to the genomic research which occurs in the academy. His motivation was in large part to fill in the gaps of population coverage within South Asia which one sees in the academic literature. Much of this is due to politics, as the government of India has traditionally been reluctant to allow sample collection (ergo, the HGDP data uses Pakistanis as their South Asian reference, while the HapMap collected DNA from Indian Americans in Houston). Of course this sort of project is not without its own blind spots. Zack must rely on public data sets to get a better picture of groups like tribal populations and Dalits, because they are so underrepresented in the Diaspora from which he draws many of the project participants.
Once Zack has the genotype one of the primary things he does is add it to his broader data set (which includes many public samples) and analyze it with the Admixture model-based clustering package. What Admixture does is take a specific number of populations (e.g. K = 12) and generate quantity assignments to individuals. So, for example individual A might be assigned 40% population 1 and 60% population 2 for K = 2. Individual B might be 45% population 1 and 55% population 2. These are not necessarily ‘real’ populations. Rather, the populations and their proportions are there to allow you to discern patterns of relationships across individuals.
Since Zack has put his results online, I thought it would be useful to review what patterns have emerged over the past two years, as his sample sizes for some regions are now moderately significant. Though he has K=16 populations, not all of them will concern us, because South Asians do not tend to exhibit many of the components. I will focus on seven: S Indian, Baloch, Caucasian, NE Euro, SE Asian, Siberian and NE Asian. These are not real populations, but the labels tell you which region these components are modal. So, for example, the “S Indian” component peaks in southern India. The “Baloch” in among the Baloch people of southeastern Iran and southwest Pakistan. The “NE Euro” among the eastern Baltic peoples. The last three are Asian components, running the latitude from south to north to center. They only concern the first population of interest, Bengalis. I will combine these last three together as “Asian.”
Below is a table, mostly individuals from Zack’s results (though there are some aggregate results from public data sets). Comments below.
Thanks to the efforts of geneticists the story of the extinction of the Spanish Habsburgs is now well known. They are in short a case study in the disastrous consequences of an inbred pedigree. The downsides of inbreeding are to some extent intuitively understood by all, especially consanguineous relations between first order relatives. Though I’m willing to bet that all things equal inbred individuals are not as attractive or intelligent as outbred individuals, the literature in this area for humans is surprisingly thin. A major problem is controlling for confounds; all things are often not equal (e.g., imagine if inbreeding is more common in marginal isolated communities, which is often true in the West. See Consanguinity, Inbreeding, and Genetic Drift in Italy, where it is obvious that the less developed areas of Italy had elevated rates of marriage between relatives despite Catholic discouragement of the practice). But the case that inbreeding results in the expression of deleterious recessive diseases is more straightforward. The rarer the disease, the higher the proportion of individuals who are affected who are the consequence of inbreeding. This is due to the logical fact that very rare alleles tend not to come back together in homozygote form due to the character of the Hardy-Weinberg equilibrium. If the recessive trait is caused by a minor allele with a frequency of p, p2 can converge upon zero very rapidly as p decreases in frequency. At p = 0.1 the recessive trait will express in 1% of the population (so p/p2 = 10). At p = 0.01 the recessive trait will express in 0.01% of the population (so p/p2= 100). And so forth.
One of the elementary aspects of understanding genetics on a biophysical scale is to characterize the set of processes which span the chasm between the raw sequence information of base pairs (e.g. AGCGGTCGCAAG….) and the assorted macromolecules which are woven together to create the collection of tissues, and enable the physiological processes, which result in the organism. This suite of phenomena are encapsulated most succinctly in the often maligned Central Dogma of Molecular Biology. In short, the information of the DNA sequence is transcribed and translated into proteins. Though for greater accuracy and precision one must always add the caveats of phenomena such as splicing. The baroque character of the range of processes is such an extent that molecular genetics has become a massive enterprise, to a great extent superseding classical Mendelian genetics.
One critical structural detail from an evolutionary perspective is that the amino acids which are the building blocks of proteins are generally encoded by multiple nucleotide triplets, or codons. For example the amino acid Glyceine is “four-fold degenerate,” GGA, GGG, GGC, GGU (for RNA Uracil, U, substitutes for Thymine in DNA, T), all encode it. Notice that the change is fixed upon the third position in the codon. Altering the first or second position would transform the amino acid end product, and possibly perturb the function of the final protein (or perhaps disrupt transcription altogether in some case). These are synonymous substitutions because they don’t change the functional import of the sequence, as opposed to the nonsynonymous positions (which may abolish or change function). In an evolutionary context one may presume that these synonymous substitutions are “silent.” Because natural selection operates upon heritable variation of a phenotype, and synonymous substitutions presumably do not change phenotype, it is often assumed that evolutionary change on these bases is selectively neutral. In contrast, nonsynonymous changes may be deleterious or beneficial (far more likely the former than the latter because breaking contingent complexity is easier than creating new contingent complexity). Therefore the ratio of gentic change on nonsynonymous and synonymous bases across lineages has been a common measure of possible selection on a gene.
It is generally understood that inbreeding has some negative biological consequences for complex animals. Recessive diseases are the most straightforward. The rarer a recessive disease is the higher and higher fraction of sufferers of that disease will be products of pairings between relatives (the reason for this is straightforward, as extremely rare alleles which express in a deleterious fashion in homozygotes will be unlikely to come together in unrelated individuals). But when it comes to traits associated with inbred individuals recessive diseases are not what comes to mind for most, the boy from the film Deliverance is usually the more gripping image (contrary to what some of the actors claimed the young boy did not have any condition).
Some are curious about the consequences of inbreeding for a trait such as intelligence. The scientific literature here is somewhat muddled. But it seems likely that all things equal if two people of average intelligence pair up and are first cousins the I.Q. of their offspring will be expected to be 0-5 points lower than would otherwise be the case. By this, I mean that the studies you can find in the literature suggest when correcting for other variables that the inbreeding depression on the phenotypic level is greater than 0 (there is an effect) but less than 5 (it is not that large, less than 1/3 of a standard deviation of the trait value). Presumably for higher levels of inbreeding the consequences are going to be more dire.
Modern evolutionary genetics owes its origins to a series of intellectual debates around the turn of the 20th century. Much of this is outlined in Will Provines’ The Origins of Theoretical Population Genetics, though a biography of Francis Galton will do just as well. In short what happened is that during this period there were conflicts between the heirs of Charles Darwin as to the nature of inheritance (an issue Darwin left muddled from what I can tell). On the one side you had a young coterie around William Bateson, the champion of Gregor Mendel’s ideas about discrete and particulate inheritance via the abstraction of genes. Arrayed against them were the acolytes of Charles Darwin’s cousin Francis Galton, led by the mathematician Karl Pearson, and the biologist Walter Weldon. This school of “biometricians” focused on continuous characteristics and Darwinian gradualism, and are arguably the forerunners of quantitative genetics. There is some irony in their espousal of a “Galtonian” view, because Galton was himself not without sympathy for a discrete model of inheritance!
In the end science and truth won out. Young scholars trained in the biometric tradition repeatedly defected to the Mendelian camp (e.g. Charles Davenport). Eventually, R. A. Fisher, one of the founders of modern statistics and evolutionary biology, merged both traditions in his seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance. The intuition for why Mendelism does not undermine classical Darwinian theory is simple (granted, some of the original Mendelians did seem to believe that it was a violation!). Many discrete genes of moderate to small effect upon a trait can produce a continuous distribution via the central limit theorem. In fact classical genetic methods often had difficulty perceiving traits with more than half dozen significant loci as anything but quantitative and continuous (consider pigmentation, which we know through genomic methods to vary across populations mostly due to half a dozen segregating genes or so).
The above figure is from a paper in PLoS GENETICS, Analysis of the Genetic Basis of Disease in the Context of Worldwide Human Relationships and Migration. The authors synthesize two diverse domains of human genomics. First, there are biomedically focused genome-wide association studies and their like which attempt to identify risk alleles for particular diseases. In some cases these risk alleles are very penetrant, in that a particular state predicts with high likelihood a disease phenotype. But in most cases the yield is elevated or decreased risks for highly complex traits such as type 2 diabetes. Second, there is the domain of evolutionary genomics which attempts to reconstruct a phylogenetic and population genetic history so as to frame contemporary patterns of variation in their proper context. How this might be important or of interest is obvious in the case of malaria resistance genes. Alleles conferring resistance have arisen in multiple populations due to parallel environmental pressures. Phylogenetic relationships between these populations should inform your predictions as to the likely similarities of the mutations between the populations. Meanwhile, population genetic theory can give you clues as to the likelihood of multiple adaptations.
The genetics and history of Tibet are fascinating to many. To be honest the primary reason here is elevation. The Tibetan plateau has served as a fortress for populations who have adapted biologically and culturally to the extreme conditions. Naturally this means that there has been a fair amount of population genetics on Tibetans, as hypoxia is a side effect of high altitude living which dramatically impacts fitness. I have discussed papers on this topic before. And I will probably talk more about it in the future, considering rumblings at ASHG 2012.
But to understand the character of the effect of natural selection on a population it is often very important to keep in mind the phylogenetic context. By this, I mean that evolutionary processes occur over history, and those historical events shape the course of subsequent of phenomena. Concretely, to understand how the Tibetans came to be adapted to high altitudes one must understand who they are related to, and what their long term history is. There is a paper in Molecular Biology and Evolution which attempts to do just that, Genetic evidence of Paleolithic colonization and Neolithic expansion of modern humans on the Tibetan Plateau:
A few years ago Malcolm Gladwell made the “10,000 hour rule” famous in his book Outliers. In practice (e.g., discussions with people day to day or on this blog) the rule gets translated into the inference “practice is what matters.” When talking about genetics this often implicitly also entails that “genes don’t matter.” I’m not saying that this is necessarily what Gladwell’s own exposition taken literally would suggest, but ideas have a way of evolving once they’re outside of the pages of a book.
My own response is that this sort of rhetorical device is silly. In domains of virtuosity the intersection of innate talent and conscientiousness are often critical. That’s because for outstanding excellence gains on the extreme margin of performance are critical. There are many born with talent, and those who hone and refine that talent will have an edge over those who do not exhibit the same work ethic. But the converse is that there are those born without talent for whom 10,000 hours of invested effort is lunacy.
Kevin Mitchell of Wiring the Brain has a very long post up inveighing against the specter of eugenics. I don’t have a great deal of time to engage Kevin right now.* But in addition to Kevin’s post I highly recommend this episode of WBUR’s On Point. It has Steve Hsu on, and he articulates many of the positions that I myself hold. Steve’s work with BGI has triggered the latest discussion of eugenics thanks to Vice‘s sensational representation of the research project and its aims. But it’s a useful discussion to engage in, even if the starting point is a little unfortunate.
I will state though Kevin’s argument seems to be predicated on the implicit assumption that his interlocutors hold to some sort of Platonic ideal of the most-perfect-human. There’s no such thing obviously, and even those who sympathized with eugenic policies such as W. D. Hamilton rejected this notion at the end of the day. Rather, human traits are evaluated in terms of how they serve the flourishing of individuals and society according to understood values. Intelligence is generally assumed to benefit individuals, and, I believe that it benefits society as well through innovation. Innovation drives the productivity growth which is the foundation of our post-Malthusian age.
The genetics of schizophrenia is a fertile if fraught topic. But I won’t be discussing that in this post. Rather, I want to put the spotlight on a peculiar contradictory and illogical tendency in the contemporary American Zeitgeist: the gene is all-powerful, and the gene is irrelevant. The same people who raise eyebrows with skepticism about the heritability of endophenotypes, nevertheless seem to believe that when it comes to the domain of disease genes are perfectly and frighteningly predictive! I know scientifically educated people who have expressed to me their confidence in the power of nurture, as opposed to nature, in the determination of the character of their potential offspring. And yet these same individuals may express serious worry that genetic testing might render the whole field of health insurance null and void. The problem with this perspective is that it is a robust behavior genetic finding that many traits have substantial heritable components. That is, the correlation between parent and offspring in a trait (e.g., personality) is not simply a function of environmental input. Similarly, for many diseases which have a biological basis the predictive value of a given set of genes, or even family history, is often imperfect. Biological development has a strong random component, which we can’t predict or control. This is true even for environmental inputs as well; there are people who have never smoked who die of lung cancer.
I was reviewing some literature for a blog-post-to-come and I noticed a figure in a paper I’ve long been aware of which indicates to me that Afrikaners surely have a non-trivial proportion of non-European ancestry. The paper is Population differences of two coding SNPs in pigmentation-related genes SLC24A5 and SLC45A2. It’s a forensics result. Basically SLC24A5 is useful for differentiating West Eurasians from Africans and East Asians, Amerindians, and Oceanians. But it is not too useful in distinguishing between West Eurasians. The “European” derived variant SNP within this locus is actually present at ~50% frequency as far south and east as India. In contrast, the “European” derived variant of SLC45A2 decreases much more rapidly outside of Europe, so it is a more plausible European-diagnostic-marker.
The figure below illustrates the results from the paper:
Bears are big deal today. I’ve talked about this before, so I won’t belabor the point in this post. Rather, I want to persuade you that there’s a really interesting paper out in PLOS Genetics right now, Genomic Evidence for Island Population Conversion Resolves Conflicting Theories of Polar Bear Evolution. I know that seems like a mouthful, and despite the fact that I nodded to the reality that this is highly relevant in part because of policy concerns, the paper itself makes salient the reality that oftentimes we are confronted with the juxtposition between useful abstractions and the empirical shape of the world. In this case the abstraction is that of species, the one taxonomic category which many people find to be a natural kind, so to speak. These sorts of confusions of our expectations are often highly informative. They illustrate the limits of our abstractions, and drive us toward more complex and/or elegant formalisms which are capable of modeling nature as it is, rather than as it we wish it would be.
This is a follow up to my post from yesterday. In case you care about the technical details (after I clean this stuff up I will put it on GitHub) I’m using R’s adehabitat package to create a 95% distribution curve after smoothing with kernel density. The goal is to give you a better intuition about where the populations are dispersed across two dimensional visualizations of genetic variation.
Thinking about how to plot text, I came up with a quick hack, which just used the initial data and found the median x and y position. That explains why some of the labels are shifted so, in populations with a huge range the label position is going to be sensitive to not being smoothed (if you know how to pull out the centroid out of the kver, tell!). I’ve given them colors and also used black. The latter actually seems to be clearer!
Note: This is not just for fun, as I plan to start rolling out results and methods from some of the data sets I have more regularly in the near future.