I noticed during Peter Ralph and Graham Coop’s Ask Me Anything about their new paper, The Geography of Recent Genetic Ancestry across Europe, someone brought up the effects of plague. Recall that ~1/3 of Europe’s population died during the Black Death. And population size reductions on the order of ~50% due to epidemics are not unknown in human history. Surely this would have a major genetic effect? Well, in fact it would have a genetic effect due to possible adaptations to disease (see CCR5). But there would be little overall impact on genetic diversity, at least in the short term. That is because for bottlenecks to produce major change in the genetic character of a population they have to be rather extreme in magnitude.
This issue came to mind for me in 2009 when I watched Stark Trek. If you haven’t watched the J. J. Abrams reboot, and are a spoilerphobe, read no more! Now, with that out of the way you may recall that during this film the Vulcans suffered a genocidal attack. Out of billions of Vulcans only ~10,000 survived. Here’s some commentary on the possible consequences, New Star Trek Movie: A Vulcan Holocaust?:
My own inclination has been to not get bogged down in the latest race and IQ controversy because I don’t have that much time, and the core readership here is probably not going to get any new information from me, since this is not an area of hot novel research. But that doesn’t mean the rest of the world isn’t talking, and I think perhaps it might be useful for people if I stepped a bit into this discussion between Andrew Sullivan and Ta-Nehisi Coates specifically. My primary concern is that here we have two literary intellectuals arguing about a complex topic which spans the humanities and the sciences. Ta-Nehisi, as one who studies history, feels confident that he can dismiss the utility of racial population structure categorization because as he says, “no coherent, fixed definition of race actually exists.” I am actually more of a history guy than a math guy, not because I love history more than math, but because I am not very good at math. And I’ve even read books such as The Rise and Fall of the Caucasian Race and The History of White People (as well as biographies of older racial theorists, such as Madison Grant). So I am not entirely ignorant of Ta-Nehisi’s bailiwick, but, I think it would be prudent for the hoarders of old texts to become a touch more familiar with the crisp formalities of the natural sciences.
Because of Angelina Jolie’s revelation, the Myriad Genetics case is in the news again. If you don’t know what I’m talking about, look it up. Because of the patent Myriad can charge thousands of dollars for a test which would otherwise be much cheaper (and putting it out of reach of many without health insurance). My question here is simple: if you are a geneticist do you think Myriad’s position has any validity? The reason I ask is that I know many geneticists, and I know many geneticists read me, and I follow many geneticists on Twitter, but I’ve never encountered one who would be willing to defend Myriad’s position as plausible and passing the smell test. If you are one of those geneticists please leave a comment, because I’m honestly curious.
I went to the talks about the Myriad case at ASHG, and I have to say it was all law, and no science. The science was confused and laughable. The panelists themselves rolled their eyes and expressed resignation as to the garbled ratiocinations of the judges who reviewed the case. There is a classic “two cultures” problem.
A few years back I was rather fixated on issues of maternal fetal health. In particular I was worried about gestational diabetes in relation to my wife because I come from an ethnic group with an elevated risk for these sorts of problems, and the effect when you are in mixed-race marriages seems to be additive (i.e., unlike some risk factors associated with pregnancies the mother’s ethnicity is not the only relevant variable). This is embedded in the broader suite of metabolic diseases which exhibit ethnic variation. Early work on genome-wide selection in humans yielded the result that there was a strong enrichment for signals of adaption within regions of the genome associated with metabolism, so this should not be that surprising. Humans are a geographically dispersed species that inhabits a wide range of environments, so natural selection would shape the distribution of phenotypes within populations if evolution is a significant historical process (it is).
A paper in last month’s Trends in Genetics highlights more precisely how natural selection would operate in a life history context in specific cases. Many ways to die, one way to arrive: how selection acts through pregnancy:
When considering selective forces shaping human evolution, the importance of pregnancy to fitness should not be underestimated. Although specific mortality factors may only impact upon a fraction of the population, birth is a funnel through which all individuals must pass. Human pregnancy places exceptional energetic, physical, and immunological demands on the mother to accommodate the needs of the fetus, making the woman more vulnerable during this time-period. Here, we examine how metabolic imbalances, infectious diseases, oxygen deficiency, and nutrient levels in pregnancy can exert selective pressures on women and their unborn offspring. Numerous candidate genes under selection are being revealed by next-generation sequencing, providing the opportunity to study further the relationship between selection and pregnancy. This relationship is important to consider to gain insight into recent human adaptations to unique diets and environments worldwide.
Yesterday I pointed to a paper which was interesting enough, but didn’t pass the smell test in relation to other evidence we have (at least in my opinion!). A primary concern was the fact that uniparental (male and female lineages) show a peculiar distribution of variation in comparison to autosomal genetic variation (i.e., the vast majority of the genome) in the case of Europe (genome-wide analysis suggest more of Europe’s variation is partitioned north-south, but Y and mtDNA results often imply an east-west split). But a secondary concern I had was that I felt the models were a bit too stylized. In particular following Cavalli-Sforza and Ammerman the authors concluded that demic diffusion better fits their results of genetic variation in Europe (as opposed to continuity of Paleolithic hunter-gatherers). This is likely correct, but these are not the only two models.
A paper out in Nature Communications, using analysis of the phylogenetics of whole ancient mitchondrial genomes, outlines my primary concern when it comes to the models being tested, Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans:
Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitochondrial genomes from ancient human remains. We then compare this ‘real-time’ genetic data with cultural changes taking place between the Early Neolithic (~5450 BC) and Bronze Age (~2200 BC) in Central Europe. Our results reveal that the current diversity and distribution of haplogroup H were largely established by the Mid Neolithic (~4000 BC), but with substantial genetic contributions from subsequent pan-European cultures such as the Bell Beakers expanding out of Iberia in the Late Neolithic (~2800 BC). Dated haplogroup H genomes allow us to reconstruct the recent evolutionary history of haplogroup H and reveal a mutation rate 45% higher than current estimates for human mitochondria.
Update: First, people coming to this weblog for the first time should know that I moderate comments. So if you leave an obnoxious one it’s basically like an email to me (no one will see it). Second, the correlation between height and intelligence is not that high. This association is probably not going to be intuitively visible to anyone, but rather only shows up in large data sets. So please stop offering yourself as a counter-example of the trend (also, the key is to look within families, because the signal here is going to be swamped by other factors when you compare across populations). Third, a friend has sent me another paper which does confirm that even within sibling cohorts there does seem to be a correlation between height and I.Q. The problem is that it is a very small one, so you need large data sets with a lot of power to see it.
One moderately interesting social science finding is that there is a positive correlation between height and measured intelligence (e.g., on an I.Q. test). Setting aside the possibility that I.Q. tests designs are culturally biased against shorter people, one wonders why this is so. Height is a highly heritable trait where most of the variation within the population is due to variation as numerous genes. In other words, there isn’t a “tall” or “short” gene, but thousands and thousands of variants which shape the variation of the trait across the population. When I say it is highly heritable, I mean to imply that most of the variation in height in developed societies is due to genes (80-90%). As it happens intelligence is somewhat similar in its genetic architecture, heritable due to small effects across many genes. In general estimates for the heritability of intelligence tend to be somewhat lower, on the order of ~50% rather than 80-90%.
It is due to the highly polygenic nature that both of these traits have been posited as candidates for a “good genes” model of sexual selection. Presumably individuals with a higher mutational load will have lower intelligence and be shorter, all things equal, because these traits have extensive genome-wide coverage and are big targets. Geoffrey Miller’s The Mating Mind: How Sexual Choice Shaped the Evolution of Human Nature, was predicated on this logic. If the mutational load argument holds then the reduced I.Q. of shorter individuals may simply be due to the same cause: “bad genes.”
I have put 1 million markers (from a combination of Illumina SNP-chips) of mine online. I’m also going to put my sequence online when I get it done. Why? What do I gain from this? Hopefully I don’t gain anything from it. By this, I mean that the only major information that is actionable in a life altering sense is likely to be disease related. Though I’ve been contacted about possible loss of function mutations through imputation, so far my genotype has not illuminated any more risk susceptibilities. Rather, I am trying to make it clear by my openness that your genetic information has more power when pooled together with that of others, and small one step in creating that vast pool of information is to demystifying sharing it, and practicing what you (that is, me) preach. My soul is not in my genes, and certainly my genotype reflects me with far less obvious fidelity than a photograph would. By this, I mean that there are many traits that one could predict about me, but many one would be at a loss to predict.
Every now and then Richard Dawkins stirs controversy by bringing up the topic of eugenics. This is not surprising in terms of Dawkins’ intellectual pedigree. The most influential British evolutionary biologist in the generation before Dawkins, R. A. Fisher, was a eugenicist. Arguably the most the most eminent evolutionist of Dawkins’ own generation, W. D. Hamilton, clearly had eugenical sympathies, though he was keenly aware how unfashionable that had become.* University College London’s Galton Laboratory still had the word eugenics in its title until 1965. More recently Dawkins has brought up the issue of consanguinity amongst the British Pakistani community. A practice which one might argue is non-eugenical due to the high rate of recessive diseases.
One of the pitfalls about talking about genetics, especially human genetics, is that the public wants a specific gene for a specific trait. Ergo, the “God gene” or the “language gene.” In some cases science has been able to pull a rabbit out of the hat, and offer up a gene for a trait. But in most of those instances these are going to be single gene recessive diseases. Not exactly what the doctor ordered. In other cases the association seems trivial. For example, wet or dry earwax?* What people are truly interested in are the genetic basis of complex traits, such as intelligence, personality, and height. Unfortunately complex traits often have a complex genetic basis. A trait such as height, which is highly heritable (i.e., most of the variation in the population is due to variation in genes), turns out to be subject to the control of innumerable genes, each of which has a small impact on the value of the final trait. Then there is the possibility that the heritability is tied up to interaction effects across genes.
A paper on the genetics of the Roma (“Gypsies”), Reconstructing Roma History from Genome-Wide Data, has finally come out in a journal. It’s been on arXiv for a while, so nothing too surprising. But, reading through the paper I have to note one rather clear aspect for me: there is a crispness and detail to the way they outlined and integrated their methods into the results section. Unfortunately there is an obvious tendency in the pressure to publish for people to use methods and tools (which usually consists of software written by others which you use in a blackbox fashion) in a slapdash manner with an aim toward arriving at a publishable unit. Because of the specialization within science it seems one can entirely make it through peer review by using methods which signal that one does not really know what one is talking about. To give a concrete example, a year ago I was told about a phylogenetic package isin moderate usage which seems to basically be a “random number generator.” The fact that this package is used is a testament to the fact that many researchers who are not phylogeneticists simply reach for the nearest method at hand, and trust the results if they make some intuitive sense (presumably in this case they would simply report the results which were intelligible).
The ultimate future, I’m hoping, is for open data, open code, and open methods. When a shady or sketchy paper makes it through peer review there is now visible public anger which bubbles out of the scientific community, but the process of reproducing the results can still be tedious (see Arsenic life). This is less true in cases where the means are more computational. The only things stopping the process of science from operating more efficiently are human barriers (e.g., cultural norms, institutional barriers toward data release).
Bears are big deal today. I’ve talked about this before, so I won’t belabor the point in this post. Rather, I want to persuade you that there’s a really interesting paper out in PLOS Genetics right now, Genomic Evidence for Island Population Conversion Resolves Conflicting Theories of Polar Bear Evolution. I know that seems like a mouthful, and despite the fact that I nodded to the reality that this is highly relevant in part because of policy concerns, the paper itself makes salient the reality that oftentimes we are confronted with the juxtposition between useful abstractions and the empirical shape of the world. In this case the abstraction is that of species, the one taxonomic category which many people find to be a natural kind, so to speak. These sorts of confusions of our expectations are often highly informative. They illustrate the limits of our abstractions, and drive us toward more complex and/or elegant formalisms which are capable of modeling nature as it is, rather than as it we wish it would be.
Over at IEEE Spectrum Eliza Strickland has a long piece, The Gene Machine and Me, which reports on her experience with exome sequencing (this refers to the ~1 percent of the genome, or 30 billion base pairs, which is coding). Being IEEE Spectrum there is much focus on genomic technology, but I suspect this will have as much interest 10-20 years down the line as varieties of combustion engine circa 1900 are to us. In other words, it will be standard technology which we use, not a novel technique which is of interest to non-specialists.
Last month I noted that a paper on speculative inferences as to the phylogenetic origins of Australian Aborigines was hampered in its force of conclusions by the fact that the authors didn’t release the data to the public (more accurately, peers). There are likely political reasons for this in regards to Australian Aborigine data sets, so I don’t begrudge them this (Well, at least too much. I’d probably accept the result more myself if I could test drive the data set, but I doubt they could control the fact that the data had to be private). This is why when a new paper on a novel phylogenetic inference comes out I immediately control-f to see if they released their data. In regards to genome-wide association studies on medical population panels I can somewhat understand the need for closed data (even though anonymization obviates much of this), but I don’t see this rationale as relevant at all for phylogenetic data (if concerned one can remove particular functional SNPs).
I have very little with which I can disagree with in this Mark Thomas piece, To claim someone has ‘Viking ancestors’ is no better than astrology. His conclusion:
Exaggerated claims from the consumer ancestry industry can also undermine the results of serious research about human genetic history, which is cautiously and slowly building up a clearer picture of the human past for all of us.
Many of the commercial companies plant stories in the media that sound exciting and seem scientific. But very often they are trivial or wrong, are not published in peer-reviewed scientific journals, and just serve as disguised PR for the company.
The only caveat I would offer is that the sort of confusions and misrepresentations that occur with Y and mtDNA phylogeography are dampened when you are looking at a million markers throughout the whole genome. This does not mean there are still no confusions and misrepresentations (e.g., the reference populations matter a great deal when you present someone as a linear combination of X populations, and that summary is still not reality as such, but an informative model). One alarming aspect of the trade in Y and mtDNA is that I’ve met several people who somehow believe that only these lineages are ancestrally informative. That is probably a function of the ease with which you can say someone is “descended from Niall of the Nine Hostages.”
Addendum: I actually asked Jim Wilson on Twitter if I could get a look at the raw results (not even raw data) for the claims made. One major problem when scientists have a go-to-media-first strategy is that things get out of hand very quickly.
Last summer Neuroskeptic posted on The Coming Age of Fetal Genomics. It seems likely to me that this “age” won’t be ushered in with a bang, but we’ll be there before we know it. After all, most people aren’t thinking about having children at any given moment, and don’t track biomedical advances in genetic disease screening until they’re crossing that bridge. Over at Xconomy Luke Timmerman has a post up, Natera Joins Quest in Four-Way Battle for Prenatal Genetic Tests. Here are some important details:
Yesterday I re-ran Plink with a narrower European-biased data set, and generated some MDS plots. I only had a few Asian and African populations, mostly so that I could replicate the standard dimensions 1 and 2, producing the classic “v-shape” which you’ve seen before. But what’s more interesting are lower coordinates. They may not capture as much of the variation in the distance matrix, but illustrate important dynamics. I haven’t used the directlabels package yet, so right now the labels are still imperfect. I’m giving black text as well as colored text. Also, here’s the original data (as in MDS results, not the raw data).
This is a follow up to my post from yesterday. In case you care about the technical details (after I clean this stuff up I will put it on GitHub) I’m using R’s adehabitat package to create a 95% distribution curve after smoothing with kernel density. The goal is to give you a better intuition about where the populations are dispersed across two dimensional visualizations of genetic variation.
Thinking about how to plot text, I came up with a quick hack, which just used the initial data and found the median x and y position. That explains why some of the labels are shifted so, in populations with a huge range the label position is going to be sensitive to not being smoothed (if you know how to pull out the centroid out of the kver, tell!). I’ve given them colors and also used black. The latter actually seems to be clearer!
Note: This is not just for fun, as I plan to start rolling out results and methods from some of the data sets I have more regularly in the near future.
I’ve been thinking about how best to visualize PCA/MDS type of results, which allow for the two dimensional representation of genetic variation. Below are a few of my efforts with a data set I have. You can see the individuals in gray, but also ellipses which cover ~95% of the distribution of a given population.
Please click the images for a larger version. They represent coordinate 1 on the y axis and 2 on the z axis derive from a multidimesional scaling representing identity by state across individuals.
My daughter has four grandparents. Genetically she is a little over 25 percent her paternal grandfather and maternal grandmother, and a little under 25 percent her maternal grandfather and paternal grandmother.* Why? Because she is 50 percent genetically identical by descent with her mother and likewise with her father. This is all rather straightforward. But what about culturally?
With biological heredity we can speak of genes, the substrate by which inheritance occurs. With culture memes have been far less fruitful as anything more than an illustration, as opposed to the basis of a formal system of logic and analysis. Nevertheless, we can describe with relative clarity many aspects of culture as a trait or phenotype. And this is important. Recall that evolutionary process was characterized by Charles Darwin despite lacking a satisfying theory of inheritance.
A reader points me to a talk given by David Reich at the Center for Human Genetic Research 2013 Retreat. One of the issues Reich brought up is old, but perhaps worth reemphasizing: due to endogamy many South Asians carry a higher load of recessive ailments. This is not due to recent inbreeding (which is barred by custom in many South Asian groups, which enforce kin-level exogamy), but long term genetic isolation. Over time even a moderate sized population can be affected by drift. This was one of the major points in the 2009 paper Reconstructing Indian History, but not one particularly emphasized in the press follow up. A major implication is that a relatively simple public health measure for South Asians would be to marry outside of their jati. The social or genetic distance need not be great. But one generation of outbreeding should “mask” many of the deleterious alleles. If this model is correct one should be able to track decreases in morbidity within the American South Asian population, where there are many inter-caste and inter-regional marriages (yes, this is between people of putative high status, but this doesn’t matter).