A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:
Native Americans and Northern Europeans More Closely Related Than Previously Thought
Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of AmericaNewswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS
The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.
What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.
So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.
A month ago I posted Don’t trust an archaeologist about genetics, don’t trust a geneticist about archaeology, in response to James Fallows at At 5% Neanderthal, You Are an Outlier. Fallows has now put up a follow up, The Neanderthal Defense Committee Swings Into Action, where he links to my response post. This prompted the original archaeologist in question to reach out to me via email. I am posting the letter, with their permission, below.
The Chronicle of Higher Education has a piece out by Nathaniel Comfort, The Eugenic Impulse. I would just like to offer that to a great extent we already live in the second age of eugenics. The high frequency of abortions of fetuses which come back positive for Down syndrome is well known. But it seems possible that we’ll be able to reduce the frequency of many Mendelian diseases as well. Basically those ailments which are due to a major mutation of large effect and high penetrance (i.e., you have the mutation, you have the disease).
A major goal which we’re very far from though is the ability to select for quantitative traits. There are technical hurdles, both tactical and strategic, here. The major issue is that there are simply too many variants for one to be able to select a ‘perfect’ genetic profile. Those who’ve talked to me know my response in this domain: select for low mutational load. High coverage fetal whole genome sequencing would do that. The marketing pitch for this writes itself: imagine you, but bright of mind, and beautiful of face!
While I was at Spencer Wells’ poster at ASHG I was primarily curious about bar plots. He’s got really good spatial coverage, so I’m moderately excited about the paper (though I didn’t see much explicit testing of phylogenetic hypotheses, which I think this sort of paper has to do now; we’re beyond PCA and bar plots only papers). That being said, Spencer was more interested in me promoting the Scientific Grants Program. Here’s some more information:
The Genographic Project’s Scientific Grants Program awards grants on a rolling basis for projects that focus on studying the history of the human species utilizing innovative anthropological genetic tools. The variety of projects supported by the scientific grants will aim to construct our ancient migratory and demographic history while developing a better understanding of the phylogeographic structure of world populations. Sample research topics could include subjects like the origin and spread of the Indo-European languages, genetic insights into Papua New Guinea’s high linguistic diversity, the number and routes of migrations out of Africa, the origin of the Inca, or the genetic impact of the spread of maize agriculture in the Americas.
Recipients will typically be population geneticists, students, linguists, and other researchers or scientists interested in pursuing questions relevant to the Genographic Project’s broad goal of exploring our migratory history. Recipients of Genographic scientific grant funds will become members of the Genographic Consortium, and will be expected to act as agents of the greater Genographic mission, participating in and reporting on multiple aspects of Genographic fieldwork, in addition to their own proposed and mission‐aligned pilot projects. Openness and transparency within the Consortium are the key values of the project’s research team, and grantees will be expected to abide by this code of conduct.
Last week Luke Jostins (soon to be Dr. Luke Jostins) published an interesting paper in Nature. To be fair, this paper has an extensive author list, but from what I am to understand this is the fruit of the first author’s Ph.D. project. In any case, you may know Luke because I have used his loess curve on hominin encephalization for years. His bread & butter is statistical genetics, and it shows in this Nature paper. God knows how he managed to cram so much density into ~5.5 pages of plain text. Luke is also a contributor to Genomes Unzipped, and has put up a post over there on one implication of the paper, Dozens of new IBD genes, but can they predict disease? The short answer is that for individual prediction complex traits are going to be a hard haul over the long term.*
They are subject to what Jim Manzi would term “high causal density.” A simple way to state this is that outcome X is dependent on a host of variables, and if you capture only a small number of variables, you aren’t going to be explaining much in a general fashion. This is obvious from the text of Luke’s paper. Let’ look at the abstract, Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease:
With the election coming up, California Proposition 37, Mandatory Labeling of Genetically Engineered Food, is on my mind. From Ballotpedia:
If Proposition 37 is approved by voters, it will:
* Require labeling on raw or processed food offered for sale to consumers if the food is made from plants or animals with genetic material changed in specified ways.
* Prohibit labeling or advertising such food as “natural.”
* Exempt from this requirement foods that are “certified organic; unintentionally produced with genetically engineered material; made from animals fed or injected with genetically engineered material but not genetically engineered themselves; processed with or containing only small amounts of genetically engineered ingredients; administered for treatment of medical conditions; sold for immediate consumption such as in a restaurant; or alcoholic beverages.”
James Wheaton, who filed the ballot language for the initiative, refers to it as “The California Right to Know Genetically Engineered Food Act.”
Michael Eisen has two posts up on this which get the meat of the issue for me. I disagree with Prop 37, though on first blush I think the idea of transparency is radically empowering. Before I get to my reasoning, I want to set aside some ancillary considerations. Some are voting for the measure because they oppose agribusiness in general, or have a particular bone to pick with the way that some firms enforce their intellectual property on seed lines. These are fine critiques, but I’m not going to address them, because I think they’re separate from the science.
Here’s a caption from a Time article, What Your Doctor Isn’t Telling You About Your DNA:
Nice to know that two physicians in Philadelphia not only have medical degrees, but specialize in mind-reading the parents of this nation! Above the caption is a photo of the two concerned and worried looking professionals in question. Let me quote the first two paragraphs of the article:
The test results were crystal clear, and still the doctors didn’t know what to do. A sick baby whose genome was analyzed at the Children’s Hospital of Philadelphia turned out to possess a genetic mutation that indicated dementia would likely take root around age 40. But that lab result was completely unrelated to the reason the baby’s DNA was being tested, leaving the doctors to debate: Should they share the bad news?
When it comes to scanning DNA or sequencing the genome — reading the entire genetic code — what to do with unanticipated results is one of the thorniest issues confronting the medical community. Many conflicted discussions followed the dementia discovery at the Children’s Hospital of Philadelphia (CHOP) before a decision was reached: the parents would not be told that this fatal memory-sapping disease likely lurks in their child’s future. Given the hopelessness of the situation, with no treatment and no cure, the doctors said forwarding such information along felt pointless. “We came around to the realization that we could not divulge that information,” says Nancy Spinner, who directs the hospital laboratory that tested the infant. “One of the basic principles of medicine is to do no harm.”
The fourth in a five-part series exploring the promise and pitfalls of sequencing children’s genomes
Around the same time, Spinner’s lab also tested another child — an unusually short 2-year-old referred for kidney disease — and discovered the toddler had a gene linked to a rare form of colon cancer. In some cases, polyps arising from this kind of cancer have been known to develop as early as age 7. This time, the decision to inform the parents was easier: “We feel good about that one,” says Spinner. “Proper screening can make a huge difference.”
There is a high likelihood that you know of which ABO blood group you belong to. I am A. My daughter is A. My father is B. My mother is A. I have siblings who are A, O, B, and AB. The inheritance is roughly Mendelian, with O being “recessive” to A and B (which are co-dominant with each other, ergo, AB). It is also generally common knowledge that O is a “universal donor,” while A and B can only give to individuals within their respective blood group and AB.
Because ABO was easy to assay it was one of the earliest Mendelian markers utilized in human genetics. In the first half of the 20th century while some anthropologists were measuring skulls, others were mapping out the frequency of A, B, and O. Today with much more robust genetic methods ABO has lost its old luster as a genetic marker, especially since there is a strong suspicion that the variants are strongly shaped by natural selection. This makes them only marginally useful for systematics, which rely upon loci which are honest mirrors of demographic history.
The Pith: Natural selection comes in different flavors in its genetic constituents. Some of those constituents are more elusive than others. That makes “reading the label” a non-trivial activity.
As you may know when you look at patterns of variation in the genome of a given organism you can make various inferences from the nature of these patterns. But the power of those inferences is conditional on the details of the real demographic and evolutionary histories, as well as the assumptions made about the models one which is testing. When delving into the domain of population genomics some of the concepts and models may seem abstruse, but the reality is that such details are the stuff of which evolution is built. A new paper in PLoS Genetics may seem excessively esoteric and theoretical, but it speaks to very important processes which shape the evolutionary trajectory of a given population. The paper is titled Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation. Here’s the author summary:
Considerable effort has been devoted to detecting genes that are under natural selection, and hundreds of such genes have been identified in previous studies. Here, we present a method for extending these studies by inferring parameters, such as selection coefficients and the time when a selected variant arose. Of particular interest is the question whether the selective pressure was already present when the selected variant was first introduced into a population. In this case, the variant would be selected right after it originated in the population, a process we call selection from a de novo mutation. We contrast this with selection from standing variation, where the selected variant predates the selective pressure. We present a method to distinguish these two scenarios, test its accuracy, and apply it to seven human genes. We find three genes, ADH1B, EDAR, and LCT, that were presumably selected from a de novo mutation and two other genes, ASPM and PSCA, which we infer to be under selection from standing variation.
The dynamic which they refer to seems to be a reframing of the conundrum of detecting hard sweeps vs. soft sweeps. In the former you case have a new mutation, so its frequency is ~1/(2N). It is quickly subject to natural selection (though stochastic processes dominate at low frequencies, so probability of extinction is high), and adaptation drives the allele to fixation (or nearly to fixation). In the latter scenario you have a great deal of extant genetic variation, present in numerous different allelic variants. A novel selection pressure reshapes the frequency landscape, but you can not ascribe the genetic shift to only one allele. It is no surprise that the former is easier to model and detect than the latter. Much of the evolutionary genomics of the 2000s focused on hard sweeps from de novo mutations because they were low hanging fruit. The methods had reasonable power to detect them (as well as many false positives!). But of late many are suspecting that hard sweeps are not the full story, and that much of evolutionary genetic process may be characterized by a combination of hard sweeps, soft sweeps (from standing variation), various forms of negative selection, not to mention the plethora of possibilities which abound in the domain of balancing selection.
Many of the details of the paper may seem overly technical and opaque (and to be fair, I will say here that the figures are somewhat difficult to decrypt, though the subject is not one that lends itself to general clarity), but the major finding is straightforward, and illustrated in figure 4 (I’ve added labels):
Update II: This comment sums up the pertinent issues.
Update: Please see comments below. This may be an infectious disease story, and not a genetic one.
When a reader sent me an email about the story, I assumed it was a rather sophisticated hoax. The short of it is that an 11 year old boy, Colman Chadam, has been pulled out of his school in Palo Alto because he carries alleles for cystic fibrosis, though he is asymptomatic (i.e., he has never manifested any symptoms of c.f.) administrators are worried that he might pose a risk to some students who do show progression of c.f. (bacterial infections can spread from child to child). As you probably know about 1 out of 30 people of European descent is a carrier for one of the many thousands of mutant variants for cystic fibrosis. The details of Colman Chadam’s results are not totally clear. Is he just a carrier? Or does he have two copies of cystic fibrosis, but somehow they differ enough that they can functionally complement each other?
We don’t know. But we do know that the lawyer from the school, Lenore Silverman, has stated that “The district is not willing to risk a potentially life-threatening illness among kids.” The risk here is non-existent. Not to be creepy, but does the school require that people with various venereal diseases also avoid the premises? With the small, but non-trivial, frequency of sexual abuse at school they too post a illness risk to the kids. In fact, more than Colman Chadam.
You can read everything at The San Francisco Chronicle. Perhaps you want to contact some of the staff at Jordan Middle School in Palo Alto. These people should be ashamed of themselves. I know we live in a litigious society, and being in education isn’t the easiest job, but we deserve better than this!
After yesterday’s post I feel it is important again to reiterate that there is an unfortunate tyranny of the gene-as-physical-entity when it comes to our understanding of human heredity. To clarify what I mean, I think it is useful to borrow a framework from Andrew Brown. On the one hand you have a conventional modern mainstream understanding of the gene as a molecular biological entity, fundamentally derived from DNA and its role as envisaged by Francis Crick and James Watson, but with roots deeper back into the physiological genetic tradition which Sewall Wright was embedded within. In contrast to this concrete and biophysical conception of the gene there are those who conceive of the gene as an abstract unity of analysis. Richard Dawkins is the primary proponent of this viewpoint on the public intellectual scene, though men such as William D. Hamilton self-consciously understood the difference between their own genetics, and that which arose out of the insights of Crick and Watson.
I was a little sad when I heard my friend Steve Hsu had accepted a position at Michigan State some months back. My reasons were two-fold. First, I swing by Eugene now and then, and I wouldn’t have the opportunity to drop in on his office. Second, it seemed that Steve was becoming an Administrator. To some extent I feel like that’s going over to the dark side. But ultimately it’s his decision, and Steve has a lot of things going on at any given moment, and I’m hopeful he’ll continue to be involved in the production of scholarship in some form (he’s more of a scholar than most as it is).
Now apparently his move has resulted in submerged tensions coming to the fore. You can read the article in The Lansing Journal, New director’s experience a plus for MSU, but his controversial views concern some. Let’s qualify who these “some” are:
There’s an open access paper/preprint on Y chromosomal lineages that just came out, A calibrated human Y-chromosomal phylogeny based on resequencing. Since it is open access you can read the whole thing (it’s short). Let me quote from the discussion:
Nevertheless, the rapid expansion of R1b (and possibly I1) in Europe contrasts with the less starlike expansion of E1b1a in Africa, which has been associated with the spread of farming, ironworking and Bantu languages in Africa over the last 5,000 years (Berniell-Lee et al. 2009). Both R1b and E1b1a samples are from a mixture of indigenous donors (from Europe and Africa, respectively) and admixed American donors, so sampling strategy does not provide an obvious explanation for the difference. Instead, the different phylogenetic structure, with far more resolution of the individual E1a1a branches, may reflect expansion starting from a larger and more diverse population, and thus retaining more ancestral diversity.
Rice is a pretty big deal. There’s really no need to justify research on this crop. It feeds literally billions, so the funding will always flow. Would that we knew rice as well as we know C. elgans. After yesterday’s travesty of a paper on barley I thought that readers might find a new paper in Nature, A map of rice genome variation reveals the origin of cultivated rice, more interesting and illuminating. The authors used genomic sequencing, of varied coverage (i.e., very deep, repeated, and therefore accurate coverage vs. a single pass which is a very rough draft), to assess the relationship between Asian wild rice and two of the dominant domestic cultivars, indica (long-grain paddy rice) and japonica (short-grain dry cultivation rice). Presumably the two cultivars derive from a wild ancestor, but the details are still being hashed out.
The heritability of a trait within a population is the proportion of observable differences in a trait between individuals within a population that is due to genetic differences. Factors including genetics, environment and random chance can all contribute to the variation between individuals in their observable characteristics (in their “phenotypes”)…Heritability thus analyzes the relative contributions of differences in genetic and non-genetic factors to the total phenotypic variance in a population. For instance, some humans in a population are taller than others; heritability attempts to identify how much genetics are playing a role in part of the population being extra tall.
Over at Haldane’s Sieve Dr. Joseph Pickrell has a commentary up on a preprint on explaining the ‘missing heritability’ using yeast genetics. All good reading. I long ago gave up on the idea that the idea of ‘heritability’ would ever be widely internalized among the educated public in any precise sense. But we muddle on. The next decade is going to be big for the genomics of complex traits. Or so people keep telling me!
But this gives me the excuse to point to a commentary which you really should read again and again. It is A commentary on ‘common SNPs explain a large proportion of the heritability for human height’ by Yang et al. (2010).:
It seems a new field is being born! Jeff Wall & Monty Slatkin have a pretty thorough review out, Paleopopulation Genetics:
Paleopopulation genetics is a new field that focuses on the population genetics of extinct groups and ancestral populations (i.e., populations ancestral to extant groups). With recent advances in DNA sequencing technologies, we now have unprecedented ability to directly assay genetic variation from fossils. This allows us to address issues, such as past population structure, changes in population size, and evolutionary relationships between taxa, at a much greater resolution than can traditional population genetics studies. In this review, we discuss recent developments in this emerging field as well as prospects for the future.
Nothing very new for close readers of this weblog, but the references are useful for later mining.
Via Haldane’s Sieve, Genetics has a new preprint policy:
POLICY ON PRE-PRINT DEPOSITS
GENETICS allows authors to deposit manuscripts (currently under review or those for intended submission to GENETICS) in non-commercial, pre-print servers such as ArXiv. Upon final publication in GENETICS, authors should insert a journal reference (including DOI), and link to the published article on the GENETICS website, and include the acknowledgment: “The published article is available at www.genetics.org.” See http://arxiv.org/help/jref for details.
Here’s a more thorough list of preprint guidelines by journal. For all practical purposes this means that population genetics can now percolate more freely among the masses. Many of the differences between “draft” preprints and the final manuscript have to do with formatting, etc., from what I have seen. So the content shall flow!
To the left is a PCA from The History and Geography of Human Genes. If you click it you will see a two dimensional plot with population labels. How were these plots generated? In short what these really are are visual representations of a matrix of genetic distances (those distances being general FST), which L. L. Cavalli-Sforza and colleagues computed from classical autosomal markers. Basically what the distances measure are the differences across populations in regards to their genetics. The unwieldy matrix tables can be visualized as a neighbor-joining tree, or a two dimensional plot as you see here. But that’s not the end of the story.
In the past ten years with high density SNP-chip arrays instead of just representing the relationship of populations, these plots often can now illustrate the position of an individual (the methods differ, from components analysis or coordinate analysis, to multi-dimensional scaling, but the outcomes are the same).
Today there was a short article in Discover on a paper published last spring on the models for the settling of Madagascar. I didn’t pay too much attention when the paper came out for two reasons. First, it focused on Y and mtDNA, and I’ve been playing with Malagasy autosomes. Second, it seemed a ridiculously brutal computational attack on a question which seems to have a straightforward intuitive explanation: yes, Madagascar was settled by a small founding group. With hindsight I may have spoken too soon, or passed judgement too hastily. Looking at the paper the explicit model building of demography does still seem like overkill, but they obtain some important precision here. The phylogenetics and the archaeology align nicely.
Though the authors of the article talk about future directions, I think we will find that the Malagasy originate from a small group of Malayo-Polynesians who did find themselves stranded on Madagascar (later to absorb African admixture). This is not controversial. Rather, when I came to this position with enough solidity I began to look at the cultural anthropology of Madagascar. In particular, what do the Malagasy remember of about their own past in Southeast Asia? From what I could tell (the literature on Madagascar is not too rich in English) the Malagasy don’t recall much. This is important, because it tells us just how fragile oral memory can be when you have a major geographical and demographic rupture. The influence of Sanskrit is apparently evident within Malagasy, attesting to the early period if Indic influence in Southeast Asia. But the Malagasy are not part of Dharmic or Islamic civilization. They are the people forgotten by time. I think what little we know about the Malagasy can shed on light memories and legends preserved by peoples who we suspect were migrants into the only homelands they knew (e.g., how could the Aryans be exogenous if they didn’t record any memory of lands before India?).
When it comes to the human genetics of the Khoe-San there’s a little that’s stale and unoriginal for me in terms of presentation. The elements are always composed the same. The Bushmen are the “most ancient” humans, who can tell us something about “our past,” about “our evolution.” Tried & tested banalities just bubble forth unbidden. I have no idea why. There’s a new paper in Science on the genetics of the Khoe-San, which includes Bushmen, which brought to mind this issue for me because of the outrageous nature of the press releases.
The title of the paper itself is a testament to vanilla, Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History. This is absolutely not surprising. Are you shocked that the Khoe-San have adaptations? Or that African history is complex? The wonder of it all! This paper actually revisits much of the same ground as Pickrell et al.’s originally titled The genetic prehistory of southern Africa. Before Dr. Pickrell executes throw-down on me on Twitter let me concede that I have no creative ideas to offer in terms of an alternative title. Rather, I have an idea: perhaps in the future scientists could explore the evolutionary genetic basis for steatopygia? The trait is not limited just to Khoe-San, my distant cousins the Andaman Islanders also exhibit it. Perhaps this is the ancestral state of the human lineage? This is a situation where the titles just write themselves!