As many of you know around the year 2000 the analyses of Y chromosomal human lineages became a pretty big deal. The reason these lineages are important and useful is that they record the uninterrupted ancestry of males, from father to son, along the Y chromosome. Instead of the complexities of the whole genome, as with mtDNA you have a simple and elegant phylogenetic tree to interpret. The clusters along this tree are defined as broad haplogroups, united by derived states from a common ancestor. One of the largest haplogroups is R1a1a. It happens to be my paternal lineage, as well as Dr. Daniel MacArthur’s and Dr. Zack Ajmal’s.
The map above illustrates the peculiarity of R1a1a: it is geographically enormously expansive. How to explain this distribution? A naive response might be that this distribution is surprising similar to that of the Indo-European languages. Unfortunately this runs up against the conundrum that low caste South Indian groups, relatively untouched by Indo-Aryan culture (at least until the past few hundred years), also manifest high frequencies of R1a1a.
To make a long story short it seems that R1a1a is an old haplogroup with a lot of structure across Eurasia. Maju points me to a paper in American Journal of Physical Anthropology which simply & elegantly brings home to us some obvious insights, New Y-chromosome binary markers improve phylogenetic resolution within haplogroup R1a1:
A paper on the psychology of religious belief, Paranormal and Religious Believers Are More Prone to Illusory Face Perception than Skeptics and Non-believers, came onto my radar recently. I used to talk a lot about the theory of religious cognitive psychology years ago, but the interest kind of faded when it seemed that empirical results were relatively thin in relation to the system building (Ara Norenzayan’s work being an exception to this generality). The theory is rather straightforward: religious belief is a naturally evoked consequence of the general architecture of our minds. For example, gods are simply extensions of persons, and make natural sense in light of our tendency to anthromorphize the world around us (this may have had evolutionary benefit, in that false positives for detection of other agents was far less costly than false negatives; think an ambush by a rival clan).*
In the comments below there is a lot of talk about the worry of transferring gene X from organism 1 to organism 2, where the two organisms are very far apart on the tree of life. I’m a little sanguine about this, but that’s because there is already much of this going on through natural processes.
Carl Zimmer for example points out that 8 percent of the human genome seems to derive from endogenous retroviruses (the post draws on material from his book A Planet of Viruses). This is probably a low bound number, as he notes in the comments. Additionally, this isn’t just limited to viruses. See: Horizontal gene transfer between bacteria and animals.
I think on of the chasms between geneticists and the public is that a lot of things that seem creepy and strange to the public are part & parcel of the geneticist’s professional toolkit. For example, to my knowledge no transgenic mice have turned into the Brain. I have friends that order weird mouse varieties, and then do weirder things to them, every week.
With the election coming up, California Proposition 37, Mandatory Labeling of Genetically Engineered Food, is on my mind. From Ballotpedia:
If Proposition 37 is approved by voters, it will:
* Require labeling on raw or processed food offered for sale to consumers if the food is made from plants or animals with genetic material changed in specified ways.
* Prohibit labeling or advertising such food as “natural.”
* Exempt from this requirement foods that are “certified organic; unintentionally produced with genetically engineered material; made from animals fed or injected with genetically engineered material but not genetically engineered themselves; processed with or containing only small amounts of genetically engineered ingredients; administered for treatment of medical conditions; sold for immediate consumption such as in a restaurant; or alcoholic beverages.”
James Wheaton, who filed the ballot language for the initiative, refers to it as “The California Right to Know Genetically Engineered Food Act.”
Michael Eisen has two posts up on this which get the meat of the issue for me. I disagree with Prop 37, though on first blush I think the idea of transparency is radically empowering. Before I get to my reasoning, I want to set aside some ancillary considerations. Some are voting for the measure because they oppose agribusiness in general, or have a particular bone to pick with the way that some firms enforce their intellectual property on seed lines. These are fine critiques, but I’m not going to address them, because I think they’re separate from the science.
The above infographic from The New York Times article For Asians, School Tests Are Vital Steppingstones, was titled “1027-asians” when I tried to save it. No idea why, but I think that’s an amusing file name. My offensively titled post is inspired by the cliche reference to Confucianism in the piece. As my previous posts on “Tiger Mom’s” indicate I am not a big fan of the “Asian” way of obtaining academic laurels through brute force alone. In places like South Korea a cram-school bidding war has distorted the culture. The single-minded focus on a specific test means that the whole society has to shift to keep up with the innovators in the educational “arms race.” Think of it as the analog to the doping scandal in cycling. And it’s an irony that the term innovation is being used here by me, because this sort of “education” destroys the creativity, flexibility, and originality which is the engine which motors modern civilization. Sufficient for producing engineers, but I doubt fruitful as the seedbed for an individualistic scientific culture which aims to shift paradigms.
Here’s a caption from a Time article, What Your Doctor Isn’t Telling You About Your DNA:
Nice to know that two physicians in Philadelphia not only have medical degrees, but specialize in mind-reading the parents of this nation! Above the caption is a photo of the two concerned and worried looking professionals in question. Let me quote the first two paragraphs of the article:
The test results were crystal clear, and still the doctors didn’t know what to do. A sick baby whose genome was analyzed at the Children’s Hospital of Philadelphia turned out to possess a genetic mutation that indicated dementia would likely take root around age 40. But that lab result was completely unrelated to the reason the baby’s DNA was being tested, leaving the doctors to debate: Should they share the bad news?
When it comes to scanning DNA or sequencing the genome — reading the entire genetic code — what to do with unanticipated results is one of the thorniest issues confronting the medical community. Many conflicted discussions followed the dementia discovery at the Children’s Hospital of Philadelphia (CHOP) before a decision was reached: the parents would not be told that this fatal memory-sapping disease likely lurks in their child’s future. Given the hopelessness of the situation, with no treatment and no cure, the doctors said forwarding such information along felt pointless. “We came around to the realization that we could not divulge that information,” says Nancy Spinner, who directs the hospital laboratory that tested the infant. “One of the basic principles of medicine is to do no harm.”
The fourth in a five-part series exploring the promise and pitfalls of sequencing children’s genomes
Around the same time, Spinner’s lab also tested another child — an unusually short 2-year-old referred for kidney disease — and discovered the toddler had a gene linked to a rare form of colon cancer. In some cases, polyps arising from this kind of cancer have been known to develop as early as age 7. This time, the decision to inform the parents was easier: “We feel good about that one,” says Spinner. “Proper screening can make a huge difference.”
Image credit: Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings
I really love the paper Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings. I first read it about six years ago. The result is rather straightforward, but the problem is empirically a moderately deep one. Modern analytic genetics as the fusion between Mendelism and biometrics began with R. A. Fisher’s The Correlation between Relatives on the Supposition of Mendelian Inheritance in 1918. But note, that paper assumed particular relatedness between relatives. As highlighted in the above paper the expected values for most categories of relatedness always had a variance component which was unaccounted for, and so reduced the power of the methodology to ascertain the extent of heritability. The relatedness you can expect between any two siblings is ~0.50, and that is also the average across all siblings. But the reality is that in most cases two given siblings will not share half their genes identical by descent.
Egg freezing is no longer an experimental procedure, according to the American Society for Reproductive Medicine (ASRM), which on 22 October issued new guidelines on the controversial practice. The change in policy is expected to accelerate the growth of clinics that offer egg freezing to women who face fertility-damaging treatment for cancer or other conditions, and to women wishing to delay having a baby — although the society stopped short of endorsing the procedure for that purpose
You can read the full guidelines, with caveats, online. Last I checked this costs on the order of $10,000. Nothing to sneeze at, but definitely not insane when you consider how much money many couples spend on fertility technologies when women are between 35 and 40.
And of course I recommend freezing sperm too. That’s far less costly.
There is a high likelihood that you know of which ABO blood group you belong to. I am A. My daughter is A. My father is B. My mother is A. I have siblings who are A, O, B, and AB. The inheritance is roughly Mendelian, with O being “recessive” to A and B (which are co-dominant with each other, ergo, AB). It is also generally common knowledge that O is a “universal donor,” while A and B can only give to individuals within their respective blood group and AB.
Because ABO was easy to assay it was one of the earliest Mendelian markers utilized in human genetics. In the first half of the 20th century while some anthropologists were measuring skulls, others were mapping out the frequency of A, B, and O. Today with much more robust genetic methods ABO has lost its old luster as a genetic marker, especially since there is a strong suspicion that the variants are strongly shaped by natural selection. This makes them only marginally useful for systematics, which rely upon loci which are honest mirrors of demographic history.
Interesting story in The San Jose Mercury News, Open-source science helps San Carlos father’s genetic quest:
“We used materials that are public, freely available,” said Rienhoff, a physician and scientist, as Beatrice frolicked nearby. “And everything we’ve learned we’ve put back out there, in the public domain. It’s for the patient’s good, and the public good.”
Born with small, weak muscles, long feet and curled fingers, Beatrice confounded all the experts.
No one else in her family had such a syndrome. In fact, apparently no one else in the world did either.
Rienhoff — a biotech consultant trained in math, medicine and genetics at Harvard, Johns Hopkins and the Fred Hutchinson Cancer Research Center in Seattle — launched a search.
He combed the publicly available medical literature, researching diseases, while jotting down each new clue or theory. Because her ailment is so rare, he knew no big labs or advocacy groups would be interested.
I noticed today that GEDmatch is trying to raise funds to cover the cost of their web services. What are those services? Basically if you get raw data back from direct-to-consumer genotyping firms GEDmatch allows you to run further analytics. You can do some of these yourself…but most people aren’t going to know how to convert their files into pedigree format and use PLINK. As genotype data becomes more and more common there’ll be more need for analytic services like GEDmatch, whether for profit or for fun.
One can imagine a near future where much of the work can be offloaded to desktop applications (e.g., Promethease does this for traits and diseases). But the problem is that there is greater returns to the analysis when you aggregate the source data into huge agglomerations, and if people are doing their own analyses on their own systems that’s not going to happen (the GWAS information in Promethease uses aggregated information implicitly with the studies they rely on). This is why GEDmatch and openSNP are important.
In any case, if you have used GEDmatch and wish to give, this would be a good time.
The domestication of the dog is a complex and unresolved topic. But at this point I am convinced that this is one domestication event which well predates agriculture. To some extent this is common sense. There are tentative archaeological finds of domestic dogs in the New World almost immediately after widespread human habitation of the Western hemisphere, >10,000 years ago. More concretely domestic dog DNA has been retrieved from ~9,250 year old coprolites in Texas. The distinctiveness of the New World dogs is well attested genetically. Eskimo dogs for example are nested in a well diverged clade with “ancient dogs” (e.g., Basenji), indicating their early separation from the main Eurasian stock. Additionally, from talking to a dog geneticist I am to understand that the Eskimo dogs themselves are likely new arrivals, and superseded older dog lineages in the far north.
I have mentioned the PLoS Genetics paper, The Date of Interbreeding between Neandertals and Modern Humans, before because a version of it was put up on arXiv. The final paper has a few additions. For example, it mentions the generally panned (at least in the circles I run in) PNAS paper which suggested that ancient population structure could produce the same patterns which were earlier used to infer admixture with Neandertals (the authors also point to Yang et al. as a support for the proposition of admixture rather than structure). The primary result, dating the admixture between Neandertals and anatomically modern humans ~40-80,000 years before the present, is reiterated.
An interesting aspect is that their method is to utilize linkage disequilibrium (LD) decay. It’s interesting because tens of thousands of years is a hell of a long time to be able to detect an admixture event via LD! In particular because there’s likely a palimpsest effect where there are intervening admixtures and other assorted demographic events (e.g., bottlenecks and selective sweeps can also generate LD). So how’d they do it? Basically the authors figured out a way to ascertain which pairs of SNPs may have introgressed from Neandertals by comparing the frequency in modern humans to Neandertals at those given SNPs (in particular, by looking at variants at low frequency in Africans and derived in Neandertals). A major technical problem here is the “genetic map” which allows one to assess what the nature of recombination over time is going to be which breaks apart the associations which are the hallmark of LD is not particular precise enough to robustly allow them to make the inferences that they want.
Every now and then I get emails/inquiries about using my simple “quick & dirty” charts. I always give permission, and in fact I never complain when other (usually more popular!) weblogs use them. Even on the very rare occasions where attribution is not given (I suspect this is usually an oversight in haste). This is fair enough, as I regularly use figures and tables from scientific papers in my blog posts. I believe this is all reasonably under “fair use.” But I’ve decided to explicitly assert that the charts I produce are under the Creatives Commons license. This is not to force people to attribute. Even if someone uses my chart and does not attribute I won’t really sue or anything like that (unless they’re somehow miraculously making millions off them). Rather, it’s a nudge to those who would use the charts, and should also make it so that people don’t have to contact me directly via email or Twitter. If you want to use the chart or data, permissions is given implicitly.
The Pith: Natural selection comes in different flavors in its genetic constituents. Some of those constituents are more elusive than others. That makes “reading the label” a non-trivial activity.
As you may know when you look at patterns of variation in the genome of a given organism you can make various inferences from the nature of these patterns. But the power of those inferences is conditional on the details of the real demographic and evolutionary histories, as well as the assumptions made about the models one which is testing. When delving into the domain of population genomics some of the concepts and models may seem abstruse, but the reality is that such details are the stuff of which evolution is built. A new paper in PLoS Genetics may seem excessively esoteric and theoretical, but it speaks to very important processes which shape the evolutionary trajectory of a given population. The paper is titled Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation. Here’s the author summary:
Considerable effort has been devoted to detecting genes that are under natural selection, and hundreds of such genes have been identified in previous studies. Here, we present a method for extending these studies by inferring parameters, such as selection coefficients and the time when a selected variant arose. Of particular interest is the question whether the selective pressure was already present when the selected variant was first introduced into a population. In this case, the variant would be selected right after it originated in the population, a process we call selection from a de novo mutation. We contrast this with selection from standing variation, where the selected variant predates the selective pressure. We present a method to distinguish these two scenarios, test its accuracy, and apply it to seven human genes. We find three genes, ADH1B, EDAR, and LCT, that were presumably selected from a de novo mutation and two other genes, ASPM and PSCA, which we infer to be under selection from standing variation.
The dynamic which they refer to seems to be a reframing of the conundrum of detecting hard sweeps vs. soft sweeps. In the former you case have a new mutation, so its frequency is ~1/(2N). It is quickly subject to natural selection (though stochastic processes dominate at low frequencies, so probability of extinction is high), and adaptation drives the allele to fixation (or nearly to fixation). In the latter scenario you have a great deal of extant genetic variation, present in numerous different allelic variants. A novel selection pressure reshapes the frequency landscape, but you can not ascribe the genetic shift to only one allele. It is no surprise that the former is easier to model and detect than the latter. Much of the evolutionary genomics of the 2000s focused on hard sweeps from de novo mutations because they were low hanging fruit. The methods had reasonable power to detect them (as well as many false positives!). But of late many are suspecting that hard sweeps are not the full story, and that much of evolutionary genetic process may be characterized by a combination of hard sweeps, soft sweeps (from standing variation), various forms of negative selection, not to mention the plethora of possibilities which abound in the domain of balancing selection.
Many of the details of the paper may seem overly technical and opaque (and to be fair, I will say here that the figures are somewhat difficult to decrypt, though the subject is not one that lends itself to general clarity), but the major finding is straightforward, and illustrated in figure 4 (I’ve added labels):
You know that Newsweek is ending its print edition. This was long in the coming. What I find interesting is that apparently its circulation peaked in 1991, at 3.3 million. It declined to 3.1 million in 2007, and literally cratered over the past 5 years to 1.5 million! Thinking back to my own past I remember my interest in Newsweek‘s cartoons, as well as the in-class discussion triggered by the latest issue of Time. There was a time when these were relevant publications. But that ended in 1995, with the rise of the internet. For years the weeklies still maintained the illusion of relevance, but I think they were living on borrowed time. People went through the motions because they always had. After all, the cover of Time was important, everyone knew it. Until no one did. The collapse in circulation is just a reflection of the fact that this emperor had long ago shed its clothes.
Part of the popularity of our demonstration archive is that it is free for end users. We are happy to provide this service. It is a valuable resource for the academic community and it also publicizes the value of our SDA software. However, the flip side of providing this free service is that it does not generate any income to offset the cost of providing the infrastructure required. We receive no funding from GSS for hosting their datasets — which is often a surprise to our users. Almost all of our income comes from the fees provided by licensing the SDA software to other data archives (like ICPSR and IPUMS), and virtually all of that income goes to support the programming and technical support that we provide them. We obviously need some additional sources of revenue.
Update II: This comment sums up the pertinent issues.
Update: Please see comments below. This may be an infectious disease story, and not a genetic one.
When a reader sent me an email about the story, I assumed it was a rather sophisticated hoax. The short of it is that an 11 year old boy, Colman Chadam, has been pulled out of his school in Palo Alto because he carries alleles for cystic fibrosis, though he is asymptomatic (i.e., he has never manifested any symptoms of c.f.) administrators are worried that he might pose a risk to some students who do show progression of c.f. (bacterial infections can spread from child to child). As you probably know about 1 out of 30 people of European descent is a carrier for one of the many thousands of mutant variants for cystic fibrosis. The details of Colman Chadam’s results are not totally clear. Is he just a carrier? Or does he have two copies of cystic fibrosis, but somehow they differ enough that they can functionally complement each other?
We don’t know. But we do know that the lawyer from the school, Lenore Silverman, has stated that “The district is not willing to risk a potentially life-threatening illness among kids.” The risk here is non-existent. Not to be creepy, but does the school require that people with various venereal diseases also avoid the premises? With the small, but non-trivial, frequency of sexual abuse at school they too post a illness risk to the kids. In fact, more than Colman Chadam.
You can read everything at The San Francisco Chronicle. Perhaps you want to contact some of the staff at Jordan Middle School in Palo Alto. These people should be ashamed of themselves. I know we live in a litigious society, and being in education isn’t the easiest job, but we deserve better than this!