Tag: Genomics

Using your 23andMe data in Plink

By Razib Khan | January 7, 2013 4:58 am

With the recent $99 price point for 23andMe many of my friends have purchased kits (finally!). 23andMe’s interpretive results are pretty rich now, but there are still things missing. There are plenty of third party tools you can use, but I know some people might want to do their own data analysis. There are many ways you could go about this, but I want to put up some posts on DIY genomic data analysis to making the learning curve a little less steep, and get people started. Motivation to actually begin going down this road is a big issue, but I think once you get over the hump it gets a lot easier.

First, you need Plink. It is really preferable that you work on a Mac or in Linux to engage in heavy duty analysis, but in this post I’ll assume you are working on the Windows platform. Again, the point here is to make this accessible. Download Plink if you don’t have it, and extract it where ever you like.

Read More

CATEGORIZED UNDER: Genomics
MORE ABOUT: Genomics, PLINK

Why the future won’t be genetically homogeneous

By Razib Khan | January 5, 2013 10:52 pm

While reading The Founders of Evolutionary Genetics I encountered a chapter where the late James F. Crow admitted that he had a new insight every time he reread R. A. Fisher’s The Genetical Theory of Natural Selection. This prompted me to put down The Founders of Evolutionary Genetics after finishing Crow’s chapter and pick up my copy of The Genetical Theory of Natural Selection. I’ve read it before, but this is as good a time as any to give it another crack.

Almost immediately Fisher aims at one of the major conundrums of 19th century theory of Darwinian evolution: how was variation maintained? The logic and conclusions strike you like a hammer. Charles Darwin and most of his contemporaries held to a blending model of inheritance, where offspring reflect a synthesis of their parental values. As it happens this aligns well with human intuition. Across their traits offspring are a synthesis of their parents. But blending presents a major problem for Darwin’s theory of adaptation via natural selection, because it erodes the variation which is the raw material upon which selection must act. It is a famously peculiar fact that the abstraction of the gene was formulated over 50 years before the concrete physical embodiment of the gene, DNA, was ascertained with any confidence. In the first chapter of The Genetical Theory R. A. Fisher suggests that the logical reality of persistent copious heritable variation all around us should have forced scholars to the inference that inheritance proceeded via particulate and discrete means, as these processes do not diminish variation indefinitely in the manner which is entailed by blending.

Read More

We are Nature

By Razib Khan | December 13, 2012 10:03 am

There’s an interesting piece in Slate, The Great Schism in the Environmental Movement, which seems to be a distillation of trends which have been bubbling within the modern environmentalist movement for a generation now (I’ve read earlier manifestos in a similar vein). I can’t assess the magnitude of the shift, but here’s the top-line:

But that is a false construct that scientists and scholars have been demolishing the past few decades. Besides, there’s a growing scientific consensus that the contemporary human footprint—our cities, suburban sprawl, dams, agriculture, greenhouse gases, etc.—has so massively transformed the planet as to usher in a new geological epoch. It’s called the Anthropocene.

Modernist greens don’t dispute the ecological tumult associated with the Anthropocene. But this is the world as it is, they say, so we might as well reconcile the needs of people with the needs of nature. To this end, Kareiva advises conservationists to craft “a new vision of a planet in which nature—forests, wetlands, diverse species, and other ancient ecosystems—exists amid a wide variety of modern, human landscapes.”

Read More

A lighter shade of brown: Dan MacArthur, look east or south!

By Razib Khan | December 12, 2012 4:58 pm

South Indian Udupi cuisine

In the post below I offered up my supposition that Dan MacArthur’s ancestry is unlikely to be Northwest Indian, which precludes a Romani origin for his South Asian ancestry. Indeed this is almost certainly so, Dienekes Pontikos followed up my crude analyses with IBD-sharing calculations (IBD = ‘identity by descent,’ which is basically what you would think it is). The South Asian population which MacArthur has the closest affinity to is from Karnataka, which is one of the Dravidian speaking states of the South. This does not necessarily refute my earlier contention, as aside from Brahmins most Bengalis seem to have broad South Indian affinities, except for the fact that they often have more East Asian ancestry.

Read More

MORE ABOUT: Genetics, Genomics

We don’t know why Ethiopians breathe easy

By Razib Khan | December 11, 2012 11:40 pm

Most people are aware that altitude imposes constraints on individual performance and function. Much of this is flexible; athletes who train at high altitudes may gain a performance edge. But over the long term there are costs, just as there are with computers which are ‘overclocked.’ This is the point where you make the transition from physiology to evolution. Residence at high altitude entails strong selective pressures on populations. Over the past few years there has been a great deal of exploration of the genetics of long resident high altitude groups, the Tibetans, Peruvians, and Ethiopians.

Read More

The origins of the Romani determined definitively

By Razib Khan | December 9, 2012 1:52 pm

In many cases there are questions of a historical and ethnographic nature which are subject to controversy and debate. Scholarly arguments are laid out, and further dispute ensues. For decades progress seems fleeting, as one hypothesis is accepted, only to be subject to later revision. This sort of pattern gives succor to the most cynical and jaded of ‘Post Modern’ set, especially when the ‘discourse’ in question is in the domain of science.

But thankfully these debates can come to an end in some cases. So it is with the origins of the European Romani, better known as ‘Gypsies’ (though the Roma are the most well known of the Romani, other groups within Europe have different ethnonyms). Obviously many of the basic elements have long been there, but I think the most recent genetic work now establishes a level of closure. Taking a step back, what do we know?

1) The Romani language seems to be Indo-Aryan, with a likely affinity with the northwest group of Indo-Aryan languages

2) The Romani presence in Europe only dates to the past ~1,000 years, with an entry point in the Byzantine Empire

3) They are an admixture between an ancestral Indian element, and local populations

4) Their history of endogamy has resulted in a strong genetic drift effect

The two papers which seem to nail the coffin shut on these questions use somewhat different methodologies. One relies on Y chromosomal STRs (hypervariable repeat regions) to generate a paternal phylogeny. Focusing just on the paternal phylogeny allows for one to make very robust genealogical inferences. Additionally, the authors had a very large data set across India. Their goal was to ascertain the exact region of origin of the Romani before they left India. As noted in bullet #1 there is already some evidence from their language that this must be in northwest India. The second paper uses a SNP-chip; hundreds of thousands of autosomal markers. This has been done to death for other populations, so the method isn’t new. Rather, it is that it is now being applied to the Romani.

First, the Y chromosomal paper. The Phylogeography of Y-Chromosome Haplogroup H1a1a-M82 Reveals the Likely Indian Origin of the European Romani Populations:

Linguistic and genetic studies on Roma populations inhabited in Europe have unequivocally traced these populations to the Indian subcontinent. However, the exact parental population group and time of the out-of-India dispersal have remained disputed. In the absence of archaeological records and with only scanty historical documentation of the Roma, comparative linguistic studies were the first to identify their Indian origin. Recently, molecular studies on the basis of disease-causing mutations and haploid DNA markers (i.e. mtDNA and Y-chromosome) supported the linguistic view. The presence of Indian-specific Y-chromosome haplogroup H1a1a-M82 and mtDNA haplogroups M5a1, M18 and M35b among Roma has corroborated that their South Asian origins and later admixture with Near Eastern and European populations. However, previous studies have left unanswered questions about the exact parental population groups in South Asia. Here we present a detailed phylogeographical study of Y-chromosomal haplogroup H1a1a-M82 in a data set of more than 10,000 global samples to discern a more precise ancestral source of European Romani populations. The phylogeographical patterns and diversity estimates indicate an early origin of this haplogroup in the Indian subcontinent and its further expansion to other regions. Tellingly, the short tandem repeat (STR) based network of H1a1a-M82 lineages displayed the closest connection of Romani haplotypes with the traditional scheduled caste and scheduled tribe population groups of northwestern India.

 

Two trees illustrate the results succinctly:

The bottom line:

– This particular Y chromosomal lineage which is highly diagnostic of South Asian origin in the Romani shows that the Romani seem to derive from the populations of northwest India

– Additionally, within these populations the Romani Y chromosomal lineages derive from the lower caste elements, the scheduled castes and scheduled tribes

But the above results don’t get directly at genome-wide admixture. The second paper does, using hundreds of thousands of markers to explore the Romani affinity to other populations. Reconstructing the Population History of European Romani from Genome-wide Data:

The Romani, the largest European minority group with approximately 11 million people…constitute a mosaic of languages, religions, and lifestyles while sharing a distinct social heritage. Linguistic…and genetic…studies have located the Romani origins in the Indian subcontinent. However, a genome-wide perspective on Romani origins and population substructure, as well as a detailed reconstruction of their demographic history, has yet to be provided. Our analyses based on genome-wide data from 13 Romani groups collected across Europe suggest that the Romani diaspora constitutes a single initial founder population that originated in north/northwestern India ∼1.5 thousand years ago (kya). Our results further indicate that after a rapid migration with moderate gene flow from the Near or Middle East, the European spread of the Romani people was via the Balkans starting ∼0.9 kya. The strong population substructure and high levels of homozygosity we found in the European Romani are in line with genetic isolation as well as differential gene flow in time and space with non-Romani Europeans. Overall, our genome-wide study sheds new light on the origins and demographic history of European Romani.

The plot to the left illustrates the relationship of the Romani to world-wide populations using multi-dimensional scaling, where genetic variation is decomposed into dimensions, and individuals are plotted on those dimensions. In short, the Romani exhibit a classic admixture cline pattern.That is, they are the products of a two-way admixture between populations which occupy distinct positions along a cline, and Romani individuals and populations are distributed along the cline in proportion to their admixture. One notable aspect is that the Romani are actually two clusters; one which manifests a strong ‘east’-‘west’ distribution, and another which seems located purely within the European cluster. The latter seems to be the Welsh Romani, who in the neighbor-joining tree (see the supplements) fall on the same branch as European populations, as opposed to the other Romani, who form their own clade.

To drill down further you need to ascertain admixture with a model-based clustering algorithm. Ergo, ADMIXTURE. I’ve reedited the figure to illustrate the salient points. In particular, it is clear that the Roma populations except the Welsh have significant South Asian ancestry. The question is how much? To answer this question you need to know the source population in South Asia. A peculiar aspect of this plot is that the Romani have very little of the green ancestral component, which happens to be modal in the Middle East (not shown). This element happens to be highly enriched in many Pakistani populations, but not necessarily northwest Indian ones. Nevertheless, the issue that leaves me suspicious of this particular finding is that many of the European populations, in particular those groups (e.g., Balkans) which may have admixed with the Romani, have this element to extent not evident in one of their presumed ‘daughter’ populations. I wonder if perhaps the peculiarities of Romani inbreeding has skewed the allele frequency distribution so much that you get strangeness like this. I am not showing higher K’s because those break out with a Romani-cluster. Just like the Kalash-cluster this is to a great extent a feature of the long term endogamy of these communities. With high levels of drift the allele frequency of these groups moves into a very peculiar space in relation to their parental populations, but one must not become confused and assume that the Romani or Kalash are themselves appropriate independent clusters in the same way that Europeans or East Asians are.

Using various forms of admixture analysis the authors seem to conclude that the Balkan Romani are 30-50% South Asian. This seems in line with intuition. But that still leaves open the question of who those South Asians were. As I noted above the most thorough Y chromosomal data point to the lower caste elements of northwest India. What do the autosomes say?

I don’t want get into the technical details of how they tested the models, but it seems that one of the likely parental populations to the Romani had a close relationship to the Meghwal, a scheduled caste from northwest India. In other words, the autosome results align very well with the Y chromosomal inferences. Additionally, the models tested imply that the Romani likely left South Asian ~1,000 years before the present, which aligns well with what is known from the historical record (though this is a case where I put much more stock in the historical record than inferences from population genetic models; look at the intervals).

Finally, there is the question of inbreeding. One aspect of the Romani genome is jumps out you is that they have many long “runs-of-homozygosity” (ROH). This is totally expected, as decades of uniparental analyses suggested a great deal of population bottleneck events as the Romani spread throughout Europe. But the ROH patterns also unearth an interesting fact: some of the Balkan Romani clearly have recent European admixture, while the non-Balkan Romani had an initial period of admixture followed by endogamy. The latter scenario seems to resemble Askhenazi Jews, while the former would suggest that the boundary between Romani and non-Romani in the Balkans is more fluid than is sometimes portrayed.

So there we have it. The Romani derive from lower castes populations from the northwest Indian subcontinent who seem to have left ~1,000 years ago. Over time they admixed with local populations, and are now 50-70% non-South Asian, with some groups being ~90% European (e.g., Welsh Romani). And, they have a long history as an endogamous group, judging by their inbreeding.

Northern Europeans and Native Americans are not more closely related than previously thought

By Razib Khan | December 1, 2012 3:38 pm

A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:

Native Americans and Northern Europeans More Closely Related Than Previously Thought

Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America

Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS

 

The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.

What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.

So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.

Reflections on the evolution at ASHG 2012

By Razib Khan | November 11, 2012 1:54 pm

As most readers know I was at ASHG 2012. I’m going to divide this post in half. First, the generalities of the meeting. And second, specific posters, etc.

Generalities:

Life Technologies/Ion Torrent apparently hires d-bag bros to represent them at conferences. The poster people were fine, but the guys manning the Ion Torrent Bus were total jackasses if they thought it would be funny/amusing/etc. Human resources acumen is not always a reflection of technological chops, but I sure don’t expect organizational competence if they (HR) thought it was smart to hire guys who thought (the d-bags) it would be amusing to alienate a selection of conference goers at ASHG. Go Affy & Illumina!

– Speaking of sequencing, there were some young companies trying to pitch technologies which will solve the problem of lack of long reads. I’m hopeful, but after the Pacific Biosciences fiasco of the late 2000s, I don’t think there’s a point in putting hopes on any given firm.

– I walked the poster hall, read the titles, and at least skimmed all 3,000+ posters’ abstracts. No surprise that genomics was all over the place. But perhaps a moderate surprise was how big exomes are getting for medically oriented people.

– Speaking of medical/clinical people, I noticed that in their presentations they used the word ‘Caucasian‘ a lot. This was not evident in the pop-gen folks. It shows the influence of bureaucratic nomenclature in modern medicine, as they have taken to using somewhat nonsensical US Census Bureau categories.

– Twitter was a pretty big deal. There were so many interesting sessions that I found myself checking my feed constantly for the #ASHG2012 hashtag. It was also an easy way to figure out who else was at the same session (e.g., in my case, very often Luke Jostins).

– If you could track the patterns of movements of smartphones at the conference it would be interesting to see a network of clustering of individuals. For example, the evolutionary and population genomics posters were bounded by more straight-up informatics (e.g., software to clean your raw sequence data), from which there was bleed over. But right next to the evolution and population genomics sections (and I say genomics rather than genetics, because the latter has been totally subsumed by the former) you had some type of pediatric disease genetics aisles. I wasn’t the only one to have a freak out when I mistakenly kept on moving (i.e., you go from abstruse discussions of the population structure of Ethiopia, to concrete ones about the likely probability of death of a newborn with an autosomal dominant disorder, with photos of said newborn!).

Read More

Inflammatory bowel syndrome is nature’s side effect

By Razib Khan | November 4, 2012 9:23 pm

Last week Luke Jostins (soon to be Dr. Luke Jostins) published an interesting paper in Nature. To be fair, this paper has an extensive author list, but from what I am to understand this is the fruit of the first author’s Ph.D. project. In any case, you may know Luke because I have used his loess curve on hominin encephalization for years. His bread & butter is statistical genetics, and it shows in this Nature paper. God knows how he managed to cram so much density into ~5.5 pages of plain text. Luke is also a contributor to Genomes Unzipped, and has put up a post over there on one implication of the paper, Dozens of new IBD genes, but can they predict disease? The short answer is that for individual prediction complex traits are going to be a hard haul over the long term.*

They are subject to what Jim Manzi would term “high causal density.” A simple way to state this is that outcome X is dependent on a host of variables, and if you capture only a small number of variables, you aren’t going to be explaining much in a general fashion. This is obvious from the text of Luke’s paper. Let’ look at the abstract, Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease:

Read More

Buddy can you spare a selective sweep

By Razib Khan | October 21, 2012 12:52 pm

The Pith: Natural selection comes in different flavors in its genetic constituents. Some of those constituents are more elusive than others. That makes “reading the label” a non-trivial activity.

As you may know when you look at patterns of variation in the genome of a given organism you can make various inferences from the nature of these patterns. But the power of those inferences is conditional on the details of the real demographic and evolutionary histories, as well as the assumptions made about the models one which is testing. When delving into the domain of population genomics some of the concepts and models may seem abstruse, but the reality is that such details are the stuff of which evolution is built. A new paper in PLoS Genetics may seem excessively esoteric and theoretical, but it speaks to very important processes which shape the evolutionary trajectory of a given population. The paper is titled Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation. Here’s the author summary:

Considerable effort has been devoted to detecting genes that are under natural selection, and hundreds of such genes have been identified in previous studies. Here, we present a method for extending these studies by inferring parameters, such as selection coefficients and the time when a selected variant arose. Of particular interest is the question whether the selective pressure was already present when the selected variant was first introduced into a population. In this case, the variant would be selected right after it originated in the population, a process we call selection from a de novo mutation. We contrast this with selection from standing variation, where the selected variant predates the selective pressure. We present a method to distinguish these two scenarios, test its accuracy, and apply it to seven human genes. We find three genes, ADH1B, EDAR, and LCT, that were presumably selected from a de novo mutation and two other genes, ASPM and PSCA, which we infer to be under selection from standing variation.

The dynamic which they refer to seems to be a reframing of the conundrum of detecting hard sweeps vs. soft sweeps. In the former you case have a new mutation, so its frequency is ~1/(2N). It is quickly subject to natural selection (though stochastic processes dominate at low frequencies, so probability of extinction is high), and adaptation drives the allele to fixation (or nearly to fixation). In the latter scenario you have a great deal of extant genetic variation, present in numerous different allelic variants. A novel selection pressure reshapes the frequency landscape, but you can not ascribe the genetic shift to only one allele. It is no surprise that the former is easier to model and detect than the latter. Much of the evolutionary genomics of the 2000s focused on hard sweeps from de novo mutations because they were low hanging fruit. The methods had reasonable power to detect them (as well as many false positives!). But of late many are suspecting that hard sweeps are not the full story, and that much of evolutionary genetic process may be characterized by a combination of hard sweeps, soft sweeps (from standing variation), various forms of negative selection, not to mention the plethora of possibilities which abound in the domain of balancing selection.

Many of the details of the paper may seem overly technical and opaque (and to be fair, I will say here that the figures are somewhat difficult to decrypt, though the subject is not one that lends itself to general clarity), but the major finding is straightforward, and illustrated in figure 4 (I’ve added labels):

Read More

Introgressing toward becoming rice

By Razib Khan | October 4, 2012 12:24 am

Rice is a pretty big deal. There’s really no need to justify research on this crop. It feeds literally billions, so the funding will always flow. Would that we knew rice as well as we know C. elgans. After yesterday’s travesty of a paper on barley I thought that readers might find a new paper in Nature, A map of rice genome variation reveals the origin of cultivated rice, more interesting and illuminating. The authors used genomic sequencing, of varied coverage (i.e., very deep, repeated, and therefore accurate coverage vs. a single pass which is a very rough draft), to assess the relationship between Asian wild rice and two of the dominant domestic cultivars, indica (long-grain paddy rice) and japonica (short-grain dry cultivation rice). Presumably the two cultivars derive from a wild ancestor, but the details are still being hashed out.

Read More

CATEGORIZED UNDER: Genetics, Genomics
MORE ABOUT: Genetics, Genomics, Rice

Signal of Indo-Aryan admixture in South Indian Brahmins

By Razib Khan | September 29, 2012 12:59 am

I’ve mentioned a few times that the Reich lab has been finding suggestive evidence for admixture between indigenous South Asians and a West Eurasian group on the order of ~3,000 years before the present. The modal explanation is probably an Indo-Aryan intrusion. Dienekes used rolloff in ADMIXTOOLS to repeat these general findings. Specifically, he found signal for an admixture event analogous to one between non-Brahmin South Indians and Northern Europeans. I say analogous because I do not mean to imply that the admixture was exactly of this form. Rather, there are general resemblances in the genetic profiles across the four groups (i.e., Orcadian & North Kannadi, and population X and Y which merged to form South Indian Brahmins).

Read More

CATEGORIZED UNDER: Human Genetics, Human Genomics
MORE ABOUT: Genetics, Genomics

The Bushmen tell us a lot about human evolution because they are humans who have evolved

By Razib Khan | September 21, 2012 12:32 am

When it comes to the human genetics of the Khoe-San there’s a little that’s stale and unoriginal for me in terms of presentation. The elements are always composed the same. The Bushmen are the “most ancient” humans, who can tell us something about “our past,” about “our evolution.” Tried & tested banalities just bubble forth unbidden. I have no idea why. There’s a new paper in Science on the genetics of the Khoe-San, which includes Bushmen, which brought to mind this issue for me because of the outrageous nature of the press releases.

The title of the paper itself is a testament to vanilla, Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History. This is absolutely not surprising. Are you shocked that the Khoe-San have adaptations? Or that African history is complex? The wonder of it all! This paper actually revisits much of the same ground as Pickrell et al.’s originally titled The genetic prehistory of southern Africa. Before Dr. Pickrell executes throw-down on me on Twitter let me concede that I have no creative ideas to offer in terms of an alternative title. Rather, I have an idea: perhaps in the future scientists could explore the evolutionary genetic basis for steatopygia? The trait is not limited just to Khoe-San, my distant cousins the Andaman Islanders also exhibit it. Perhaps this is the ancestral state of the human lineage? This is a situation where the titles just write themselves!

Read More

Not all genes are created the same

By Razib Khan | August 28, 2012 11:52 pm

The map to the right shows the frequencies of HGDP populations on SLC45A2, which is a locus that has been implicated in skin color variation in humans. It’s for the SNP rs16891982, and I yanked the figure from IrisPlex: A sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information. Brown represents the genotype CC, green CG, and blue, GG. Europeans who have olive skin often carry the minor allele, C. While SLC24A5 is really bad at distinguishing West Eurasians from each other, SLC45A2 is better. Though both are fixed in Northern Europe, the former stays operationally fixed in frequency outside of Europe, in the Near East. As I stated earlier the proportions of the ancestral SNP in the Middle Eastern populations in the HGDP seem to be easily explained by the Sub-Saharan admixture you can find in these groups.

In contrast major SNPs in SLC45A2 are closer to disjoint between Europeans and South Asians. For example I’m a homozygote for the C allele. And yet even here we need to be careful. I want in particular to draw your attention to the frequencies in the Middle Eastern populations, the Sardinians, and the Kalash of Pakistan.

The Kalash, and their Nuristani cousins, have often been observed to have “European” physical features. These populations even trade in legends of descent from the Macedonians of Alexander. And the genetics here shows why. Though the Kalash far are more closely related to other Northwest South Asians than to Europeans, on the subset of genes which are implicated in pigmentation many of them could actually “pass” for Europeans. In fact, it is interesting to me that by these measures the Sardinians are no more European than groups like the Kalash and the Druze (in contrast to the total genome, where Sardinians may be the best reference for Western Europeans). They have a lower frequency of the SNP strongly associated with blue eyes than either of these groups, for example.

In the above paper they also produced a chart which illustrated the relationships of HGDP populations as a measure only of the six SNPs they used in their prediction method. These are markers which distinguish blue and brown eye color in Europeans efficiently.

Read More

Consent and genomics

By Razib Khan | August 26, 2012 5:15 pm

Interesting story in The New York Times, Genes Now Tell Doctors Secrets They Can’t Utter:

One of the first cases came a decade ago, just as the new age of genetics was beginning. A young woman with a strong family history of breast and ovarian cancer enrolled in a study trying to find cancer genes that, when mutated, greatly increase the risk of breast cancer. But the woman, terrified by her family history, also intended to have her breasts removed prophylactically.

Her consent form said she would not be contacted by the researchers. Consent forms are typically written this way because the purpose of such studies is not to provide medical care but to gain new insights. The researchers are not the patients’ doctors.

But in this case, the researchers happened to know about the woman’s plan, and they also knew that their study indicated that she did not have her family’s breast cancer gene. They were horrified.

“We couldn’t sit back and let this woman have her healthy breasts cut off,” said Barbara B. Biesecker, the director of the genetic counseling program at the National Human Genome Research Institute, part of the National Institutes of Health. After consulting the university’s lawyer and ethics committee, the researchers decided they had to breach the consent stipulations and offer the results to the young woman and anyone else in her family who wanted to know if they were likely to have the gene mutation discovered in the study. The entire family — about a dozen people — wanted to know. One by one, they went into a room to be told their result.

“It was a heavy and intense experience,” Dr. Biesecker recalled.

Around the same time, Dr. Gail Jarvik, now a professor of medicine and genome science at the University of Washington, had a similar experience. But her story had a very different ending.

She was an investigator in a study of genes unrelated to breast cancer when the study researchers noticed that members of one family had a breast cancer gene. But because the consent form, which was not from the University of Washington, said no results would be returned, the investigators never told them, arguing that their hands were tied. The researchers said an ethics board — not they — made the rules.

Dr. Jarvik argued that they should have tried to persuade the ethics board. But, she said, “I did not hold sway.”

Read More

Cultures & genes: Paleolithic to the Neolithic

By Razib Khan | August 16, 2012 11:23 pm

Spatial linguistic variation Spatial genetic variation Temporal linguistic variation Temporal genetic variation
Paleolithic Very high High Moderate-to-high Moderate-to-low
Neolithic Moderate Moderate-to-low Moderate High
Bronze Age Moderate-to-low Low Moderate Moderate-to-high
Iron Age Low Low Moderate-to-low Moderate
Modern Age Very low Low Low Moderate-to-low

In the comments below I posited a scenario to explain a strange inference from a paper from a few years back, Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude:

Population historical models were estimated (8) from the two-dimensional frequency spectrum of synonymous sites in the two populations. The best-fitting model suggested that the Tibetan and Han populations diverged 2750 years ago, with the Han population growing from a small initial size and the Tibetan population contracting from a large initial size (fig. S2). Migration was inferred from the Tibetan to the Han sample, with recent admixture in the opposite direction.

2,750 years would place the divergence of modern Tibetans and Chinese a few hundred years before Confucius. In fact, it would technically post-date the first historically attested Chinese writing, from the Shang dynasty. This result was pretty incredible, though one of the main authors believes it is a reasonable estimate. There are many ways you can explain this sort of divergence time, but one way which I elucidated below is rather simple. Imagine, if you will, a large set of populations which are culturally very distinct, but engage in gene flow with each other. This is not a preposterous scenario. Because of the restrictions on the manner in which genes are inherited, and the flexibility of cultural traits in terms of transmission, you often have situations where change in allele frequency is clinal while change in culturae is punctuated. To give a concrete example, moving along a transect on the North European plain will result in a gradual change in allele frequencies, but a crisper shift in languages spoke. The two are not totally distinct. Allele frequencies will tend to shift more at language boundaries, but whereas most of the difference is between groups across languages, in relation to genes usually the differences are within groups.

Read More

CATEGORIZED UNDER: Anthroplogy, History
MORE ABOUT: Genetics, Genomics

Not all homozygosity is created the same (way)

By Razib Khan | August 15, 2012 11:39 pm

Browsing the most recent issue of The American Journal of Human Genetics I stumbled upon a paper with some neat figures, Genomic Patterns of Homozygosity in Worldwide Human Populations. More specifically they focused on patterns of “runs of homozygosity” (ROH), that is, sequences of the genome which exhibited a strong bias toward homozygous SNPs. The figure above illustrates a pooled set of populations with individual variation in total length of ROH for aggregated from three classes, short, medium, and long ROHs. The small and medium length ROH exhibit the pattern of increasing total ROH as a function of distance from Africa. But not so with the large ROH. Why?

Read More

CATEGORIZED UNDER: Genomics, Human Genomics

Azores to Atlantis: Africa through the shadows

By Razib Khan | July 27, 2012 9:55 pm

In many ways the image of Africa in the minds of Westerners has become a trope. The “Dark Continent,” eternal, and primal. Like many tropes the realized existence of this Africa is only within the imagination. The real Africa is far different. For there is no real Africa, there Africas. This truth is on my mind this week as two papers of great importance in understanding African genetic history finally saw the light of day. First, Dr. Joseph Pickrell et al. posted their preprint, The genetic prehistory of southern Africa, to arXiv. Second, out of the Tishkoff lab came Evolutionary History and Adaptation from High-Coverage Whole-Genome Sequences of Diverse African Hunter-Gatherers. Let me step aside here and observe a secondary, but non-trivial, detail. The former is an open access preprint. The second is a complete paper published in a relatively high impact journal, Cell, for which the paper itself does not seem typical or appropriate. This is fair enough, most people do not read journals front to back in this day. But unlike Dr. Joseph Pickrell’s paper the paper in Cell is paywalled, and from what I can tell you can not obtain the supporting information without getting beyond the gate! So if you need that paper, email me and I will send it onward (I would just post it on a server, but I’ve gotten nasty emails from the legal departments of publishers, so I am wary of doing that).

Read More

MORE ABOUT: Africa, Genetics, Genomics

Inbred shorter people

By Razib Khan | July 20, 2012 12:10 am

Evidence of Inbreeding Depression on Human Height, a paper with over 1,000 authors! (I exaggerate) It’s interesting because it seems to establish that inbreeding does have a deleterious effect on traits whose genetic architecture is presumably polygenic and additive. Why is this theoretically important? Because inbreeding depression is often assumed to be driven by the exposure of rare recessive larger effect alleles, which recombine in near relations. Using tens of thousands of individuals from across a dozen European nations the authors found that there is a consistent relationship between inbreeding and reduction in height.

As the authors note height is a convenient trait to explore. First, it’s highly heritable. 80 to 90 percent of the variation in the population is explained by variation in genes. Second, it’s easy to measure. Also, implicit in the paper is the fact that in Europe today there is far less of a environmental effect on height (that’s why the heritability value is high). Even in poor European nations most people have enough to eat, so height is highly heritable, allowing for appropriate cross-national comparison.

Read More

CATEGORIZED UNDER: Genetics, Genomics
MORE ABOUT: Genomics, Height

The first, second, and third nations

By Razib Khan | July 11, 2012 10:47 pm

By now you’ve probably read about the paper which reports that there seem to have been three waves of humans migrating into the New World prior to the arrival of Europeans. A major aspect of this result is that it does not emerge out of a vacuum, but rather comes close to settling an old question in linguistics. The late Joseph Greenberg generated a series of audacious phylogenies of languages of the world. Greenberg’s attempts received mixed reviews. It seems that there is little controversy about some of his classifications of African languages, but linguists of American native dialects rejected his division of the languages of the New World into three broad families, Eskimo-Aleut, Na-Dene, and Amerind. Eskimo-Aleut is rather self-evident. Na-Dene encompasses a group of languages in northwest North America, along with some significant outliers such as Navajo. Amerind seems to roughly be a grab-bag of everything else. The linguistic trichotomy also lent itself to a narrative of three migrations. L. L. Cavalli-Sforza gave his support to Greenberg’s framework in The History and Geography of Human Genes, and it seems most non-linguists are particularly congenial toward his tendency of ‘lumping.’ In contrast, linguists remain more skeptical ‘splitters,’ at lease those who have a more ethnographic disciplinary bent. Geneticists have not always supported Greenberg’s suppositions. For example, many of the members of the same group which authored this paper implicitly put the kibosh on the attempt to construct a unified linguistic family which spanned the Andaman Islanders and the Papuans.

The method of the paper was relatively straightforward, assuming you are already somewhat familiar with the statistical genetic esoterica which was unveiled a few years ago by this group and others. Basically you take genetic data in the form of hundreds of thousands of SNPs, and you test the patterns of variation in that data across populations against explicit models of demographic history, represented visually by phylogenetic trees. You can see here that the sampling was relatively thick, except for the United States. Chalk this up to politics. I’ve been hearing about this particular problem in relation to this paper for over a year now. Not having asked any of the members of the group directly I obviously am going off hearsay, but the lack of American samples is most definitely not a feature. It’s a bug. In the supplement they also note that they couldn’t get Na-Dene data from another research group. Almost certainly that’s because of bioethical issues and legal contractual constraints.

Despite all this drama, the scientific isn’t too hard to understand. Aside from the nifty statistics one problem is that many of these native groups have European and African admixture, but there are workarounds to that (e.g., just pull out genomic segments which are indigenous, and use those). The outcome is neatly visualized in the figure below:

Read More

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!
ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+