By Razib Khan | June 28, 2011 1:04 am


In the comments below Antonio pointed me to this working paper, What Do DNA Ancestry Tests Reveal About Americans’ Identity? Examining Public Opinion on Race and Genomics. I am perhaps being a bit dull but I can’t figure where its latest version is found online (I stumbled upon what looks like another working paper version on one of the authors’ websites). Here’s the abstract:

Genomics research will soon have a deep impact on many aspects of our lives, but its political implications and associations remain undeveloped. Our broad goal in this research project is to analyze what Americans are learning about genomic science, and how they are responding to this new and potentially fraught technology.

We pursue that goal here by focusing on one arena of the genomics revolution — its relationship to racial and ethnic identity. Genomic ancestry testing may either blur racial boundaries by showing them to be indistinct or mixed, or reify racial boundaries by revealing ancestral homogeneity or pointing toward a particular geographic area or group as likely forebears. Some tests, or some contexts, may permit both outcomes. In parallel fashion, genomic information about race can emphasize its malleability and social constructedness or its possible biological bases. We posit that what information individuals choose to obtain, and how they respond to genomic information about racial ancestry will depend in part on their own racial or ethnic identity.

We evaluate these hypotheses in three ways. The first is a public opinion survey including vignettes about hypothetical individuals who received contrasting DNA test results. Second is an automated content analysis of about 5,500 newspaper articles that focused on race-related genomics research. Finally, we perform a finer-grained, hand-coded, content analysis of about 700 articles profiling people who took DNA ancestry tests.

Three major findings parallel the three empirical analyses. First, most respondents find the results of DNA ancestry tests persuasive, but blacks and whites have very different emotional responses and effects on their racial identity. Asians and Hispanics range between those two poles, while multiracials show a distinct pattern of reaction. Second, newspaper articles do more to teach the American reading public that race has a genetic component than that race is a purely social construction. Third, African Americans are disproportionately likely to react with displeasure to tests that imply a blurring of racial classifications. The paper concludes with a discussion, outline of next steps, and observations about the significance of genomics for political science and politics.

A mismeasured Mismeasurement of Man

By Razib Khan | June 8, 2011 2:32 am

I would say The Mismeasurement of Man is one of the most commonly cited books on this weblog over the years (in the comments). It comes close to being “proof-text” in many arguments online, because of the authority and eminence of the author in the public mind, Stephen Jay Gould. I am in general not particularly a fan of Gould’s work or thought, with many of my sentiments matching the attitudes of Paul Krugman in this 1996 essay:

….Like most American intellectuals, I first learned about this subject [evolutionary biology] from the writings of Stephen Jay Gould. But I eventually came to realize that working biologists regard Gould much the same way that economists regard Robert Reich: talented writer, too bad he never gets anything right. Serious evolutionary theorists such as John Maynard Smith or William Hamilton, like serious economists, think largely in terms of mathematical models. Indeed, the introduction to Maynard Smith’s classic tract Evolutionary Genetics flatly declares, “If you can’t stand algebra, stay away from evolutionary biology.” There is a core set of crucial ideas in his subject that, because they involve the interaction of several different factors, can only be clearly understood by someone willing to sit still for a bit of math. (Try to give a purely verbal description of the reactions among three mutually catalytic chemicals.)

But many intellectuals who can’t stand algebra are not willing to stay away from the subject. They are thus deeply attracted to a graceful writer like Gould, who frequently misrepresents the field (perhaps because he does not fully understand its essentially mathematical logic), but who wraps his misrepresentations in so many layers of impressive, if irrelevant, historical and literary erudition that they seem profound.

Yes, I am aware that some biologists would disagree with this assessment of Gould’s relevance. But I remain generally skeptical of his arguments, though over the years I have become more accepting of the necessity of openness to a sense of ‘pluralism’ when it comes to the forces which shape evolutionary processes. And certainly there is interesting exposition in a book like The Structure of Evolutionary Theory, but there was no need for ~1500 pages (Brian Switek did fine with a little over ~300 pages in covering similar territory as the first half of the book). Whatever valid positions Gould staked out in opposition to excessive adaptationist thinking on the part of the neo-Darwinian orthodoxy of the mid-20th century, his penchant for self-marketing and repackaging of plausible but not particularly novel concepts was often destructive in my experience to the enterprise of a greater public understanding of science.

When I was in 8th grade my earth science teacher explained to the class proudly that he was not a “Darwinian,” rather, he accepted punctuated equilibrium. One must understand that much of his audience was Creationist in sympathy because of the demographics of the region, but I was frankly appalled by his explicit verbal rejection of “Darwinism,” because I knew how the others would take it (my best friend in the class was a Creationist and he kept chuckling about “monkeys turning into men” throughout the whole period). I remained after to further explore this issue with my teacher. I expressed my bewilderment as best as I could, and it came to pass that my teacher explained that he had arrived to his skepticism of the rejected model of Darwinism via the works of Stephen Jay Gould. With his silver tongue Gould had convinced him that the future of evolutionary science lay with punctuated equilibrium, which had already overthrown the older order. A 13 year old can only go so far, and so I moved on.

The coincidental intersection of sociology & genetics

By Razib Khan | April 20, 2011 11:57 pm

Hispanic – Definitions in the United States:

The 1970 Census was the first time that a “Hispanic” identifier was used and data collected with the question. The definition of “Hispanic” has been modified in each successive census. The 2000 Census asked if the person was “Spanish/Hispanic/Latino”.

The U.S. Office of Management and Budget currently defines “Hispanic or Latino” as “a person of Mexican, Puerto Rican, Cuban, South or Central American, or other Spanish culture or origin, regardless of race.”

Because Hispanics can be any race, you need to look at their own self-identification. The breakdowns as per the American census are that somewhat over 50% of American Hispanics/Latinos identify as white, most of the rest as “some other race,” with a small minority as black, Native American, etc.

This came to mind when I saw this paper in BMC Genetics, Comparing self-reported ethnicity to genetic background measures in the context of the Multi-Ethnic Study of Atherosclerosis (MESA). The issue is that when you’re doing association studies between genes and diseases you want to control for population structure. For example, if disease X is found in Chinese Americans to a higher degree than the general population, then all the alleles distinctive to Chinese Americans would correlate with disease X in an aggregated pool. Self-reports are pretty good, but on the margin there is now some juice to squeeze out of the data sets by using ancestrally informative markers to “clean up” the outliers within the populations.

Here are the results:

Four clusters are identified using 96 ancestry informative markers. Three of these clusters are well delineated, but 30% of the self-reported Hispanic-Americans are misclassified. We also found that MESA SRE provides type I error rates that are consistent with the nominal levels. More extensive simulations revealed that this finding is likely due to the multi-ethnic nature of the MESA. Finally, we describe situations where SRE may perform as well as a GBMA in controlling the effect of population stratification and admixture in association tests.

Below is a principal component analysis plot which illustrates the largest dimensions of genetic variation in their data set for the individuals from four different populations, African Americans, European Americans, Hispanic Americans, and Chinese Americans. I thought of the above census results when I saw the distributions on the plot:

Why race will matter after we all get our full sequences

By Razib Khan | February 9, 2011 10:55 am

In my post “Health care costs and ancestry”, a commenter says:
“Race” is a concept that should have died with disco. I imagine it will soon be feasible for every patient to have their genome analysis included in their medical file and the various risk and other pertinent factors explicated.

The chart to the left shows how race is a social construct. It’s a bar plot which partitions ancestry, and as you can see, the Asian children are a mix of European and Asian. How does that happen? Because in 1980 the US Census included people of South Asian origin as “Asian Americans.” In contrast, those of Middle Eastern origin remain “non-Hispanic white” (this not totally crazy, think Ralph Nader or Marlo Thomas). But it means that an ethnic Baloch from Pakistan is “Asian,” and an ethnic Baloch from Iran is a “non-Hispanic white.”

Health care costs and ancestry

By Razib Khan | February 8, 2011 1:07 am

The Pith: In this post I examine the relationship between racial ancestry and cancer mortality risks conditioned on particular courses of treatment. I review research which indicates that the amount of Native American ancestry can be a very important signal as to your response to treatment if you have leukemia, as measured by probability of relapse.

If you are an engaged patient who has been prescribed medication I assume you’ve done your due diligence and double-checked your doctor’s recommendations (no, unfortunately an M.D. does not mean that an individual is omniscient). Several times when I’ve been prescribed a medication I have seen a note about different recommended dosages by race when I did further research. Because of my own personal background I am curious when it says “Asian.” The problem with this term in medical literature is that “Asian” in the American context is derived from a Census category constructed in 1980 for bureaucratic and political purposes. It amalgamates populations which are genetically relatively close, East and Southeast Asians, with more distant ones, South Asians (when my siblings were born I remember that my parents listed their race as “Asian” when they filled out paper work for the hospital).

But at least the issues with an “Asian” category are clear. Consider the “Hispanic/Latino” category. In the the USA this term also became popular through government fiat around 1970, as a catchall for people whose ancestry derives from the Spanish speaking Americas, with Spaniards, Portuguese, and Brazilians, being border-line cases. Additionally, it has become relatively common in the general American culture to code Hispanic as non-white. This despite the fact that all Latin American populations have large self-identified white populations, with some, such as Argentina and Uruguay, being overwhelmingly white. In the USA between 54% and 92% of Hispanics identify as white in terms of their race. The discrepancy is that some surveys allow for the “Some other race” option, which is the second most popular choice. Surveys which force respondents into a few categories such as white, black, Native American or Asian, produce a result where Hispanics default to a white self-identification.

ResearchBlogging.orgImplicitly we know it’s more complicated than this mishmash of bureaucratic convenience and opportunistic American identity politics. The HapMap has a Mexican American sample from Los Angeles. Above you see K = 3 in ADMIXTURE for Mexican Americans. Each thin “slice” is an individual, with the color proportions reflective of genomic contributions of one of three putative ancestral groups.The full plot had Europeans and Chinese as well. Blue seems to correspond with Native American, and red white European (the green residual is modal in East Asians). Los Angeles’ Mexican American community is obviously mixed-race. What in Latin American might be termed mestizo. And yet according to the survey data when forced to choose this community seems to affiliate with a white Spanish identity, blanco. Seeing as almost all of them are Spanish speaking and not indigenous (I am aware that the USA has a small and growing non-Spanish speaking Latino population of indigenous immigrants), this would make sense. But another facet of Mexican American identity surfaces in the concept of Aztlán, which is a nod to the Nahua roots of much of the Mexican population.

But whatever the the cultural nuance and subtly, which can be decomposed at length, it is also important to properly characterize the genetic structure of the Hispanic populations. Some Mexican Americans are predominantly white European in ancestry, and some are predominantly Amerindian. Many are mixed in roughly equal proportions. This is not just a minor detail. Going back to my first paragraph, a new letter to Nature Genetics reports on the differential response to treatment in children with leukemia proportional to Native American ancestry. Ancestry and pharmacogenomics of relapse in acute lymphoblastic leukemia:

Ancestry in the Americas

By Razib Khan | January 10, 2011 2:19 am

ResearchBlogging.orgThe populations of the African Diaspora have a particular interest in the new genomics, and its relationship to ancestry. Unlike other post-Columbian Diasporas they have sketchy, at best, knowledge of the regions from which their ancestors arrived. This probably explains the popularity of Roots and Henry Louis Gates Jr.’s various genealogical projects which have utilized cutting edge genomics. It may seem silly to hang one’s hat on one maternal lineage, but perhaps it seems silly if you are relatively assured of the broad outlines of your own genealogy. The fact that I am U2b is not very interesting to me, but I also happen to know that my maternal grandmother’s mother’s family were long resident in their region of Bengal (and, that her father was a migrant from northwest India). It would be a different matter if my ancestors had been enslaved and dispossessed of their heritage.

A new paper in PLoS ONE surveys the paternal (NYR), maternal (mtDNA), and autsomal (using 175 ancestrally informative markers), heritage of a range of African origin populations from across the Americans. Dissecting the Within-Africa Ancestry of Populations of African Descent in the Americas:

Our analysis revealed that both continental admixture and within-Africa admixture may be critical to achieving an adequate understanding of the ancestry of African-descended Americans. Whilecontinental ancestry reflects gender-specific admixture processes influenced by different socio-historical practices in the Americas, the within-Africa maternal ancestry reflects the diverse colonial histories of the slave trade. We have confirmed that there is a genetic thread connecting Africa and the Americas, where each colonial system supplied their colonies in the Americas with slaves from African colonies they controlled or that were available for them at the time. This historical connection is reflected in different relative contributions from populations of W/WC/SW/SE Africa to geographically distinct Africa-derived populations of the Americas, adding to the complexity of genomic ancestry in groups ostensibly united by the same demographic label.

There isn’t anything too surprising here. Blacks from Brazil have much more ancestry from the former Portuguese colonies of Angola and Mozambique. As we should expect. Because the New World African Diaspora dates to only the past 350-150 years even mtDNA should be a good snapshot of the genetic variation. And, because of the ability to construct clean genealogies due to lack of recombination, mtDNA can be even more informative than total genome surveys in terms of elucidating fine-grained geographical patterns. The map below illustrates the mtDNA results well:

To classify humanity is not that hard

By Razib Khan | December 14, 2010 12:22 pm

snpskinIn my post below I quoted my interview L. L. Cavalli-Sforza because I think it gets to the heart of some confusions which have emerged since the finding that most variation on any given locus is found within populations, rather than between them. The standard figure is that 85% of genetic variance is within continental races, and 15% is between them. You can see some Fst values on Wikipedia to get an intuition. Concretely, at a given locus X in population 1 the frequency of allele A may be 40%, while in population 2 it may be 45%. Obviously the populations differ, but the small difference is not going to be very informative of population substructure when most of the difference is within populations.

But there are loci which are much more informative. Interestingly, one controls variation on a trait which you are familiar with, skin color (unless you happen to lack vision). A large fraction (on the order of 25-40%) of the between population variance in the complexion of Africans and Europeans can be predicted by substitution on one SNP in the gene SLC24A5. The substitution has a major phenotypic effect, and, exhibits a great deal of between population variation. One variant is nearly fixed in Europeans, and another is nearly fixed in Africans. In other words the component of genetic variance on this trait that is between population is nearly 100%, not 15%. This illustrates that the 15% value was an average across the genome, and in fact there are significant differences on the genetic level which can be ancestrally informative. You can take this to the next level: increase the number of ancestrally informative markers to obtain a fine-grained picture of population structure. In the illustration above the top panel shows the frequencies at the SNP mentioned earlier on SLC24A5. The second panel shows variation at another SNP controlling skin color, SLC45A2. This second SNP is useful in separating South and Central Asians from Europeans and Middle Easterners, if not perfectly so. In other words, the more markers you have, the better your resolution of inter-population difference. This is why I found the following comment very interesting:

To study humankind, AAA responds

By Razib Khan | December 13, 2010 1:48 pm

This morning I received an email from the communication director of the American Anthropology Association. The contents are on the web:

AAA Responds to Public Controversy Over Science in Anthropology

Some recent media coverage, including an article in the New York Times, has portrayed anthropology as divided between those who practice it as a science and those who do not, and has given the mistaken impression that the American Anthropological Association (AAA) Executive Board believes that science no longer has a place in anthropology. On the contrary, the Executive Board recognizes and endorses the crucial place of the scientific method in much anthropological research. To clarify its position the Executive Board is publicly releasing the document “What Is Anthropology?” that was, together with the new Long-Range Plan, approved at the AAA’s annual meeting last month.

The “What Is Anthropology?” statement says, “to understand the full sweep and complexity of cultures across all of human history, anthropology draws and builds upon knowledge from the social and biological sciences as well as the humanities and physical sciences. A central concern of anthropologists is the application of knowledge to the solution of human problems.” Anthropology is a holistic and expansive discipline that covers the full breadth of human history and culture. As such, it draws on the theories and methods of both the humanities and sciences. The AAA sees this pluralism as one of anthropology’s great strengths.

Changes to the AAA’s Long Range Plan have been taken out of context and blown out of proportion in recent media coverage. In approving the changes, it was never the Board’s intention to signal a break with the scientific foundations of anthropology – as the “What is Anthropology?” document approved at the same meeting demonstrates. Further, the long range plan constitutes a planning document which is pending comments from the AAA membership before it is finalized.

Anthropologists have made some of their most powerful contributions to the public understanding of humankind when scientific and humanistic perspectives are fused. A case in point in the AAA’s $4.5 million exhibit, “RACE: Are We So Different?” The exhibit, and its associated website at www.understandingRACE.org, was developed by a team of anthropologists drawing on knowledge from the social and biological sciences and humanities. Science lays bare popular myths that races are distinct biological entities and that sickle cell, for example, is an African-American disease. Knowledge derived from the humanities helps to explain why “race” became such a powerful social concept despite its lack of scientific grounding. The widely acclaimed exhibit “shows the critical power of anthropology when its diverse traditions of knowledge are harnessed together,” said Leith Mullings, AAA’s President-Elect and the Chair of the newly constituted Long-Range Planning Committee.

Was the Pocahontas exception necessary?

By Razib Khan | November 12, 2010 12:11 am

Harry_F._ByrdIn Jonathan Spiro’s Defending the Master Race it is recounted that as American states were passing more robust anti-miscegenation laws and legally enshrining the concept of the one-drop-rule an exception was made in Virginia for those with 1/16th or less Native American ancestry. The reason for this was practical: many of the aristocratic “First Families of Virginia” claimed descent from Pocahontas. Included within this set was Senator Harry F. Byrd Sr. of Virginia, who was 1/16th Native American, being a great-great-grandson of Pocahontas. This sort of background was probably not exceptional among the “Founding Stock” of Anglo-Americans whose ancestors were resident within the boundaries of the American republic at independence. Only around 1700 did the white population of the American British colonies exceed the indigenous, so no doubt some amalgamation did occur.

But from what I’ve seen the extent of admixture with the indigenous substrate was very marginal, especially in comparison to white populations in Argentina or Brazil. Or so I thought. In conversation a friend recently claimed that over 50% of American whites were 5% or more non-European in ancestry. I expressed skepticism, and he dug up the citation. Genetic ancestry: A new look at racial disparities in head and neck cancer:

Not all genes are equal in the eyes of man

By Razib Khan | September 13, 2010 12:19 am

Kalashpeople_20100312A few days ago I was listening to an interview with a reporter who was kidnapped in the tribal areas of Pakistan (he eventually escaped). Because he was a Westerner he mentioned offhand that to “pass” as a native for his own safety he had his guides claim he was Nuristani when inquiries were made. The Nuristanis are an isolated group in Afghanistan notable for having relatively fair features. His giveaway to his eventual captors was that his accent was clearly not Nuristani, and master logicians that the Taliban are, the inference was made that he was likely a European pretending to be Nuristani.

I thought about this incident when looking over the supplements yesterday of Reconstructing Indian population history. On page 19 note S2 figure 1 includes the Kalash of Pakistan. These are the unconverted cousins of the Nuristanis who were not forcibly brought into the religion of peace in the late 1800s because their region of the Hindu Kush was under British rule, who naturally imposed their late 19th century European value that populations should not be converted by force to a particular religion (Nuristan means “land of light,” whereas before Afghans called it Kafiristan, “land of the unbelievers”). Despite the fair features of the Kalash, which has given rise to rumors that they are the descendants of Alexander the Great’s soldiers, they cluster with Central and South Asian populations, not Europeans. Like the Ainu of Japan it seems superficial similarities to Europeans, at least in relation to the majority population around them, has resulted in an inordinate expectation of total genome exoticism, when in reality a few particular loci are producing the distinctiveness.

Figure 1 from the 2007 paper, Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians, brings home the point:

America in 2050 may still be majority white

By Razib Khan | June 19, 2010 1:50 pm

I have expressed some skepticism at the idea that in the year 2050 the United States of America will perceive itself as a majority-minority nation; that is, non-Hispanic whites will be be a minority. This projection is repeated and asserted so often that it’s a plausible background assumption when you’re making a model of the American future. But there are other factors which make this a shakier inference from current trends. A new article in The New York Times which has nothing to do with racial identity as such is a good tell as to the other factor at work, Plea to Obama Led to an Immigrant’s Arrest:

he letter appealing to President Obama was written in frustration in January, by a woman who saw her family reflected in his. She was a white United States citizen married to an African man, and the couple — college-educated professionals in Manhattan — were stymied in their long legal battle to keep him in the country.

One of the principals is introduced as white, but later on, you learn:

“I’ve been feeling very confused and ashamed as an American citizen,” she said, evoking her family’s eclectic immigrant origins.

Her father, an emeritus professor of East Asian languages and cultures at the University of California, Berkeley, is the son of Scottish immigrants; her mother’s family were refugees from North Korea; her stepmother is Chinese; and her sister’s husband is Egyptian.

Vanessa HugdensIf her mother is one of the tiny minority of white European-descended Koreans, she happens to be one of those who also has a Korean first name (it isn’t too hard to find these data on the internet). In other words, The New York Times felt that it was permissible for the purposes of this article to frame one of the individuals profiled as white despite the fact that more precisely she’s Eurasian as is clear within the text of the article itself (she may also have identified herself as white to the reporter). I am not sure that she would have been defined as white if her husband was not an African immigrant, as for narrative purposes that is probably a better contrast effect. But imagine if her mother’s family were black immigrants from Jamaica: The New York Times would not define her as white I would hazard in that case.

Image Source: Wikimedia Commons

PCA plots and trees

By Razib Khan | June 1, 2010 3:52 pm

A few years ago I had the pleasure of asking the famed geneticist L. L. Cavalli-Sforza some questions. Here’s part of the Q & A which is germane to my post from a few days ago:

7) Question #3 hinted at the powerful social impact your work has had in reshaping how we view the natural history of our species. One of the most contentious issues of the 20th, and no doubt of the unfolding 21st century, is that of race. In 1972 Richard Lewontin offered his famous observation that 85% of the variation across human populations was within populations and 15% was between them. Regardless of whether this level of substructure is of note of not, your own work on migrations, admixtures and waves of advance depicts patterns of demographic and genetic interconnectedness, and so refutes typological conceptions of race. Nevertheless, recently A.W.F. Edwards, a fellow student of R.A. Fisher, has argued that Richard Lewontin’s argument neglects the importance of differences of correlation structure across the genome between populations and focuses on variance only across a single locus. Edwards’ argument about the informativeness of correlation structure, and therefore the statistical salience of between-population differences, was echoed by Richard Dawkins in his most recent book. Considering the social import of the question of interpopulational differences as well as the esoteric nature of the mathematical arguments, what do you believe the “take home” message of this should be for the general public?

Edwards and Lewontin are both right. Lewontin said that the between populations fraction of variance is very small in humans, and this is true, as it should be on the basis of present knowledge from archeology and genetics alike, that the human species is very young. It has in fact been shown later that it is one of the smallest among mammals. Lewontin probably hoped, for political reasons, that it is TRIVIALLY small, and he has never shown to my knowledge any interest for evolutionary trees, at least of humans, so he did not care about their reconstruction. In essence, Edwards has objected that it is NOT trivially small, because it is enough for reconstructing the tree of human evolution, as we did, and he is obviously right.

PCA plots show you variation that occurs in a correlated fashion across a set of genes. In other words, they’re large systematic signals within the sea of noise genetic variation. They can tell us a great deal, in concert with other techniques, about the history of our species, and the nature and extent of the relationship between populations within in our species. The reason that there is correlated variation across a subset of genes which are highly informative in regards to population identity is simple: human population groups generally have a common shared history. They have been subject to the same evolutionary dynamics, and those dynamics, from drift to selection, have particular effects on the nature of genomic variation (or lack thereof).

My point in my previous post was to emphasize that this information needs to be integrated into the bigger picture in a nuanced fashion. Broad systematic population wide patterns of variation, and between population variation, is important, and of great evolutionary interest. But the genetic uniqueness within families, from recent unique de novo mutations (operationally, family scale private alleles), is also of great interest and importance. PCA plots such as the ones above are naturally not going to tell us much about this aspect of human variation. In the “thought experiment” I presented I indicated that focus on the largest signals of between population variation alone can miss a great deal.

When America was post-colonial

By Razib Khan | April 30, 2010 1:35 pm

Below I stated:

…until the late 20th century the majority of the ancestry of the white population of the republic descended from those who were counted in the 1790 census.

A commenter questioned the assertion. The commenter was right to question it. My source was a 1992 paper that estimated that only in 1990 did the proportion of American ancestry which derived from those who arrived after the 1790 census exceeding 50%. In other words, if you ran the ancestors of all Americans back to 1790, a majority of that set would have been counted in the 1790 census (so people of mixed ancestry would contribute to the two components are weighted by their ancestry).
