The coincidental intersection of sociology & genetics

By Razib Khan | April 20, 2011 11:57 pm

Hispanic – Definitions in the United States:

The 1970 Census was the first time that a “Hispanic” identifier was used and data collected with the question. The definition of “Hispanic” has been modified in each successive census. The 2000 Census asked if the person was “Spanish/Hispanic/Latino”.

The U.S. Office of Management and Budget currently defines “Hispanic or Latino” as “a person of Mexican, Puerto Rican, Cuban, South or Central American, or other Spanish culture or origin, regardless of race.”

Because Hispanics can be any race, you need to look at their own self-identification. The breakdowns as per the American census are that somewhat over 50% of American Hispanics/Latinos identify as white, most of the rest as “some other race,” with a small minority as black, Native American, etc.

This came to mind when I saw this paper in BMC Genetics, Comparing self-reported ethnicity to genetic background measures in the context of the Multi-Ethnic Study of Atherosclerosis (MESA). The issue is that when you’re doing association studies between genes and diseases you want to control for population structure. For example, if disease X is found in Chinese Americans to a higher degree than the general population, then all the alleles distinctive to Chinese Americans would correlate with disease X in an aggregated pool. Self-reports are pretty good, but on the margin there is now some juice to squeeze out of the data sets by using ancestrally informative markers to “clean up” the outliers within the populations.

Here are the results:

Four clusters are identified using 96 ancestry informative markers. Three of these clusters are well delineated, but 30% of the self-reported Hispanic-Americans are misclassified. We also found that MESA SRE provides type I error rates that are consistent with the nominal levels. More extensive simulations revealed that this finding is likely due to the multi-ethnic nature of the MESA. Finally, we describe situations where SRE may perform as well as a GBMA in controlling the effect of population stratification and admixture in association tests.

Below is a principal component analysis plot which illustrates the largest dimensions of genetic variation in their data set for the individuals from four different populations, African Americans, European Americans, Hispanic Americans, and Chinese Americans. I thought of the above census results when I saw the distributions on the plot:

Granted, there is a big difference between genetic admixture in populations which can vary over a continuous range, and the artificial binning you see in census categories. But the 50% white vs. 50% non-white (white + other) corresponds reasonably well to the PCA in my mind….

CATEGORIZED UNDER: Genetics, Genomics
  • Darkside

    this might sound horrible but are hispanics out competing blacks because they are closer to whites? meaning: because they have some white in them? i’ve always wondered that (in relation to whitness = success in Brazil)

  • dc

    hispanics are outcompeting blacks because of socioeconomics.. Hispanics that are travelling to the US are either.. very affluent in their native countries and have education and money to travel to the US or hardworking immigrants. Cubans the richest and most educated hispanic group, alone outcompete/ are more educated and wealthier than the native US white population. The NATIVE black community in America are being outcompeted by Haitians and African immigrants, thus solidifying my argument. I think it has to do with entitlements, public policy, and the general attitude of the community. Some people have education and money and its easy, others work hard to do what it takes to get an education and help their families. Overall in the black american community there is a regression. A Wallstreet journal article, after the census summed this up.

  • Matt Simonson

    I’m amazed such a simple paper was published in BMC Genetics, everyone who has ever worked with the type of data they used in this study could have told you their results long ago. The PC plot you see above is performed in basically every genetic association analysis and has been for almost the past decade, and the fact that genetic ancestry doesn’t align perfectly with societies notions of what constitutes race are blatantly obvious to anyone in the field. A better plot that includes most of the worlds populations and is shown using 3 dimensions is here:

    If you look closely you can actually see the migration patterns out of Africa across the planet and how there are slight genetic variations between populations.

  • gcochran

    IQ looks to be highly polygenic (and additive), at least in the populations examined so far. If that is generally true, and if the differences in average IQ between human populations are significantly genetic (within-group variation is known to be highly heritable) , you would expect average IQ in an admixed population to vary with the admixture fraction. This would be so even though we have trouble finding single alleles that account for much of the variance in IQ.

    This would also be the case for other highly polygenic traits: for example, if the average height of two populations differed, we could make a decent prediction of height for a mixed individual or mixed population by looking at their SNPs and running HAPMIX.

    Anyone with a representative data set containing SNPs and some reasonable measure of IQ could prove or disprove this easily.

    Probably the first thing you do is simply take a quick look to see whether mixed populations tend to have average IQs intermediate between the two ancestral groups. In almost every example I know of, this is the case. In principle, you might two populations that have some kind of negative genetic interaction, so that admixed individuals scores lower. Haven’t heard of that happening in humans. You might also find pairs of populations that ‘nick’ for IQ, that have higher average IQ in hybrid individuals. I have heard of a possible example of that, but haven’t seen enough followup to determine if it was a real effect.

  • Razib Khan

    I’m amazed such a simple paper was published in BMC Genetics, everyone who has ever worked with the type of data they used in this study could have told you their results long ago.

    the paper is not just a PC plot. i’m aware of PCA plots of genetics, i generate them myself 😉

  • Diogenes

    Why concentrate on averages when extremes may be overinfluential regarding IQ?
    Mixed populations may average (or slightly less) between parent populations.
    But how about individuals? These make ALL the difference in evolution, since if they can be linked up (somehow), even group evolution is influenced. If its polygenic, and variants differ, how about >140s % in mixed vs less mixed populations?


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar