Pigmentation: the simplest of complex traits not so simple?

By Razib Khan | March 24, 2013 2:46 am

Image credit: Muntuwandi

One of the pitfalls about talking about genetics, especially human genetics, is that the public wants a specific gene for a specific trait. Ergo, the “God gene” or the “language gene.” In some cases science has been able to pull a rabbit out of the hat, and offer up a gene for a trait. But in most of those instances these are going to be single gene recessive diseases. Not exactly what the doctor ordered. In other cases the association seems trivial. For example, wet or dry earwax?* What people are truly interested in are the genetic basis of complex traits, such as intelligence, personality, and height. Unfortunately complex traits often have a complex genetic basis. A trait such as height, which is highly heritable (i.e., most of the variation in the population is due to variation in genes), turns out to be subject to the control of innumerable genes, each of which has a small impact on the value of the final trait. Then there is the possibility that the heritability is tied up to interaction effects across genes.

All of this might compel you to wonder why even tackle the morass that is complex trait genetics?The simple answer is why not? The more concrete answer is that unlike the social sciences geneticists have the gene as an abstract unit from which to construct their theoretical models. It may be a daunting task, but unpacking the causal components of complex traits in a genetic sense is at least more tractable than other intellectual endeavors. Memes are fine as a metaphor, but they haven’t been nearly as useful in constructing a science which generates non-obvious inferences as genes have been.

For most complex traits of any great interest it is not feasible for someone to list off the genes which control most of the normal variation on that trait. Pigmentation is an exception to this. While the continuous variation in height or intelligence seems to be distributed across many, many, genes, (on the order of hundreds or thousands) most of the variation in pigmentation seems to be collapsed into a few genes of large effect. One way to think about it is as an exponential decay function where each successive gene explains less and less of the variation of the trait within your population(s) of interest. So, SLC24A5 is probably the largest effect locus, with a nearly disjoint allele frequency difference between Europeans and Africans (and East Asians). A paper from the mid aughts reports that a substitution at this locus in Europeans (who carry the derived variant) explains “between 25 and 38% of the European-African difference in skin melanin index.” A few years later another paper reported in relation to the locus KITLG that a variant within this region was responsible for ~20% of the variation in European-African pigmentation difference. Assuming independent, effects these two genes then may account for nearly half of the average difference in trait value across the two populations. There are other loci which crop up in the literature repeatedly. TYR, ASIP, SLC45A2, OCA2, and HERC2, for example.

The point in listing off these genes is to emphasize that pigmentation has been one of the major success stories in human genomics over the past decade. This really is the golden age of inquiry into this field. As I like to recount, in 2003 Armand Leroi wrote in the epilogue to his book Mutants that it we didn’t even have a good grasp of the genetic basis of the normal variation in skin color in humans. This assertion was totally out of date within five years. Such a radical change is what you want science to be on its best days, when you’re not slamming your head against a problem which presents no clear and obvious solution. Though many of the genes above were discovered by analyzing differences between Europeans and Africans, a similar set seem to be segregating within South Asians. Expanding the sample coverage to more diverse populations does yield more loci, and in many cases you see different mutations within the same gene producing a change in pigmentation across divergent populations (e.g., at OCA2 there are European and East Asian derived variants). All this taken together seems to imply that change in pigmentation occurred repeatedly across human populations over the last 10-20,000 years, though targeting the same relatively small space of genes which can modulate melanin pathways.

As you might have guessed I have been keeping track of this literature rather closely for a while. The reason is two fold. First, normal human phenotypic variation is interesting to me. And the genetics of pigmentation has been a relative success story. Additionally, I also need to add that the polygenic but large effect genetic character of pigmentation was predicted by the mid-20th century using an analysis of phenotypes and pedigrees in mixed-race populations. See Genetics of Human Populations. All of it has been coming together so well that I’ve started ignoring the literature in this area. Pigmentation genetics was more and more the purview of forensic specialists and the like.

But a new paper in PLOS GENETICS makes me reconsider whether the game is quite over yet. Genetic Architecture of Skin and Eye Color in an African-European Admixed Population:

Differences in skin and eye color are some of the most obvious traits that underlie human diversity, yet most of our knowledge regarding the genetic basis for these traits is based on the limited range of variation represented by individuals of European ancestry. We have studied a unique population in Cape Verde, an archipelago located off the West African coast, in which extensive mixing between individuals of Portuguese and West African ancestry has given rise to a broad range of phenotypes and ancestral genome proportions. Our results help to explain how genes work together to control the full range of pigmentary phenotypic diversity, provide new insight into the evolution of these traits, and provide a model for understanding other types of quantitative variation in admixed populations.

That’s the author summary, not the abstract. The primary result is that in their results the authors find that the genes of large effect have much smaller effects proportionately than in earlier studies, and that ancestry can explain a substantial proportion of the variation when the specific removes are already accounted for. Their study population is in Cape Verde, an island off the coast of West Africa where most of the population are of West African and Portuguese ancestry. This is not an unimportant detail.

Much of their analysis of Cape Verde as a mixed population uses the Yoruba and CEU HapMap data set. CEU consists of whites from Utah who are of British and other assorted Northern European ancestry. The Yoruba are probably a reasonable representative of the African ancestors of this population (though the HGDP Mandenka would probably have been somewhat better). But I don’t understand why they didn’t use the HapMap Tuscan population, as the parental European population for individuals from Cape Verede is Portuguese, Southern European, not Northern European. When speaking of pigmentation genetics it is important to note that European populations vary a great deal. Previous studies focused on a population which was about 20% Northern European and 80% West African (African Americans). This study focuses on a population which is 40% Southern European and 60% West African. It seems entirely reasonable that admixture source populations would have a strong impact on the nature of genetic effects.

The image (source) to the left illustrates variation in frequency of alleles of SLC45A2. A derived variant associated with lighter skin is prevalent across Europe at frequencies on the order of ~90%. But where in Northern European the proportions range the interval from 90 to 100 percent, in Portugal it is present at frequencies of ~80 percent. Alleles associated with light eyes in Europeans are also found at far lower frequency in the southwest and southeast of the continent.

Overall the biggest result out of this paper is found in the abstract: “We identify four major loci…for skin color that together account for 35% of the total variance, but the genetic component with the largest effect (~44%).” The implication, which they lay out, is that in this admixed population the genetic architecture is such as that within that 44% there may be smaller effect genes which diffused through the genome, and strongly correlated with differential ancestry (i.e., European ancestral segments have more “light” alleles, African segments the “dark” ones). This is not entirely unreasonable. If pigmentation loci are targets of selection (their results suggest that this is so) then one might see change on large effect loci first, and then graduate convergence to the adaptive peak via small effect loci. But, I also believe that the fact that the European source population is on the darker side also is having an effect. The allele frequency differences between Swedes and Yoruba, would be larger than Portuguese and Yoruba (though to be sure the Portuguese and Swedes would still be far closer).

Rather than reject previous models, these results refine them, and remind scientists that they need to see how robust their general inferences are. It seems plausible that in the next few years more scholars will explore the genetics of pigmentation in diverse populations, and therefore gain a more nuanced understanding of the genetic architecture of this trait.

Citation: Beleza S, Johnson NA, Candille SI, Absher DM, Coram MA, et al. (2013) Genetic Architecture of Skin and Eye Color in an African-European Admixed Population. PLoS Genet 9(3): e1003372. doi:10.1371/journal.pgen.1003372

* To be fair, these trivial associations are often side effects of other genetic changes which are presumably adaptive in some fashion.

CATEGORIZED UNDER: Genetics, Genomics, Uncategorized
MORE ABOUT: Pigmentation
  • ohwilleke

    One line of analysis that seems fruitful to me, at least in cases when the phenotype is well defined, is to use the law of averages to estimate not the extent to which a trait is hereditable, but some measure of how many gene loci seem to be at work in determining a phenotype.

    The concept can be illustrated by a case where a genetic trait that influences a phenotype is influenced by N loci of equal influence with an undetermined number of possible mutational variations at each loci. It doesn’t matter where in the autosomal genome the loci arise (apart from NRY DNA and to a lesser extent X chromosomes). You could have a third at one chromosome, a sixth at another, and half at a third, or any other possible mix.

    You statistically strip away environmental phenotype data using a sufficiently large database of closely related individuals (getting twins in the sample isn’t important). If you have a 100 or 1000 equal loci trait, the law of averages is going to very strongly favor intermediate values of a phenotype in people with shared ancestry (e.g. siblings) relative to the sources of the ancestry (e.g. parents). The smaller N gets, the more you will see phenotype value distinctiveness between people of shared ancestry relative to the sources of ancestry and you may not infrequently get children who are more distinctive than their parents with repsect to the phenotype. The larger N gets, the more you will see blending of parental traits in children (or the equivalent) with children who are more extreme than their parents on the genetic component of phenotype variation being rare.

    This is easy to guess at in extreme cases of say three or less loci, or hundreds of loci, but is hard to quantify in intermedate cases.

    Since loci, characteristically, do not have equal effects, you have to generalize the concept of N to equal loci equivalents, or use confidencence intervals (the brute force approach would be to use monte carlo analysis with a simplified model) and to establish strong Baysean priors about one or a small number of models of relative loci effect (e.g. equal effect, linear variation in effect with a slope as a second parameter, or exponential decay (or more generally, a power law distribution with an exponent as a second parameter). You might be able to set up an algorhythm to numerically estimate (1) effective N or confidence intervals for N, (2) relative probabilities of the three different loci strength models, and (3) expected range of secondary parameters for linear variation in strength or power law variation in strength models.
    The idea would first identify phenotypes, then determine if they have a hereditary component and an approximate magnitude of herediary effect in a simple additive model, and then do a more involved test of a large sample of phenotype data to suggest about how many loci you are looking for and what kind of relative loci strength models are plausible, before even genotyping a single person. If the phenotype distribution strongly suggests, for example, a effective N of 15 loci with a particular power law (say a decay exponent of a 3/5th power), you could then use that information to target how many loci at approximately what thresholds of effect sizes to be looking for as you design your GWAS search, while also letting you know when to stop looking.
    If you had some comfort that you had identified, for example, 90% of the effective loci equivalents involved in some particular phenotype, e.g. extraversion, you could then bootstrap from that to do a second round a Baysean model comparisons to look at issues like additive v. multiplicative traits, dominance patterns, etc. You could also, at that point, start to pay more attention to gene function which might both help to fine tune to phenotype definitions and to sift random fluke associations from ones that are likely to be causal.
    The hard part, particularly for cognitive traits at least, would be to get good phenotype definition and measurement. This has multiple dimensions of difficulty involved. You have to identify not just the right dimension upon which the phenotype varies or something close, but also properly scale the phenotype against some absolute scale rather than a statistically based one. You want to be fine grained in subcomponents of the trait measured so that components that don’t fit the inheritance pattern can be culled from those that do or separated into multiple phenotypes profiled individually. If you rely mostly on self-reporting your results are likely to be deeply impaired. You need to do lots of broad phenotypic effect from known genotype studies as well (which is hard to do without massive centralized health care and otherwise indexed records) so you get a knack for what true genetically driven phenotypes tend to look like to use as models when crafting new ones without the benefit of genotypes.
    But, in this kind of analysis the best may be the enemy of the good. The very rough estimations one can get on effective genotype loci involved in a trait may be enough to get researchers much close to the right track without being terribly accurate or sophisticated. Going from 1-100,000 possible loci to 10-100 in one of two plausible genetic models may be enough to break the code entirely (or at least to account for 90%-95% of genetic causes) in practice.

    • razibkhan

      the commenter you’re responding to is annoyingly repetitive on this issue, so unless you want to enter into a long exchange… but i’d be curious about sibling comparisons on anything that purports to be large effect, since it would segregate within families.

  • ohwilleke

    “Both John Hawks and Steven Pinker cited a small-sample study that didn’t control for gender in order to promote the politically correct conclusion that MAOA only affects violent behavior in white people.”
    A plausible origin of the statistic could be that most violent behavior in blacks and Hispanic has predominantly environment origins rooted in economic circumstances that render genetic impacts from MAOA a secondary effect lost in statistical noise from the first order environmental effects, while enough white people have circumstances secure and affluent enough to suppress environmentally caused violent behavior rooted in economic circumstances to make genetic MAOA impacts on violent behavior. This fits the data point that shows much higher coefficients of heritability for IQ in middle class whites than in poor blacks. Genetically caused phenotype effects are often only discernable against a relatively quiet background of environmental effects.

    • razibkhan

      commenter you’re responding to is annoyingly repetitive on this issue, so unless you want to enter into a long exchange… but i’d be curious about sibling comparisons on anything that purports to be large effect, since it would segregate within families.

    • http://theunsilencedscience.blogspot.com/ nooffensebut

      “commenter you’re responding to is annoyingly repetitive on this issue”

      (Laugh out loud.) If you find me annoyingly repetitive, I would not recommend watching all of the warrior-gene copycat documentaries. Maury Povich paternity shows are a more enjoyable guilty pleasure.

      “A plausible origin of the statistic could be that most violent behavior in blacks and Hispanic has predominantly environment origins rooted in economic circumstances…”

      I take your point, but this particular statistic really did result from a study that grossly mismatched the gender ratios of the comparison groups. The study was trumpeted after a scientist committed a James Watson vis-à-vis the Maori. Most of this research is on whites and Asians, so one could safely suspend judgment for Africans and Hispanics, even though data from the National Longitudinal Study of Adolescent Health supports the opposite conclusion.

  • http://skadhitheravernerblog.wordpress.com/ Skadhi_the_Raverner

    According to Gayre in the MQ, people in Africa have different undertones to their black skin. In eastern and southern Africa it is golden, in the Horn of Africa it is greyish and elsewhere it is reddish.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar