Beyond visualization of data in genetics

By Razib Khan | May 31, 2010 7:08 am

totalvarHopefully by now the image to the left is familiar to you. It’s from a paper in Human Genetics, Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study. The paper is interesting in and of itself, as it combines a wide set of populations and puts the focus on the extent of disjunction between self-identified ethnic identity, and the population clusters which fall out of patterns of genetic variation. In particular, the authors note that the “Native Hawaiian” identification in Hawaii is characterized by a great deal of admixture, and within their sample only ~50% of the ancestral contribution within this population was Polynesian (the balance split between European and Asian). The figure suggests that subjective self assessment of ancestral quanta is generally accurate, though there are a non-trivial number of outliers. Dienekes points out that the same dynamic holds (less dramatically) for Europeans and Japanese populations within their data set.

All well and good. And I like these sorts of charts because they’re pithy summations of a lot of relationships in a comprehensible geometrical fashion. But they’re not reality, they’re a stylized representation of a slice of reality, abstractions which distill the shape and processes of reality. More precisely the x-axis is an independent dimension of correlations of variation across genes which can account for ~7% of the total population variance. This is the dimension with the largest magnitude. The y-axis is the second largest dimension, accounting for ~4%. The magnitudes decline precipitously as you descend down the rank orders of the principle components. The 5th component accounts for ~0.2% of the variance.

The first two components in these sorts of studies usually conform to our intuitions, and add a degree of precision to various population scale relations. Consider this supplement chart from a 2008 paper (I’ve rotated and reedited for clarity):


The first component separates Africans from non-Africans, the latter being a derived population from a subset of the former. The second component distinguishes West Eurasians from East Eurasians & Amerindians. These two dimensions and the distribution of individuals from the Human Genome Diversity Project reiterates what we know about the evolutionary history of our species.

And yet I wonder if we should be careful about the power of these two-dimensional representation’s in constraining us excessively when we think about genetic variation and dynamics. Naturally there is the sensitivity of the character of dimensions upon the nature of the underlying data set upon which they rely. But consider this thought experiment,

Father = Japanese
Mother = Norwegian
Child = Half Japanese & Half Norwegian

If you projected these three individuals upon the two-dimensional representation above of the worldwide populations the father would cluster with East Asians, the mother with Europeans, and the child with the groups who span the divide, Uyhgurs and Hazaras. So on the plot the child would be far closer to these Central Asian populations than to the groups from which its parents derive. And here’s a limitation of focusing too much on two-dimensional plots derived from population level data: is the child interchangeable with a Uyghur or Hazara genetically in relation to their parents? Of course not! If the child was a female, and the father impregnated her, the consequence (or probability of a negative consequence) would be very different than if he impregnated a Uyghur or Hazara woman.

The reason for this difference is obvious (if not, ask in the comments, many readers of this weblog know the ins & outs at an expert level). Abstractions which summarize and condense reality are essential, but they have their uses and limitations. Unlike physics biology can not rely too long on elegance, beauty, and formal clarity. Rather, it always has to dance back between rough & ready heuristics informed by the empirics and theoretical systems which emerge from axioms. Usually a picture has its own sense. But the key is to be precise in understanding what sense it makes to you.

CATEGORIZED UNDER: Genetics, Genomics
  • Pingback: Tweets that mention Beyond visualization of data in genetics | Gene Expression | Discover Magazine --

  • Peter Marsh

    The purpose of this article appears to be an attempt to discount the value of these charts which contain some very important information. Information that confirms previous studies regarding the origins of the Polynesians.

    Here is an example of some of these studies linking America and Japan with Polynesia

    InPeter Bellwoods book Mans Conquest of the Pacific he cites a study showing that
    Polynesians and NW Coastal Indians have very similar blood. They both have No B, high A, high M, high R2 & moderate Fya. The study showed Polynesians have no blood similarities to S.E. Asians or Melanesians.

    S.W. Serjeantson “The Colonization of the Pacific – A Genetic Trail 1989 pp 135,162-163,166-7 “The following genes set them apart: Polynesians lack HLA-B27 , wheras it is common amongst Melanesians.
    HLA-Bw48 is commonly found in Polynesian populations, but occurs only sporadically in Melanesia. The only other known population with an appreciable frequency of HLA-Bw48 is that of the North American Indians or more specifically the Tlingit. In Polynesia Bw48 co-occurs with A11, – suggesting a variation since Polynesians departed from the Canadian coast.

    Theodore G Schurr and colleagues(1990) ‘Both the North American Pima and the Central American Maya have high frequencies of the Mitochondrial DNA sequence variation containing the rare Asian RFLP Hine II morph 6 in conjunction with an Asian-specific 9 based pair deletion.’ It appears that both the Pima and the Maya are genetically very close to the Polynesians. The arrival of these genes in America is believed to have been between 6-8,000 years ago, ruling out the possibility of Polynesian origins as Polynesians have only been in the Pacific for 2,200 years. A migration of the Polynesians from America is far more logical.

    Katsushi Tokunaga and colleagues. ‘Genetic link between Asians and Native Americans: Evidence from HLA genes and haplotypes’ in Human Immunology 62 1001-1008 (2001).
    HLA24-Cw8-B48, A24-Cw10-B60 and A24-Cw9-B61 were all commonly observed in Taiwan indigenous populations, Tibetans, Thais, Japanese, Orochon in North East China, Buryat, Man,Yakut, Inuit, Tlingit, Pima, Maya and Maori.’

    Harihara and colleagues (1992) noted: When observing the ‘Frequency of a 9bp deletion in the mitochrondrial DNA among Asian populations’. It appears that the Maori & Cook Islanders had ancestors from the Shizuoka prefecture of Japan.”

    Fideas E, Leon S, and colleagues. ‘HLA Trans Pacific contacts'(1995) notes that; ‘a tribe living near the Pacific Colombian coast named the Noanama/Wanana, are clustered genetically closer to Japanese people than to other American natives.Novick and colleagues concur with this.’

    Yes Polynesians are related to Japanese and native Taiwanese. They came vis the Kuroshio current to America and then sailed down to Hawaii – the Homeland of Polynesians. Yes they did mix with Caucasians – The Easter Islanders are paleolithic Caucasians from America – as are the Basques – hence their close genetic similarity.

    In 1972 Professor Jean Dausset conducted a study of the Caucasian blue/green eyed, red heads of Easter Island, who are in fact a significant part of the Polynesian story. He found them to have an ancient strain of Caucasian blood, which can also be found in the Basques of Spain, characterised by A29 and B12. The analyses revealed that 39% of unrelated Basques and 37% of the Easter Islanders were carriers of the HLA gene B12. These were the highest and second highest proportions tested throughout the world. The figures for A29 were similar. The Easter Islanders, with 37%, had the highest proportion in the world, while the Basques were second with 24%. The most remarkable thing was; that the two genes were found as a haplotype (combined genetic markers) in 11% of Easter islanders and 7.9% of the Basques. No other people in the world had remotely comparable figures.”
    In fact, from the above tests, the Easter Islanders appear to be of a more pure ancient Caucasian racial stock than the Basques!

    So both these very visual graphs tell us exactly like it is. Yes, there has been some recent genetic admixture, but geneticists can see that by looking at the gene tree where they can see the times of recombination.

    Yes America WAS the stepping stone of Polynesians into the Pacific. The second graph in the above article shows this very clearly. This is the trail of Haplogroup B on the West coast of America which arrived 6-8,000 years ago, but in Polynesia its arrival was only 2,200 years ago. Chronology alone suggests the direction of colonisation.

    For more details regarding this alternative much more robust theory regarding the origins of the Polynesians see my website Polynesian Pathways at above url.

  • bioIgnoramus

    “Unlike physics biology can not rely too long on elegance, beauty, and formal clarity”: aye, and physics tends to rely on them when there’s a dearth of experimental data.

  • Razib Khan

    The purpose of this article appears to be an attempt to discount the value of these charts which contain some very important information.


  • Pingback: PCA plots and trees | Gene Expression | Discover Magazine()


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar