Despite the reality that I’ve cautioned against taking PCA plots too literally as Truth, unvarnished and without any interpretive juice needed, papers which rely on them are almost magnetically attractive to me. They transform complex patterns of variation which you are not privy to via your gestalt psychology into a two or at most three dimensional representation which can you can grok immediately. That is why History and Geography of Genes was so engrossing. You recognize patterns which were otherwise unrecognizable. But how you interpret those patterns, that’s a wholly different matter. And how those patterns arise is also not something one can ignore.
First, let’s start with an easy case. To the left is a PCA plot with four populations. Nigerians, East Asians (Chinese + Japanese), Europeans (whites from Utah), and finally, African Americans. The x-axis is the first principal component of variation, and the y-axis the second. That means that the x-axis is the independent dimension of variation within the patterns of genetic data which explains the largest fraction of the total amount of genetic variation. The sum totality of the variation can be decomposed into an large set of independent dimensions which can be rank ordered from the largest explanatory components to the smaller ones, successively by number. In a human genetic context the first principal component invariably separates Africans from non-Africans, and the second principal component often maps onto a west-east axis from Europe to the New World. Subsequent principal components can often be useful in smoking out fine scale distinctions, or relationships which are confused by the existence of similar but different signals in admixed populations.
The interpretation of this plot is rather easy. You see that African Americans lay along a continuum between Nigerians and Europeans, skewed toward Nigerians, with some outliers toward East Asians. We know from other genetic findings that ~20% of the African American ancestral quanta is European, but, that quanta is not equally distributed across the population. ~10% of the African American population is more than 50% European in ancestry, while 90% is less than 50% European. And so you have a distribution which reflects this variation. As for the outliers, I will speculate and suggest that these are indications of Native American ancestry among some African Americans.
The story I presented above is probably plausible as an explanation of the visual because we have a wealth of historical data to corroborate the plausibility of that narrative. The fit between the results from the technique of analysis of genetic variation and what scholars have long inferred from textual sources is relatively easy. It is far more difficult to look at a PCA plot, and generate a plausible narrative that you yourself accept with a high degree of confidence with little external support. It is with that caveat in mind that I present Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping: