Visualization of genetic distances, part n

By Razib Khan | April 21, 2011 4:56 pm

Zack Ajmal has been taking his Reference 3 data set for a stroll over at the Harappa Ancestry Project. Or, more accurately, he’s been driving his computer to crunch up ADMIXTURE results ascending up a later of K’s. Because it is the Harappa Ancestry Project Zack’s populations are overloaded a touch on South Asians. He managed to get a hold of the data set from Reconstructing Indian History. If you will recall this paper showed that the South Asian component which falls out of ancestry structure inference algorithms may actually be a stabilized hybrid of two ancient populations, “Ancestral North Indian” (ANI) and “Ancestral South Indian” (ASI). ANI are a population which can be compared pretty easily to other West Eurasians. There are no “pure” groups of ASI, but the indigenous peoples of the Andaman Islands are the closest, having diverged from the mainland ASI populations tens of thousands of years ago.

At K = 11, that is, 11 inferred ancestral populations, Zack seems to have now stumbled onto the patterns which one would expect from this hybrid model of South Asians. Let me quote him:

Now let’s take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.

The r-squared between % ASI and % Onge, an Andaman group, is 0.994. That means 99.4% of the variation in the former can be explained by variation of the latter. The % ASI is consistently higher than Onge. Why? The last common ancestors of Andaman Islanders and the ASI diverged on the order of tens of thousands of years ago. Dienekes observed ADMIXTURE needs good reference populations, and the Onge have been so long diverged from the last common ancestor with the mainland ASI populations that it’s not a perfect proxy for this ancient group. But it seems that the underestimate is systematically biased in the same direction, so that explains the good fit between the two trends.

Zack naturally generated a pairwise matrix of Fsts between these inferred ancestral populations. Remember, the value within Fst shows the proportion of the genetic variance in the two populations which can be partitioned across them, but not within them. So it’s a rough measure of genetic distance.

Here’s the matrix. I’ve renamed some populations:



S Asian Andaman E Asian SW Asian European Siberian W African Papuan Amerindian Khoisan/Pygmy E African
S Asian 0 0.165 0.121 0.09 0.071 0.134 0.184 0.21 0.175 0.261 0.15
Andaman 0.165 0 0.122 0.161 0.152 0.144 0.224 0.209 0.207 0.304 0.304
E Asian 0.121 0.122 0 0.152 0.137 0.067 0.216 0.205 0.139 0.294 0.187
SW Asian 0.09 0.161 0.152 0 0.048 0.163 0.179 0.235 0.208 0.257 0.143
European 0.071 0.152 0.137 0.048 0 0.143 0.186 0.223 0.178 0.261 0.148
Siberian 0.134 0.144 0.067 0.163 0.143 0 0.232 0.228 0.141 0.311 0.203
W African 0.184 0.224 0.216 0.179 0.186 0.232 0 0.286 0.281 0.123 0.059
Papuan 0.21 0.209 0.205 0.235 0.223 0.228 0.286 0 0.29 0.367 0.26
Amerindian 0.175 0.207 0.139 0.208 0.178 0.141 0.281 0.29 0 0.364 0.252
Khoisan/Pygmy 0.261 0.304 0.294 0.257 0.261 0.311 0.123 0.367 0.364 0 0.133
E African 0.15 0.195 0.187 0.143 0.148 0.203 0.059 0.26 0.252 0.133 0

The South Asian population above is very different from the components you’ve seen before. It seems equivalent to ANI more than anything else. This is a good reminder that the labels we’re giving to these ancestral groups are mnemonics, they’re not to be taken as literal and concretely. Personally I find Fst matrices hard to read, so I’ve generated a number of multidimensional scaling plots illustrating the relationships with the matrix. Clarity can be achieved by mixing & matching the populations, so that’s what I did. Also, I only display dimension 1 and dimension 2. Remember that dimension 1 is the one with more weight.

Do not think of these as real concrete populations from which all modern populations emerged. These eleven populations are abstractions which fulfill the dictates of the algorithm. But, I do think that with that caveat in mind, there are suggestive patterns.

First, the “SW Asian” component isn’t that much closer to “W Africans” than the other West Eurasian groups. Yet we know in reality that Southwest Asian populations are closer to Africans. What’s going on? Southwest African populations have African admixture. And, that admixture is recent enough that it shakes out rather easily. This is in contrast to the normal South Asian modal components, which are indicative of a greater time since admixture, which was thorough enough that it is not trivial to tease out the two ancestral groups from each other’s genetic background. Fission and fusion are normal parts of the history of any geographically expansive species. ADMIXTURE will capture the earlier parts of fusion. But after a long enough period of time that fusion becomes its own distinctive element.

There is the conventional east-west division you see in Eurasia on PCA, but you see evidence of the north-south secondary component on these plots too. The Andaman populations are closer to East Eurasians than West Eurasians, but, they also occupy their own position which highlights a north-south axis.

Finally, the S. Asian/ANI population seems somewhat closer to “Europeans” than “SW Asians. That is interesting. But this where you have to very careful and remember that these “pure” ancestral components can themselves fractionate into substituent elements at higher K’s or when you constrain the data set appropriately (Africans and inbred groups tend to hog clusters in ADMIXTURE). If you’ve read all the genome bloggers you will be aware that “European” and “SW Asian” components themselves break apart upon closer inspection. The “SW Asian” component usually divides into a northern and southern branch. The northern branch is often positioned closer to the other “European” groups than it is to the southern branch in terms of genetic distance. Here are a selection of West Eurasian groups sorted by their “S Asian” proportion:


South Asian %
Iranians 30%
Lezgins (Caucasian) 29%
Georgians (Caucasian) 26%
Adygei (Caucasian) 24%
Armenians 22%
Turks 21%
Syrians 19%
Druze 18%
Lebanese 17%
Samaritians 16%
Palestinian 15%
Cypriots 14%
Saudis 14%
Yemenese 14%
Russian 8%
Tuscans 7%
Hungarians 7%
Utah whites 7%
Orcadian 5%
British 5%
French 5%
Italian 5%
Finnish 4%

Also observe that the distance between SW Asians and Europeans is smaller than bertween Europeans and S Asians. Crunching up the K’s, or limited the data set to West Eurasian groups, would probably show more fine-grained relationships.

CATEGORIZED UNDER: Genetics, Genomics
ADVERTISEMENT
NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com

ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+