ADMIXTURE vs. MDS, visualization is just visualization

By Razib Khan | January 18, 2011 1:06 pm

Dienekes did another run of his data with K = 64. He posted a huge plot with the two largest dimensions of variation. He also posted an accompanying spreadsheet with the coordinates of where the Dodecad samples were. So I found my own position pretty quickly. Before going to that, I thought I’d repost a comparison between myself, the HapMap Gujaratis, the North Kannadi sample, and the HGDP Uygurs. This is at K = 10 in ADMIXTURE from Dodecad.

OK, with that in mind, here’s the full MDS with the two largest components of genetic variation. I’ve added large labels. Also, click the image for a larger file so you can read the small labels.


One thing that jumps out at me is the tight clustering of very populous groups such as Europeans. The East Asians and Yoruba samples aren’t as representative of their macro-region, so that makes some sense. But the Dodecad Ancestry Project has a lot of West Eurasian groups, so the affinity there is still striking. I am basically a touch off the “North Kannadi” cluster, a little toward the Uygurs. In the clustering which is the main focus of Dienekes’ post I also fall into a North Kannadi cluster. Interestingly, in Zack’s preliminary run with the South Asian data set I’m 71% with Nepalis, and 29% with part of the Singapore Indians (most of whom I assume are Tamil). Note the close position of the Uygurs to the North Kannadi, despite the fact that geographically the Uygur are much closer to Pakistani populations. It just goes to show you what happens when you throw a whole lot of genetic variation into the pot, and then focus on the two largest components of variance. The axis between Europe and East Asia is spanned by South Asians. But some South Asian groups, such as the North Kannadi sample, have an ancestry component somewhat more like East Asians than West Eurasians, so they get placed closer to East Asians on the two dimensional plots. This is what Dienekes terms the “South Eurasian” element, which has been submerged almost everywhere by a West and East Eurasian element.

Here’s a close up of the South Asian region of the plot. You can see how close the Uygurs are to the North Kannadi sample, and how close I am to the North Kannadi. But two of the North Kannda samples are out of the cluster in the MDS. I assume they’re the individuals with a lot of the purple ancestral component, what Dienekes’ termed “West Asian.” The individual between the Gujarati and North Kannadi clusters is probably the one with the slight orange “East Asian” component. And that gives you insight what’s going on with me. If you removed the orange component from my ancestry I’d probably be in the Gujarati cluster. I’m “pulled” to the North Kannadi cluster as a direct proportion of my East Asian ancestral component. The MDS plot isn’t “wrong,” it is visualizing the data correctly with the constraints imposed by our own abilities to process information intuitively. But without the ADMIXTURE plot you’d probably make the wrong inference about my population assignment. With that information the likely hypothesis would be that I’m from a liminal population which has interactions with East Asian groups (e.g., Nepali, Assamese, or Bengali).

Note: Removing the Africans from the sample, or visualizing different combinations of dimensions, would also certainly clear up the confusion in this case. But again, these sorts of steps require a human understanding of what the techniques are presenting to you.

CATEGORIZED UNDER: Genetics, Genomics
  • pconroy

    Razib,

    Very interesting, as this would fall in line with my speculation that the mystery component of the North Kannadi could be Austronesian – and depending on how one defines Austronesians and their expansion – slow boat/fast boat etc – they could be seen as South East Asian + Oceania or at least island South East Asian

  • http://blogs.discovermagazine.com/gnxp Razib Khan

    u wouldn’t happen to know what the detailed origin of the north kannadi sample is, would you?

  • pconroy

    No, but whatever it is, it’s only a minor portion of other Indian samples – so IMO it must be due to:
    1. Isolation by distance – like Sardinia – which is not likely
    2. Isolation by religion/caste – like Assyrian
    3. Ancient substrate
    4. Exogenous to India – like Austronesian admixture

    I guess there is some likelihood of #1 or #2 – if the sample was all sourced from an isolated mountain village or religious minority, but usually in understudied areas, samples are from cities, and rarely rural areas.

  • http://washparkprophet.blogspot.com ohwilleke

    I am aware that a few people have done three dimensional MDS plots and have seen one or two. I’d be curious to hear what people think about their usefulness.

  • Pingback: The genomic heritage of French Canadians | Gene Expression | Discover Magazine

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com

ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »