Eurasia, ADMIXTURE supervised & unsupervised

By Razib Khan | March 16, 2011 12:45 am

After yesterday’s post I thought it might be useful to see how running ADMIXTURE in different modes would impact the outcomes. Probably the major reason I wish more people would use this software is that they’d see that this program is just a program, and stop assuming its outputs to be divine writ. Over the years I’ve noticed a tendency of individuals anchoring to one specific plot in one specific paper as if it supported their argument definitively. Running ADMIXTURE or PCA plots via EIGENSOFT makes you very aware of how useless this sort of stance is.

Today I’ve limited the population set to be “South Asia-centric.” Specifically, there are only a few Middle Eastern, European, and East Asian populations, along with one African population. The goal is to figure out how different South Asian groups relate to these non-South Asian groups. First, I ran ADMIXTURE K = 2 to K = 9. Then, I ran ADMIXTURE in “supervised” mode for K = 9. Basically, I set nine populations as as “pure” references. They were:

– Tamil Dalit
– French Basque
– Lithuanian
– Adygei
– Palestinian
– Buryat (Altaic region)
– Dai (South Asian)
– Papuan
– Luhya (Kenya)


The command was:

./admixture -j2 --supervised southasian.bed 9

–j2 is because I have two cores. It can be omitted. To run in supervised mode you need a .pop file (same name as pedigree file, but .pop). After reading the docs and consulting someone, it turns out to be really easy to do. Basically the .pop file is a text file with N number of lines, where N = number of individuals in your .fam file. For those individuals you have as part of your reference population have them labeled by their population ID, and those who are not, but are to be calculated, leave those lines blank. For example, an N = 9, with 3 Yoruba, 3 African Americans, and 3 Tuscans, might be like so for the .pop file:

Yoruba
Yoruba
Yoruba

Tuscan
Tuscan
Tuscan

In the slide show below I have some PCA plots. I generated them with EIGENSOFT. It’s rather easy to use this program. Just download it. It has some old libraries, so you’ll need to go find those and install them in the appropriate directory. The error will tell you what to get and where to put it once you try and run the program. Once you have EIGENSOFT, go into the POPGEN directory. That’s where you’ll run the commands, again, from the terminal. There are bunch of different commands and such you have to run, and the program takes different formats. You need to convert your .bed files to .ped. So go into the Plink directory, and do this:

./plink --bfile yourbedfile --recode --tab --out yourpedfile

This will generate a .ped and .map file. You also need an .ind file. I wrote a script to generate that. It is here. Just rename it from .txt to .pl, and run it like so in the directory where you have your .ped file:

perl pedToInd.pl "yourpedfile"

Now you have a .ind file. Take the three files, and move them to /EIGENSOFT/POPGEN if they aren’t already there. You can now do your thing by reading the documentation. But to make it simple, I wrote another script by modifying some of the scripts which come with EIGENSOFT. The script has two parameters, the file name, and the number of dimensions you want to generate. You can find it here. Remember to change it from .txt to .pl (or .perl). You run it just like this:

perl eigenscript.pl "yourpedfile" 4

This is in the case of 4 dimensions, which is what I usually look for. In general I find PCA plots come out quicker than ADMIXTURE, for what it’s worth. After the script is done it will output a bunch of files. You will probably find yourpedfile.evec the easiest one to understand. It will have the IDs in the first column, and the eigenvectors (dimensions) in subsequent columns, in order of their eigenvalue (magnitude). It’s easy to import this into a spreadsheet and manipulate it. I usually pull it into R and plot it in an ad hoc fashion. My experience with Open Office Calc is that too many data points make it really sluggish, but it’s doable.

A few notes on the results below. The Luhya Kenyan sample is an African outgroup. I didn’t post PC 1 vs. 2, because as usual it simply separates Africans from non-Africans. PC 2 is the west-east divide in Asia, while 3 seems to be a south vs. north split in Eurasia. PC 4 divides East Asians into northern and southern branches. Note especially the differences between K = 9, K = 9 supervised. I know that unsupervised runs are model-free, but after you’ve run ADMIXTURE a bit I do suspect that implicitly you begin to realize that the input you throw into the machine constrains the range of product.

In terms of concrete results. In the unsupervised run the Kalash generally form their own cluster (albeit, that cluster generally has a low genetic distance to Europeans in comparison to other South Asians). But in the supervised run they’re mostly like the Lithuanians. In contrast, the other Pakistan populations tend to resemble the Adygei, from the Caucasus. Another of my reference populations was Pakistani, and notice that while the Adygei spans the Iranians and Pakistanis, the Palestinian component in South Asians is very low, while it is substantial in Iran.

k2
k3
k4
k5
k6
k7
k8
k9
k9s
pca1
pca2
pca3
pca4

CATEGORIZED UNDER: Genetics, Genomics, Uncategorized
  • http://www.zackvision.com/weblog/ Zack

    Eigensoft works with bed/bim/pedind files. You don’t need to convert to ped/map.

  • Diogenes

    Interesting about the Agyghe. Their pics look surprisingly familiar, with almost “East European” facial features. I also noticed something like this in Ossetians…
    It seems reasonable to think about two waves of Neolithic coming from the Near East in all directions.
    A first one from the North Fertile Crescent that reached far, since it had only hunter-gatherers ahead.
    A second more advanced one from further South, which had a more limited impact due to “higher drag” as it travelled among already agricultural peoples.
    This may explain some small but visible differences between Lezgins, Georgians on one side and Ossetians, Adighe on the other? Also among Europeans to a smaller degree.
    Seem to recall small “East African” segments in ME and residual ones in South Europeans…
    Maybe the Home of the Soul of Ptah spanned quite wider than thought before?
    http://dienekes.blogspot.com/2009/06/hapmix-for-detection-of-chromosomal.html

  • Diogenes

    sorry repeated comment then edited. feel free to delete

  • onur

    This may explain some small but visible differences between Lezgins, Georgians on one side and Ossetians, Adighe on the other?

    What are those visible differences you mention? Especially what are the visible differences of Adyghei and Ossetians from Lezgins?

  • http://blogs.discovermagazine.com/gnxp Razib Khan

    Eigensoft works with bed/bim/pedind file

    ah. perhaps i didn’t get it to work the first time….

  • Diogenes

    I guess that’s subjective? I was being ironic…
    There are some genetic differences though.

  • onur

    Genetically speaking (presumably also physically speaking), like Adyghei and Ossetians and unlike Georgians, Lezgins are northern Caucasians rather than southern Caucasians (not to be confused with linguistics).

  • Diogenes

    Looking closer, this is a very interesting result, one has to wonder why it hasn’t been done before.
    Looks like some Eastern Near Eastern population split into two similar but distinct pops (like related neighbours): one expanding to the North, another to the South-East. The South-East-bound would have introduced the Fertile Crescent Neolithic techs into Iran and India, explaining their high prevalence there. Presumably carriers of R1a haplogroups.
    The North-bound expanded into the rivers of the Ukraine all the way to the Baltic region.
    Some of those (the Kurgan people- proto-Indo-European speaking?) in the steppe rivers presumably adapted to pastoralism and exploded from there. The Kalash appear to be their descendants?
    The small component in Brahui and Balochi is interesting since the former are not Indo-European speaking and the latter are likely late adopters?
    If I wanted to sound silly I would say this is almost supportive of an Elamo-Dravidian linguistic family hypothesis. :)

  • pconroy

    Diogenes,

    That question of the Kalashwas discussed and answered a few weeks ago on Dienekes blog…

  • pconroy

    Oh and BTW, that linked post above refers to the “Secrets of the Silk Road” exhibit, which is currently in Philly, and definitely worth a visit – I was there last weekend.

    Overall, it was a good exhibit, but not great, as they deliberately obfuscated on the timeline, without clearly stating that European-like peoples were in the Tarim basin for thousands of year before the Chinese. Also they focused on the Southern Tarim basin mostly – which are generally seen as possibly Iranic-type, with nothing from the Northern Tarim, where the Tocharian-proper are. About 1/3 of the artifacts were Tang dynasty Chinese, and they made a big deal of the fact that Chinese like people lived in Astana, near Turpan.

    There was an excellent voice commentary available however from Victor Mair and Elizabeth Barber Wayland.

    They mentioned that Tocharian was one of 27 languages attested in the Tarim Basin, without mentioning that there were 2 types, separated by possibly thousands of years, and as such the language probably evolved for a considerable time in the area – along with the people speaking it. They made no mention of Buddhism at all. They made little mention of the cultural artifacts introduced by these people into China. The only thing they mentioned being traded West to East was Roman glassware. They erroneously mentioned that “soap” was routinely used by Romans, and seemed to attribute its invention to them.

  • Diogenes

    pconroy, I’m aware all of the ideas I’m defending have been formulated by others before. But knowledge belongs to all and to nobody. We should collate and discuss and present it in new perspectives. If I repeat some of these things it’s because I believe there’s marginal benefit for many people reading my comments in thinking about them again. Especially people who find me annoying. No other purpose.

    Plus I don’t want something as delicious as that to obscurely linger in some recess of Dienekes’ blog. Dienekes seems to figure a lot of things out, but he often doesn’t develop them, I guess he is too busy honourably defending Thermopylae.

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com

ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »