A best case scenario for unsupervised ADMIXTURE?

By Razib Khan | April 7, 2011 2:59 pm

One of the great things about ADMIXTURE is that the population elements shake out of the data through the logic of the program. The worst thing is that it is then left up to you to make sense of the elements. A useful way to use ADMIXTURE and avoid excessive interpretive fogginess is to figure out individual proportions of contribution from X ancestral groups when you have a pretty good idea that an admixture event did occur between very distinct and distantly related population groups. To some extent the whole New World is a good laboratory for this process. Consider, for example, someone from the Dominican Republic or Puerto Rico. There is a good chance that their ancestry will fractionate into three elements:

– An African one

– An Amerindian one

– A European one

These three elements are sampled from very different locations geographically. The ancestral populations have been separated for tens of thousands of years, with little to no gene flow across them. This means that the allele frequencies of the “source” populations should be relatively different (maximizing Fst). A mapping of inferred allele frequencies between abstract ancestral populations generated by ADMIXTURE to concrete allele frequencies of known source populations is rather straightforward.

So here’s an experiment. I have 40 individuals with non-trivial African admixture. Most of them are African Americans, though some are of Latino heritage, and several of Ethiopian or Somali origin. A minority are also people who have a small quantum of African ancestry, but well above the “noise” threshold. Let’s take four populations from the HapMap: Yoruba, Utah whites, Maasai, and Chinese from Beijing. I merged the data (removing problem individuals), and added the aforementioned 40 individuals. I pruned the data set so that no more than 0.5% of a given SNP is missing across the individuals. I was left with ~120,000 markers.

Then I did two runs of ADMIXTURE: supervised and unsupervised. In the supervised run the HapMap populations were “pure,” while in the unsupervised runs the HapMap populations also had their ancestries inferred. Here are the population breakdowns for the HapMap populations in the unsupervised run:


The Maasai are the only group with much intrapopulation variance:

OK, so how did the admixed set that I have vary across the two runs? There were four ancestral components, which I labeled:

– West African

– European

– Chinese

– East African

Here are the correlations between the two runs for the 40 individuals:

– West African, 0.9995

– European, 0.9997

– Chinese, 0.9957

– East African, 0.9988

Not too shabby. Here are the barplots side by side:

Here are the runs so you can see them:


This seems like a best-case scenario for ADMIXTURE smoking out population structure. For all the reality that ADMIXTURE is just a “dumb program,” when used judiciously it can be very illuminating.

CATEGORIZED UNDER: Genetics, Genomics
ADVERTISEMENT
  • Dragon Horse

    This is interesting, when David runs the unsupervised option his results for me come up very similar to yours. However, when he runs the supervised option I come out 99% West African (which he knows is some type of statistical error, but he thinks it is due to some special affinity I have to one or more of the samples he is using, that other African Americans tend to not have). From your more detailed results with higher K partitioning, it appears I just have more Fulani and Pygmy affinity than average.

  • Eze

    At K=3 supervised, the Horn African samples seem to have a lot more Yoruba (African proxy) affinity than 23andMe suggests in the ancestry painting (usually less than 30%). Perhaps it could be that 23andMe uses very small window sizes to classify segments into African-European-Asian.

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com

ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+