I’ve been playing around with ADMIXTURE and EIGENSOFT with the the HapMap data set along with a few friends & family merged into it. It is interesting to see how the intuitive inferences you make from ADMIXTURE bar plots differ somewhat from PCA scatter plots. In any case, I’ve been posting some of the preliminary results on Facebook (in part because one of my friends is on Facebook and is curious about his own genetic background), and a friend who is a grad student pointed me to Structurama, which infers the best number of categories* (one can do cross-vaidation in ADMIXTURE). I’ve avoided STRUCTURE because it’s computationally more intensive. Any other recommendations? Specifically, something not mentioned by Dienekes or David.
Below the fold is a taste of the games my computer has been up to overnight. K = 5 ancestral populations in ADMIXTURE. HapMap Utah whites, Tuscans, Mexicans, Beijing Chinese, in that order. The last 6 bars are: my father, my mother, and then four individuals of European ancestry, Euro 1, Euro 2, Euro 3, and Euro 4. After merging files and pruning founders and thinning the markers to reduce linkage disequilibrium I was left with 120,000 SNPs. Just a note, I’ve played around with different numbers of SNPs at various K’s, and some very small differences are surprising consistent. My mother is always just a bit more “Asian” than my father.
* Some of the download links on the Structurama site do not seem to work right now, but you can download the source code. Didn’t try the Mac links.