Do you want your genotype in a public data set?

By Razib Khan | January 16, 2013 1:54 am

In the near future one of my projects is revising and expanding the “PHYLO” pedigree file which I put up a week ago. Basically I want there to be a public data set which has a modest number of SNPs useful for phylogenetic analysis (100-200,000) with a wide population coverage. Additionally, I am going to do a few things like rename the family ids to populations, and also release it with  scripts to help in running Admixture (for example, shell scripts which will automate replication and later analysis of replicates). Finally, I’m planning on running ~50 replicates of K = 2 to K = 20 with 10-fold cross-validation (yes, this is will take a while) to get a good sense of the “best” K’s. The reality is that most people probably are only interested in the “most informative” K, +/- 1, so there’s no need for everyone to run K = 2 to K = 20. The time saved should be used on running replicates, and then CLUMPP to merge the results.

I would say that this is for ‘amateurs’ only, but I don’t think it’s betraying confidence to observe that several academic researchers at prominent institutions have ended up inquiring of me of how to get good public data sets. This sort of information still hasn’t percolated to the general public, including scientists who don’t work on population genomics. After a few trial runs with public data sets people with academic access could move to things like the POPRES data set.

But the ultimate point of this post is to ask: do you want to be in this data set? If so, I need the file (23andMe format is fine, otherwise, pedigree files only), your name, and some minimal ethnic information. I’m not going to add everyone. I just want to diversify the public data set a little. But I am going to put names in the sample sheet, so you won’t have anonymity. As you know I don’t particular care about this personally, but your mileage may vary. Researchers might need to contact or check that people are who they are.

Email: contactgnxp -at- gmail -dot- com

CATEGORIZED UNDER: Personal Genomics
MORE ABOUT: Personal genomics

Comments are closed.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar