The HGDP made less racist!

By Razib Khan | August 17, 2011 10:30 pm

Back in the 1990s there was a lot of controversy around the Human Genome Diversity Project. In fact there were whole books devoted to the sociology of the project. Though on some of the details critics of the project may have had a point, their overall aim of stalling scientific inquiry in this area failed in totality. A few years ago a team out of the University of Chicago even produced a web browser so you can explore the data yourself. To my knowledge this hasn’t resulted in massive genocidal action against indigenous peoples; the human race doesn’t seem to need any scientific backing for that, alas.

But, if I was a Lefty the-man-is-racist type I think I might assert that the chips which were used to generate the 600,000 markers for the HGDP public data set are racist! I’m not one of those types, so what I really am concerned about is ascertainment bias. From what I have heard many of the SNP chips floating around today are looking for variants found in Europeans most often. That’s because so many study populations in medical genetics are of European descent. This is not a total deal breaker, a lot of European variation is useful in understanding world wide patterns of variation. But ultimately it’s not optimal.

Today we take a major step in changing this. Nick Patterson sent me a nice heads up on a project out of David Reich’s lab. Using the full genomes of disparate human populations, as well as other primates, and archaic humans, the group has collaborated with Affymetrix to produce a panel which is much more finely tuned toward the concerns of those interested in the demographic and adaptive history of human populations.

You can find the files here, at In particular see the technical document. When I get some time I’ll be playing with this, rest assured.

Finally, Nick adds an important caution:

We hope that this array, and the HGDP data we have produced will be a major resource for population genetic studies. The data are undoubtedly complicated, (13 different ascertainment schemes (!)) and users should read the technical documentation, and especially the short readme file. In particular note that the ancient DNA alleles are not high quality (especially the Neandertal) and there are numerous potential traps in analysis

  • Paul Ó Duḃṫaiġ

    We often see a similar European focus when it comes to Y-chromosome Haplogroups. As you can imagine most of the people testing on FamilyTreeDNA are from the US, as Western Europe is heavily R1b (especially Ireland and Britain) it tends to result in alot of new R1b SNP’s been discovered but not a huge amount of coverage on other Y Haplogroups.

    Hopefully over time with growing database and more testing from across the globe it will lead to better picture on Y Chromosome diversity. If you look at the below image you can see alot of new SNP’s discovered in European R1b this year alone (generally those not in FTDNA deep clade)



Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar