Introducing the Harappa Ancestry Project

By Razib Khan | January 17, 2011 9:22 am

A few weeks ago I hinted at a South Asian equivalent to Dodecad & Eurogenes BGA. It is now public and in the data collection phase. You can read the whole thing here:

This is the feed:

If your ancestry is from these nations:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • Burma
  • India
  • Iran
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka
  • Tibet

Read on! If not, “for entertainment purposes only”….

I have been griping in public and in private about the “reference” populations used for South Asian genomics for years. Because of the Permit Raj the HGDP had to use Pakistani populations. Additionally, because of the HGDP’s mandate to focus on smaller groups which might harbor genetic uniqueness you have some very obscure tribes, but only one sample set from an Indo-Aryan speaking population. And even there, it was a minority, not the Punjabi speaking majority of Pakistan.

Some of this has changed in recent years. Papers such as Reconstructing Indian History and Genetic diversity in India and the inference of Eurasian population expansion have added more populations to the mix. The current phase of the HapMap has Gujaratis from Houston. But there is always a problem when you take a small population set to be representative of a broader group. There are ~1.3 billion South Asians. Using Gujaratis from Houston, who are likely to be of a narrow range of castes, is still problematic. Because of the long history of endogamy and likelihood of fine-grained caste and geographical structure good population coverage is of the essence for South Asians. Taking the Beijing HapMap sample as representative of Han Chinese is not optimal, but this sort of thing would be far less optimal in South Asia.

So when Dienekes began the Dodecad Ancestry Project I was very curious. I had had ADMIXTURE for a while, but it prompted me to start playing around with it myself. My plan was to wait to see how Dienekes fared. In particular, what didn’t pan out in terms of fruitful use of labor. Mine is finite, like everyone’s. My medium term plan was to start up a South Asian equivalent to Dodecad at some point in the first half of 2011.

Then Zack approached me. I know Zack from the internet since 2003 through the blogs. His primary interest in blogging was about Pakistani culture and liberal politics (he’s Pakistani American and a liberal). But he also has a doctorate in electrical engineering, so he has some technical skillz. It turns out that because of Zack’s own peculiar genetic background (he’s 1/4 Egyptian) he kept asking me questions. Eventually it became clear that he was interested about starting something similar to Dodecad…and I told him my own future plans, and encouraged him to take up the torch immediately. I knew Zack had the technical chops, and also could probably devote more time and energy at the time than I could.

I immediately gave him my 23andMe sample. Since I had Dienekes already run my genome we kind of knew what to expect. And it looks like Zack has the software running well. He included a Nepali sample, and it turns out that in an MDS clustering I fell 71% into the dominant Nepali cluster. This is kind of what I expected.

In any case, the details:

Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.

If you are unsure if you are eligible to participate, please send me an email ( to inquire about it before sending off your raw data.

What to send?
Please send your All DNA raw data text file (zipped is better) downloaded from 23andme to along with ancestral background information about you and all four of your grandparents. Background information would include where they were born, mother tongue, caste/community to which they belonged, etc. Please provide as much ancestry information as possible and try to be specific. Do especially include information about any ancestry from outside South Asia.

Data Privacy
The raw genetic data and ancestry information that you send me will not be shared with anyone.

Your data will be used only for ancestry analysis. No analysis of physical or health/medical traits will be performed.

The individual ancestry analysis published on this blog will be done using an ID of the form HRPnnnn known to only you and me.

What do you get?
All results of ancestry analysis (individual and group) will be posted on this blog under the Harappa Ancestry Project category. This will include admixture analysis as well as clustering into population groups etc.

I suggest you read about Dienekes’ analysis on South Asians for an idea about what to expect.

You can access all blog posts related to this project from the Harappa Ancestry Project link on the navigation menu on every page of my website. You can also subscribe to the project feed.

If you’re South Asian, Iranian, Burmese, or Tibetan, and have a 23andMe genotyping done, you know what to do. If you know someone from these groups who have had that done, please forward this one.

CATEGORIZED UNDER: Genetics, Genomics

Comments (15)

  1. Ian

    Not that I have done it yet (can’t justify spending the money at the moment, but perhaps in the next year), but would a project like that be interested in half-Indians? (Other half is German) And by Indian, I mean south Asian who emigrated ~150 years ago.

  2. zack is 1/4 egyptian, so not sure if he’d throw stones @ mischlinges 😉 that being said, u caribs are interesting to see how much admixture u’ve got too.

  3. Ian

    Assuming no one lied to their husbands, I can trace all but one of my father’s grandparents to India or to Indian-born parents. That said, I know many people who consider themselves Indian, but have a Chinese or white great-grandparent. While things seems to have changed a little in the last few decades, African ancestry seemed to be the big barrier to group membership. “Douglas” were far less likely to be admitted into the group.

    Still, it’s the Amerindian element that I’m really curious about. Outside of the town of Arima (which retains a small ‘Carib’ community) the Amerindian element pretty much disappeared during the 19th century. But really, the Spanish (and later British) authorities were only aware of the people who were brought into the encomienda system, or were settled in missions (Arima being the last of these to operate). Large areas of the island – including those parts that are closest to Venezuela – were pretty much ‘wild’ until the start of the 20th century. It was cacao cultivation that brought people into the south and the interior, and much of that was pioneered by ‘payols’ – peasants from Venezuela who would clear the land, plant the cacao trees, and then sell it to estates who used both Indian and ‘Spanish’ labour. In those sorts of settings, ‘wild Indians’ (Amerindians, including Waraos from the Orinoco delta who regularly came to Trinidad to trade until the 1930s) and ‘Spanish’ mestizos could easily have been incorporated into a population that saw itself as (Asian) Indian.

    My brother was always convinced that there was Amerindian blood in the more rural Indo-Trinidadians, based on a combination of appearance, culture and some anecdotes.

    While the Indo-Trinidadian (and Indo-Guyanese) identity has been largely exclusionist, in the other islands it has been reversed. Jamaican, St. Lucian, Vincentian and Grenadian Indians are often (usually?) mixed. In the smaller islands they apparently formed the lowest social class, while in Jamaica many of them intermingled with the wealthier (near-)white/Chinese business class. Though others mixed into the African masses, introducing ganja, curry goat (and some say dreadlocks) into the heart of the Jamaican national identity. I don’t know much about the Indians in the French islands, but I believe that the Indian communities in Martinique and Guadeloupe are mostly mixed. Of the Indians in Suriname I know very little. Indians in Barbados are mostly recent immigrants, often secondary migrants from Guyana or Trinidad or East African Gujaratis.

  4. I am still trying to convince the brown dude to do 23andme. It is proving to be an uphill battle. (He’s cheap.)

    Me: Wouldn’t it be a good idea to know if we’re carriers for any of the same recessive diseases?
    Him: Why? You don’t want children anyway.

  5. Ian

    Of course, from a purely Indian perspective, Indo-Trinidadians are mixed anyway. Caste and regional affiliations have largely been forgotten – my (allegedly) Kashmiri Brahmin great-great grandfather (who was put in the kitchen because he was thought to be too light-skinned to survive in the fields) married a very dark-skinned woman. And their daughter married a Bihari Pathan. In my father’s day, ‘Madrasi’ meant something in Trinidad. Today it, like Chamar, is just an insult with, yes, the implication that there’s something wrong with being dark skinned, or darker-skinned than the norm.

  6. who was put in the kitchen because he was thought to be too light-skinned to survive in the fields

    i know in india brahmins are often chefs because people will eat whatever their cook.

    i would not not be surprised about amerindian. this always seem to pop up surprisingly among people from the islands.

  7. Ian

    Never thought about it like that. Interestingly, Trinidadians tend to trust Muslims with cooking, especially with meat, because they are ‘more careful’ with it (presumably a reflection on the way they slaughter animals). Chinese are also trusted with food, and used to own almost every bakery.

  8. Hurray for the grass roots effort!

  9. Thanks for the plug, Razib.

    The feed URL for the project (in case you don’t want to read about my liberal politics 😉 ) is:

    Ian, half-South Asians are definitely welcome.

    PS. Razib, this comment box is so tiny in iPhone safari.


    @Ian I repasted comment #3 to the Brown Pundits since I’ve always been curious about the exotic Indo-Diaspora; Africa, Oceania and Latam. Particularly the sort of “personal ancedotes”, the unspoken rules of the society which outsiders are never really privy to.

    All the best to Harrapa ancestry project.

  11. pconroy

    Based on the reference to “Harappan” in the project name, I was wondering why you don’t include Iranians, Turkmens and some other Central Asian populations – if part of your goals are to try and discover the genetic origins of the Harappan people?

  12. paul, iran is in the list. probably a better proxy than modern central asians, who have a lot of recent turkic ancestry.

  13. pconroy

    Oops, I should have seen that!

    Also, I would suggest to Zack that he divide some areas up, like Southern Iran, from East and West Iran and likewise divide Pakistan into its main ethnic groups.

  14. paul, he’s asking the provenance and background of all 4 grandparents. so that’s taken care of.

  15. J.

    I wonder if you would also consider including countries to the east of Burma such as Thailand or Cambodia. I’ve seen several admixture analyses showing those populations as sharing significant ancestry with South Asians.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar