The current bias in genealogical databases

By Razib Khan | May 29, 2012 7:13 pm

As a follow up to my post below on the thick coverage of European information in genealogical and genomic databases, here are the “Ancestry Finder” matches from 23andMe for my daughter using the default settings:

If I increase sensitivity India does come up, at 0.1%, second to last in a very long list of European nations. I’m pointing this peculiarity out because my daughter is 50 percent South Asian, but this element of her ancestry doesn’t find many matches because there aren’t many people out there in the database to match. In contrast, because she is 1/8th Norwegian (her great-great grandparents were immigrants from the Olso area; thanks!) this “block” jumps out, and aligns up with many people in their database.

This isn’t just an exceptional case. Here’s the result for a friend who is 50 percent East Asian (Chinese) and 50 percent American white:

The old warning rears its ugly head: the tool is just a tool, and must be used with and understanding of what it can and can’t do. If you decrease sensitivity many South Asians actually match people from European nations before they do people from India. Why? Part of it is probably that many South Asian groups are highly endogamous, which dampens intra-South Asian segment sharing. And the other part is that the sample size of Europeans is so large that random matches with this population are just as, or more, likely than genuine matches with the smaller number of South Asians.

CATEGORIZED UNDER: Personal Genomics

Comments (16)

  1. If I increase sensitivity India does come up, at 0.1%,

    You’re noting India but your graphic points to Ireland.

  2. please read the sentence before the graphic. i’m going to add a return to make it clearer.

  3. pconroy


    While I think you are probably correct here, on the other hand there is the fact that hundreds of thousands of Irish – in the form of the British Army – lived in India (including today’s Pakistan and Bangladesh), only recently.

    I’ve brought up the Anglo-Indian thing a few times on Harappa and no one seemed interested in discussing it at all.

    Yet, I dated a Jatt Sikh from the Punjab years ago and she told me that in her small town, there were 2 Anglo-Indian large landowners, one I remember was called “Stack”, which is a name from Co Kerry, Ireland. So there certainly must be Indians with Irish ancestry, who now shun the association. This girl – who looked like Norah Jones – even told me that her father had reddish hair himself, and was much lighter skinned than her paternal grandfather or grandmother.

    When I search for my lastname in the list of the Bengal Army (part of the British Army in India), there are no fewer than 40 Conroy enlisted men and 2 Conroy officers. I find it hard to believe they left no descendants. My father has a distant relative who is Indian – lastname Thakrar.

  4. SB

    pconroy, there is a population of “Anglo-Indians” in India. But they have maintained their own distinct identity, and have not culturally assimilated into any other population. Indian society being strongly supportive of endogamy historically, did not socially accept union with those outside the community. This cultural rigidity, may be hard for non-south Asians to understand , but nevertheless was true until very recently (after the 80s).
    If your dad has a distant relative who has an Indian last name, my guess would be that it is one who identifies as Anglo-Indian, or is someone married to an Indian.

  5. pconroy


    So that’s a start, you are not denying the presence of recently Irish ancestry in South Asia/Greater India.

    I know nothing about the situation of the British Army in India, but in almost all cases where there are large army bases of soldiers, who are comparatively much wealthier than the locals, a population of prostitutes will form around the bases, and mixed race children will follow.

    Look at Korea or Vietnam or any other war. People have similar motivations everywhere, so I don’t think it’s credible that in India this did not happen – do you?

  6. pconroy


    You also have known Irish people living in Pakistan, such as Jennifer (Wren) Musa – known as the “Queen of Baluchistan”, who hails from Co Kerry, Ireland:

    Her son is the current Pakistani ambassador to Russia.

  7. pconroy


    BTW, my fathers relative identifies on 23andMe as “South Asian”, not “Multiple Regions” or anything like that.

    The person in question is Raam Thakrar – I believe this is him:

    You be the judge!

  8. skid

    Norway, Sweden, the UK, Ireland, and Finland are all places 23andMe will ship to. India is not (neither is China, Japan, Korea…).

    That’s gotta be a huge source of the bias.

  9. Melissa

    The author does not seem to understand why there is a wealth of online resources for some countries, but not others. In the case of Norway, 2 universities in Norway created a website containing all public censuses, most available emmigration/passenger lists, and images of most available church and probate records. These are all free online at the Digitalarkivet. There are also quite a few online databases for UK and Swedish records, some paid, some free, many created through volunteer projects. Any countries with a wealth of data like that online are going to encourage and fuel research and genetic testing in those same countries. They know there is interest and demand because of the wealth of data already there, which in turn fuels further support. If you want to see more information than what is available for some countries, then you need to look at ways to encourage or get involved in projects to add online record availability for those countries. 10 years ago, much of what is out there today, wasn’t. We all have a say in what happens over the next 10 years. That’s easy to forget.

  10. Onur

    I have seen Paul Conroy’s genetic results several times in genome analysis blogs (e.g., Dodecad, Eurogenes, Harappa). He is genetically a typical British/Irish person according to them. So if he has some colonial-era South Asian ancestry, it must be so minuscule as to be genetically undetectable.

  11. pconroy

    @Razib, SB, Onur,

    Since posting above, I have gotten my parents results back from DNA Tribes – mine should follow in a day or two. This is the FREE offer for Eurogenes members:

    More info here:

    Offer expires tonight, 5/31/12!!!

    Father (IE6) – 100% Native Irish AFAIK:
    100% European

    51.50% Northwestern European
    22.37% Iberian
    21.87% Baltic-Urals
    3.76% Indus Valley
    0.51% Arctic

    1. Cornwall West Britain
    2. Lithuania
    3. West Scotland and Ireland
    4. England
    5. Orkney Islands Scotland
    6. European-American Utah
    7. Slovenia
    8. Hungary
    9. Galicia Spain
    10. Basque Spain
    11. France
    12. Ukraine
    13. Germany and Netherlands
    14. Spain
    15. Scandinavia
    16. Bergamo Italy
    17. Romania
    18. Belarus
    19. Basque France
    20. Finland

    Mother (IE7) – Mostly Native Irish, but also Huguenot, Norman and Lancashire England (which may include Alano-Sarmatians) ancestry:
    100% European

    60.16% Northwest European
    20.47% Iberian
    17.67% Baltic-Urals
    0.99% Horn of Africa
    0.64% Oceanian
    0.05% Central African
    0.03% Southern African

    1. France
    2. West Scotland and Ireland
    3. Basque Spain
    4. Hungary
    5. Cornwall West Britain
    6. England
    7. Belarus
    8. Mordvin
    9. Orkney Islands Scotland
    10. European-American Utah
    11. Ukraine
    12. Lithuania
    13. Slovenia
    14. Spain
    15. Poland West Slavic Mixed
    16. Galicia Spain
    17. Bergamo Italy
    18. Basque France
    19. Finland
    20. Germany and Netherlands

  12. The author does not seem to understand why there is a wealth of online resources for some countries, but not others

    you lack reading comprehension. norway or britain have lots of records in the first place because the churches have detailed records which go back centuries.

  13. Onur


    The “Indus Valley” component of DNA Tribes is a very diffuse and mixed component, and not just for DNA Tribes customers but also for people in the DNA Tribes database. For instance, Southwest Scots in the DNA Tribes database have 5.3% “Indus Valley” component on average and Orcadians in the DNA Tribes database have 3.7% “Indus Valley” component on average.

    Do all these people have colonial-era genetic connections with South Asia? Obviously not. So there is nothing unusual in your father’s results for a 100% Irish (Irish people are not included in the DNA Tribes database, so I use Orcadians and Scots as proxies). Another important point: As you know, results of people in the database are not directly comparable with results of customers like your parents due to what David calls “calculator effect” (this explains your parents’ low amount of “Northwest European” component compared to Northwest Europeans in the database); but, if even Northwest Europeans in the database have so much “Indus Valley”, then you should not be surprised at the amount of “Indus Valley” your father has (3.76%).

    As for your mother’s “Horn of Africa” component, it is in noise levels for DNA Tribes (0.99%), so it does not mean anything. Thus your mother’s results are not unusual for the Irish either.

    Lastly, similarity lists of DNA Tribes are not accurate most of the time. So you should not make much of them.

  14. Onur


    I have seen Paul Conroy’s genetic results several times in genome analysis blogs (e.g., Dodecad, Eurogenes, Harappa). He is genetically a typical British/Irish person according to them. So if he has some colonial-era South Asian ancestry, it must be so minuscule as to be genetically undetectable.

    This includes Conroy’s parents’ genetic results in those blogs too. They, too, are genetically typical British/Irish according to their genetic results in those blogs.

  15. pconroy

    Just for the record, here are my own results:

    100% European

    60.01% Northwestern Europe
    20.33% Iberian
    15.00% Baltic-Urals
    4.66% Indus Valley

    1. Scandinavia
    2. Orkney Islands Scotland
    3. Western Scotland and Ireland
    4. England
    5. France
    6. Cornwall West Britain
    7. European-American Utah
    8. Belarus
    9. Basque Spain
    10. Galicia Spain
    11. Ukraine
    12. Spain
    13. Slovenia
    14. Mordvin
    15. Lithuania
    16. Basque France
    17. Bergamo Italy
    18. Romania
    19. Poland West Slavic Mixed
    20. Hungary

  16. Onur


    Typical British/Irish results again.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar