Finding fake roots

By Razib Khan | May 7, 2012 1:56 am

I haven’t watched much of Henry Louis Gates Jr.’s Finding Your Roots series. It seems like Gates has kind of created a mini-empire in genealogical series on PBS. More power to him, but it hit diminishing returns for me a long time ago. But I see clips online here and there. And something which I saw really kind of disturbed me. From what I can gather Gates regales his subjects with their DNA results, and tells them their ancestral quanta fractions. Nothing too amazing. But it seemed clear to me that when Gates referred to “European”, “Asian” and “African” ancestors, he was communicating to the audience that these quanta really represented those exact populations!

I assume that the geneticists Gates works with explained to him the falsity of this typology. I also understand that the television format results in natural license. But Henry Louis Gates Jr. has produced lots of these shows now. He has the leisure to unpack the concepts for the lay audience. As it is, it seems he is repeating misconceptions of model-based clustering algorithms. Misconceptions mind you which persist even within the biological community. But that doesn’t make it any better.

For those who aren’t getting the essence of what I’m saying, above are my 23andMe results. I’m 60 percent European, 40 percent Asian. But those fractions were produced assuming that I could only be a combination of Northern European, East Asian, and West African! Computational algorithms do not return results of the form “I’m sorry, but the input is going to generate useless output.” So they came back with these results, which are highly misleading. Adding another reference population from South Asia makes the results much more plausible. The take-home is that the terms for each quanta are only mnemonics. They should never be taken literally.


Comments (21)

  1. Darkseid

    I was hoping you’d weigh in on his show. I had been assuming it was oversimplified but just decided to take what i could get as far as guiding the public’s knowledge about ancestry. I found the one where Maggie Gyllenhaal tried to sound “informed” about her results to be pretty funny:)

  2. Charles Nydorf

    A great semi-popular exposition of model-based clustering algorithms would be very welcome now. The ideas behind these algorithms originated in among mainstream statisticians like Pearson, Fisher and Steinhaus. Then computer scientists picked them up and the field took off. Science popularizers have still not caught up.
    The great thing about Henry Louis Gates is that he realized that the application of population genetics to human groups is not inherently dangerous and can even have very positive social effects.

  3. Sandgroper

    Party game.

    The funny one was when former Australian politician Pauline Hanson got her results.

  4. Amos Zeeberg (Discover Web Editor)

    Is there a certain number of reference populations the analysis could include to avoid highly misleading results? They’ll always be an approximation, I realize, but some approximations you could live with, as opposed to the Northern European/East Asian combo you received.

    If you added, say, five reference populations (e.g. South Asian) to the three that 23andMe is already using, could you have results for any person that were at least generally reasonable?

  5. If you added, say, five reference populations (e.g. South Asian) to the three that 23andMe is already using, could you have results for any person that were at least generally reasonable?

    have to be representative. basically the issue is that you want to capture the combinations of world genetic variation which makes everyone up. if you don’t construct the model with a reasonable range of that variation, the model is going to give you weird results. the extreme case would be to take a swede, and put them into a pot with only bushmen tribes. the algorithm will *try*, but it won’t produce anything intelligible.

    as for the number/character of populations, that depends on what you want to do. yes, include south asians for a test which you want to use world wide. but if you had a finite number of slots, whether you want an amerindian population is going to depend on the possibilities being tested. e.g., if you did it in china it would be “wasting” the model’s time.

  6. Dm

    include south asians for a test which you want to use world wide

    True. But is this “if” condition satisfied? Both Gates’s audience, and 23andMe’s customers, are overwhelmingly American, and only a relatively small proportion of them are concerned about S Asian roots.

    In the US, even the way the South Asians pre-define themselves, in terms of standard multiple-choices for ancestry (be it for the Census or for a medical questionaire), is totally ambiguous. Given a choice between “Asian”, “Near Eastern”, or “Other”, they made each choice with a nearly the same probability. Many are Caucasian and a few, Black.

  7. Dm

    Thanks for the link, Razib. True, the latest installment brought the 2nd S Asian to the series. It doesn’t change the basic fact that the target audience for the program, and for the DNA genealogy testing marketers, is mainstream American. Of course the listeners / customers would be keen to hear how it works in the faraway corners of the world, but most of them should be perfectly satisfied with the idea that at present, the South Asians receive only trivial answers from these tests.

  8. Onur

    have to be representative. basically the issue is that you want to capture the combinations of world genetic variation which makes everyone up.

    That is why DNA Tribes SNP test results make much more sense than 23andMe SNP test results, as DNA Tribes utilizes many more reference populations and from all corners of the world.

  9. #8, but you mistake the thrust of my post. the a priori model that gates is pushing is useful, but false, for even europeans, africans, and east asians!

  10. Nathan

    Razib, is that pie diagram analogous to 23andme’s Ancestry Painting?
    My dad was the only family member tested , and the results of his 23AndMe Ancestry Painting are : “73% European, 27% East Asian & <1% African ".
    My innitial reaction , one which I still subscribe to, is that these utilities are working on too broad a definition to be of much use.

    Our ethnic background is Tamil from Ceylon. So I assumed the European component to simply denote West Eurassian. The 23AndMe global similarity has Central Asian on top followed by East Asian. Now shouldn't it be South Asian followed by whatever? Or is South Asian left out because the similiarity bar chart aims to show similarity to non South Asian populations?

  11. Dm

    #10 I think I got it right, more or less, but let’s double-check. I read the test premise as follows: “As long as you are comfortable with, or interested in, a simplification which describes your origins as a triangulation between NE Europe, E Asia, and West Africa, here is the result”.

    For WE Americans and African Americans interested in possible AA/WE/Native Am admixtures in their bloodlines, the simplification appears to be reasonable IMVHO. Of course even for WE/AA customers, if they come with any additional Asian, E African, or S/E European roots, the model may be a stretch … sometimes a potentially misleading stretch. But it isn’t *that* bad for most.

    People play with all sorts part-meaningless scores for themselves, like think of “personality / psychology trait surveys”. You are 57% extrovert! My score of paternal dedication is 8 out of 10! Her best city to live is Missoula, MT! At least the AA/WE/NA pie chart is an objective metric, not merely an entertaining number.

  12. . I read the test premise as follows: “As long as you are comfortable with, or interested in, a simplification which describes your origins as a triangulation between NE Europe, E Asia, and West Africa, here is the result”.

    you read wrong. or more properly, you are having a discussion with yourself, not me. this is *not* african american lives anymore. gates is making pretty grand claims for DNA in his series at this point. if you watch the series *now* that’s how i read the subtext of the *DNA/genomics* part (and of course, there’s always the moral about how we’re all related). you see maps of haplogroup migrations.

  13. ackbark

    Speaking of all related, sometime ago I saw somewhere something to the effect that ‘all women are more closely related to one another than men are to one another’, saying essentially that there is less variability among women than among men.

    This morning I realized this might have something to do with the fact that men will generally agree on what makes an attractive woman but among women there is no real consensus about men.

    So, is this actually right, are women more closely related to one another, or is it just a popular illusion?

  14. #14, 99% sure that’s garbling of lower Fst values for mtDNA vs. Y chromosomes. i.e., it is maternal lineages, not women. also, if i recall correctly mtDNA coalesces more quickly back to a common ancestor, but it is probably a clock artifact.

  15. Dm

    #13 still trying to figure out what we saw differently in Gates’s show and the falalcies threeof (the series which neither you nor myself actually watched LOL). Just checked the post of reification, sorry for missing it earlier. I must have needed a caffeine boost because my mind, seeing “polymers”, instantly conflated “reification” with “rheology” and it was like, wtf, if Razib is talking about glues, then I can check it later? Whoops.

    So I suspect that wrt Gates, I misunderstood that you aren’t happy about his (obsolete / confused / plain fictional) myths of deep ancestral prehistories, with the maps of haplogroup densities used to put a veneer of a scientific fact on quasi-religious legends? Is it the “fake roots” you had in mind? Not the family roots 4,5 generations back, but the prehistoric ancestral roots spanning millennia?

    If it is true, then I concede. I just don’t really dig deep-ancestry as a personal quest, and so I kind of ignored those tales. Prehistoric origins are taking good shapes when groups of people, extant and ancient, are compared. Of course one can find some tantalizng snippets of DNA blocks from far away in a single genome, but it’s always a random remnant of the recombinatory mill. Most of the deep ancestors’ blocks in the autosomes didn’t have a chance to get to you anyway, while lone SNPs with unusual allelic frequencies tend to be far too short on statistical power, at least with the current thin reference sets.

    So one’s individual genome may hold intriguing surprises, but to get a solid understanding of one’s pre-ancestry, one that may rival a myth, you’d still need a large group of people like you.

  16. Not the family roots 4,5 generations back, but the prehistoric ancestral roots spanning millennia?

    he talks repeatedly about thousand year old cousins (using mtDNA, etc.). and i’ve watched about 2/3 of his series on this fwiw (i.e., going back to his first in the mid-2000s). how much have you watched?

    and his use of haplogroups is rooted in a real phylogeny, even if it is trivial or misleading. the problem is that he’s now switching between mtDNA/Y methodologies and autosomal admixture estimates in such a way to elide the distinctions for the audience. i’m not sure he’s doing this purposely either…though it isn’t as if he doesn’t have scientific advisers.

  17. Joanna

    Hi Razib,

    As one of the genetics consultants supporting Harvard University Professor Henry Louis Gates Jr.’s Finding Your Roots series, I’d like to provide a little further detail:

    In analyzing the DNA of each guest through the lens of 23andMe’s service, we took a holistic approach. By combining mitochondrial, Y chromosome (for the men), and autosomal DNA for each guest, we inferred the guest’s overall ancestry, as well as anything that might be unusual or of particular interest to the guest and the show’s audience. As a consultant I recommended that Prof. Gates not present South Asian guests with their 3-way admixture (Ancestry Painting) results, since those are not the most pertinent DNA results. Therefore those results were not presented on the show to the South Asian guests. South Asian ancestry is evident in the DNA, overall, even if not in a 3-way admixture analysis; a feature called “Ancestry Finder” shows very clearly, for example, that Sanjay Gupta’s ancestry traces back to India. For South Asian guests of the Finding Your Roots show the emphasis was on other aspects of their DNA, and other genealogical information, rather than on the admixture percentages.

    As you know, the science underlying the interpretation of DNA in terms of an individual’s geographic ancestry is still young. A number of researchers and genealogists are developing tools aimed at furthering that science. At 23andMe we are excited to be working towards extracting as detailed and yet accurate ancestry information as possible from each customer’s DNA.

  18. #18, i don’t really have a problem with your service (i have my whole family and my wife’s family pedigreed thanks to you!). i’m 99 percent sure that my beef is the way that the show was edited. and i’m 99 percent sure that i wouldn’t be irritated if gates had not had so much license to produce hours and hours and hours of narrative on this topic. as i recently told someone, the main thing that i’m kind of exasperated about is that i think the editing of one of the episodes which i saw was not very clear about the distinction between the uniparental lineages and and the admixture tests. the audience is going to be really confused IMO.

  19. Joanna

    #19, That distinction between the uniparental lineages and the admixture tests is one of the most challenging concepts in ancestry genetics. Professor Gates, and the producers he has teamed up with, asked us for explanations of haplogroups and autosomal ancestry a number of times. So some of the episodes include brief explanations. They also asked us to explain this and related concepts in our blog posts for the PBS web site associated with the shows. This post,, probably gets closest to explaining the distinction you noted. We are now working on one post that discusses the Y and mtDNA lineages and on another on the science behind the admixture estimates.

  20. Dm

    Pondering this issue, the series, the aura, the stretched imagination, has been really an eye-opener for me. I had to break the mold of my “geneticist self” ever-focused on dangerous heritable conditions where we tend to convey to the patients only The Undoubtable (the interpretations with overwhelmingly strong proof … which is typically fraught with uncertainties and equivocations, because it skips some of the most important details which have merely “strong” rather than “overwhelming” proof)

    Part of the problem with medical genetic interpretations may be legalistic (tomorrow, more will be known, and the lawyers may be after you) and part, simply paternalistic (there is no shortage of bioethicists and regulators who insists that the public has to be protected from the need to understand genetics)

    Anyway I realized that with ancestry testing, I need to throw away this baggage, and to call my inner psychic for help instead. You know how, when fortune-telling, it may be at least as important to observe the eyes and the body language cues as the cards. Just like psychic reading, ancestry reading may be a way for the subject to rediscover oneself … because what people typically want to read in their pre-ancestries isn’t a cold scientific fact. It’s that they inherited the goodness of the ancient heroes rather than snippets of ancient DNA strands, and that, when they choose to stray from the boredom of their middle-class suburban lives, it’s because they are coming true to their roots.

    So I tried mind-reading overlay with DYIDodecad, and I did see my wild, dancing, flirting subject buzzing with pent-up excitement when we came across chromosomes and segments with high-percentage South Asian components. I didn’t have to tell her that most of these segments were about as high on percentage of European components, or that overall, her S Asian admixture was right in the middle of the normal range of the reference population. It really didn’t matter because we’ve made the self-discovery, self-validation hurdle. The ancestral power in your blood, or its reflection in your mind, either way it isn’t about the science of modern genetics!


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar