Which undersampled groups would you like to see?

By Razib Khan | September 9, 2011 10:22 pm

To my excitement I got the Tutsi (almost) and Malagasy genotypes. These are cases where N = 1 is a big deal, as opposed to N = 0. What other groups might be informative? Most of the world’s population is obviously not sampled, but they’re not always of equal interest. What would be equivalent to the Tutsi (politically relevant) or Malagasy (demographically very unique)? If I solicit funds to pay from someone’s genotyping it won’t be a successful solicitation if the interest is very narrow.


Comments (83)

  1. Justin Giancola

    Russia & Northern Central Asia: from the Kama east; anybody.

  2. JoeLynne
  3. Dr Duck

    Sentinelese would be pretty interesting, but not practial, I suppose. Agree with Charles re Tasmanian Aborigines.

  4. Charles Nydorf

    Definitely Karaites from Lithuania, Ukraine, Crimea and Egypt.

  5. Toavina Andriamanerasoa

    Agree re. Tasmanian Aborigenes

  6. C. Baer

    How about some of the aboriginal (non-Kinh) Vietnamese groups – Cơ-tu, Xơ-đăng, Giẻ-triêng, Cor, etc.

  7. Huxley

    I’d like to see an analysis of the Germans. The northern and southern Germans seem very different (blonde Nordic/Germanic vs. brunette Alpine/Celtic). I wonder how the Bavarians and Holsteiners ended up speaking the same language (or a last dialects of the same language) as they must have at least somewhat different ancestry.

  8. Filipino negritos.

    They might turn into a hot topic one of these days.

  9. Taylorboy

    How about Irish people both in Northern Ireland and the Irish Republic and including both Catholics and Protestants?

  10. I’d like to see people of Mexican descent resident in the U.S. sorted by average year of their own or their ancestors’ immigration to the U.S.

  11. Dear Justin Giancola:

    By any chance, would you happen to be distantly related to an old friend of mine, recently deceased, the Air Force colonel Dr. John Giancola?

    Steve Sailer

  12. The Donme of Istanbul’s elite, formerly of Salonika.

  13. The hereditary upper class of Monterrey, Mexico.

  14. Self-identified Cockneys. Are they pure English or did living in a cosmopolitan port city have much impact on them?

  15. So-called triracial isolates in the Eastern U.S., such as the Melungeon and others.

  16. Ashkenazi Jews originating in Budapest: why all the famous physicists?

  17. Nuer and Dinka from South Sudan. They are politically in the news, having a new country. They are really interesting looking. And they are famous in cultural anthropology circles going back to Evans-Pritchard as the canonical example of segmentary lineages.

  18. There have been hints in genetic studies of Polynesian islanders that Thor Heyerdahl’s popular Kon-Tiki theory that South Americans rafted to Polynesia isn’t totally wrong. (Also, an American food crop showed up in Polynesia before 1492, but it might have gotten there by seeds floating.) That would be a fun topic.

  19. Another approach would be to make a list of specific famous individuals whose ancestry is subject to controversy and solicit contributions from their descendants. For example, Charlie Chaplin, one of the most famous men of the 20th Century. It is generally believed by the public that his mother was Jewish, but some genealogical researchers doubt this. One possibility is that his mother was actually Roma.

    Chaplin’s ancestry is of interest for all sorts of reasons, such as the weird connection in looks between Chaplin and Hitler. Hitler may have imitated Chaplin’s famous mustache. Chaplin’s 1940 movie The Great Dictator, in which he parodied Hitler, was driven in part by Chaplin’s unease over their similarities.

    Chaplin’s daughter, actress Geraldine Chaplin, is 67-years-old. There’s an outside chance that she’d be amenable to providing a sample.

  20. Madouc

    Cherokee or other NA groups.

  21. Justin Giancola

    Sorry Steve I’m not aware of a connection. There is however a member of Jersey Shore who shares my last name as well. I’d guess that as it’s not a super common name the chances of us all being related is decent.

  22. What part of Italy were your Giancola ancestors from? Col. Dr. Giancola’s parents were from Castel San Vincenzo in the mountains outside Rome in, I believe, Isernia province.

  23. Razib:

    You might look at the works of Israeli journalist Hillel Halkin for ideas. He writes:

    “The fact is that I’ve always been a sucker for this kind of stuff. Ever since I was a kid growing up in Manhattan, I’ve lapped it up: stories about the lost tribes, descendants of the Marranos, shadowy Jewish kingdoms in the Middle Ages, Jews turning up in far places — the mountains of Mexico, the jungles of Peru, Kaifeng, the Malabar Coast, Timbuktu . The Jews of Manhattan were boring. Jews spotted by Marco Polo on the China coast or surviving centuries of the Inquisition in the hills of Portugal gave me goose pimples.

    “Call it the romance of Jewish history. The idea that we were a profoundly more adventurous, infinitely more varied, more far-ranging, more interesting people than the Jews I knew.”

    Halkin’s 2002 book Across the Sabbath River: In Search of a Lost Tribe of Israel makes the case that the obscure Mizo ethnic group on the India-Burma border are really the descendants of Manasseh, one of the Lost Tribes of Israel.

    If you find one person from one of these obscure groups who has some Jewish ancestry, you’d get some publicity for your findings.

  24. One approach would be to contact members of obscure groups who have written about their membership in these groups. For example, Ilgaz Zorlu, an Istanbul accountant, wrote a bestseller in Turkey called “Yes, I Am a Salonikan” about his secretive Donmeh ethnic group (crypto-Jewish followers of the 17th Century false messiah Shabbetai Zevi who concentrated in Salonika before relocating to Istanbul, where they played major roles in the Ataturkian elites). Mr. Zorlu might want the publicity from having his DNA analyzed.

  25. You might ask descendants of individuals rumored to be part black. For example, was President Harding part black? That’s a fairly popular question these days. Unfortunately, he didn’t have any biological children, so you’d have to go to his siblings’ descendants.

    Or, what is Vivien “Scarlett O’Hara” Leigh’s ancestry? There are hints that it includes some kind of Central Asian component. She has one 77 year old daughter, Suzanne Farrington, who is apparently still alive.

  26. South French per regions in Euro context. Critical to prove/disprove the FC refugium theory.

  27. Prem Palver

    Asperger Sailerites of the San Gabriel Valley.

  28. Justin Giancola

    I can’t confirm those details, but I’ve always heard amongst relatives mountains outside Rome!

  29. Miguel Madeira

    “Cape Verde Islanders”

    Nothing to see there – Portuguese + West African.

    More interesting studies:

    – Canary Islanders (pure Spaniards or Spanish/Berber mix?)

    – some villages along Sado river, in southern Portugal (there is a kind of “legend” that many people there descend form African slves).

  30. Nathan M
  31. Garvan

    I think an Ainu genotype would be interesting. I understand that pure Ainu no longer exist, but even a mixed individual can give interesting information as shown by the Tutsi and Malagasy cases.


  32. FF

    Descendants of Tommy Solomon, the last full blood.

  33. Roland Kuhn

    Re undersampled groups: I suggest collection of north-west North American aboriginal DNA (i.e., na-Dene DNA), to clarify a recent finding. The recent Yotova et al. paper http://mbe.oxfordjournals.org/content/28/7/1957.abstract?sid=facb4e54-49cc-4ce1-b713-be5bccd60993 seemed to show that many people outside Africa have inherited a particular chunk of their X-chromosomes from Neandertal ancestors. In the paper, by far the highest frequency of this “Neandertal” X-chromosome sequence (up to 25%) is found in NW North America; that freequency is much higher than the generally accepted figure of about 3% for the portion of overall DNA inherited from Neandertals by modern non-African populations. Why do NW North Americans have such a high proportion of the “Neandertal” X-chromosome DNA? Most likely, because of a combination of genetic drift and selection. However, it is not completely inconceivable that the proportion of their OVERALL DNA inherited from Neandertals is higher than for other populations (significantly higher than 3%). My understanding is that no-one has specifically measured the contribution of Neandertal DNA to the complete genome of this population; if that’s correct, it would be worthwhile to look into this.

  34. Shay Riley

    Americo-Liberians (who are the descendants of African American — mainly from southeastern and northeastern USA — and Caribbean settlers — mainly Barbados — from the early- to mid-1800s) and the Sierra Leone Krio (who are descendants of Caribbean — especially Jamaican — and African American/African Canadian settlers from the late-1700s to mid-1800s) are definitely undersampled groups. I believe that these are Africa’s only two groups who have significant American/Caribbean ancestry, which makes them demographically unique (and potentially has politically relevance, depending on whether it can be shown that they have African ancestry that hails from modern-day Liberia). I don’t recall any study ever documenting their DNA.

    Are they more akin to North American blacks or West Africans, or are they an intermediate group between the two? Is their genetic mix of African-specific ancestries similar to that of North American blacks, or is it its own unique mix (the Americo-Liberians and the Krio apparently assimilated some almost-enslaved Africans who were caught midsea and brought back to African shores)? Also, how much genetic intermingling have they had with surrounding indigenous Liberian and Sierra Leonean populations?

  35. Paul G

    Minority religious groups from the ME. Yezidis, Ma’loula Aramaeans, N Syrian Alawites, Nash Didan Jews, etc.

  36. Paul G

    @ JoeLynne’s suggestion, Kurdish Jews.

    You may wish to keep an eye on the ICHG studies planned for next month. One in particular may contain data on Kurdish Jews.
    “Genetic structure of Jewish populations on the basis of genome-wide single nucleotide polymorphisms.” N. M. Kopelman

    “The collection of Jewish populations studied incorporates a variety of populations not previously included in other genomic population structure studies of Jewish groups.”

  37. C. Baer

    Or how about the Aivilingmiut or other Inuit groups of northern Hudson Bay?

  38. of these responses, the one for which an N = 1 would be informative would be a part ainu individual. we have lots of japanese in the public domain already to compare them to.

  39. “Cape Verde Islanders”

    “Nothing to see there – Portuguese + West African.”

    C’mon, the interesting thing would be how much Jewish ancestry Cape Verdeans have.

    The mainstream media is most interested in Jews and blacks. If Razib could document that, say, Abraham Lincoln was part Jewish and part black, he’d be invited on Oprah and 60 Minutes to trumpet his findings. (Unfortunately, Lincoln doesn’t have any surviving descendants. And I very much doubt that’s true.)

  40. “Minority religious groups from the ME. Yezidis, Ma’loula Aramaeans, N Syrian Alawites, Nash Didan Jews, etc.”

    Excellent suggestion. I’d add Samaritans, Gnostics, and Druze to the list. What about the Turkish Alevi?

  41. How about that black African village that was in the Caucasus/Black Sea region of Czarist Russia?

  42. Jared Diamond, in the Third Chimpanzee, talks about how strikingly different in phenotype people can be from one Solomon Island to another. He names different islands tens of miles apart where skin color differs notably. You could test Darwin’s theory of sexual selection driving racial differences here: do Solomon Islanders all have the same ancestry and have locally diverged for whatever reason? Or are there differences in average ancestry between islands which accounts for their distinct looks?

  43. Gullah dialect speakers from Sea Islands of American southeast. Are they of mixed black African tribal heritages like most African-Americans, or do they come from a particular tribe?

  44. Tangier Island in the Chesapeake — these are perhaps the most genetically isolated British folk in the U.S. Their local accent preserves features of how English was spoken in the 17th Century British Isles.

  45. Is there much interfacing between genetic researchers and genealogical hobbyists? They don’t tend to have a lot in common, but they could be very useful to each other in suggesting topics for research.

    For example, lots of socially prominent white people in Virginia claim to be descended from Pocohontas. That would take N > 1, but it’s a pretty interesting subject.

  46. I’ve heard there are something like 1800 people who claim to be part Tasmanian.

  47. Onur

    What about the Turkish Alevi?

    Also the Kurdish Alevi. The Alevi, whether Turkish or Kurdish, despite lacking orthodoxy and authorized sacred books and the centuries-long supression by the Sunni Ottoman government and populace, have very similar beliefs and practices. So their oral tradition seems to be strong.

    I have no Alevi roots, but I have a friend who is a Kurdish (Kurmanj Kurdish) Alevi. If I convince him to test with 23andMe or FTDNA, I will inform you.

  48. Onur

    I have no Alevi roots

    at least no known roots

  49. Roger Bigod

    There’s a surviving remnant of Pocahontas’ tribe, the Powhatan.

    Pocahontas had one surviving grandchild, a woman who married a man named Bolling. Armies of genealogists have toiled over the Bollings, so there’s a fairly complete record.

  50. Paul G

    @ Steve Sailer

    Yes. Certainly. There are two Gnostics participating in the Dodecad project (DOD460 and DOD786), and one of them is also participating in the Harappa project (HRP0094).

  51. Stephen Hemenway

    Native peoples from the highlands of the Andes. Since they live at such high altitudes I wonder if they have adaptations similar to the adaptations of Tibetans, and if they separate out from other Native Americans for this reason. (The Andes and the Himalayas really are comparable in altitude: Llasa is at 11,450 ft, and Cusco is at 11,200 ft. I heard on npr that if you measure from the center of the earth rather than sea level, because the earth is fatter at the equator, the Andes has the tallest mountain in the world.) It would explain in part why they maintained such a large native population and retained so much of their culture (thankfully); the European settlers, like the Han in Tibet, weren’t adapted to such high altitudes. Just a thought.

  52. Roger Bigod

    The physiologists have studied adaptations to high altitude in Andes populations, but I don’t know how much is due to identified genes. There’s an old study that looked at their antibody repertoire and found a lack of some antibodies for mumps virus, a European import. This may have been based on antibody or protein sequence, rather than DNA.

    We can make over a million different antibodies and if each one were a separate gene, it would take more DNA than the genome has. The solution to this puzzle is that there are 3 sets of partial genes, each arranged in tandem. A developing B lymphocyte chooses one string of DNA from each set to express and splices them into the final gene. The antibody is externalized on the surface of the cell and if it is bound by antigen the cell undergoes multiple mitoses and starts pumping out soluble antibody.

    Sequences of a population’s antibody repertoire might be a great indication of the history of pathogen exposure.

  53. Darkseid

    i vote for blacks in the latin america region

  54. What’s the group in the mountains of Pakistan that speaks a language that seems extraordinarily unrelated to other languages?

  55. The highlands of Yemen are interesting for a variety of reasons.

  56. #62, i think you are thinking burusho. they’re in the HGDP.

  57. TonyGrimes
  58. Muslims from Zanzibar in Tanzania. To what extent were they indigeneous converts, Arabs, South Asians, Egyptians or something else?

  59. pconroy

    Guanche – Canary Islands – Looked similar to Cro-Magnons of Europe

    Yaghan – Tierra Del Fuego – similar lifestyle to Neanderthals, wore no clothing in freezing temperatures

    Andaman Islander – Out of Africa interest

    Bashkir – Bashkortostan, Central Asia – Y-DNA R1b frequency of > 70%, maybe origin of R1b

    Ouldeme – Cameroon – very high frequency of R1b

    Ket – Siberia – possibly related to Native Americans

    Ainu – Hokkaido

    Aleut – Alaska

    Objibwa – Canada/US – have 25% mtDNA Haplogroup X

    Torres Strait Islander

    Lemba – Zimbabwe – possibly Jewish

    Kalash – Nuristan

    Melungeon – Tennessee – possibly Sephardic Jewish

    Copt – Egypt

    Siwa Oasis Kabyle – Egypt – may be related to some of the red-haired pharaohs of Egypt

    Tuareg – Sahara

    Szekler – Hungary – more East Asian than regular Hungarians

  60. Onur

    – Anatolian Greeks (especially the ones who are not from coastal western Anatolia, thus not influenced by the significant Ottoman-era Greek migrations to coastal western Anatolia from the Aegean islands and mainland Greece), whether – when still living in Anatolia during the Ottoman era – the Turkish-speaking ones or Greek-speaking ones

    – The Zoroastrians of Iran; as they, by staying in Iran zamin, are genetically much purer Iranians than the Zoroastrians of South Asia, who are heavily admixed with South Asians despite being a small and relatively isolated minority group; also, they, by most probably lacking any Turkic or Arab admixture, are likely to be genetically purer Iranians than Muslim Iranians

  61. Justin Giancola

    67. As far as the Yaghan that’s super cool, but where do we know this from? All the wiki-pedia stuff needs citation. Aren’t these same people also claimed to be giants? or even fury beasts? I’m not being sarcastic btw.

  62. Grey

    “Self-identified Cockneys. Are they pure English or did living in a cosmopolitan port city have much impact on them?”

    I guess there were lots of small impacts but i don’t think there were many big impacts before the 60s apart from a sizeable Irish component after the famine. I doubt they’d be much different from a Nascar crowd with maybe 5% more Irish. White southerners look pretty much the same when you adjust for sun, tattoos and weight (although that last element is getting the same way now) so i don’t think it would tell you much.

  63. pcconroy wins thread!

    Actually, it’s not a competition … but a lot of interesting suggestions from everybody.

  64. gcochran

    Japan’s Imperial family

  65. Burakumin




    Zana and Khwit!

  66. Roger Bigod

    White southerners for whom Nascar is emblematic come from the Borderlands, most via Ulster with very little intermarriage. The gene pool is probably very different from Cockneys. Both pops are heterogeneous so N=1 wouldn’t be very informative.

    The figure of merit for N=1 might be the date of the last common ancestor. That appears to maximize the number of unique sequences in the resulting tree and therefore the information sensu Shannon.
    Tribes in the mountains or distal islands of Asia look like the best bet. One set of Ainu SNPs would be a big turn-on.

  67. pconroy


    There is a Samaritan tested, check out Benyamin Tsedaka – who has a public profile on DeCodeMe – and who recently tested with 23andMe too.

    More here:

    “We are the real Israelites,” declares Benyamin Tsedaka, a priest among the 600-strong Samaritan community which traces its ancestry back to the northern biblical kingdom of Israel. “Unlike some of our returning Jewish brothers,” says Tsedaka, “we have always been here; we never ever left this land.” Tsedaka claims to represent the 125th generation of his family to live in Israel, which according to him goes back 3630 years to the time of Joshua’s conquest of the Land of Canaan.

  68. pconroy


    I presume that was a joke to test Irish people, as there are literally thousands (probably like 10’s of thousands) of Irish tested by 23andMe.

    Or do you mean specifically people born in Ireland to Irish parents and grandparents – in that case there are probably 100’s – myself included – tested.

    Some of the biggest surprises for me as an Irish person, were:
    1. Having about 750 relatives identified so far – putting me in the range of Colonial US numbers. This is a signal of rapid population expansion and/or inbreeding.
    2. Having 100’s of these Relatives in the US South
    3. Having Jewish relatives from Eastern Europe and US
    4. My father having a relative whose ancestry is South Asian

  69. A speaker of an Ubangian language from the Central African Republic. Why?

    “Greenberg (1963) classified the then little-known Ubangian languages as Niger–Congo and placed them within the Adamawa languages as “Eastern Adamawa”. They were soon removed to a separate branch of Niger–Congo, for example within Blench’s Savanna languages. However, this has become increasingly uncertain, and Dimmendaal (2008) states that, based on the lack of convincing evidence for a Niger–Congo classification ever being produced, Ubangian “probably constitutes an independent language family that cannot or can no longer be shown to be related to Niger–Congo (or any other family).””

    This suggests that Ubangian languages may be a previously unidentified language family in African that no one has singled out to genotype, and at the very least is likely to be divergent from other Niger-Congo populations. The N=1 sample could set a lot of light on the linguistic ambiguity. CAR is generally undersampled in any case and is not geographically far from some very distinct genetic populations (Chadic, Fulani, Bantu, Pygmy, Tutsi, Maasi, Kordofan, Omoro, Ethiosemetic, Cushitic).

    For similar reasons, a Bangi-me speaker from Mali.

  70. Paul Givargidze

    @ pconroy:

    “We are the real Israelites,” declares Benyamin Tsedaka, a priest among the 600-strong Samaritan community…”

    Based on their autosomal DNA, they sure do make a good case. I believe there stands a reasonable chance Samaritans are principally descended from the population of Israel/Judea in the Second Temple period.

    MDS plots, many times, mimic geography to a significant extent, for relatively unadmixed populations. Or, at least in populations with admixture limited mostly to local (negligible genetic distance), as opposed to distant (genetically) sources.

    Please see the slightly modified MDS, based on the Behar et al. plot from the 2010 study of Jewish DNA , I am providing a link to below. This particular plot was created by David Wesolowski (Eurogenes) for a topic Razib discussed a few months backs, regarding the DNA of Jews, Christians and other ME populations. The symbols are a bit off, apologies. The image was rotated and flipped horizontally. The image was also cropped to exclude all non-Semitic-speaking populations. Some individual Arabian admixed Iranians remain, however. The relative positions of the N Mesopotamian cluster, Samaritans, and Yemen Jews, do not deviate much from geography. As these three populations are also the least (recently) admixed among the samples used, it lends support, in my opinion, to the possible principal Israelite/Judean origins of the Samaritans.

    MDS – http://www.box.net/shared/30y5m60d6h5sndalz7lk

  71. AJ

    Rajasathi Jains – many claim Rajput ancestry and a minority shows central asain traits

  72. AJ

    That should be “Rajasthani” Jains

  73. Grey

    “White southerners for whom Nascar is emblematic come from the Borderlands, most via Ulster with very little intermarriage. The gene pool is probably very different from Cockneys.”

    Yeah right.

  74. Antonio

    I don’t know whether these population are under sample or not, but I would like to see more data from the americas. For instance, I would like to compare the admixture of the elites in countries such as Argentina, Brazil and US. Maybe with South Africa as well. From what I remember most projects such as Dodecad or Eurogenes favor analysis of unadmixed individus, for several reasons. Instead, I would like use genetic data from this areas to shed some light upon the political histories of these places. The overtime flow of genes can tell us a lot about history! (btw, by admixed I mean not only “white” vesus “non-white” but also the the within european descendent people.

  75. TonyGrimes

    Jarawa Andamanese and Bougainville Islanders would be interesting. I think they are the darkest people in the world, with possible African exceptions. The people of Bougainville exhibit some blondism too:



Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar