The Roma have multitudes

By Razib Khan | October 24, 2013 6:17 am

Credit: Dbachmann

Credit: Dbachmann

Update: Turns out “Maria” is also an ethnic Roma.

There was a recent case in Ireland of a young Roma girl who was blonde haired and blue eyed being removed from her home, on the suspicion that she was not in fact the biological child of the presumed parents (who, like most Roma, are reportedly of dark complexion, hair, and eye). I even saw a report that a hospital was consulted on the probability of such an outcome, and they said it would be “extremely unusual”. It turns out that DNA tests confirmed that this girl was the biological child of the putative parents. And of course all this has be understood in light of the case of “Maria” in Greece; a little blonde girl who turned out not to be the biological child of the two Roma who claimed her as their daughter (it looks like there was welfare fraud in that case).

My initial response to the Irish case was that consultant should be fired, because in an admixed population like the Roma it shouldn’t be that unusual to have offspring who deviate a great deal from the parental phenotype. This prompted some interesting reactions. First, there were those who seem blissfully ignorant of the fact that the Roma are an admixed population. That’s easy enough to resolve, as there have been scientific papers published on this issue using genome-wide data. Second, there are claims that very small fraction of Roma have blonde hair and blue eyes (on the order of less than 1%). The latter may be a defensible claim, though not indisputably so.

Before we move on I have to clarify that there is a distinction between “Roma” and “Romani.” The latter refers broadly to the populations across Europe which were referred to as “Gypsy,” while the former denotes a set of populations with a center of distribution in Southeast Europe, in particular in the Balkans. In much of Northern and Western Europe there are now two populations of Romani with very distinct histories (and genetics): the Roma who have recently arrived from Southeast Europe, and the various non-Roma groups who have a very long history in their nations of residence (e.g., Finnish Kale).

In terms of various traits we know a fair amount about the genetics of pigmentation in humans. Though the fine grained individual predictive models are coarse, most of the genes which have large effects on population-scale differences are now well characterized. This allows me to produce a model which is reasonably plausible to give you an intuition for why brown-skinned populations can generate a wide range of outcomes in realized phenotype.

Imagine five loci rank-ordered in effect size, gene 1, gene 2, gene 3, gene 4, and gene 5. Each gene comes in two flavors, two alleles. One is a “dark” allele (produces dark pigmentation) and another is a “light” allele. From these you can have a distribution of complexion which is referred to as a “melanin index” (it’s dependent on reflectance). Imagine that you assume each allele at each gene exhibits a melanin index value like so in relation to the aggregate:

Gene 1 = 30, 2
Gene 2 = 15, 1
Gene 3 = 10, 1
Gene 4 = 5, 2
Gene 5 = 5, 0


What you see above are potential genotypes (all heterozygote implicitly), with their phenotypic values being the sum of the two. One allele at gene 1 contributes 30 melanin units, and the other 2. And so on. Taking the “dark” alleles and assume they’re all homozygote (so doubling them), you get a maximal potential value of 130, and a minimal one of 6 if you make the “light” ones homozygote. But of course in most cases you’ll get a combination. But what would be the outcome for a given set of frequencies? Since I’m lazy I ran a simulation. I set the frequencies of the dark allele for each each like so:

Gene 1 = 60%
Gene 2 = 45%
Gene 3 = 35%
Gene 4 = 46%
Gene 5 = 50%

Then I generated 10,000 multilocus genotypes, and added a “noise” parameter so that the trait wasn’t totally determined by the genes. This is why the phenotypic value can be higher (and lower, though that bound can go no further than ~0) than what genotype would predict. Here’s the distribution:


The mean value is 73. The 25th percentile is 55. 1 out of 26 individuals should have an exclusively “light” genotype across all five genes. The point is that in a polygenic character if you have polymorphism on the genotypic level you’re likely to have it on the phenotypic level. 

roma2The second major question is is this even plausible for Roma? Yes. They’ve very admixed. Two recent papers make the case definitively, Reconstructing Roma History from Genome-Wide Data and Reconstructing the Population History of European Romani from Genome-wide Data. These papers used tens of thousands to hundreds of thousands of markers. You can see in the bar plot to the left that the Roma have much higher European-like ancestry proportions than other Indians. It is likely their parental population is Punjabi-like, so it seems that they’re ~50% non-Indian in admixture. The second paper offers up a wider population set for comparison, and it suggests that the Roma did not experience much gene flow with Middle Eastern groups (there are still Roma-related populations in the Middle East, the Dom). Rather, their primary phase of admixture occurred ~1,000 years ago in the Balkans.

Reconstructing the Population History of European Romani from Genome-wide Data has a wide range of Romani populations, and it seems evident that the Western and Northern Romani have more European admixture than the Balkan Roma. It turns out that the Welsh Romani seem totally Europanized in their genome.That is, they’re basically now a Northern European population, perhaps with some residual South Asian ancestry. Because these Romani originally spoke an Indo-Aryan language it seems that they are genuine Romani in a cultural sense. The Welsh Romani have simply undergone enough gene flow with the surrounding population over the past hundreds of years to lose their genetic distinctiveness.

secondYou can see a broader population wide comparison in this bar plot. European populations are at the top, and below them are the Romani groups. The South Asian admixture is again evident, but observe the paucity of both of the Middle Eastern components (you can label them “Northern/Caucasian” or “Southern/Arabian” for convenience; they show up repeatedly in Admixture analyses). The authors of the second paper linked above make much of this, but I would be cautious. I would have preferred that they run Admixture in supervised mode, or perhaps used a formal test of admixture (e.g., D-statistic). But, it is strongly suggestive of the possibility that the Roma sojourn in the Middle East was rather short, and that the true ethnogenesis of the group occurred in the Balkans primarily. And, as I said earlier, the European genetic character of Welsh Romani is pretty obvious in this plot (they cluster with Europeans in the PCA as well).

But, despite the Romani history of admixture in Europe, some of them are genetically very isolated now, and have been for hundreds of years. This seems the case of the Roma, who have had surprisingly little admixtures since the initial settlement. There’s widespread evidence of inbreeding and founder effect across the Romani populations as well, making them both admixed and very distinct. You see long runs of homozygosity, and the clustering bar plots tend to “break out” the Roma rather early on in the steps up the number of populations, similar to what you see in groups such as the Kalash. I believe one of the problems with adducing phylogenetic relationships of the Romani with Y and mtDNA markers was simply that bottleneck effects are more powerful for uniparental lines, and they were buffeted more by the small population size. In sum, when it comes to Roma genetic variation there are a few things to keep in mind:

1) South Asian source

2) Admixture with Southeastern Europeans

3) Long period of relatively genetic continuity and isolation after the initial phase

4) Genetic homogeneity within the groups. That is, they’re well admixed across most individuals

5) Lots of novel genetic uniqueness because of high drift rate because of small effective population size

MORE ABOUT: Roma, Romani

Comments (13)

  1. So proud of myself for not sending this story to any of my friends as I thought that it could be wrong (result of my GNXP training:)

  2. Paul Conroy

    One note of caution, comparing these Roma in Ireland to Welsh Roma is incorrect – as the Roma community in Ireland dates back to 1998, or a mere 15 years, while Roma in Wales are residents for hundreds of years, and are “Welsh Roma”. The vast majority of Roma in Ireland are Romanian citizens, with a few from the rest of the Balkans. The child in question is called “Iancu Muntean”, and is Romanian.

    Check this article:

    I myself have South Asian like ancestry, but this can’t be Roma, it has to be either:
    1. British army in India
    2. Ancient, Bronze Age ancestry

    I’ve noticed that the South Asian like ancestry in Irish people mostly from the South East of the country, less elsewhere.

    Here are my recent Admix results – where I have 2.4% Kalash:
    0.62% Nilotic-Omotic
    0.00% Ancestral-South-Indi
    33.53% North-European-Balti
    1.68% Uralic
    0.27% Australo-Melanesian
    0.00% East-Siberean
    0.00% Ancestral-Yayoi
    4.45% Caucasian-Near-Easte
    0.00% Tibeto-Burman
    0.02% Austronesian
    0.00% Central-African-Pygm
    0.50% Central-African-Hunt
    0.00% Nilo-Saharian
    0.00% North-African
    12.28% Gedrosia-Caucasian
    0.00% Cushitic
    0.12% Congo-Pygmean
    0.00% Bushmen
    0.21% South-Meso-Amerindia
    34.91% South-West-European
    0.31% North-Amerindian
    0.41% Arabic
    0.00% North-Circumpolar
    2.40% Kalash
    0.17% Papuan-Australian
    8.12% Baltic-Finnic
    0.00% Bantu

    [,1] [,2]
    [1,] “3.3% North_Finn + 96.7% Welsh” “2.2006”
    [2,] “4.8% South_Finn + 95.2% Welsh” “2.2198”
    [3,] “5.2% Inkeri + 94.8% Welsh” “2.243”
    [4,] “4.5% Finland + 95.5% Welsh” “2.2477”
    [5,] “5% Finn + 95% Welsh” “2.2527”
    [6,] “3.9% Saami + 96.1% Welsh” “2.2736”
    [7,] “94.4% N._European + 5.6% Stalskoe” “2.3226”
    [8,] “92.9% North-West-European + 7.1% Vepsa” “2.3769”
    [9,] “8.4% North-Russian + 91.6% North-West-European” “2.395”
    [10,] “5.2% Karelian + 94.8% Welsh” “2.4334”
    [11,] “5% Lak + 95% N._European” “2.4943”
    [12,] “4.9% Lezgin + 95.1% N._European” “2.4964”
    [13,] “4.9% Chechen + 95.1% N._European” “2.5073”
    [14,] “3.8% Georgian + 96.2% N._European” “2.511”
    [15,] “95.7% N._European + 4.3% Ossetian” “2.5171”
    [16,] “94.9% N._European + 5.1% Urkarah” “2.5196”
    [17,] “3.8% Abhasian + 96.2% N._European” “2.5204”
    [18,] “3.6% Georgian_Imereti + 96.4% N._European” “2.5407”
    [19,] “4.8% Kumyk + 95.2% N._European” “2.5438”
    [20,] “95.5% N._European + 4.5% North-Ossetian” “2.5454”

    • razibkhan

      i didn’t compare welsh roma to irish roma. i specifically said welsh romani, because the welsh romani are NOT roma. i made that clear in the post paul 🙂

      • Paul Conroy

        My objection is to the use of the terms like “Irish Roma” for Romanians. I rate it alongside the use of terms like “Black Irish” or people calling “Irish Travellers”, Roma/Romani or Gypsy – these create a distorted picture of Irish demographic history. You may not be as sensitive as I am to increasing the US reader’s already dismally poor understanding of what constitutes an “Irish” person. It’s like the way “Albion’s Seed” – one of your favorite books – tries so hard to sideline the history of the Irish in the US from the 1600’s onwards.

        I’m sure as Romanians, the people in your article, should not be compared to Northern or Western Europeans, and have no such ancestry. Blonde hair also occurs in the Balkans at appreciable frequencies, though obviously at much lower levels than among Welsh Romani/Roma.

  3. Dmitry Pruss

    It’s a circular argument. The village grannies across the region told their pesky grandkids, “Behave of the Gypsy will take you!”. Some grew up believing in the Bogeyman. But if we suppose that the Gypsies actually have been stealing kids at an appreciable rate, for centuries, then surely enough great many of them will be naturally born blue-eyed today?

    • Dmitry Pruss

      And of course now the Greek blue eyed girl turned out to be an ethnic Roma, left in Greece by her desperately poor transient mother, and both kidnapping and welfare-fraud “theories” are dispelled.

      • kiiski

        The welfare fraud allegation refers to the couple “hosting” the girl in Greece. They had registered a total of 14 children, some of which aren’t accounted for and may not actually exist.

  4. kiiski

    A rarish polygenic combination is one possibility, but in the case of ‘Maria’, the biological parents also have other very light children (photos in linked article). This, and the ‘discreteness’ of the phenotypes, would seem to suggest some form of albinism:

  5. David237856

    I haven’t seen a full account of Maria’s biological family (Bulgarian Roma), but the mother is very dark skinned, while at least three of Maria’s numerous siblings, as well as Maria herself, are very pale skinned. This is surprising unless there is some kind of albinism in the picture, and this has indeed been claimed:

  6. kiiski

    There’s another example of a very light Bulgarian Roma in the documentary “Paradise Hotel”. It’s about a Roma ghetto in Yambol, not far from the town where Maria’s parents were found. Watch the trailer:


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar