Clusters where they "shouldn't be"….

By Razib Khan | February 13, 2012 11:25 pm

Uyghur girls

A few people have pointed me to the paper, Implications for health and disease in the genetic signature of the Ashkenazi Jewish population. You should check it out if you don’t have academic access to papers, it’s not gated. Rather, I want to focus on a methodological issue.

In the genetics reader survey only 20 percent of you agreed that you understood how to read an ADMIXTURE plot. After looking at some of the results in this paper I have a lot of sympathy. Understanding what’s going on requires more prior information than is often present in the legends of the figures.

It is known that to a first approximation Ashkenazi Jews, that is, the Jews of Europe, can be understood as an admixture between a European population and a Middle Eastern one. But Ashkenazi Jews also exhibit their own genetic distinctiveness, probably due to long term endogamy. This shows up in various genetic statistics. In this paper the authors show that Ashkenazi form their own cluster in both PCA and ADMIXTURE, two ways in which to ascertain population structure. Below I’ve reedited and highlighted some populations of note in one of their ADMIXTURE plots. It’s rather informative of the bigger problem with interpreting these sorts of results in the absence of context.

As you can see there is an ancestral element which is predominant in the Ashkenazi. A individual analysis also implies that most of those with a lower fraction of this element who identify as Ashkenazi probably have recent admixture (e.g., only three out of four grandparents were Jewish). What I found striking is that the Uyghur and Hazara both also shook out as of a particular ancestral element. The reality is that we know this is a total artifact of the ADMIXTURE software; the Hazara have a historical narrative of being the product of intermarriage between Mongols and Persians. The historical evidence for the origin of the Uyghur is sketchier and more confused, but it can be reconstructed. And the genetics make it likely that both these groups emerged over the past 2,000 years, as an admixture between a Western and Eastern Eurasian set of populations.

What does this have to do with Ashkenazi Jews? I think one should be skeptical of an “Ashkenazi Jewish” modal element when we already know that this plot has useless clusters. It does not seem like they included any “real” East Asian reference populations, so the Hazara and Uyghur stepped up and took that position, despite both populations having ~50 percent West Eurasian admixture. The ADMIXTURE software transformed a clearly hybridized population into its own ‘ancestral’ population. Something similar might be happening with the Jews, especially in light of the fact that the authors had a relatively large Jewish population their data, geared to exploring the nature of Jewish genetic relationships. This is a case where we know that we probably don’t know that much.

The moral: don’t think you can read a scientific figure plainly without any context.

Image credit: Wikipedia

CATEGORIZED UNDER: Human Genetics, Human Genomics

Comments (13)

  1. Nick Patterson

    This is a shrewd and important note illuminating a key issue.
    I work a lot on recovering phylogeny as some readers of this blog will know.

    I use PCA (and ADMIXTURE) output mostly for data exploration rather than formal modeling;
    PCA in some sense has no underlying model at all, and ADMIXTURE’s model is
    a “star shaped phylogeny” where populations all split from a root population at the same time
    (and then remix). Sometimes this model is unrealistic and the results can be misleading,
    as Razib points out.

    Nick Patterson (Broad Institute)

  2. Charles Nydorf

    I am looking at the provisional version of the article and finding it frustrating. The authors found a small amount of geographical intra-population variation within the Ashkenazim but ‘Addistional File 5’ that shows this does not have a legend explaining the color coding.

  3. marcel

    A minor niggle, and more history of European Jews than you may be interested in.

    It is known that to a first approximation Ashkenazi Jews, that is, the Jews of Europe

    I do not believe that this is quite right. While the Ashkenazim seem to have come originally from groups up and down the Rhine river and neighboring regions, they moved east during and after the Crusades, especially the first Crusade. Until the 19thC, the small Jewish populations in France, the Lowlands, and England (I don’t think there were any populations of note elsewhere in the UK) were almost entirely descended from people exiled from Spain a couple of hundred years earlier. The same is true of Greek Jews, and a bit less firmly, of Italian Jews, though here I think as you move further south, especially away from Austria, the Ashkenazi influence becomes negligible.

    It is certainly true that at the end of the 18th C, there were few Jews in these parts of Europe, that the Ashkenazim were the lion’s share of European Jews, so to that extent, the quoted assertion is not incorrect, but there remained an identifiable minority of Jews in Europe who were not Ashkenazi. Over the course of the 19th and early 20thC, Ashkenazim immigrated in increasing numbers in areas where the Jewish populations had been predominantly Sephardic (i.e., from Iberia)

  4. It is certainly true that at the end of the 18th C, there were few Jews in these parts of Europe, that the Ashkenazim were the lion’s share of European Jews, so to that extent, the quoted assertion is not incorrect

    yes. i am aware of all the rest (e.g., romaniots, etc.).

  5. Onur

    Technically speaking, Ashkenazi Jews are eastern European (not to be confused with East Europe) Jews and Sephardi Jews are western European (not to be confused with West Europe) Jews. The former community originated along the Rhine area and the latter community originated in Iberia. Both communities seem to be a mixture of ancient West Asian Jews (whose places of origin are not clear) with European indigenous populations.

  6. Charles Nydorf

    A few words on the geographical extent of Ashkenazic Jewry. Ashkenazic Jews are traditionally defined by following some religious practices that are slightly different from those of other religious groups. The area within which these practices were followed coincided with the territory in which Jews spoke Yiddish up till the early 19th century. This area includes Germany along with Alsace-Lorraine, a small part of Switzerland, Austria and Holland. To the southeast it extended into the Czech regions and an area south of the Carpathians taking in Slovakia, Hungary, and subcarpathian Ukraine and Transylvanian Romania. To the north and east of the Carpathians Bukovina and Moldavia were traditional Ashkenazic area. The demographic heartland of the Ashkenaz was the medieval kingdom of Poland-Lithuania. The part belonging to the old Lithuanian Duchy embraced Lithuania, eastern Latvia, Belarus and parts of Ukraine and Poland. The part belonging the Kingdom of Poland included Poland and most of Ukraine.

  7. I wonder how “well mixed” a hybrid population has to be to appear to be an ancestral rather than hybrid component in an ADMIXTURE analysis. The Ashkenazi example suggests that in the right circumstances that it takes less than about sixty generations. The Uygur and Hazara examples trim that a bit and suggest that the circumstances necessary don’t have to be that exceptional.

    Would five hundred years with less endogamy than the Ashkenazi suffice? For example, would a Mayflower Puritan cluster be treated by ADMIXTURE as an ancestral population (presumably very similar to CEU) distinct from the source populations for them in the United Kingdom?

  8. Eurologist

    The solution to the “problem” of getting artificial ancestral populations out of ADMIXTURE or PCA runs seems to be trivial: use proper reference populations, and project the population you want to study onto that (i.e., exclude it from the initial analysis).

  9. #8, oh wow, you should go school the guys at the broad institute! everything is so trivial for you!

    more seriously, this paper illustrates a ‘trivial’ problem with getting ‘proper’ reference populations:

  10. #7, reductio ad absurdum: a mixed-race family would be an ‘ancestral population.’ that’s why you remove relatives out of these analyses. but it shows that with enough IBD in an admixed group you can create a new group. one can actually argue that that is what the ashkenazi jews are.

  11. Eurologist


    I only addressed one of several issues, and that one indeed seems trivial compared to the others.

    Given that I never shy away from admitting that I am no expert, I certainly didn’t mean to sound condescending. As I mentioned, it seems to me if one wants to find out ancestral admixtures to a specific population, then it’s best to exclude that one and do a projection, instead. That way, the studied population doesn’t pretend to be it’s own source and can never form its own cluster (so as to render the analysis useless). So, that’s the trivial part (conceptually; and practically, in PCA – not so sure about ADMIXTURE).

    What are “proper” reference populations? Looks like for this to be meaningful, you want a lot more populations than K-levels, widely spread, and yes, at one point it becomes model- or prior-knowledge dependent (when selecting as clean or cleanly admixed reference populations as possible, and not overloading a particular region or irrelevant outlying groups). For minor contribution, it probably doesn’t matter much, and it’s probably OK to use as few “stand-in” populations as possible. But sooner or later, interpretation also depends on when admixture took place, and whether you are open-minded enough to allow this to have happened any time in the past 50,000 years or so, in Eurasia — rather than starting from the preconceived notion that, in order of descending importance, most was done in the iron age, the bronze age, or towards the final stages of the dominance of agriculture.

    The Reich paper has some of those latter elements of “interpretation difficulty.” For example, as I have mentioned many times, large parts of West Asia and Europe clearly descend from populations originally residing in the subcontinent – ~45,000ya. Add some continuous gene flow and the well-known climatic barrier within India, just before LGM parts of Pakistan and NW India likely were genetically closer to people in the Caucasus/Anatolia/Eastern Europe than to people in S and SE India and SE Asia. So, IMO, what today may look like recent / IE admixture may have a base-level cause that dates back 25,000ya (the large diversity of y-DNA haplogroups R, R1, and R1a/ R1b in the subcontinent is also of interest in this context).

    Conversely, if the goal is to identify a person as belonging to a particular group, one could do the opposite and climb up the k-ladder until (almost) every group has become distinct.

    Finally, if the algorithm has no way to put proper weights on the groups, then it makes most sense to me to pare down to equally-sized groups.

  12. Interesting

    This maybe khazar ancestry showing up.

    As a Jew I have no problem with this. However, it is kind of a hot button issue among some. As it has been a claim that anti-Semites have made in the past.

  13. Onur

    This maybe khazar ancestry showing up.

    Where do you see that?


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar