R1a1a conquers the world…in a few pulses?

By Razib Khan | October 31, 2012 2:52 am

As many of you know around the year 2000 the analyses of Y chromosomal human lineages became a pretty big deal. The reason these lineages are important and useful is that they record the uninterrupted ancestry of males, from father to son, along the Y chromosome. Instead of the complexities of the whole genome, as with mtDNA you have a simple and elegant phylogenetic tree to interpret. The clusters along this tree are defined as broad haplogroups, united by derived states from a common ancestor. One of the largest haplogroups is R1a1a. It happens to be my paternal lineage, as well as Dr. Daniel MacArthur’s and Dr. Zack Ajmal’s.

The map above illustrates the peculiarity of R1a1a: it is geographically enormously expansive. How to explain this distribution? A naive response might be that this distribution is surprising similar to that of the Indo-European languages. Unfortunately this runs up against the conundrum that low caste South Indian groups, relatively untouched by Indo-Aryan culture (at least until the past few hundred years), also manifest high frequencies of R1a1a.

To make a long story short it seems that R1a1a is an old haplogroup with a lot of structure across Eurasia. Maju points me to a paper in American Journal of Physical Anthropology which simply & elegantly brings home to us some obvious insights, New Y-chromosome binary markers improve phylogenetic resolution within haplogroup R1a1:

Haplogroup R1a1-M198 is a major clade of Y chromosomal haplogroups which is distributed all across Eurasia. To this date, many efforts have been made to identify large SNP-based subgroups and migration patterns of this haplogroup. The origin and spread of R1a1 chromosomes in Eurasia has, however, remained unknown due to the lack of downstream SNPs within the R1a1 haplogroup. Since the discovery of R1a1-M458, this is the first scientific attempt to divide haplogroup R1a1-M198 into multiple SNP-based sub-haplogroups. We have genotyped 217 R1a1-M198 samples from seven different population groups at M458, as well as the Z280 and Z93 SNPs recently identified from the “1000 Genomes Project”.

The two additional binary markers present an effective tool because now more than 98% of the samples analyzed assign to one of the three sub-haplogroups. R1a1-M458 and R1a1-Z280 were typical for the Hungarian population groups, whereas R1a1-Z93 was typical for Malaysian Indians and the Hungarian Roma. Inner and Central Asia is an overlap zone for the R1a1-Z280 and R1a1-Z93 lineages. This pattern implies that an early differentiation zone of R1a1-M198 conceivably occurred somewhere within the Eurasian Steppes or the Middle East and Caucasus region as they lie between South Asia and Eastern Europe. The detection of the Z93 paternal genetic imprint in the Hungarian Roma gene pool is consistent with South Asian ancestry and amends the view that H1a-M82 is their only discernible paternal lineage of Indian heritage.

The table to the left shows you an Indian population from Malaysia. Malaysian Indians tend to be Tamils, from the south of the subcontinent. If they were finding individuals who were carriers of R1a1a, the data set is probably somewhat enriched for Tamil Brahmins and people of North Indian ancestry, though this does not alter the basic story. What you see is that all the Indians carry this one distinctive mutation. I find it unlikely that all these Malaysian Indians are Brahmins or North Indians, especially given that there is a non-trivial proportion of R1a1a in Tamil lower castes. So here you have a population with is probably representative of Indian Y chromosomal phylogeography before the Indo-Aryans arrived. Second, you see that M458 is well represented among Hungarians. This makes sense, insofar as this is a very common variant in Eastern Europe. Z280 also seems to be found in northern Eurasia. An interesting aspect is that in the Uzbek sample z93 has a high frequency. The Uzbeks are an admixed population. A Turkic component overlain atop an Iranian substrate. The frequency of Z93 suggests to me that the Eastern Iranians share common ancestry with South Asians. This is not a revolutionary finding, but it does imply that Z93 may have come, in part, with the Indo-Aryans (i.e., there were two or more waves of Z93).

The authors note that Z458 and Z93 carrying individuals exhibit “star like” phylogenies when STRs were analyzed. They are the top two panels. The Genghis Khan haplotype exhibits a star like phylogeny. In other words, it’s indicative of rapid expansion from a small founder group. In contrast, they argue that Z280 carrying Y chromosomes do not exhibit a star like phylogeny. The implication being that it did not undergo the same expansion. Dates of expansion (looking at the most recent common ancestor) for Z458 and Z93 are pegged to 7 and 10 thousand years before the present. I don’t put much stock in these dates personally, but I thought I’d relay them.

What can we say from this? If these results hold what they tell us is that R1a1a is a very lucky haplogroup, and its current range is a function of multiple expansions from a common and diverse R1a1a pool, probably in Central Eurasia. The presence of Z93 in Uzbeks, and Mongols, suggests to me that this variant was and is present in Iranians. Therefore, I don’t think that Z93 is indigenous to South Asia, but is intrusive. I believe it arrived with the “Ancestral North Indians.”


Comments (23)

  1. Davidski

    Gah..this paper is already out of date.

    Z280 and Z93 overlap in Europe, both in Western and Eastern Europe. On the other hand, South Asian R1a is all L342, which is under Z93.

    Here’s the latest tree…


  2. Something I am noticing as I read you and Davidski’s complaints is that all these haplogroups were already listed by ISOGG and that the two European clades are part of a larger haplogroup: R1a1a1b1a (S198/Z282), while the Asian clade is a distinct “brother” lineage to this one: R1a1a1b2 (S202/Z93). So with the data of this paper alone, R1a1 or at least its main subclade R1a1a1b1 (S339/Z283), is equally split between South/Central Asia and Europe.

    It’d be interesting to get some info on upstream “asterisk” (and the like) frequencies. From memory there was high concentration of such top level “asterisk” individuals in Pakistan and NW India but my knowledge of R1a is rusty, stuck some years ago, so I’d appreciate well informed data on this key aspect – even if just to refresh my memory.

    Besides that, Malaysian Indians are (per Wikipedia) 85% Tamils, what makes them not really representative of South Asian R1a genetics overall. Particular interest should be in documenting Pakistan and North India, which previous papers and miscellaneous data suggest as the core of R1a basal diversity (top level “asterisk” paragroups and such only showed up over there or almost).

    I specially agree with you, Razib, in that Indo-Aryan conquest does not look like a probable cause of one of the most common haplogroups of South Asia, much less considering that IA expansion took place in the Bronze to Iron Age, when South Asia already hosted great densities of farmer populations.

  3. Davidski

    ^ All R1a from South Asia (India + Pakistan) tested to date for a wide variety of SNPs is L342+, which is just below Z93. European Z93 are overwhelmingly L342-, plus there are lots of other SNPs in Europe that aren’t found in South Asia.

    How’s that Maju? Do you find that informative enough?

  4. Check Underhill 2010 (supp. tables):

    R1a* is found only in West Asia (Iran, Turkey, Gulf emirates)

    R1a1* is found also in Iran and then towards Europe: Caucasus, Greece and Scandinavia.

    R1a1a* is widespread (India, West and Central Asia, Europe) but now that I think of it it may not be R1a1a(xR1a1a1) but some other larger category, what is not helpful.

    “Do you find that informative enough?”

    You cite no source. You’re asking me to believe in your word blindly, what I’m not ready to do.

  5. Karl Zimmerman

    As someone with a R1a1a patrilineage myself, I hope 23andme updates to include these sub-haplogroups soon. My great-grandfather was (presumably) 100% German (from near Berlin, so possibly a Germanized Slav), but he was unusually dark for a German, leading many to joke he might have actually been a Roma or Jewish behind his back.

  6. Take away points for me:

    * R1a1a was present at significant frequencies in pre-Indo-Aryan India, so Y-DNA estimates of patriline Indo-Aryan admixture in India are probably overstated, particularly in Southern India. This tends to show that the Indo-Aryan superstrate thinner than previously believed, particularly in Southern India.

    * More generally, R1a1a-Z93, in general, is probably a pre-Indo-European genetic legacy where ever it is found. Until this analysis, any population that was overwelmingly R1a1a would look like a strong candidate for near total replacement by Indo-Europeans. But, if most of the R1a1a in one of those populations was R1a1a-Z93, then the language shift of that region to Indo-European languages looks more like superstrate driven language shift.

    * Assuming that Razib is right in his suspicion that R1a1a-Z93 is a legacy of a pre-Indo-Aryan component of Ancestral North Indian populations intruding into South India, the question is who was involved and when. The obvious candidate here is the Harappans, and the obvious time would be around the time of the South Indian Neolithic revolution, about a thousand years pre-Indo-Aryan. But, this certainly isn’t the only possible scenario.

    Wouldn’t it be great to be able to retrieve some ancient DNA from a pre-Indo-Aryan Harappan cemetery. (e.g. in the desert that arose when the Saravasti River was diverted and ceased to water the land) to answer these questions?

    * Y-DNA estimates of patriline European admixture in European Roma are probably overstated. Hence, the Roma have probably been quite a bit more endogamous than previously believed. This also favors a “fast” migration from India to Europe with few intermediate stops where local R1a1a would have been acquired en route, as opposed to a gradual migration scenario.

    * This pulls the “center of gravity” of R1a1a haplogroups a bit closer to India.

    * This makes me very interested in the subtype assignment of both ancient Tocharian Y-DNA samples (previously found to be R1a, at least) and modern Y-DNA from the Tarim Basin where Tocharians admixed with East Eurasians. Presumably, the Tocharian sub-haplogroup of R1a is among the sub-haplogroups found in the current admixed population even if other new sub-haplogroups have been introduced since then.

    The leading view is that the Tocharians were not Indo-Iranians, and hence should have different subhaplogroup assignments than Uzbeks and Mongolians with R1a1a. Examination of this could either confirm or contradict this assumption in a way largely independent of the other evidence. It would also shed new insight into the specific subgroup of R1a1a that may have been associated with the Proto-Indo-Europeans, since there is no good reason to suspect significant admixture of people with Proto-Indo-European ancestry and people without in among the Tocharians. They settled what was pretty much virgin territory when they arrived.

  7. Eurologist

    Is this a reasonable model: R1a1a1 was present in some (W to N ?) Pontic region, spread with agriculture and perhaps with early steppe technologies, and then later spread again with IE and metallurgy intruding into both agricultural and steppe cultures? Finally, certain, very specific subgroups can be associated with Slavic expansion and with Roma on top of that.

    As for IE in most of Europe, I tend to think that it spread from the much more numerous agriculturalists of the W Pontic region rather than originating from steppe folks. There are known, peaceful interactions between these two before climate deteriorated and more people adopted a pastoralist lifestyle in the region, and before “Kurgan expansion.”

  8. Justin Giancola

    Ok Maju –

    no disrespect, but I see you doing this all the time – and you seem to try very hard to master english language and grammar – but in english we say especially…just so you know. Maybe this is some British thing I am not aware of, but I wanted to look out for you as you do a worthy job with english.

  9. Thanks for your interest in my grammar, Justin, but Wikitionary has both forms: especially and specially.

    Anyhow I’d suggest that you get used to have English corrupted by non-native users: it’s the price to pay for being the lingua franca – it happened to Latin as well.

    For me “especial” and “special” are just alternative spellings with nuances proper of a PhD linguist, sincerely. But that’s only thanks to you, previously I thought “especial” was a Spanish-influenced typo in fact.

  10. Davidski

    Hey Maju, you crazy Mediterranean farmer, here’s a map of old R1a clades with some nice sources.


    This is how it went…West Asia > Europe > Central Asia and South Siberia > South Asia

    The spread from Europe was of R-M417, with the proto-Indo-Europeans. Believe it.

  11. Crazy Atlantic farmer (or fisherman or blacksmith…) if anything, mind you. The only Mediterranean ancestry I have is from Italy.

    Your graph, Polako (Davidski), is beautiful but making any claims based only on FTDNA data (whose catering is very limited to certain regions, because it’s a commercial service), as I feel it is the case for R1a1a*, is not honest (or, if it is honest, it is wrong anyhow).

    Anyhow I find impossible to conceive any archaeologically solid prehistorical frame for an expansion from Western Europe (where according to your map is the only place where R1a1a* exists – ???) to the rest of the world: Eastern Europe, Central Asia and South Asia, where the haplogroups considered in this paper exist.

    So there must be more R1a1a* in West, Central or South Asia. Where exactly? I bet for Iran, because it’s showing all the phylogenetic levels but we’ll see in due time.

  12. Sandgroper

    You explained that very clearly and unambiguously to my native (international) English speaking eyes, Maju, which is as good as it ever needs to be, and a good deal better than some of the loose cannons around.

    When I try to get Davidski’s mapski, I just get ‘login blocked’ for reasons that are a mystery to me. I must be too stupid.

  13. Davidski

    Sandgroper, you can find all the maps here…


    Right click and open in a new window if simply clicking doesn’t work.

  14. Justin Giancola

    @9 – Cool man 🙂

    “previously I thought “especial” was a Spanish-influenced typo in

    I was suspicious of just that, ha!

  15. Tibor Fehér comments on the paper (which was submitted a year ago) and what was left out here:

  16. #15:

    “… the reviewers were so narrow-minded that we had finally to drop all FTDNA samples plus the pedigree calcs”.

    Thank the gods for the reviewers! I’m a new convert to peer-review after reading that.

    Besides that, the only commenters with common sense in that thread is Vadu. And he did not say much.

    The notion that “Asia lacks R1a*/M420, R1a1*/M459” is clearly wrong, as even David’s map shows. In fact in targeted sampling (Underhill 2010), R1a* was only found in Asia.

    While the FTDNA map shows R1a* in Italy, Provence, Rhineland and the Atlantic Islands, the apportion must be extremely low because Underhill sampled at least some of those areas and got nothing:

    region n R1a* R1a1*
    England 310 nd 0
    Germany 95 nd 0

    Only with the huge samples for certain regions that commercial companies like FTDA manage could some individuals be detected. Comparing that with:

    Iran 150 2 1

    … for example, is comparing apples and oranges, or more like comparing Mount Everest with a pebble. See supplementary tables at Underhill 2010 for more comparable figures.

  17. Davidski

    Every South Asian R1a tested to date, from all sources, including all the 1000 Genome samples, has turned out not only Z93+, but also L342+, which is under Z93.

    Pretty awesome stuff, eh Maju?

  18. What exactly is “every South Asian R1a tested to date”? Because so far I only know of the Malaysian Tamils used in this paper, which is a poor choice of reference population by all accounts.

    I’m not saying that you are wrong, David, I’m saying that you are systematically failing to demonstrate that you are right. Also your insistence without sufficient or actually any solid evidence on certain hypothetical solution to the questions raisde by the available data makes me doubt of your capacity to make a rational dispassionate analysis of the matter. And it’s not just you, as we can see in that forum thread someone posted above: wishful thinking should not cloud reason… yet it does.

    At the moment, the greatest basal diversity based on peer-reviewed studies (only) seems to be in Iran. I won’t put my hand on fire for that origin because we need more data first of all but that is what the available data seems to be suggesting.

    This could still allow for a back-migration into South Asia (I’m still assuming P originated in South Asia, because of P* and R2) but this migration could have different origins and time-frames (Neolithic flow from West Asia?) What does not seem probable at all is that R1a1a spread from the Franco-Cantabrian region into Eastern Europe and then to India, as your FTDNA-based map could suggest (wrongly).

  19. Davidski

    Every South Asian R1a tested to date = 1ooo Genomes samples, plus private samples from Pakistan and India tested at FTDNA.

    Anyway, the map I posted most certainly doesn’t show a spread of R1a from the Franco-Cantabrian refuge into Eastern Europe and India.

    What it strongly suggests is that R1a entered Western and Central Europe from West Asia during the Neolithic. It then diffused into Eastern Europe with a subset of Central European farmers. That’s why Western Europe now has such high R1a SNP diversity, with old and unique lineages not present anywhere else. And these even pop up in places where people don’t test themselves very much, like Switzerland and Belgium.

    I’m guessing that the Neolithic R1a groups that entered Eastern Europe from the west then mixed with local Mesolithic survivors to form the various kurgan groups, who then set off deep into Asia via the steppes.

    I can’t see any other version of events being plausible based on all the data I’ve seen.

  20. Sandgroper

    #13 – Thank you.

  21. Did FTDNA test Iranians and other West Asians from the 1000 genomes samples? How many R1a indviduals are in those samples anyhow. I ask because I really don’t feel you are providing good information that backs your claims but rather dropping bomblets here and there which may sound good but are not clear contrastable data.

    “Anyway, the map I posted most certainly doesn’t show a spread of R1a from the Franco-Cantabrian refuge into Eastern Europe and India”.

    The map you posted only shows (in red) the R1a1a* segment in SW Europe, from Britain to Yugoslavia going through France, Rhineland, Cantabria and North Italy. It shows no R1a1a* in Eastern Europe whatsoever, what is difficult to conciliate with a Kurgan origin of the downstream nodes R1a1a1, etc. – or with any other origin East of a line that goes from, roughly, Rotterdam to Skopje.

    “That’s why Western Europe now has such high R1a SNP diversity, with old and unique lineages not present anywhere else”.

    That you say (in need of confirmation, in my opinion) would indicate an origin of R1a1a in Western Europe (or Yugoslavia/Albania), regardless of what happened at upstream phylogenetic levels (R1a, R1a1) or downstream ones (R1a1a1, etc.) You should be more strict with phylogenetics: the R1a node and the R1a1a node are NOT the same thing, just like R and R1a are not the same thing either, etc.

    “I’m guessing that the Neolithic R1a groups that entered Eastern Europe from the west”…

    Impossible in my understanding. Certainly not in the Neolithic.

    I think that we need more data, specially from Asia (West, Central and South Asia) before we can jump to any strong conclusions. What is clear on Underhill 2010 is that the R1a ultimate ancestors should have lived in West Asia (Zagros area probably) and, on this paper, that R1a1a1b1 (S339/Z283), to which most R1a people belongs, has two branches one South-Central Asia and the other European-Central Asian.

    All what happened in between remains uncertain so far.

  22. pconroy

    There is a very useful Y-DNA SNP visualization tool – which produces world maps with tags:


    So you can generate Maps of SNP’s:

    and hundreds of others

  23. Sandgroper

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar