Genetic variation in the Caucasus

By Razib Khan | May 15, 2011 3:05 pm

The Pith: There is a very tight correlation between language and genes in the Caucasus region.

If the Soviet Union was the “The Prisonhouse of Nations,” then the Caucasus region must be the refuge of the languages. Not only is this region linguistically diverse on a fine-grained scale, but there are multiple broader language families which are found nowhere else in the world. The widespread Indo-European languages are represented by Armenians, Greeks, and Iranians. The similarly expansive Altaic languages are represented by the Turkic dialects. But in addition to these well known groups which span Eurasia there are the Northwest Caucasian, Northeast Caucasian, and Kartvelian, families. These have only a local distribution despite their distinctiveness.

On the one hand we probably shouldn’t be that surprised by the prominence of small and diverse language families in this rugged region between Russia and the Near East. Mountains often serve as the last refuges of peoples and cultures being submerged elsewhere. For example, in the mountains of northern Pakistan you have the linguistic isolate of Burusho, which has no known affinity with other languages. Likely it once had relatives, but they were assimilated, leaving only this last representative isolated in its alpine fastness. The once extensive Sogdian dialects (Sodgian was once the lingua franca between Iran and China) are now only represented by Yaghnobi, which persists in an isolated river valley in Tajikistan. How the mighty have fallen! But the mountains are always the last fortresses to succumb.

ResearchBlogging.orgBut the Caucasus are peculiar for another reason: they’re so close to the “action” of history. In fact, history as we know it started relatively near the Caucasus, to the south on the Mesopotamian plain ~5,000 years ago. Therefore we have shadows and glimmers of what occurred on the south Caucasian fringe early on, such as the rise and fall of the kingdom of Urartu ~3,000 years ago. The ancient ancestors of the Georgians even show up in Greek myth, as the Colchis of Medea. And this was a busy part of the world. Hittite, Greek, Roman, and Arab, came and went. The rise of Turkic resulted in the marginalization of many of its predecessors. Some scholars even argue that the Indo-European and Semitic languages families issue from the north and south fringes of the Fertile Crescent, respectively. And it isn’t as if history has skirted the Caucasians. The Georgians faced the brunt of the Mongol armies, while the Circassians have famously been present across the greater Middle East as soldiers and slaves.

Ultimately it seems that geography can explain much of the sui generis character of the Caucasus in relation to adjacent regions. The homogenizing impact of large political units such as Byzantium, Persia, the great Arab Caliphates, Russia, and the Ottomans, was dampened by the fact that the Caucasus was often administered indirectly. The cost of conquering valley after valley was presumably prohibitive, and the natives could always retreat to the mountains (as the Chechens did most recently in the 1990s).

A new paper in Molecular Biology and Evolution illuminates the genetic relationship of Caucasian peoples, both within the region, and to groups outside of it. Parallel Evolution of Genes and Languages in the Caucasus Region:

We analyzed 40 SNP and 19 STR Y-chromosomal markers in a large sample of 1,525 indigenous individuals from 14 populations in the Caucasus and 254 additional individuals representing potential source populations. We also employed a lexicostatistical approach to reconstruct the history of the languages of the North Caucasian family spoken by the Caucasus populations. We found a different major haplogroup to be prevalent in each of four sets of populations that occupy distinct geographic regions and belong to different linguistic branches. The haplogroup frequencies correlated with geography and, even more strongly, with language. Within haplogroups, a number of haplotype clusters were shown to be specific to individual populations and languages. The data suggested a direct origin of Caucasus male lineages from the Near East, followed by high levels of isolation, differentiation and genetic drift in situ. Comparison of genetic and linguistic reconstructions covering the last few millennia showed striking correspondences between the topology and dates of the respective gene and language trees, and with documented historical events. Overall, in the Caucasus region, unmatched levels of gene-language co-evolution occurred within this geographically isolated populations, probably due to its mountainous terrain.

In some ways this is a paper which would have been more in keeping with the early 2000s. It focuses on Y chromosomal markers, so the direct male lineage. This is contrast to the sort of analyses which focus on hundreds of thousands of autosomal markers across the genome. But there are some benefits to focusing on Y chromosomal lineages, which are highlighted within this paper. First, one can construct very precise trees based on the mutational distance of individuals. Haplogroups can be subdivided cleanly into haplotypes with treelike phylogenetic relationships by comparing mutational differences. Second, one can use molecular clock methodologies to peg the timing of the separation between two clades.

I don’t have a good natural grasp of the ethnography of the region, nor am I very well versed in the phylogeography of Y chromosomal lineages (at least in relation to some of the readers of this weblog), so I won’t go into specifics much (see Dienekes Pontikos’ comments). The main step forward here is the enormous sample size and fine-grained coverage of the ethnic groups across the Caucasus. In a region of such linguistic diversity and geographic fragmentation this is of the essence. They found a 0.64 correlation between variance in genes and language, and 0.60 correlation between variance in genes and geography. Because geography and language are so tightly linked in the Caucasus they couldn’t obtain statistically significant results when one variable was controlled, but language seems to be a bigger predictor than geography.

The following two maps show the distribution of haplogroups across Caucasian populations, as well as how they relate to other groups. A general affinity with Near Eastern groups is evident in this simply through inspection:

In classic fashion the authors found a very tight correlation between the phylogenetic trees generated from Y chromosomes and linguistics (the Dargins being the exception):

Many researchers, such as Marcus Feldman, assume that this sort of correspondence is a natural outgrowth of the fact that gene flow tends to be demarcated by dialect continuums. By this I mean that intermarriage between two groups all things equal is going to be favored if there is linguistic comprehensibility. In the pre-modern era before “standard” languages codified from on high this means that genes would flow from tribe to tribe, with subtle differences of dialect, which nevertheless would remain intelligible. That is until you encounter a language family barrier, where despite borrowings across the chasm intelligibility is simply not possible. In the Balkans the Slavic languages of Bulgarian and Macedonian reputedly exhibit a dialect continuum. But the barrier between these two languages and Greek is not just one of subtle shading, but deep differences.  This seems to be at work in the Caucasus, where the chasm is even greater in linguistic terms (Greek and Slavic langauges are both Indo-European, though I suspect that at that level of distance there isn’t much of a difference if it was Greek to Georgian or Slavic to Azeri).

There are lots of details in the paper, ranging from a synthesis with archaeological evidence for the development of Caucasian cultural complexes derived from Near Eastern sources, to the timing of the separation between the major language families or sub-families. The weeds here are beyond me to be frank. So what can we conclude from this specific case to the generality?

At some point in the near future we’ll have thick and robust data sets like this for many regions of the world, so this may be a preview of what is to come. This is focusing on the Y chromosomal lineages, and we must remember that male mediated ancestry can exhibit consistent differences from female mediated ancestry. I no longer am very confident of the finding from comparisons of mtDNA and Y chromosomal variation that the majority of human gene flow has been female mediated because of patrilocality. But this may be at work in some areas. In general the scholars, such as Bryan Sykes, who have looked at the phylogeography of uniparental lineages tend to notice a difference between Y chromosomal and mtDNA patterns, whereby the former were subject to much clearer partitioning between groups (e.g., the Wales-England border) than the latter. The natural inference is that this is a hallmark of “man the warrior,” as male linages eliminate and marginalize each other in the “great game” of genetic competition. Over the short term in the pre-modern world there is a zero sum aspect to this, populations are relatively constant, and so for Genghis Khan to be fruitful other men must be pushed aside. This does not necessarily entail slaughter. Bonded or landless men may not reproduce their genes, or, their reproduction may be sharply diminished. A few generations of differential fertility can quickly lead to major differences in the distribution of ancestry.

Assume for example that at generation 1 population A outnumbers population B by a factor of 20. Assuming that A has a replication of 0.95 per generation and B 1.20 per generation, how many would it take for B to overtake A in total numbers? 13 generations. We have examples from the New World where Iberian Y chromosomal lineages have totally replaced Amerindian ones among the racially mixed population, while preserving Amerindian mtDNA. In areas with generations of European male migration the total genome content has become overwhelmingly male, but the mtDNA still shows the signature of the founding Amerindian population.

I am willing to be that for the Caucasus we would see much less distinction on the mtDNA if the same study was replicated with the same individuals. The major explanation for why this would not be so from my perspective would be if the original male Near Eastern groups arrived and intermarried with sharply distinctive local female lineages, and these distinctions have been preserved over time through endogamy, whether culturally conditioned (language barriers) or geographically necessitated.

Finally, on the broadest canvass these sorts of findings should make us question the contention that nationality is a totally modern invention. These language and genetic clusters clearly denote populations which have deep differences which have persisted and emerged over thousands of years. This has resulted in a “Balkan powder-keg” in our time (e.g., the Russian government backing the Ossetes against Chechens, and so on) . To some extent contemporary conflicts are rooted in the exigencies of the present. But, they often also utilize preexistent differences and allegiances which have deep time roots. Dismissing these differences as purely socially constructed epiphenomena is I think the wrong way to approach the question.

Citation: Oleg Balanovsky, Khadizhat Dibirova, Anna Dybo, Oleg Mudrak, Svetlana Frolova, Elvira Pocheshkhova, Marc Haber, Daniel Platt, Theodore Schurr, Wolfgang Haak, Marina Kuznetsova, Magomed Radzhabov, Olga Balaganskaya, Alexey Romanov, Tatiana Zakharova, David F. Soria Hernanz, Pierre Zalloua, Sergey Koshel, Merritt Ruhlen, Colin Renfrew, R. Spencer Wells, Chris Tyler-Smith, Elena Balanovska, & and The Genographic Consortium (2011). Parallel Evolution of Genes and Languages in the Caucasus Region Mol Biol Evol : 10.1093/molbev/msr126

CATEGORIZED UNDER: Anthroplogy, Genetics, Genomics

Comments (17)

  1. Onur

    This study is essentially a study of the highland Caucasus, the most isolated areas of the Caucasus, so it is not so surprising to find a clear correlation between languages and genetics in this study, especially as the Northwest and Northeast Caucasian language families are probably very old in the Caucasus (maybe so old as to be directly linked to the first Neolithic colonizations in the Caucasus).

    But the lowland Caucasus and Transcaucasus populations, which are less isolated than the highland Caucasus populations, aren’t included in this study. Previous genetic studies suggested that the Transcaucasus and lowland Caucasus populations have genetic structures correlated with geography rather than languages and were genetically very close to each other respectively despite speaking very different language families and religious differences.

  2. John Emerson

    Wixman’s book here: talks about lnguage differences and public policy, etc. in the Caucasus (Azerbaijan, though). One thing that he reports is that marriages were regarded as mixed marriages only if the religions were different, not according to language group. Don’t know the significance of that but it would be worth looking at.

  3. Onur

    Over the short term in the pre-modern world there is a zero sum aspect to this, populations are relatively constant, and so for Genghis Khan to be fruitful other men must be pushed aside. This does not necessarily entail slaughter. Bonded or landless men may not reproduce their genes, or, their reproduction may be sharply diminished. A few generations of differential fertility can quickly lead to major differences in the distribution of ancestry.

    The dating of the spread of the so-called “Genghis” haplotype to the Mongolian conquests is incompatible with history. The Mongolian conquests did not affect the conquered territories to have such a relatively big genetic impact; in fact, conquering Mongolians were so few in number that they couldn’t even spread their language and religion, not to mention culture, in any of the conquered territories beyond Mongolia (including Chinese Mongolia and Russian Mongolia) and quickly disappeared in almost all of the conquered territories. The Turkic expansion, which began roughly 1000 years before the Mongolian conquests, and previous Altaic expansions explain the spread of that haplotype much better than the Mongolian conquests.

  4. * The link in the linguistic tree illustration of Indo-European as being closer to North Caucasian than to Kartvelian or other languages of the region is, so far as I know, novel. Of course, the Ossets could simply represent a genetically Northwest Caucuasian population on the boundary between Northwest Caucuasian languages and Indo-European languages than experienced language shift at a possibly recent point in time.

    * The language/genetic concordance does point to a common North Caucuasian proto-language and genetic population at some remote time, as common source that is not infrequently questioned.

    * Chechens are Y-DNA hg J2 heavy with a minor J1 component; Dagestanis, in contrast, as Y-DNA hg J1 heavy with very little hg J2. Generally, one tends to associate J2 more with Anatolia and Iran, and J1 more with Arabia. The J1 branch is very basal, the J2 branch is not particularly so. Both, however, are within the NE Caucuasian branch. Could this indicate that the Dagestanis are an older layer than Chechens?

    * The European distributions of the two G2a hgs and J1* see very similar to each other. So do the European distributiosn of J2a4b and I2a. This is suggestive of the possibility that these hgs dispersed as a parts of two distinct migrations.

    * The close link between genes and language in many of the cases makes mutation rates as a tool to date the language groups very attactive, particularly accompanied by linguistic phylogenetic dating, although Dienkes has made a quite convincing case that the conventional Y-DNA mutation rate dates tend to be about a factor of three too high.

    The urge to get dates from genes and languages is particularly great here because the languages in question almost surely have pre-historic roots and the “weeds” of archaeological, linguistic and other data in close proximity make it hard to calibrate this date as easily as we can in some other instances (e.g. Japan’s lingustic diversification that happened almost all in the historic era).

  5. , conquering Mongolians were so few in number that they couldn’t even spread their language and religion, not to mention culture

    stupid. the mongols were mostly shamanists at this time. what kind of person thinks that shamanism is going to overwhelm institutional religions? like how the turks imposed it on the muslims and christians they ended up ruling? stop saying stupid stuff if you are going to make a lot of assertions which can be disputed (e.g., hazaras just fall out of your mental map). make a proactive argument that i can examine with some detailed moving parts.

  6. Onur

    stupid. the mongols were mostly shamanists at this time. what kind of person thinks that shamanism is going to overwhelm institutional religions? like how the turks imposed it on the muslims and christians they ended up ruling? stop saying stupid stuff if you are going to make a lot of assertions which can be disputed.

    Not at al. Anglo-Saxons spread their language and pagan religion to the Christian British natives and Slavs spread their language and pagan religion to the Christian Balkan natives. Besides, Central Asian Turkic people were shamanist before converting to Islam preserving many of their shamanist beliefs and practices even after converting to Islam, and also there were still large numbers of shamanist Turkic people in Central Asia during the Mongolian conquests.

    e.g., hazaras just fall out of your mental map

    I didn’t mention them because they are a very tiny (compared to the total territory conquered by Mongols) remnant population.

  7. #6, make a serious argument now. you didn’t say anything new. if you think i don’t know your examples you’re daft. but your argument is so weak as to make me laugh. i am not asking politely. don’t assert. i’m not impressed by your knowledge, i don’t see it surpassing mine. you know what i expect, now step up or shut up.

    lay out your specific argument in detail about the influence of turks. you rarely say anything i don’t know, so i want to evaluate your argument in detail.

  8. Onur

    Khan, religion was just a small part of my argument, but you somehow fixated on it. I mentioned religion just as an auxiliary element to language. If Mongolians had spread their language and not their religion, not much would change in my evaluation of their genetic impact. But the examples I gave in #6 are valid for the issue we are discussing.

  9. hey, do you know how to read english? you made a lot of assertions in #1 like you were big shit. put up. don’t fixate on follow up comments. LAY OUT YOUR ARGUMENT IN DETAIL, or shut up. this is not a domain that i’m ignorant in, so let me evaluate your reasoning. what you’ve laid out is vague and impressive to people who might not know of the long term persistence vof shamanism in the chagaitai khannate as a comparison point to the mission of cyril and methodius or the roman mission to the saxons in the early 7th century. you’re talking to someone capable of evaluating a thick comparative argument, and who has for example read books on the christianization of the saxons, the rise of the slavs in the balkans, and the history of the different mongol khanates. if you have a thick contingent argument, let’s hear it.

    make your argument now. if you don’t have the time or energy to do so, that’s fine. but don’t make comments like #1 then. that’s not a request.

  10. Onur

    You are free to believe what you want, but I am not the only one linking the spread of the haplotype in question to the Turkic expansion and the former Altaic expansions, there are lots of scientific material on this issue available on the Internet.

    you made a lot of assertions in #1

    but don’t make comments like #1 then

    You should have meant #3.

  11. “what kind of person thinks that shamanism is going to overwhelm institutional religions?”

    The strongest examples would be the staying power of Chinese folk religion and Shinto in the face of institutionalized religions like Buddhism. Shamanist religious scheme like these didn’t overhwhelm institutional religions in these cases, but did managed to find a cultural niche in which it could survive vibrantly. Shamanism turns out to be capable of living symbiotically with other institutional religions relatively well.

    It wouldn’t be so hard to imagine Mongolian conquerers more intent on molding their subjects in their own image formalizing their shamanistic religious practices into something akin to Shinto. It would need some kind of institutionalization in a class of shamans and less personalized forms of rituals, but certainly could have evolved in that way.

    Certainly it is remarkable how seemingly little of a cultural legacy the Mongolians left despite their vast empire. In what proportion of their former empire have they left a cultural trace that could be discovered if you didn’t have historical evidence that they were there? Their genetic trace seems to have been greater than their cultural legacy.

    Similarly, the survival of the Hindu religion in India, which is the most direct healthy descendant of the Indo-European religious system (indeed, perhaps the last continuously living descendant), has overcome temporary advances by other institutionalized religions and brought to a standstill the spread of others. Of course, Hindus are not shamanists. But, the Hindu example does argue against the theory that, in general, some kinds of religions have inherently more staying power in a culture than others. Their is not some natural hierarchy of religions on a scale of cultural fitness. An alternative theory is that religion is largely part of the baggage that comes with a larger cultural package that causes one set of cultural imperalists to prevail over another culture, which isn’t necessarily all that intinsicly related in detail to the success of the cultural imperialists.

  12. #11, the anglo-saxon and balkan slav examples were better, because the institutional religions actually seem to have whithered and collapsed. in contrast in east asia, and frankly much of the world, shamanism is subordinated and integrated into institutional religion. you see this in local cults in hinduism, bon-po with tibetan buddhism, the persistence of female shamans in korea, etc. shamanism formalized to the point where it could spread is institutionalized. just like state shinto was, but that was an artificiality imposed relatively recently.

    i’m not too surprised that the mongols had little cultural impact. the arabs spread islam and arabic, but the latter only fitfully in places like persia. and much of islam itself is not a purely organic derivation of arab culture, but a synthesis of arab folkways with the models of xtianity and judaism. the statecraft of the arabs empires derived from byzantium and persia, not the early rightly guided caliphs.

  13. p.s. one reason the genghis khan model for that haplotype is moderately persuasive though is that one legacy of the mongols was the necessity for a male line descent from the “golden family” across much of eurasia as a precondition toward legitimacy of rule. this is a plausible cultural explanation for why that Y lineage might spread disproportionately. but the influence of that norm outran the mongol domains; my surname is just a title which spread to south asia due to the influence of turco-mongol norms and society.

  14. “my surname is just a title which spread to south asia due to the influence of turco-mongol norms and society.”

    How old are patriline surnames in South Asia?

    Matronmyics and patronymics were used rather than patrline surnames, if surnames were used at all, in Northern Europer prior to the 19th century. Lots of prominent Medieval Europeans had a single name with a geolocator (e.g. Francis of Assisi, Martin of Tours) or other descriptive distinguisher, rather than a surname. I’m not very familiar with the timing of surname use elsewhere.

  15. How old are patriline surnames in South Asia?

    it’s complicated. khan is less a surname than a title. like a caste name in hinduism. it was a mughal era honorific and denoted someone of a specific rank. western style surnames are very new. the naming conventions in my family, like among many non-western people, are kind of complicated and baroque.

  16. pconroy

    #14, depends on where in Northern Europe? Your comment is probably true for Scandinavia, but not for Ireland, where surnames are some of the oldest/earliest in Europe, and in use for about 1,000 years.

  17. ryanwc

    Hmm. My reply seems to have disappeared.

    Well, the key thing I wanted to mention is that a critical factor in the spread of Arabic was that it spread primarily in areas with a Semitic base, and hardly exists outside areas where Punic or Aramaic were already established. Punic survivals are attested in Libyan Arabic, Aramaic survivals in Levantine Arabic.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar