The residual of the genes & geography correlation

By Razib Khan | February 28, 2011 4:14 am

David of the Eurogenes Genetic Ancestry Project has a cautionary post up, When is a genetic map also a geographic map? Always and never. In it, he uses a specific peculiar pattern as a launching point into a broader exploration of the relationship between visualizations of genetic variation, and geography. That pattern is that Russians, the most geographically furthest east of European peoples, are closer to the Slavs of Central Europe than the Balts when plotted on the two largest dimensions of variation. I’ve highlighted this pattern from a PCA David extracted from a paper on northeast European genetics. This disjunction between geography and genetics has a pretty straightforward possible explanation: the current distribution of Russian-speaking peoples is a function of a massive demographic expansion to the east by Slavic farmers within the last 2,000 years. We already know that the borderlands between the steppe and the forest were long dominated by North Iranian people, from the Scythians to the Sarmatians, while further north the Great Russians absorbed a Finnic substrate (clear because some of the absorption is attested down to the early modern period).

With that duly noted, I think there’s definitely some margin in more rigorously estimating the deviations from expectation when one attempts to generate a correspondence between a PCA and a geographic map. What I’m imagining is that you simply enter in the positions of various ethnic groups on a real map, and then transpose the PCA with the ethnic labels on top of that map and shift until you maximize the correlations. When the correlations are maximized, stop, and then note where there are the greatest deviations from expectation. Taking example above a vast swath of eastern Europe would show up as a major deviation. Some of these peculiarities will be due to geography. The chasm between Africans and non-Africans will probably be greater than one would expect as a function of distance, but the intervening Sahara presents itself as a good cause. But, when you look at the genetic data sometimes strange and unexpected correspondences emerge. If one can’t immediately spot a reason, than that bears further investigation.

As I’ve given this some thought, I guess I should admit that I’ve fiddled with R’s mapping functions, and also looked for other applications. But the labor input is such that I’ve put off getting deeper into this topic. I’d be curious if anyone else was interested in this sort of intersection between genetic and geographic data visualization. I think maps are pretty much informational gold.

CATEGORIZED UNDER: Genetics, Genomics
MORE ABOUT: Genetics, Genomics
  • bob sykes

    There needs to be a time correction. Populations have moved all over the place, and current locations should correlate poorly or only moderately with descent.

  • onur

    This disjunction between geography and genetics has a pretty straightforward possible explanation: the current distribution of Russian-speaking peoples is a function of a massive demographic expansion to the east by Slavic farmers within the last 2,000 years. We already know that the borderlands between the steppe and the forest were long dominated by North Iranian people, from the Scythians to the Sarmatians, while further north the Great Russians absorbed a Finnic substrate (clear because some of the absorption is attested down to the early modern period).

    I think the reason why the expansion of Slavic farmers had a significant demographic impact in what is now Russia, even when we restrict our focus to the pre-modern expansion and expanded areas, is that before the Slavic expansion what is now Russia, even just its European part, as a land of inhospitable steppes, forests, tundras and extreme climates, was primarily inhabited by nomadic peoples like Scythians, Sarmatians and various Uralic and Altaic folks. As we know, nomadic populations – whether pastoral or hunter-gatherer – usually have small population densities and thus are much more open to significant demographic impacts after migrations from outside than agriculturist/sedentary populations are. A similar significant demographic impact seems to have happened within the last 2,000 years in Central Asia, whose territory was also mostly inhabited by nomads (Scythians), as a result of the Turkic expansion.

    I guess the demographic impact of the Slavic expansion to what is now Russia was more significant than the demographic impact of the Turkic expansion to Central Asia, and even when we restrict our focus to the pre-modern expansions and expanded areas, as, unlike Slavs, Turkic peoples were nomadic too and thus very probably had small population densities.

  • onur

    By demographic impact I mean genetic change in an area due to migrations from outside. Not all migrations come with significant demographic impact even when resulting in the change of language, religion, culture, etc. of an area (as in the phaenomena of elite dominance).

  • onur

    In short, this is what I think is the case: a sparsely populated pre-Slavic what is now Russia (even just its European part) and the resulting significant genetic impact of the Slavic farmer expansion (even just the pre-modern expansion in the pre-modernly expanded areas).

  • Pingback: Genetic diversity of humans « Eikonal Blog

  • chris w

    The Russian sample is from Tver, near Moscow. I wonder if the results would vary if they took a sample from Irkutsk or Vladivostok or the Volga region.

  • Bolek

    It could be Slavic expansion 2000 years ago or 4000 years ago Corded Ware Proto-Slavic expansion. Andronovo was genetically similar to modern Slavs. How do you explain that?

    Fst between Warsaw and Moscow is around 0.0003 and most likely it is so up to Ural mountains and beyond, so Slavs were very homogeneous and their expansion must’ve had a character of demic diffusion over sparsely populated areas inhabited by hunter-gatherers. But such situation didn’t exist 2000 years ago. I was possible 4000 years ago.

    Population increase rate couldn’t exceed 2-3% per 100 years in those times, so to reach density that was observed in Early Middle Ages a much longer time than few centuries was required.

    Moreover Slavs have very old and diverse R1a haplotypes and their distribution fits very well Corded Ware culture expansion, as noted by Underhill at al. 2009:
    http://www.nature.com/ejhg/journal/v18/n4/fig_tab/ejhg2009194f2.html#figure-title

    If we add to it archaic and conservative character of Slavic languages and their similarity to Indo-Iranian languages Corded Ware Proto-Slavic expansion 4000 years ago seems more likely to me.

  • http://washparkprophet.blogspot.com ohwilleke

    With a judiciously defined statistic, it ought to be possible to devise an index that would rate different historical migrations based on the intensity of the population genetic impact. One possible input into that statistic might be deviation between geographically expected PCA points and actual PCA points for a population.

    Thus, at the high end, there would be events like the movement of LBK farmers into the Dieneper-Don basin, and the Slavic farmer migration, and at the low end, there would events like the arrival of Uralic language family speaking elites into Hungary and the demographic impact of Turkish language speaking elites into Turkey. In between might be demographic events like the impact of Iberian migrations into North African Berber populations, or the demographic impact of the Indo-Aryans on South Asia. The fact that it is possible to give these examples at all is suggestive of the idea that the concept is sound enough to be quantified and that we have the kind of data needed to reduce intuitions from that data into a summary statistic for demographic impact.

    This statistic could probably be designed in a way that would impart useful information in a way that would not require knowledge of precisely when the migration took place, or the absolute sizes of the populations, as inputs. Honestly, as someone looking at pre-history, the magnitude of the demographic impact of a major migration is often something that I am more interested in knowing about than more exact dates or absolute population levels. Demographic impact gives you a feel for what kind of event this was for the people involved.

    Such a statistic might be a fruitful alternative to endless discussions of competing “replacement” and “cultural assimilation” models to describe migrations that were a little bit of each in an accurate and quantifiable way. This in turn might also allow for more fine grained consideration of the factors that produce heavy, moderate or slight demographic impacts to take place.

  • Pingback: Linkpost 02-28-11 | Amerika: New Right, Conservationist, Traditionalist, Deep Ecology and Conservative Thought

  • http://entitledtoanopinion.wordpress.com TGGP

    A post from the past. Just a coincidence that I checked out your David Anthony review leading me to that around the same time David published his post.

  • onur

    I think the key is when and where the split of the Proto-Balto-Slavic language into Proto-Slavic and Proto-Baltic happened and the expanding/spread routes of Proto-Slavic and Proto-Baltic after the split. But we should always keep in mind that language has a very variable correlation with genetics.

  • onur

    When we look from a linguistic perspective, Corded Ware or 4,000 years ago is too early for a Proto-Slavic language (I mean as a language already separated from Proto-Baltic) to exist. That is why I favor much later dates (Iron Age in the earliest scenario) for the existence and spread of Proto-Slavic in Eastern and Central Europe. Also there is little doubt that the Slavic existence/expansion even just in the European part of Russia mostly, if not wholly, happened only within the last 2,000 years.

  • onur

    That is why I favor much later dates (Iron Age in the earliest scenario) for the existence and spread of Proto-Slavic in Eastern and Central Europe.

    or in anywhere else in the world, indeed

  • onur

    As for the Underhill et al. study, Woźniak et al. 2010 heavily criticized it for using the evolutionary mutation rate instead of the much more effective pedigree (=germline) mutation rate. For the genetic marker that correlates remarkably well with the distribution of Slavic-speakers today, Woźniak et al. 2010, using the pedigree mutation rate, arrived at much later dates which are much more compatible with the plausible Slavic expansion dates.

  • onur

    As for Andronovo, possessing a high proportion of the R1a Y-chromosome haplogroup isn’t enough to suspect Slavic or Balto-Slavic origins. If in the future studies they are found to possess the so-called Slavic marker sub-clade of R1a, then we may suspect Slavic or Balto-Slavic origins for the Andronovo people. As R1a is dispersed in a very broad area and the most likely language of the Andronovo people was Indo-Iranian (from the Satem group like the Balto-Slavic languages), their high possession of R1a can be easily explained with a theory which postulates that R1a, irrespective of sub-clades, peaks around the homeland of the Satem IE languages area.

  • onur

    Population increase rate couldn’t exceed 2-3% per 100 years in those times, so to reach density that was observed in Early Middle Ages a much longer time than few centuries was required.

    As the expanded territory in question was very probably sparsely populated, Slavic farmers, when expanding in what is now Russia, may have experienced a temporary period of significant (perhaps exponential in many places) population growth (in which they may temporarily have gone beyond the Malthusian limit) until significantly populating large swaths of the European part of what is now Russia.

    Similar temporary periods of significant population growth happened in the sparsely populated parts of the New World during the early colonization by Europeans (in exponential proportions in many places).

  • Bolek

    Onur, Balto-Slavic unity is just a hypothesis. There are many other theories. Baltic and Slavic languages could develop separately and converge due to close contacts, i.e. Baltic converged to Slavic as Slavs are around 100 times more populous and spread over large area.

    Those who do not accept common Proto-Balto-Slavic estimate Proto-Slavic to originate at 3000-2000 B.C., so it fits Corded Ware expansion as Proto-Slavic very well.

    If Slavs and Balts were separate populations genetically, and this is suggested by Polako’s post, then there is no point in talking about Balto-Slavic linguistic unity, it has to be rejected.

  • onur

    Onur, Balto-Slavic unity is just a hypothesis.

    But that hypothesis is probably much more accepted as true than for instance the Altaic hypothesis.

    Those who do not accept common Proto-Balto-Slavic estimate Proto-Slavic to originate at 3000-2000 B.C., so it fits Corded Ware expansion as Proto-Slavic very well.

    But it doesn’t explain the uniformity of Proto-Slavic well into the period Slavs first appeared in the historical record in the 6th century CE.

    If Slavs and Balts were separate populations genetically, and this is suggested by Polako’s post, then there is no point in talking about Balto-Slavic linguistic unity, it has to be rejected.

    I have another hypothesis: Slavs separated from Balts in the transitional period between the Bronze Age and the Iron Age at the earliest from the western part of the Balto-Slavic homeland and then advanced further west mingling with the western folks they encountered. Balts, on the other hand, haven’t moved much west after their separation from Slavs. This and the relative isolation of Balts after their separation from Slavs may explain the eastern position of Balts in genetic tests. In paternal haplogroups Balts aren’t so different from Slavs in the prevalence of R1a. If Andronovo is Indo-Iranian, then the high frequency of R1a may be a characteristic of the hypothetical Proto-Satem (the hypothetical common ancestor of Balto-Slavic and Indo-Iranian)-speaking population.

  • onur

    BTW, I mentioned the Altaic hypothesis for comparison purposes only (as it is off topic).

  • Bolek

    “In paternal haplogroups Balts aren’t so different from Slavs in the prevalence of R1a.”

    There is significant difference. Western Slavs and Southern Slavs do not have N1c which is dominant among Balts, they are not mixed with Balts. Kashubian group in Northern Poland for example is 68.8% R1a and 0%N1c. Only Northern Eastern Slavs are mixed with Balts. Balts are also heavily mixed with Slavs. I think R1a there is mainly Slavic. Slavs have much higher Southern European component in Admixture analysis which comes from Neolithic farmers. Slavs took up farming much earlier and therefore they are more populous than Balts who remained at hunter-gather stage.

    There is strange non-IE substratum in Baltic languages which is absent in Slavic and therefore I don’t believe in Balto-Slavic unity. They were separate in the beginning and then by close contacts linguistic and genetic convergence had taken place. I agree with Polako.

    Don’t see problem with uniformity of Proto-Slavic. Dialects converge and diverge, then converge again etc. Some languages change very slowly. See Greek, not much change in 3000 years. See how close Avestan and Vedic Sanskrit were. If Slavs were homogeneous why should their languages change much? Homogenous communities preserve languages, only when they mix their languages do change fast. Western Europe is a different story, there was a language shift there and probably several times.

    I am not sure about Andronovo, depends on where Indo-Iranian ethnogenesis took place. I think it could’ve been in Central Asia, BCMA maybe. I don’t believe in mixing genes and cultures without language change.

  • onur

    I strongly suspect that the aquisition and rise of N1c1 among Balts is a recent development. In a process much similar to the spread of the Genghisid lineage in parts of Asia, I think the lineage of Rurik, the founder of the Rurik Dynasty of Kievan Rus, which was N1c1 according to the latest research, rapidly spread around the Baltic area following Rurik. This is supported by the fact that Balts have ~ 0% Mongoloid DNA according to autosomal studies and by the widespread claims of descent from Rurik around the Baltic area.

    As for Greek, until the spread of Koine Greek beginning from the Hellenistic period, there were many divergent dialects of Greek, some of which we can classify as different languages.

    As for the similarities between Avestan and Vedic Sanskrit, they were pretty insignificant compared to the similarities between the dialects of Proto-Slavic in the 7th century CE. There is a reason why Avestan and Vedic Sanskrit are classified as different languages by linguistic scholars.

    In short, it is implausible to think that Proto-Slavic was so uniform for so many millennia. It is clear that it is a pretty recent language (most probably from the 1st millennium BCE).

  • onur

    BTW, Rurik was most probably a Fenno-Ugric-admixed Scandinavian or Scandinavianized Fenno-Ugric.

  • onur

    In short, it is implausible to think that Proto-Slavic was so uniform for so many millennia. It is clear that it is a pretty recent language (most probably from the 1st millennium BCE).

    Also its uniformity makes me think that it was spoken in a small area before the 6th century CE. Its diversification into Slavic languages only began after the 6th century CE as a result of the Slavic expansion.

  • Bolek

    Onur, you know nothing about languages. There was no Proto-Slavic in 7th c., it is nonsense. Proto-Slavic is estimated 3000 BC.-2000 BC. Slavic has most archaic and conservative grammar, archaic lexicon. Archaic vocabulary for wagon, wheel and solar religion survived only in Slavic.

    In 5th c there were Western Slavic languages and Eastern Slavic languages. There were significant difference between them. Slovenia was settled from Poland and Slovenian is similar to Western Slavic languages, Bulgaria was settled from Ukraine and it is more similar to Eastern Slavic languages.

    From genetics we know that Slavs occupied huge areas in Central Europe in Bronze Age, just read Underhill at al 2009. The oldest and most diverse R1a is in Oder-Vistula area, this was most probable Slavic homeland.

    N1c in Balts is very old, it is not Finnic or Scandinavian. This is their major haplogroup. R1a is younger than that among Slavs and most probable came from Slavs.

    Greeks understand Homer, so their language has not changed much. Old established languages do not change fast.

  • onur

    Bolek, it is a waste of time to refute you with more replies, as you are writing with a strong nationalistic bias and there is no common ground that you and I share in which we can reach an agreement. No objective person would write the things you wrote. So I stop here.

  • Bolek

    Onur, no objective person would write the things you wrote. It is contrary to all that we know from genetics, anthropology, linguistics, paleodemography, archeology etc…. .So let’s stop here.

  • http://blogs.discovermagazine.com/gnxp Razib Khan

    yes. STOP.

    thanks.

  • Markku P.

    ” chris w: The Russian sample is from Tver, near Moscow. I wonder if the results would vary if they took a sample from Irkutsk or Vladivostok or the Volga region.”

    I refer to the question of chris W. In my opinion the place of russian samples (Tver) was deceptive, due to the Karelian immigration into Tver oblast in the 1600-century.

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com

ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »