Swedes not so homogeneous?

By Razib Khan | February 10, 2011 1:37 pm

Credit: David Shankbone

The more and more I see fine-scale genomic analyses of population structure across the world the more and more I believe that the “stylized” models which were in vogue in the early 2000s which explained how the world was re-populated after the last Ice Age (and before) were wrong in deep ways. I’m talking about the grand narratives outlined in works such as Bryan Sykes’ The Seven Daughters of Eve, the subtitle of which was “The Science That Reveals Our Genetic Ancestry.” If I had less faith in science to always ultimately right its course I’d probably become a post-modernist type who asserts that all these stories are fictions. Sykes’ model in particular seems to be very likely incorrect because of the utilization of ancient DNA to elucidate population movements past in Europe. From what we can gather it looks like coarse attempts to infer past distributions from current distributions (of specific lineages and their diversity) resulted in a great deal of false clarity. We’re not talking differences on the margins, but fundamental confusions. For example, Basques were always assumed to be a viable “reference” population for descendants of European hunter-gatherers. This was one of the linchpins of older historical genetics models. It turns out that this fixed assumption may have been a false one.

Not only were our past assumptions in simple models wrong, but the real explanations may also be rather complex. It turns out that ancient DNA of the “first farmers” and their “hunter-gatherer” neighbors in Central Europe reveals a lot of discontinuity between both these groups and modern Europeans. Why? It may be that in fact there were multiple migrations, and the palimpsest is going to be a tough cookie to excavate. But there’s no need to be disheartened, the old paradigms came crashing down thanks to data.

ResearchBlogging.orgWith that in mind I’ve been particularly interested in the European fringe, the far west and north. If any hunter-gatherer descendants survive in large numbers, it will be here. This is why I’m curious as to the genetics of the Sami as well as the archaeology which tracks the spread of agriculture in Northern Europe. A new paper in PLoS ONE focuses on Sweden, Swedish Population Substructure Revealed by Genome-Wide Single Nucleotide Polymorphism Data:

The use of genome-wide single nucleotide polymorphism (SNP) data has recently proven useful in the study of human population structure. We have studied the internal genetic structure of the Swedish population using more than 350,000 SNPs from 1525 Swedes from all over the country genotyped on the Illumina HumanHap550 array. We have also compared them to 3212 worldwide reference samples, including Finns, northern Germans, British and Russians, based on the more than 29,000 SNPs that overlap between the Illumina and Affymetrix 250K Sty arrays. The Swedes – especially southern Swedes – were genetically close to the Germans and British, while their genetic distance to Finns was substantially longer. The overall structure within Sweden appeared clinal, and the substructure in the southern and middle parts was subtle. In contrast, the northern part of Sweden, Norrland, exhibited pronounced genetic differences both within the area and relative to the rest of the country. These distinctive genetic features of Norrland probably result mainly from isolation by distance and genetic drift caused by low population density. The internal structure within Sweden (FST = 0.0005 between provinces) was stronger than that in many Central European populations, although smaller than what has been observed for instance in Finland; importantly, it is of the magnitude that may hamper association studies with a moderate number of markers if cases and controls are not properly matched geographically. Overall, our results underline the potential of genome-wide data in analyzing substructure in populations that might otherwise appear relatively homogeneous, such as the Swedes.

Playing around with ADMIXTURE I’m now happy to see 350,000 SNPs, but less assured by 29,000 SNPs. After a bunch of pruning I have a data set where individuals have 100,000 SNPs, and that seems marginal when it comes to differentiating variation in Western Europe among populations, though I suppose I didn’t do it very intelligently (i.e., I didn’t try to bias toward ancestrally informative markers).

A major “top line” finding of this paper is that Swedes exhibit more geographical substructure than more numerous populations inhabiting expansive Central European regions. Additionally, though not as distinctive as Finns vis-a-vis other Europeans, they are somewhat distinctive, especially those in the north. The bar plot to the left is generated by STRUCTURE, and you see set sets of populations at particular K’s, each K being a putative ancestral group.

The differentiation within Sweden is evident at higher K’s. That’s striking because notice that the Germans and British don’t exhibit the pattern (they state in the paper that they looked for geographical patterns). But for me what is striking is the disjunction between Scandinavians and continental Germans, and the relative lack of one between the British and the Germans. At K = 5 a difference does crop up. At the top you see Russians, so it looks like blue = Eastern European, while red = Western European, and the Germans are a mix of the two, with the Russians and British representing extreme “types” (again, these are very stylized facts, there are no pure “types). But the break with Swedes occurs at lower K’s. Why? The first thought is water. Water blocks gene flow a great deal, but then what about Britain? I doubt all the sampling in Britain was from the old Saxon Shore of East Anglia! I will hazard a rather general explanation: maybe it’s agriculture! More specifically, the switch to agriculture may have occurred via different demographic processes in the two locales. Britain has a milder climate than Sweden, and could presumably support a more dense transplanted culture more easily than Sweden.

Let’s look at the data in a different way. The figure to the left shows the top two dimensions of variation in the data. The x axis explains 0.64% of the variance, and the y axis 0.24% (these are genetically close groups remember). The bottom left of the distribution consists of Germans, the top of the point the Russians, and to the far right eastern Finns. Finns are something of a European outlier, along with Basques and Sardinians, but it is interesting how much greater east-west distances correspond to less variance than north-south at this scale. On the broader trans-European level north-south differentiation is usually more significant than west-east. Why? I think geography explains it, the Mediterranean and the Atlantic fringe allowed for a rapid expansion of agriculturalists in Southern Europe from their point of origination in Anatolia. The move north was slower, and involved more amalgamation with hunter-gatherers. But, within Northern Europe there were local differences. Inland North European plain with its rich soil and riverine network may have allowed for a great deal of demographic expansion in the face of an extremely thin pre-Neolithic population. But, they met another point of resistance at the oceanic fringe, where maritime resources were great enough to support denser hunter-gatherer populations. This, I suspect, explains the discontinuity at the Kattegat and Skagerak.

Let’s take another look at genetic distance. The visualization to the left is a representation of the Fst between pairs of populations. I’ve added labels. Fst just measures the proportion of genetic variance which can be partitioned between groups. The x axis is the first dimension, and the y the second. That geography is not always a good predictor of genetic distance. Look at how close the sample for Orkney (off the coast of northern Scotland), the British, Germans, and the Utah whites (who are mostly British and German in origin) cluster in terms of genetic distance. In contrast, the French and French Basques differ a great deal.

To illustrate the weirdness of some of the patterns, like a 5 year old I took a blank map of Europe and just drew a line from region to region based on distances on the first dimension (x axis). So you see a zig-zag in Western Europe, a sweep to the east, and finally the terminus in the east of Finland. You’d be surprised how often I want to scribble on a map nonsensically when I see some of the SNP-chip data. Yes, geography does correspond to genetic distance, roughly, but some of the deviations from expectation are really weird. Sardinians and Finns in particular seem to be the extreme points on some broad underlying pattern of genetic variance in Europe. But, obviously the Basques also represent another dimension. A simple model is bound to be wrong, but a complex one is going to be wrong in a lot of the details.

Finally, we’ve been talking about ancestry only. What about functionality? Genes sometimes after code for differences, some of them visible, and many of them significant. Not surprisingly ancient hunter-gatherers who were resident in Sweden were lactose intolerant. Why would they need to be able to digest milk as adults if they didn’t have herds of cattle?

By and large the authors didn’t find much functional significant in the sharp north-south difference in Sweden. But, there were some suggestions (there’s some issues with the statistical likelihood due to the lack of particular precautions which would mitigate against false positives):

PhenotypeUnique GWA hitsSNPs within 200 kb
TotalSNPs with regional differences( p < 0.05)
observedexpectedchi-square p1
Eye color94102434.30.08
Hair color197747364.80.29
Skin pigment31402211.70
Lactase deficiencyLCT gene3402.80.11
Immune systemMHC region1262162105.60
Blood lipids922465257206.20
Cardiovascular disease321921190160.70.02

Why the differentiation? I think this is a clear case of “maybe it’s agriculture.” Northern Sweden was not ethnically cleansed and assimilated of its Sami until the early modern period. These were traditionally non-agricultural people, the closest Europe had to hunter-gatherers (since they herded reindeer they obviously weren’t hunter-gatherers). Some of the difference may simply be a Sami substrate in the north of Sweden, with all the functional differences entailed due to the lack of thousands of years of dense agriculture life.

Citation: Salmela E, Lappalainen T, Liu J, Sistonen P, & Andersen PM (2011). Swedish Population Substructure Revealed by Genome-Wide Single Nucleotide Polymorphism Data PLoS ONE : 10.1371/journal.pone.0016747

CATEGORIZED UNDER: Genetics, Genomics, Geography, History

  1. JL

    I think the Samis were generally hunter gatherers only a few hundred years ago. They took up reindeer herding relatively recently.

    In Finland, the overall frequency of lactose intolerance is 17 percent, and it is higher in Finnish-speakers and in Eastern Finland than in Western Finland and in Swedish-speakers. In Finnish Samis, the frequency is 40-60 percent.

  2. I’ve seen some studies, such as fine detail mtDNA studies and ancient DNA comparisons that seem to suggest that hunter-gatherer genetic layers in Estonians are probably closer to the “pure type” of Northern European hunter-gatherers than the Saami.

    The genetic links between Berber and Saami populations similarly suggests that just as the first order demographic split in the agricultural settlement of Europe was between a Mediterranean/Atlantic branch and a Danubian branch, that the hunter-gatherers of Europe may have settled it after the LGM in a similar pattern, with the Berbers and Saami being the fringe populations closest to the Atlantic wave of post-LGM hunter-gatherer settlement (probably emerging from a refugium in or around Iberia) and the Estonians and some Siberian populations perhaps being the closest modern population to the interior European hunter-gatherer population.

    To be clear, this isn’t to say that overall Berber and Saami remain autosomally similar at the whole genome level, just that there appear to be measureable elements of each population that share a common heritage that is not shared by most European populations. They would not, of course, share the portion of Saami admixture attributable to admixture with Germanic farmers or the portion of Saami admixture that appears to be circumpolar in orgin. Similarly, they would not share the portion of Berber admixture attributable sub-Saharan or Nilo-Saharan language speaking populations.

    In addition to the Basque, Sardinians, and Saami, some of the other populations one would expect to have relatively deep roots in Europe and genetic outlier distinctiveness would be the populations of the Mountain Mari in Russia, the Paleo-Siberians (which I mean as a linguistic grouping distinct from Altaic and Uralic language speakers of Siberia) and some of the populations of the Caucusus.

  3. I was unaware of any connection between Berber and Saami.

  4. Eurologist

    The common blue between Russia and Germany is misleading, at this very limited plot (that does not take into account Mediterranean nor Caucasian (central-western Asian) populations, nor Baltic groups.

    Yet, I have always maintained that there are signs for three ancestral populations in Sweden (plus a Finnish contribution). This paper, especially the local PC1+2 and wider, 3-D PC1+2+3 plots nicely demonstrate this. Southern Swedes are essentially almost identical with the most northern Germans (i.e., from the earliest northern agriculturalists, plus frequent mixing through the millennia). But then, in addition to the northernmost Sami contribution, there is always an additional, north-central Swedish cluster that shows up, which I would identify with an original, non-agriculturalists, non-Sami population.

    This study also shows (via the relatively large genetic variation) that the data cannot simply be explained by drift and isolation.

    The large Fst distance to Russians (larger than between Russians and Germans) also has a couple of interesting interpretations. Firstly, it confirms that the R1a in much of central Europe and Scandinavia is very old (at least going back to the beginning of agriculture, if not earlier) – as confirmed with ancient y-DNA studies. That is, R1a from the much more recent Slavic expansion encompasses distinct subclades (which did not penetrate into parts of Poland, the Baltic, and Serbia/Croatia). Secondly, the people that used the Eastern rivers to trade with the very south may not have been Scandinavians but Germans – or else, they did not have a significant genetic impact there, as once was thought.

  5. John Emerson

    If I had less faith in science to always ultimately right its course I’d probably become a post-modernist type who asserts that all these stories are fictions.

    As I’ve said before, with archeology and paleontology (and to a lesser degree, history) the conclusions reached always have to be constructions from finite and often quite small data sets which are almost always defective in key ways, where the one piece you most wish you had is the one that’s missing.

    But on top of that, if it’s an active field the data set is always changing, often radically. So here, more than in almost any other study, the best current view is always (if the field is active, as for example Afghan archeology really isn’t right now) a rather fragile transient.

    Or to put it differently, the distant past never changes, but nothing changes as fast as our knowledge of the distant past.

  6. pconroy

    Saami and Berbers

    They have the same sub-lineage of mtDNA U5b1b, with a TMRCA of 8.6 +/- 2.4 ky

  7. Bolek

    “[…]so it looks like blue = Eastern European, while red = Western European, and the Germans are a mix of the two”

    Razib, I agree with you. Blue looks like Eastern European. I wonder what can be the source of it. Maybe it came from Corded Ware first R1a1-M17 Proto-Slavic expansion which reached Rhine river in the west. Later they were pushed back beyond Elbe river by Bell Beaker R1b-M269 Proto-Celto-Germanic people and R1a1 disappeared from Western Europe. But maybe some Corded Ware women were left behind and they became the source of the blue in Germany. Part of it could come from the Slavs east of Elbe river. Some Corded Ware cultures were also present in Southern Sweden and Finland and they also have some R1a1 and some blue at K=5.

  8. Secondly, the people that used the Eastern rivers to trade with the very south may not have been Scandinavians but Germans

    the rurikid lineage is finnic. the rus may then have been scandinavianized finns.

  9. Justin Giancola

    I’m curious if Norwegians stack up the same way. Seems dumb to not mention them.

  10. Justin Giancola

    I’m curious if Norwegians stack up the same way. Seems dumb for the study to not mention them.

    ugh why can’t I delete my own comments

  11. “Maybe it came from Corded Ware first R1a1-M17 . . . ”

    Given the surprises that ancient mtDNA has brought us, I’m wary of inferences about Y-DNA lineages or associating autosomal mixes to particular historical events.

    If there is anything that we have learned from ancient DNA it is that modern population genomics are not, by themselves, reliable indicators of ancient population genomics. The fact that it is there now tells us that it must have come from somewhere. But, determining the time depth when faced with two or more alternative possibilities that could explain the current mix is art more than science (if not scientific theology), and mutation/diversity rate dates aren’t very well calibrated and aren’t as accurate period as we might hope.

    In large SNP set autosomal genetics once can’t even reliably make statement about phylogenetic order as one can with uniparental markets (NRY and mtDNA) and to a lesser extent with particular allelles where we know which one is derived and which one is not from related species or ancient DNA or historical inference.

    Basically, you need archaeology, very detailed knowledge of history, linguistic relationships, uniparental phylogeny, ancient DNA, related species ancient DNA, and specific allele distribution data and similar information to figure out which layers are in the mix and what order they occurred in and to get some order of magnitude sense of the relative impact. Autosomal data then provides much more exact impact into the magnitude of the different contributions and may reveal layers that are swamped by uniparental marker sweeps of some kind (e.g. Neanderthal and Denisovian).

    One of the biggest divisions in opinion I see in discussions of paleogenetics is between those who generically favor old dates and those who generically favor young dates for sources of population genetic contributions, such as the debate over how much of the modern European genome is hunter-gatherer or Neolithic in origin.

    In the several years that I’ve been following the new developments closely, I’ve begun to shift towards a young date bias from where I started, with the ancient DNA evidence and the food production related demographic/linguistic data popularized by Jared Diamond influencing me the most in that regard. The other thing that biases me in that direction is that I have a lot of detailed world history knowledge (in part from a college history minor, and in part from continuing to read about it), that provides lots of examples of historical events that could have demographic impact. On the other hand, every once and a while something like the strong element of mesolithic Sahul layers relative to Neolithic Austronesian layers that one paper found shows that no general rule is sufficient.

    Still in Sweden, I think that there is a pretty good case, for the Saami being perhaps more recent than 6600 BCE (per the paper mentioned by pconroy that I alluded to as does the new autosomoal paper that is the subject of the original post), which is much later than you would expect for the first group of hunter-gatherers to repopulate the area after the last glacial maximum. And, I think that there is a good case for associating the Southern Swedish/Germanic component with the late Nordic Bronze Age in Southern Scandinavia around 1100 BCE when cremation begins to appear in Southern Scandinavia and many metal objects related to horses are found (i.e. for a demic Indo-European influx to the region from somewhere to the South expanding from that area at about that time).

    For once, because of this timeline, I tend to agree with Eurologist about the likelihood that there are three rather than just two main sources for the Swedish gene pool along the lines that Eurologist proposes – a Germanic layer, a Saami layer and a ” north-central Swedish cluster that shows up, which I would identify with an original, non-agriculturalists, non-Sami population.” Although, I’m not as comfortable that the Finns have a layer separate from the two non-Germanic populations. I’m more inclinde to see the Finns and Estonians as predominantly a mix of the Saami layer and older non-agriculturalist, non-Saami population, perhaps with a little dollup of circumpolar North Siberian thrown into the mix.

  12. I am looking forward to many more old teeth being drilled for their genes. If we can get Neanderthal, we can get Swede. My folks are both of English/Scotch linage, and we all look it. Fit in anywhere from Britain through Germany. Be fun to see though what is really back there, when I get my code read…

  13. Bolek

    I have just noticed that German sample was taken from Kiel area:

    “The Germans were male and female control samples from the PopGen cohort from Kiel area in Schleswig-Holstein in Northern Germany”.

    It explains why that sample was so blue and ‘Slavic’ at K=5. It is from Slavic area of Germany:


  14. Soron

    Or maybe, it’s just all the uhm… people from East Prussia, which settled there (every city there has a memorial -very wisely chosen a big stone- which commemorates it). I do not think that this kind of speculation can yield any outcomes of significance, though it’s still interesting.


