A genetic map of Italy

By Razib Khan | September 12, 2012 10:44 pm

Since the Ralph & Coop paper on IBD patterns across Europe I’ve been keen to see what gets uncovered about Italy. Recall, if you will, that in that paper the authors noted that Italy in particular of European nations exhibits a lot of deep population structure. Whereas the network of descent ties together many European nations and regions, in Italy there are deep regional differences which seem to go back to antiquity. Additionally, more recently Sardinia has come under focus as possibly particularly informative in the ethnogenesis of European peoples. Until recently I was moderately skeptical of the utility of Sardinian samples in the HGDP data set. After all, it was an isolated island, and perhaps subject to peculiarities of low effective population size. Well, it turns out that it may be that modern Sardinians are the best approximation we have today to Southern Europeans ~5,000 years ago.

A new paper in PLoS ONE has a huge sample of Italians, and applies standard techniques to ascertain population structure. An Overview of the Genetic Structure within the Italian Population from Genome-Wide Data:

In spite of the common belief of Europe as reasonably homogeneous at genetic level, advances in high-throughput genotyping technology have resolved several gradients which define different geographical areas with good precision. When Northern and Southern European groups were considered separately, there were clear genetic distinctions. Intra-country genetic differences were also evident, especially in Finland and, to a lesser extent, within other European populations. Here, we present the first analysis using the 125,799 genome-wide Single Nucleotide Polymorphisms (SNPs) data of 1,014 Italians with wide geographical coverage. We showed by using Principal Component analysis and model-based individual ancestry analysis, that the current population of Sardinia can be clearly differentiated genetically from mainland Italy and Sicily, and that a certain degree of genetic differentiation is detectable within the current Italian peninsula population. Pair-wise FST statistics Northern and Southern Italy amounts approximately to 0.001 between, and around 0.002 between Northern Italy and Utah residents with Northern and Western European ancestry (CEU). The Italian population also revealed a fine genetic substructure underscoring by the genomic inflation (Sardinia vs. Northern Italy = 3.040 and Northern Italy vs. CEU = 1.427), warning against confounding effects of hidden relatedness and population substructure in association studies.


The number of SNPs is rather good for the tasks which they attempted. My personal experience is that for clustering algorithms like ADMIXTURE or PCA you’re hitting diminishing returns >100,000, if you are looking at intra-national differences. And the sample size is rather large, though the authors admit that they could have had denser coverage of central Italy. For Italy they pooled a lot of data sets, including from biomedical studies. Naturally they also took in the HGDP and HapMap Italians.

On some methodological notes, the PCA is really hard to read. I’m not quire sure if the labeling is correct (see figure 1 to check me here). So I’ll just report the ADMIXTURE results. I looked at the methods, and I do have some concerns here. I am not clear if they ran ADMIXTURE K 2 to 10 more than once. The reality is that you should. That’s because ADMIXTURE is sensitive to the value of the seed parameter (you should change it from the default and allow it to be generated pseudo-randomly from the computer’s time), and when you do statistical checks such as cross-validation that value itself can vary across runs! What I’m saying is that one run of ADMIXTURE may tell you that K = 4 is the best fit, but another run may tell you that K = 6 is the best fit. It’s happened to me. I once ran a data set up to K = 20 20 times, and the cross-validation values themselves exhibited considerable variation across runs depending upon the K (there were some K’s though where the value seemed extremely stable, so I was more confident of the fit of that K).

Also, there was one passage which makes me a little curious as to how clearly the authors understand the clustering techniques being used here, and what it tells us (and does not tell us):

The average admixture proportions for Northern European ancestry within current Sardinian population is 14.3% with some individuals exhibiting very low Northern European ancestry (less than 5% in 36 individuals on 268 accounting the 13% of the sample).

I’d be careful of labeling a modal component in Northern Europeans “Northern European ancestry.” I’ve posted on enough topics related to this to illustrate how easy it is to generate statistical artifacts which have little correspondence to the real biological world. It’s one thing when you have two populations which are genetically very distinct, and clustering in a disjoint faction almost immediately. For example, Africans and Europeans. But when you have intra-European variation, and the clusters don’t distribute in an exclusive fashion, one should be wary of reifying them into real populations. “Northern European modal cluster” may not roll off the tongue, but it has the benefit of being precise and not false.

So what about the results? Nothing too surprising, I invite you to peruse the figures and read the supplements yourself. I did note that the evidence of intra-Italian migration is very obvious in these results. People whose geographic origins are in the north often cluster with southerners (i.e., the southern cluster), but people whose origins are in the south rarely seem to cluster with northerners. In the 20th century there were massive flows of migration from the Italian south to northern cities like Turin, while Mussolini encouraged the migration of southerners to the German speaking regions of the northeast. In contrast, few northerners headed south. In short, many people in northern Italy have grandparents or great-grandparents who left southern Italy. Far fewer southern Italians have grandparents or great-grandparents who left northern Italy (though they do exist, I actually met a young man recently whose mother was a Neapolitan whose parents were from the Veneto). Additionally, I’m curious about the fact that Sardinians seem to exhibit some level of genetic homogeneity. This surprises many people because of the history of Sardinia, under Carthaginian, Roman, and Vandal rule. I have a simple explanation for what’s going on: the coasts of Sardinia are malarial. The modern population of Sardinia are the descendants of the indigenous mountainers, who repopulated the coastal cities periodically.

I want to note that if you look at the ADMIXTURE runs the Mozabites have nearly as much of the Sardinian modal component as mainland Italians. This doesn’t mean equal genetic distance; the Mozabite dominant cluster has a higher distance. But, it does suggest to me that it may be that in the Copper Age the western Mediterranean was dominated by a Sardinian-like population, which later was displaced and assimilated by newcomers.

Finally, I have no idea where to get this data. That’s sad, since it is so large a set. But I specifically noted the biomedical origin of some the data because I suspect that’s going to make it difficult to get it into the public domain.

CATEGORIZED UNDER: Anthroplogy, Genetics, Genomics
MORE ABOUT: Italy
  • http://forwhattheywereweare.blogspot.com/ Maju

    Very interesting.

    “What I’m saying is that one run of ADMIXTURE may tell you that K = 4 is the best fit, but another run may tell you that K = 6 is the best fit”.

    You’re probably right but, looking at the various K values in fig S6, they are all roughly the same for Italy, with most of the changes taking place in the small Levantine slice and stuff that looks more like noise than informative, right? I totally agree with not reifying the clusters into “real populations” without at least much more extensive testing.

    “People whose geographic origins are in the north often cluster with southerners (i.e., the southern cluster), but people whose origins are in the south rarely seem to cluster with northerners”.

    Maybe the most interesting of all. My first thought after reading your comment was: “wait, if they are admixed, they won’t show as southerners anymore but as central italians or whatever else in between, there must be true genetically South Italian communities in Northern Italy that keep the overall affinity”.

    Luckily fig. S2 illustrates the same graph by official regions such as Piedmont, Veneto, etc. much smaller and more precise than the tripartite N-C-S division. And what do we see? That it is only people from Piedmont and Liguria (and not Lombardia, Veneto or even the tiny Alpine region of Val d’Aosta) who do that. They cluster with Southern Italians, with Central Italians (one Lombardo also here but too exceptional to matter) and even with “admixed Sardinians” (those Sardinians who tend towards the Mainland genetic makeup).

    Would it be only Liguria, we could think of Genoese mariner and trade relations maybe but Piedmont? The only element in common those two regions have is that they were both territory of the historically documented ethnicity of the Ligures, which also extended into SE France, East of the Rhône, being probably a “mountain refuge” population versus the Celtic advance.

    This is most interesting and probably more correct than your notion of “intra-Italian migrations”, unless of course we imagine that Ligurians were migrants from the South or something like that. I can find no better explanation than an ancient Ligurian link.

    … “may be that in the Copper Age the western Mediterranean was dominated by a Sardinian-like population, which later was displaced and assimilated by newcomers”.

    North Africans have some 30% of their mtDNA and some 10% of their Y-DNA of likely Iberian and certainly SW European origin. Austosomal analysis also give them strong Iberian component. I think this should be Paleolithic, because we know that Berbers (Libu, Numides, Mauri) dominated North Africa west of Egypt since at least 5800 years ago (Ancient Egypt arises) and probably since Neolithic (Capsian Neolithic) or even the Late Paleolithic (Early Capsian culture). There’s no other wave until the Arab invasions (excepted more localized influences mostly limited to parts of Tunisia: Phoenicians, Romans, Vandals), so the Iberian (SW European) substrate should be pre-Neolithic, because there was an obvious E1b-dominated (African Afroasiatic) wave on top of it.

  • Karl Zimmerman

    As an aside, has anyone yet looked at the DNA of Corsicans? I know it’s a smaller island, and doesn’t have a language as distinct as Sardinian, instead speaking a Pisan, or possibly Tuscan, dialect (opinions differ). That said, looking at S2 from this paper, the Gallurese population on Sardinia (who speak a dialect of Corsican, more or less) is genetically very similar to the rest of the island, and doesn’t even have very many “admixed” seeming individuals. And Corsican does seem to retain a few pre-Indo-European words in its lexicon. So I’d presume the chances are fairly high the Corsicans are similar, if not as extreme, as the Sardinians.

  • Onur

    As an aside, has anyone yet looked at the DNA of Corsicans? I know it’s a smaller island, and doesn’t have a language as distinct as Sardinian, instead speaking a Pisan, or possibly Tuscan, dialect (opinions differ). That said, looking at S2 from this paper, the Gallurese population on Sardinia (who speak a dialect of Corsican, more or less) is genetically very similar to the rest of the island, and doesn’t even have very many “admixed” seeming individuals. And Corsican does seem to retain a few pre-Indo-European words in its lexicon. So I’d presume the chances are fairly high the Corsicans are similar, if not as extreme, as the Sardinians.

    Corsicans are one of the number one populations whose overall autosomal genetic results I have been looking forward to seeing for years for the same reasons, but somehow no population geneticist seems to be interested in them, at least not as much as you and me.

  • https://plus.google.com/109962494182694679780/posts Razib Khan
  • http://dispatchesfromturtleisland.blogspot.com ohwilleke

    “I want to note that if you look at the ADMIXTURE runs the Mozabites have nearly as much of the Sardinian modal component as mainland Italians. This doesn’t mean equal genetic distance; the Mozabite dominant cluster has a higher distance. But, it does suggest to me that it may be that in the Copper Age the western Mediterranean was dominated by a Sardinian-like population, which later was displaced and assimilated by newcomers.”

    Just to connnect the dots a little, the Mozabite people are a Berber ethnic group living in Southern Algeria (which is in the Northern Sahara desert). The Berbers, in turn, are indigeneous Saharan people who were traditionally herders, whose indigeneous languages are one of the main language families within the Afro-Asiatic language family. While their lifestyle is similar to that of Arabic herders, their indigeneous language (spoken until sometime after Arabic was introduced in the 8th century) and genetics are very distinct. They show considerable genetic continuity to the hunter-gatherers of North Africa ca. 12,000 years ago.

    The implication would genetic ancestry contributions to the Berber people in Northwest Africa, from Neolithic people who pre-date Indo-European expansion, who have common origins with the Sardinians, either across the Strait of Gibraltar or via Egypt and/or the North African coast.

    An exchange across the Strait of Gibraltar (or by sea in the Western Mediterranean) would seem to be a more likely source for this component than a North African coastal route given (1) the existence of some other Y-DNA (e.g. certain clades of Y-DNA hg E in Southern Europe), (2) mtDNA ties back and forth between Spain and Northwest Africa (e.g. mtDNA haplogroup V), and (3) the lack of strong Sardinian-like signal between North Africa and the West Asian Highlands (Anatolia, the Caucasus and Iran).

    Given the lack of Basque-like languages in North Africa (often seen in cases of genetic ancestry contributions in connection with conquest and superstate rule), and the lack of much Y-DNA haplogroup G or haplogroup R1b in Mozabites, my guess would be that copper age bride exchange in connection with trade across the Straight of Gibraltar would be a most likely source of the Sardinian-like genetic component seen in the Mozabites.

    While the Roman Empire extended to North Africa providing a potential means of gene exchange, a gene exchange with Berbers in this era seems less likely because the Berbers aren’t sufficiently similar to Italians (the core of the Roman empire) genetically to be a good fit for that.

  • http://forwhattheywereweare.blogspot.com/ Maju

    #4 That’s recent migration and that people, with no local grandparents, would never have been taken in consideration in any half-serious genetic sampling process, right?

    Even if there was such an error in the sampling process, why in Piedmont and Liguria and not in the even more industrial and immigrated region of Lombardia, correctly mentioned in your article (Milan), as the epicenter of industrial development and immigration (or also Emilia-Romagna and Veneto)? There should not be this kind of distinction, yet there is.

    Of course the migration could be from, say, 300 or 500 or 2000 years ago but I can’t fathom it in all the historical period. Neither in the prehistorical one admittedly (except Cardium Pottery Neolithic possibly) but that is a blurrier zone.

  • pconroy

    @2 karl, @3 Onur,

    My eldest daughter is partly Corsican – actually 1/2 Irish, 1/8 Breton, 1/8 Normandy, 1/8 Lombard, 1/8 Corsican, and in most analyses, her Breton and Normandy seem to merge with the Irish to give some sort of generic British, while her Lombard (Swiss border, Lake Como) and Corsican give some sort of Southern Italian. So by a process of elimination, the Corsican element must look like Cypriot or something?!

    Here are her Dodecad K12B results:

    Single:
    [1,] “French_D” “4.7444”
    [2,] “French” “5.2463”
    [3,] “Mixed_Germanic_D” “6.8205”
    [4,] “Kent_1KG” “8.3709”
    [5,] “Dutch_D” “8.5303”
    [6,] “English_D” “9.0515”
    [7,] “CEU30″ “9.057”
    [8,] “Cornwall_1KG” “9.7711”
    [9,] “British_D” “10.4655”
    [10,] “British_Isles_D” “10.5988”

    Mixed-Mode:
    [1,] “21.6% C_Italian_D + 78.4% English_D” “0.4769”
    [2,] “16.9% S_Italian_Sicilian_D + 83.1% Kent_1KG” “0.511”
    [3,] “22% Greek_D + 78% British_D” “0.5131”
    [4,] “16.9% Sicilian_D + 83.1% Kent_1KG” “0.5201”
    [5,] “23.6% O_Italian_D + 76.4% Kent_1KG” “0.5405”
    [6,] “21.6% C_Italian_D + 78.4% CEU30″ “0.577”
    [7,] “75% English_D + 25% O_Italian_D” “0.686”
    [8,] “82% English_D + 18% S_Italian_Sicilian_D” “0.7213”
    [9,] “18% Sicilian_D + 82% English_D” “0.7215”
    [10,] “20.3% C_Italian_D + 79.7% Kent_1KG” “0.7479”

    Whereas HarappaWorld has her as follows:

    Single:
    [1,] “french_hgdp_28″ “3.1544”
    [2,] “utahn-white_1000genomes_100″ “7.8712”
    [3,] “british_1000genomes_99″ “9.1146”
    [4,] “n-european_xing_25″ “9.4165”
    [5,] “utahn-white_hapmap_18″ “9.6669”
    [6,] “hungarian_behar_19″ “11.1076”
    [7,] “orcadian_hgdp_15″ “11.9383”
    [8,] “slovenian_xing_25″ “13.0643”
    [9,] “spaniard_behar_12″ “15.0464”
    [10,] “spaniard_1000genomes_98″ “16.143”

    Mixed-Mode:
    [1,] “73.1% british_1000genomes_99 + 26.9% tuscan_hapmap_102″ “0.8487”
    [2,] “73.4% british_1000genomes_99 + 26.6% tuscan_hgdp_8″ “0.8647”
    [3,] “73.8% british_1000genomes_99 + 26.2% tuscan_1000genomes_11″ “0.9465”
    [4,] “67.5% orcadian_hgdp_15 + 32.5% tuscan_hapmap_102″ “1.0734”
    [5,] “39.6% italian_hgdp_13 + 60.4% orcadian_hgdp_15″ “1.175”
    [6,] “67.8% orcadian_hgdp_15 + 32.2% tuscan_hgdp_8″ “1.2094”
    [7,] “30% italian_hgdp_13 + 70% utahn-white_1000genomes_100″ “1.2255”
    [8,] “68.3% orcadian_hgdp_15 + 31.7% tuscan_1000genomes_11″ “1.3501”
    [9,] “66.8% british_1000genomes_99 + 33.2% italian_hgdp_13″ “1.4398”
    [10,] “90.3% french_hgdp_28 + 9.7% mordovian_yunusbayev_15″ “1.4592”

    Of course ancient Tuscans (aka Etruscans) colonized the East coast of Corsica in times of yore.

  • https://plus.google.com/109962494182694679780/posts Razib Khan

    #6, good points. need to think.

  • Onur

    My eldest daughter is partly Corsican – actually 1/2 Irish, 1/8 Breton, 1/8 Normandy, 1/8 Lombard, 1/8 Corsican, and in most analyses, her Breton and Normandy seem to merge with the Irish to give some sort of generic British, while her Lombard (Swiss border, Lake Como) and Corsican give some sort of Southern Italian. So by a process of elimination, the Corsican element must look like Cypriot or something?!

    Conroy, 1/8 is too low in her case to make a good inference about the genetics of Corsicans, as the rest of her ancestry comes from Caucasoid populations too, thus making it impossible to discern which parts of her genome comes from Corsican ancestors with confidence (it is worsened by the fact that we don’t know jack shit about the overall autosomal genetics of Corsicans, so there is a vicious circle here).

  • James

    @PcConroy

    Coriscans at the MLDP come out as 85% North-Italian 15% Sardinian, they have little to do with Sicilians or Cypriots..They have in fact one of the highest percentges of the subclade of R1b U152

  • http://emilkirkegaard.com Emil

    I wonder how this works out with Lynn’s paper about the italian IQ differences between northerners and southerners. Even if his study is dubious, its nice to see some more data.

    Cf. http://racialreality.blogspot.com/2010/03/richard-lynn-on-italian-iq.html

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com

ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »