Africa in 12 ADMIXTURE chunks

By Razib Khan | April 6, 2011 3:21 pm

Some have asked what the point is in poking around African population structure when Tishkoff et al. and Henn et al. have done such a good job in terms of coverage. First, it is nice to run your own analyses so you can slice & dice to your preference, and not rely on the constrained menu provided by others. There’s value in home cooking; you can flavor to your taste. Second, you never know what data people might leave on your doorstep. I’ve received the genotypes of three Somalis. Nothing too surprising, a touch more Cushitic than the Ethiopians in Behar et al., but interesting nonetheless.

Also, you can see how ADMIXTURE tends to come to weird conclusions in certain circumstances. Below is a K = 12 run ~50,000 SNPs. I’ve included in a few Behar et al. and HGDP populations to the Henn et al. set, as well as pruned a lot of the African groups which seem redundant in terms of information. I’ve added a few geographically informative labels as well.

Observe below that there is a Fulani cluster. I think this is pretty much an artifact. At K = 7 the Fulani have a majority component which is modal in West Africa & Bantu speakers, and a minority component which is identical to the one modal in Mozabite Berbers from Algeria. The Mozabites reside in the far northern Sahara, and their modal component drops off as one goes east toward western Asia and the eastern Mediterranean. I suspect that what is showing up in ADMIXTURE is the ancient hybridization of the Fulani, and perhaps their demographic expansion from this core group. We have some glimmers of the prehistory of the Fulani, and no expectation for them to be such a distinctive cluster, so I naturally jump to these inferences. But it does make me reconsider the nature of the “Sandawe,” “Mbuti” or “San” clusters in ADMIXTURE. These populations are culturally distinctive in deep ways from their neighbors, so a reflexive inference one might make is that they’re “pure” ancient substrate groups which have been overlain and marginalized by their Bantu neighbors. But their prehistory is far murkier than the Fulani because of their geographical isolation, so there is far less to go on. These “ancient” isolated groups themselves may have gone through the same sort of distinctive recent ethnogenesis processes which we presume occurred with the Fulani (also, in the plot below the Biaka are pure; but in most of the bar plots they have a minor element which they share with their neighbors, probably due to greater admixture and interaction between western Pygmies and their Bantu neighbors than among the easter ones).

OK, now let’s prune some of the “pure” and extraneous populations. Additionally, I’ll remove some of the K’s. So the proportions are going to be recalculated with a new base. So, keep in mind that the South African Bantus show elevated West African in part because the Khoisan proportion was removed, inflating the percentages for all the other elements.

Now let’s look at the pairwise Fst values between inferred populations. Remember, this measures the proportion of genetic variance which can be attributed to between population differences. The bigger the value, the larger the genetic distance. I’ll given the inferred populations labels, but don’t take that too seriously.


Fst divergences between estimated populations:
Fulani San Euro Maya Nilotic Biaka W African SW Asian Sandawe Mbuti Mozabite Bantu
Fulani 0.00 0.19 0.15 0.26 0.11 0.13 0.09 0.14 0.10 0.18 0.12 0.10
San 0.19 0.00 0.27 0.37 0.16 0.11 0.13 0.25 0.13 0.13 0.23 0.13
European 0.15 0.27 0.00 0.18 0.17 0.22 0.19 0.05 0.15 0.26 0.06 0.19
Maya 0.26 0.37 0.18 0.00 0.27 0.31 0.28 0.19 0.25 0.36 0.20 0.28
Nilotic 0.11 0.16 0.17 0.27 0.00 0.10 0.07 0.17 0.08 0.14 0.13 0.07
Biaka 0.13 0.11 0.22 0.31 0.10 0.00 0.07 0.21 0.09 0.09 0.18 0.07
W African 0.09 0.13 0.19 0.28 0.07 0.07 0.00 0.17 0.07 0.12 0.14 0.05
SW Asian 0.14 0.25 0.05 0.19 0.17 0.21 0.17 0.00 0.14 0.25 0.06 0.18
Sandawe 0.10 0.13 0.15 0.25 0.08 0.09 0.07 0.14 0.00 0.13 0.12 0.07
Mbuti 0.18 0.13 0.26 0.36 0.14 0.09 0.12 0.25 0.13 0.00 0.22 0.12
Mozabite 0.12 0.23 0.06 0.20 0.13 0.18 0.14 0.06 0.12 0.22 0.00 0.14
Bantu 0.10 0.13 0.19 0.28 0.07 0.07 0.05 0.18 0.07 0.12 0.14 0.00

Here’s the genetic distance between non-African groups and African ones on a bar plot.

Some consistent trends:

– Mbuti and Khoisan show the largest distance from non-Africans.

– Biaka are next. Again, this may be due to admixture between Biaka and neighboring groups, or, a closer relationship between the Biaka Pygmies and the non-Khoisan/Mbuti African groups with reference to the last common ancestors.

– Roughly equal distance of Bantus and West Africans.

– Marginally smaller distances between the Nilotic cluster and non-Africans.

– Finally, a consistently smaller difference between non-Africans and the Sandawe cluster.

As always we need to remember that these probably aren’t pure concrete real ancestral groups. I have no hesitation in presuming some low level consistent gene flow over time between the western Mediterranean groups of which Mozabites are part and some of the Nilotic populations in north-central Africa. This equilibration of gene frequencies would reduce the Fst value naturally. Second, the relative closeness of the Sandawe cluster jumped out at me initially when I looked at the African data. It just strikes me as weird.

Here’s Wikipedia on the Sandawe:

The Sandawe are an agricultural ethnic group based in the Kondoa district of Dodoma Region in central Tanzania. In 2000 the Sandawe population was estimated to number 40,000.

The Sandawe language is a tonal language with clicks, apparently related to the Khoe languages of southern Africa. Recent research suggests that the ancestors of the Khoe were pastoralists, and migrated into southern Africa from the northeast, perhaps from the region of the modern Sandawe.

But the Sandawe don’t seem to be that close to the South African Bushmen samples. Here’s a multidimensional scaling of the Fst relationships of selected inferred ancestral African groups (weight the x-axis more):

An aspect of PCA plots which always jumps out you is the gap between African groups and non-African ones, often spanned by populations which have likely recent admixture. One hypothesis to explain this is that there’s been little gene flow between Africa and the rest of the world since the Out of Africa event. Probably due to ecology (the Sahara). But here’s another explanation: the Bantu expansion has wiped clean much of the genetic variation of central and eastern Africa, the very variation which might span in part the African vs. non-African gap. The archaeology and anthropology indicate that both the groups currently dominant in much of eastern Africa and down to the south, the Bantu and Nilotic peoples, are intrusive on the scale of the past 3,000 years. So groups like the Hadza and the Sandawe are presumed to be relics of the older cultural and genetic variation. This may be why the Sandawe are closer to Eurasians than other African groups once you control for clear likely admixture (e.g., the Fulani). Or, it may be that the Sandawe themselves have an older admixture event due to back-migration from Eurasia….

Finally, let me leave you with a bunch of MDS plots which visualize the Fst differences.


CATEGORIZED UNDER: Genetics, Genomics
  • onur

    Interesting results. Though it would be better if you included Hadza and Khoe in your analyses (especially Fst). BTW, what is your general take on the genetics of Afro-Asiatic (including Semitic among others) speaking Blacks?

  • http://blogs.discovermagazine.com/gnxp Razib Khan

    Interesting results. Though it would be better if you included Hadza and Khoe in your analyses (especially Fst).

    :-) so you and german can have a long discussion? i kind of wondered about doing that too.

    BTW, what is your general take on the genetics of Afro-Asiatic (including Semitic among others) speaking Blacks?

    the hausa are only marginally modified from yoruba and igbo. that seems some sort of elite diffusion of dialect, the inversion of the fulani, who seem *more* extra-sub-saharan than the hausa, but speak a niger-congo language. i guess i lean to a north african/west asian origin for the group, but my confidence is pretty shaky at this point.

  • onur

    so you and german can have a long discussion?

    Why not? :D Joking aside, I am much more open to different possibilities than I might have seemed to be in my discussions on this blog.

    the hausa are only marginally modified from yoruba and igbo. that seems some sort of elite diffusion of dialect, the inversion of the fulani, who seem *more* extra-sub-saharan than the hausa, but speak a niger-congo language. i guess i lean to a north african/west asian origin for the group, but my confidence is pretty shaky at this point.

    My general impression is that Afro-Asiatic Blacks are a genetically pretty heterogeneous bunch, they have no unity (even without the Semitic speaking ones).

  • http://washparkprophet.blogspot.com ohwilleke

    The observtions in the original post on the Sandawe are very insightful, particularly when coupled with the recent observation that the oldest Y-DNA hg A/hg B lineages are now found in East Africa.

    * * *

    Another big picture observation worth making is that Bantu, unsurprisingly given Bantu origins, has an Fst distance from West African (0.05) that is lower than any other in the genetic distance chart and West African admixture is shown everywhere Bantu ancestry is known to exist even though we know that the sole source of West African admixture in the non-West African populations is almost exclusively Bantu.

    If one merges the Bantu and West African components as one category, you are left with three black African components that are roughly equidistant – West African, Nilotic and Sandawe, and the pruned K chart looks a lot more straightforward.

    * * *

    There Fulani are a population that I expect from experience to be pretty atypical of non-Bantu Niger-Congo language speakers. This is unsurprising because the Fulani pretty much span across the entire West African boundary between Afro-Asiatic linguistic or Nilo-Saharan linguistic areas and Niger-Congo linguistic areas across the African Sahel, buffering the rest of West Africa from direct contract with outside population influences from the North. To use a crude analogy, they are the welcome mat at the front door that the keeps the rest of the Niger-Congo homeland from getting muddy. Unraveling the demographic histories of populations in boundary areas is generally going to be harder than doing it for core areas.

    Other populations and geographic features buffer most non-Bantu Niger-Congo language speakers from much direct interactions with populations to the East. The Congo jungle and Bantu populations seem to be buffer to the South.

    There is also a quite recent modern trend of ethnic identity fusion between the Afro-Asiatic Chadic language speaking peoples such as the Hasua and the Niger-Congo language speaking Fulani due to their common interests as pastoralists of the Sahel that is leading to higher rates of admixture than one might otherwise expect.

    * * *

    The near total absence of West African/Bantu components in Ethiopian populations is surprising. While the combined SW Asian/Mozabite share is about what I would expect in Ethiopia, the fact that all of the Subsaharan African contribution can be attributed to Nilotic and Sandawe components is not. Low resolution uniparental markers do not so unequivocally rule out West African/Bantu contributions in Ethiopia, and usually autosomal measures make populations look more rather than less admixed than uniparental ones do.

    There is no component that strongly distinguishes Chadic language speaking peoples (such as the Mada) from a generalized SW Asian/Mozabite or Nilotic or West African component.

    This seems to suggest that the West African/Nilotic/SW Asian and Mozabite divide may have roots in deep African population structure; but that the Afro-Asiatic language family does not have a single coherent population structure. There is no unified single component that unites the Semitic, Berber, Chadic and Cushitic language families, unlike Khoisan, Sandawe, Niger-Congo and Nilotic language famillies which each have a clearly identificable common genetic thread that unifies almost all speakers of those languages, at least that the K=12 level.

    Echoing onur’s comment no. 3, the genetics seem to support a theory of Afro-Asiatic language expansion in which the connections between its major linguistic subfamilies (which have long been controversial) are cultural rather than resulting from demic expansions (although demic expansion is visible within some of the subfamilies). There is very little genetic overlap between the historically Semitic language Saudis, the historic Berber language Mozabites, and the Chadic language speaking Mada.

    The genetic contribution from SW Asian associated with Ethio-Semitic in Ethiopia is pretty distinctive, although intra-Ethiopian breakdowns seems to show a residual SW Asian element in Cushitic populations apart from the Ethio-Semitic SW Asian element. So, Ethiopia is bit trickier to parse, as is the historically Coptic language family area.

  • http://blogs.discovermagazine.com/gnxp Razib Khan

    #4, thanks for that comment. i am not too versed in the ethnography of sub-saharan african, so i get confused a lot ;-)

  • Lank

    The near total absence of West African/Bantu components in Ethiopian populations is surprising. While the combined SW Asian/Mozabite share is about what I would expect in Ethiopia, the fact that all of the Subsaharan African contribution can be attributed to Nilotic and Sandawe components is not. Low resolution uniparental markers do not so unequivocally rule out West African/Bantu contributions in Ethiopia, and usually autosomal measures make populations look more rather than less admixed than uniparental ones do. “

    While there are mtDNA haplogroups in Ethiopia that are attributed to ancient West African gene flow (for example, mtDNA haplogroup L2b), these are ancient enough that they can be distinguished from West African lineages. The reason for the lack of West African/Bantu autosomal affinities in the Horn of Africa is that the ancient admixture from other parts of Africa is already adequately described by the Sandawe/Nilotic clusters, populations who have higher ancient West African contributions if uniparental markers are to be believed.

    Another reason is that Bantu influence, which came to other parts of East Africa relatively recently, is virtually nonexistent. No uniparental markers in Ethiopia have been linked to the Bantu expansion, and the mtDNA haplogroups that are linked with West Africa are not associated with Bantus. If I recall correctly, one single exception to this rule has been found among the many Ethiopians that have been sampled; a Y-DNA E1b1a sample in a miscellaneous southern Ethiopian group of samples. Not a surprise that these exceptions would exist there due to Kenyan proximity.

  • http://dioegenesartemis.blogspot.com/ Diogenes

    I think there’s definitely a cline in North/East/West Africa with relations through the Green Sahara.
    Maybe Mozabites really are something like the “Fulani from the North”. They don’t appear as their own component because they’re unadmixed, but rather because among their admixture (mostly Western Asian/Egyptian) they have a distinctive ancestral population not present elsewhere (except in other North Africans and the Fulani). I think this might be the western Saharan refugee population.
    Also this population, or an admixed population with it as an ancestral component, seems to have migrated to Iberia probably during the early Neolithic, and is now a small component here. This aggregated into “Nile Core” in my analysis, and is definitely related to it, but further along the cline.
    The Sandawe are highly unusual in being agriculturalists despite their San like language, and maybe they are partly derived from Southeastern Green Saharan populations.
    It appears the Saharan pump has been an important factor not only in connecting Africans to the rest of the World, but as a mechanism provoking vast population movements, exchanges and genetic change in Africa itself.

  • onur

    The Sandawe are highly unusual in being agriculturalists despite their San like language, and maybe they are partly derived from Southeastern Green Saharan populations.

    The Sandawe began to settle and adopt agriculture only beginning from the colonial times, during the late 19th century to be specific (they were all fulltime hunter-gatherers before then), thus through Western influence directly or indirectly and their transition to agriculture and settled life completed as recently as the 1970s and only with the compulsion of the Tanzanian government over them to settle and adopt agriculture. Despite that, they still preserve much of their hunter-gatherer ways.

    http://areainfo.asafas.kyoto-u.ac.jp/english/activities/fsta/17_yatsuka/root.html

  • Eze

    ’’The near total absence of West African/Bantu components in Ethiopian populations is surprising. While the combined SW Asian/Mozabite share is about what I would expect in Ethiopia, the fact that all of the Subsaharan African contribution can be attributed to Nilotic and Sandawe components is not. Low resolution uniparental markers do not so unequivocally rule out West African/Bantu contributions in Ethiopia, and usually autosomal measures make populations look more rather than less admixed than uniparental ones do.’’

    The Bantu expansion only reached Kenya roughly by the 1st century AD. That’s relatively recent. The Bantus never penetrated further than Lake Turkana and the Tana River. Thus Horn African populations were geographically well separated from Bantus for thousands of years. Bantu expansion associated haplogroups are also exceedingly rare in the Horn of Africa (north of the Lake Turkana – Tana river boundary).

  • Eze

    So groups like the Hadza and the Sandawe are presumed to be relics of the older cultural and genetic variation. This may be why the Sandawe are closer to Eurasians than other African groups once you control for clear likely admixture (e.g., the Fulani). Or, it may be that the Sandawe themselves have an older admixture event due to back-migration from Eurasia….

    Naturally the indigenous (pre-Bantu) populations of East Africa (such as the Sandawe) should be genetically closer to Eurasians compared to South- (Khoisan) or West (Niger-Congo) Africans , as we know that the Out-of-Africa migrants branched off from archaic East Africans (L3m and L3n).

  • http://www.kinshipstudies.org German Dziebel

    “Naturally the indigenous (pre-Bantu) populations of East Africa (such as the Sandawe) should be genetically closer to Eurasians compared to South- (Khoisan) or West (Niger-Congo) Africans , as we know that the Out-of-Africa migrants branched off from archaic East Africans (L3m and L3n).”

    But since indigenous South Africans are likely derived from indigenous East Africans (http://mbe.oxfordjournals.org/content/early/2011/04/04/molbev.msr089.short), the proximity of the latter to Eurasians likely suggests that the migration was into Africa and not out of Africa. Eurasians > East Africans (Maasai, Hadza, Sandawe) > South Africans (San).

  • Charles Nydorf

    I like the idea that neolithic expansions in east and central Africa are responsible for the discontinuity between sub-Saharan populations and those of the rest of the world.

  • onur

    But since indigenous South Africans are likely derived from indigenous East Africans (http://mbe.oxfordjournals.org/content/early/2011/04/04/molbev.msr089.short), the proximity of the latter to Eurasians likely suggests that the migration was into Africa and not out of Africa.

    No, it says nothing about the direction of the migration between Africa and Eurasia. It has long been thought by many Out-of-Africa supporters that East Africa is the homeland of modern humans and by even more of them that West Africans (including Bantus from anywhere) and South Africans are genetically more distant to proto-Eurasians and consequently to modern Eurasians than non-Bantu East Africans (even when we exclude Semitic and even Afro-Asiatic speakers among them) are.

    BTW, the genetic position of Hadza isn’t clear, there are many conflicting results. Their extremely small population size and bottlenecks complicate the matter even more.

  • http://www.kinshipstudies.org German Dziebel

    “West Africans (including Bantus from anywhere) and South Africans are genetically more distant to proto-Eurasians and consequently to modern Eurasians than non-Bantu East Africans (even when we exclude Semitic and even Afro-Asiatic speakers among them) are.”

    In the case of South African Khoisans, more distant because more basal than East Africans – that’s one if the tenets of the out-of-Africa theory. There was time when mtDNA M1 found only in East Africa was considered to be the basal branch within the non-African macrohaplogroup M. Then M1 was demonstrated to be the result of a back-migration. Now, at least from the point of view of Y-DNA the A and B branches found among Khoisan are thought to be derived from East Africa. Hence, another “back-migration” into – now – South Africa. We could of course think of East Africa as a place from which all migrations take place at different times, but, under a different demographic scenario, Africa is not a refuge for archaic lineages but a sink for all kinds of lineages, and South Africa is even more of a sink – more distant from Eurasia because more derived than East Africa.

  • onur

    I think we cannot pin down exactly in which region of Africa modern humans originated, if indeed they originated in a specific region, based on available data. Current African populations do not represent the original modern humans (who probably lived in Africa and/or somewhere nearby), as Africa too has changed much genetically over time.

  • http://www.kinshipstudies.org German Dziebel

    “I think we cannot pin down exactly where in Africa modern humans originated, if indeed they originated in a specific region, based on available data.”

    Following the same logic, we can’t pinpoint on which continent humans originated. If genetics is a tool that tells us that humans originated in Africa, then this very tool should be able to tell us which part of Africa they originated in. If the principle of decreasing diversity and serial bottlenecks works on the global scale, it should work on the regional scale. Maybe we should just admit that we must have come from terra incognita.

  • onur

    Following the same logic, we can’t pinpoint on which continent humans originated. If genetics is a tool that tells us that humans originated in Africa, then this very tool should be able to tell us which part of Africa they originated in. If the principle of decreasing diversity and serial bottlenecks works on the global scale, it should work on the regional scale.

    Following your logic, if we are able to detect the continent of origin, then we should at the same time also be able to detect the exact geographical coordinates of origin on the scale of a village(!), which is of course nonsensical. So it is completely a matter of scales. As new data accumulate, we get to finer scales on the issue of modern human origins.

  • Eze

    But since indigenous South Africans are likely derived from indigenous East Africans (http://mbe.oxfordjournals.org/content/early/2011/04/04/molbev.msr089.short), the proximity of the latter to Eurasians likely suggests that the migration was into Africa and not out of Africa. Eurasians > East Africans (Maasai, Hadza, Sandawe) > South Africans (San).

    The available complete genome sequence of a Southern Khoisan man suggests they are by far the most distant African population compared to Eurasians. The split between proto-SSA (x S-Khoisan) and -OOA likely occurred after the split between proto-S-Khoisan and proto-SSA (x S-Khoisan)+OOA.

  • http://www.kinshipstudies.org German Dziebel

    “proto-SSA”

    What does SSA stand for? Sub-Saharan African? If so, then I think South Khoisan developed their specificity after breaking off from the rest of Sub-Saharan Africans who in turn retained their affinity with Eurasians. That’s what put Khoisan in a more distant position from Eurasians. It’s unlikely that the Khoisan remained underived and isolated during the past 200,000 years. They are fully modern linguistically, culturally and biologically. Also, in the recent paper that onur and I discussed ad infinitum, Hadza exceeded South Khoisan in Fst from Europeans (Tuscans). Hence, it’s likely that this is going to vary between different African and non-African groups.

  • Eze

    Yes, that’s what I meant. It’s either that or the South Khoisan admixed with an archaic South African group which is no longer present in unadmixed form, increasing their genetic divergence.

    PCA suggests the Hadza are closer to Eurasians than the South Khoisan on the first dimension, but Fst tells us they are not closer. However, Fst is prone to sample size bias. IMO, a better method would be ASD.

  • http://www.kinshipstudies.org German Dziebel

    “It’s either that or the South Khoisan admixed with an archaic South African group which is no longer present in unadmixed form, increasing their genetic divergence.”

    Yes, it’s possible.

    “Fst is prone to sample size bias. IMO, a better method would be ASD.”

    I’d love to see how different the results of Fst and ASD are in Africa, with a Hadza sample included. A global comparison of Fst vs. ASD distance methods applied would also be good to see. Do you have any paper, etc. in mind that did Fst next to ASD for Africa or the world? What I know is that it’s typical for small, isolated populations to have huge Fst. Hadza’s Ne is much smaller than that of South African Khoisans.

  • Eze

    It would indeed be interesting to compare Fst to allele sharing distance (ASD) scores of various African groups. I do not have any sources atm. Perhaps it’s an idea for Razib to look further into. ASD tools are freely available.

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com

ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »