In the post yesterday I reported what was generally known about the Horn of Africa, that its populations seem to lie between those of Sub-Saharan African and Eurasia genetically. This is totally reasonable as a function of geography, but there are also suggestions that this is not simply a function of isolation by distance (i.e., populations at position 0.5 on the interval 0.0 to 1.0 would presumably exhibit equal affinities in both directions due to gene flow). For example, you observe the almost total lack of “Bantu” genetic influence on the Semitic and Cushitic populations of the Horn of Africa, and the lack of Eurasian influence in groups to the south and west of the Horn except to some extent the Masai.
Tacking horizontally in terms of discipline, over the past few generations there has been a veritable cottage industry making the case for the recent origin of many ethno-linguistic populations through a process of cultural self-creation. Clearly there are many cases of this, some of them studied in depth by anthropologists (e.g., the shift from Dinka to Nuer identity). But there has been an unfortunate tendency to over-generalize in this direction. In some ways this is peculiar insofar as these models presuppose the infinite plasticity of culture without observing the sharp and strong norms which those very same phenomenon can enforce. The genetic isolation of non-Muslims in the Middle East after the rise of Islam seems rather well validated by the evidence from genomics. The norms of both Muslims and non-Muslims strongly biased them toward endogamy, and nature of Islamic hegemony and domination was such that Muslims were the ones who were likely to have cosmopolitan affinities with the “Islamic international.” In contrast, non-Muslim minorities began a long process of involution after the Islamic Arab conquests, only disrupted in the past century by emigration and to a lesser extent emancipation.
So back to the Horn of Africa. The vast majority of the people of the Horn of Africa speak an Afro-Asiatic language. Arabic and Hebrew are the most famous members of this group, but it is a very broad classification, ranging from the dialects of the Berbers in the Maghreb all the way to ancient Akkaddian. There are two large subfamilies of particular note and interest here: Semitic and Cushitic. The map above shows the distribution within the Horn of Africa. One can “quick & dirty” summarize the pattern here by observing that Semitic languages in Ethiopia tend to be concentrated in the north-central Christian highlands, while Cushitic is found everywhere else. Additionally, there is the confluence between religion and ethnicity, as there are Cushitic Muslims (Somalis, Afar, etc.) and Cushitic Christians (many Oromo, etc.). From what I can gather many Cushitic social and political elites have had a tendency toward assimilating into an Amhara Semitic identity (Haile Selassie’s mother was a Muslim Oromo). We could therefore generate a possible model where Semitic langauges arrived late to Ethiopia and spread through elite emulation, so the difference between Semitic and Cushitic peoples should be marginal in the genomic dimension (such as the marginal differences between Hausa and Yoruba in Nigeria). Or, we could posit that the Semitic element is distinctive from a pre-existent Cushitic substratum.
To make a long story short by running more ADMIXTURE with a Horn of Africa centered data set I have discerned that one can actually differentiate Cushitic and Semitic elements in the Horn and tentatively identify them with different ancestral components. First, the technical details….
Since I started up the African Ancestry Project one of the primary sources of interest has been from individuals whose family hail for Northeast Africa. More specifically, the Horn of Africa, Ethiopia, Eritrea, and Somalia. The problem seems to be that 23andMe’s “ancestry painting” algorithm uses West African Yoruba as a reference population, and East Africans are often not well modeled as derivative of West Africans. So, for example, the Nubian individual who I’ve analyzed supposedly comes up to be well over 50% “European” in ancestry painting. Then again, I”m 55-60% “European” as well according that method! So we shouldn’t take these judgments to heart too much. Obviously something was off, and thanks to Genome Bloggers like Dienekes Pontikos we know what the problem was: the populations of the Horn of Africa have almost no distinctive “Bantu” element to connect them with West Africans like the Yoruba. Additionally, a closer inspection shows that the “Eurasian” component present in these populations is very specific as well, almost totally derived from Arabian-like sources. When breaking apart the West Eurasian populations it is no surprise that Northern Europeans and Arabians are among the most distant pairs, even excluding recent Sub-Saharan African admixture. The HapMap Utah European American sample and the Nigerian Yoruba are very suboptimal for people with eastern African background. In contrast, African Americans are a mixture of West Africans and Northern Europeans, so the ancestry painting algorithm has nearly perfect reference populations for them. The results for African Americans may not be very detailed and rich, but they’re probably pretty accurate at the level of grain which they’re offering results.
Though I’m happy to give people of Northeast African ancestry more detailed results than 23andMe, one of my motivations for the African Ancestry Project was to obtain a data set which would allow me to explore the genomic variation in the east of Africa myself. This region is a strong candidate for “source” populations for non-Africans within the last 100,000 years, and, it seems to have experienced rapid population turnover within the last 2,000-3,000 years. My data set is not particularly adequate to my ambitions, yet. But I do now have 5 unrelated Somalis. To my knowledge there hasn’t much exploration of Somali genomics using thick-marker SNP chips, so why not? N = 5 is better than N = 0 in these cases of extreme undersampling.
Before I proceed to methods and results, I want to note that I put up most of my files here. It’s a ~25 MB compressed folder with images, spreadhseets, as well as raw output from ADMIXTURE and EIGENSOFT. I hope readers will take this as an invitation to poke around themselves.