Last year a paper came out in Science which made a rather large splash, The Genetic Structure and History of Africans and African Americans by Tishkoff et al. Since it’s more than a year old I recommend that those of you curious about the details of the paper and don’t have academic access go through the free registration, as you can then read it in full. Unlike Reich et al. the Science paper didn’t unveil a new method of analysis. It was the standard bread & butter, with PCA’s & STRUCTURE plots & phylogenetic trees. But the coverage of populations within Africa was massive. They had a lot of results and relationships to cover, and ended up with a 100 page supplement.
I commend the whole paper to you. But there are two elements I want to highlight. First, a three dimensional PCA plot. It has the first, second and third principal components of variation. In other words, the three largest independent dimensions in terms of explanatory power of genetic variation. Panel A includes all world populations, and panel B just Africans.

For panel A, PC1 = 20% of the variance, PC2 = 5%, and PC3 = 3.5%. For panel B the PCs didn’t drop off quite so much, PC1 = 11%, PC2 = 6%, PC3 = 5% and PC4 = 4%. In case you don’t know, the Hazda are Africa’s last obligate hunter-gatherers, and speak a language with clicks in it, just as the Bushmen do. The big division highlighted in this paper is that between the “indigenous” relict populations, the Hazda, Sandawe, Bushmen and Pygmies, and those who belong to the more widespread agriculturalist and pastoralist societies of Africa. Implicit within the paper is the model of a Bantu Expansion of farmers, as well as a possible later Nilotic expansion (which brought the Tutsi and Masaai) of herders, in a north-south direction. In the process they assimilated/and or/displaced the indigenous populations, of whom the aforementioned peoples are relict islands persisting in ecologically isolated or unfavorable domains.
The map to the left shows the population coverage within this paper of African groups. The pie graphs simply show ancestral quanta as inferred by STRUCTURE. You can read the paper for the blow-by-blow. But ultimately it seems there will be need for a finer-grained coverage to the south of the equator. If the Bantu expansion is as recent as archaeologists and linguists assume, on the order of ~2,000 years ago, then the gradients of genetic signals should persist. From what I can tell it is assumed on both genetic and phenotypic grounds that the Xhosa have a higher load of Khoisan ancestry than the Zulu or Tswana. The Bantu Expansion is recent enough that the semi-legendary Phoenician circumnavigation of Africa would have encountered many Khoisan peoples along the eastern coast.
Below are a selection of figures from the above paper. After selecting an image it is probably best to hit F11 for “Full Screen” if you aren’t a on a very big monitor (you can copy image location and view it in a separate window as well).

Razib Khan’s degrees are in biochemistry and biology. He has blogged about genetics since 2002, previously worked in software development, is an Unz Foundation Junior Fellow and lives in the western US. He loves habaneros.

August 22nd, 2010 at 2:01 pm
[...] This post was mentioned on Twitter by razib khan, World Amazing Things. World Amazing Things said: Genetic variation within Africa (and the world) | Gene Expression: Last year a paper came out in Science which mad… http://bit.ly/9aKIiC [...]
August 22nd, 2010 at 5:49 pm
I don’t mean to bring up a tangential point to the post, but why does the field of human genetics use PCA to visualize relationships? When I see plots like those shown here that have a ‘geometric pattern’ to them (the sharp right angles; another common pattern is a Y-shape), that tells me that there are lots of samples with zeros for many of the Y-variables (i.e., alleles that are unique to certain populations). Thus, the spatial arrangement of the points is largely an artifact of an inappropriate method: how does one calculate a correlation matrix when many of things one is correlating have values of zero?
If one really was keen on using PCA, one could calculate a pairwise distance matrix and then use that instead of the correlation matrix (Principal Coordinates Analysis).
Just curious (Really. I’m not implying that TEH DARWINISMZ ARE FALSIFIED! or anything ludicrous like that).
August 23rd, 2010 at 6:19 am
“The big division highlighted in this paper is that between the “indigenous” relict populations, the Hazda, Sandawe, Bushmen and Pygmies .” There’s no doubt that Hadza, Sandawe and San are distinct from other African populations. The authors do include Pygmies into the mix, but this inclusion looks forced and derived from a long-standing belief that Pygmies, because they are short, must have lost their original languages. At some point the authors reveal the following: “Both language and geography explained a significant proportion of the genetic variance, but differences exist between and within the language families (table S5 and fig. S33, A to C) (4). For example, among the Niger-Kordofanian speakers, with or without the Pygmies, more of the genetic variation is explained by linguistic variation (r2 = 0.16 versus 0.11, respectively; P < 0.0001 for both) than by geographic variation (r2 = 0.02 for both; P < 0.0001 for both).” It looks like Pygmy genetic and linguistic affiliations are in sync, which means that either the Niger-Congo family is very old (and the Bantu expansion took place much earlier than we think), or, more likely, that Pygmies have always spoken Niger-Congo (and Bantu) languages and are not a relic population but a foraging “arm” of the Niger-Congo expansion. In mtDNA and Y-DNA terms, Pygmies and Bantu belong to different closely related subclades of the same clades, which is in perfect alignment with their linguistic kinship. In Razib’s “Genetic Distance Tree 1″ Pygmies and Bantu again cluster next to each other, with San populations being an outgroup for both.
Tishkoff et al. observe close correspondence between genes and languages in Africa, and this is one good case in which languages and genes tell the same story. This story is different, however, from the common belief that Pygmies are relic African populations with roots Upper and Middle Pleistocene.
Another quote of note: “Thus, modern humans have existed continuously in Africa longer than
in any other geographic region and have maintained relatively large effective population sizes,
resulting in high levels of within-population genetic diversity (1, 2). Africa contains more than
2000 distinct ethnolinguistic groups representing nearly one-third of the world’s languages (3).
Except for a few isolates that show no clear relationship with other languages, these languages
have been classified into four major macrofamilies…” The fact languages and genes in Africa seem to be closely aligned with each other and that African linguistic diversity falls into a limited number of families again suggests that African genetic diversity may not be as old as it’s usually assumed. In contrast, in America between-population diversity is high (see Razib’s recent posts on Fst), languages fall into 140 language families, which may suggest high antiquity.
August 24th, 2010 at 12:05 am
[...] the Mad Biologist, whose bailiwick is the domain of the small, asks in the comments: I don’t mean to bring up a tangential point to the post, but why does the field of human [...]
August 29th, 2010 at 1:40 am
[...] Last weekend I mentioned a paper, The Genetic Structure and History of Africans and African Americans, which had the best coverage of disparate African populations we’ve seen so far. The map to the left shows the various ancestral population clusters inferred from the samples they had. Really the only failing is that they didn’t have samples from Angola, Zambia, Zimbabwe and Mozambique. Unfortunately, that’s not totally trivial. These are regions which were effected by the Bantu Expansion, with southern Angola in particular still having remnants of Khoisan language speakers which likely attest to the pre-Bantu populations. Luckily for us innovation and scientific ingenuity are such that minor questions can quickly be answered because of how cheap the basic methods have become. A new paper in The European Journal of Human Genetics tackles Mozambique in particular, and discerns a heretofore unknown possible population cluster. A genomic analysis identifies a novel component in the genetic structure of sub-Saharan African populations: Studies of large sets of single nucleotide polymorphism (SNP) data have proven to be a powerful tool in the analysis of the genetic structure of human populations. In this work, we analyze genotyping data for 2841 SNPs in 12 sub-Saharan African populations, including a previously unsampled region of southeastern Africa (Mozambique). We show that robust results in a world-wide perspective can be obtained when analyzing only 1000 SNPs. Our main results both confirm the results of previous studies, and show new and interesting features in sub-Saharan African genetic complexity. There is a strong differentiation of Nilo-Saharans, much beyond what would be expected by geography. Hunter-gatherer populations (Khoisan and Pygmies) show a clear distinctiveness with very intrinsic Pygmy (and not only Khoisan) genetic features. Populations of the West Africa present an unexpected similarity among them, possibly the result of a population expansion. Finally, we find a strong differentiation of the southeastern Bantu population from Mozambique, which suggests an assimilation of a pre-Bantu substrate by Bantu speakers in the region. [...]