Proper methods and false results

By Razib Khan | May 23, 2011 12:07 am

ResearchBlogging.orgThe Pith: Honorable intent and punctilious adherence to proper form and method does not guarantee a set of results which flesh out a genuine phenomenon. Much of science is tragic.

Most of the time I point to and review papers on this weblog which excite me. But in the interests of “balance” and dampening the bias toward material I find interesting and salient I thought it would be interesting to look at a paper which I thought wasn’t too interesting. It’s in the Journal of Human Genetics, part of the Nature Publishing Group empire. Also, it is open access, so you can read it yourself and make your own individual judgments.

The Soliga, an isolated tribe from Southern India: genetic diversity and phylogenetic affinities:

India’s role in the dispersal of modern humans can be explored by investigating its oldest inhabitants: the tribal people. The Soliga people of the Biligiri Rangana Hills, a tribal community in Southern India, could be among the country’s first settlers. This forest-bound, Dravidian speaking group, lives isolated, practicing subsistence-level agriculture under primitive conditions. The aim of this study is to examine the phylogenetic relationships of the Soligas in relation to 29 worldwide, geographically targeted, reference populations. For this purpose, we employed a battery of 15 hypervariable autosomal short tandem repeat loci as markers. The Soliga tribe was found to be remarkably different from other Indian populations including other southern Dravidian-speaking tribes. In contrast, the Soliga people exhibited genetic affinity to two Australian aboriginal populations. This genetic similarity could be attributed to the ‘Out of Africa’ migratory wave(s) along the southern coast of India that eventually reached Australia. Alternatively, the observed genetic affinity may be explained by more recent migrations from the Indian subcontinent into Australia.

To be blunt about it I think the researchers here just randomly stumbled onto a weird result which happened to align with some plausible preconceptions. This happens all the time, and is responsible for the unfortunate confirmation bias which plagues science. Researchers know very well what the expected results are, and may unconsciously or consciously sift through their data for a set of facts which align well with their theoretical preconceptions. In this case it isn’t quite so bald, as there are no orthodoxies, but a set of alternative hypotheses which go back a century or so.


The back story is the idea of the Australoid race, first conceived of by Thomas H. Huxely. To the left is a map which illustrates the original divisions of mankind as inferred by Huxley from his catalog of human characters. I haven’t included the labels because they should be rather intuitive. Observe the similar shading of Australia and a portion of India. This is as economists might say a ‘stylized fact,’ it captures the basic nugget of truth, but shouldn’t be taken as a strict concrete representation of reality. The fact is that it is obvious that upon visual inspection many South Asians, especially those termed adivasi, the “tribal” population which has customarily existed on the margins or outside of the Hindu caste system, bear some resemblance to Australian Aborigines. Additionally some anatomists adduced that there were similarities in the skeletal morphology and the like. I can’t evaluate that, but there’s a long tradition in biological anthropology which asserts that there is some connection between the peoples of Australia, and a substrate element in South Asia. Many South Asians I know can see this resemblance as well, so it isn’t as if this was “invented” by Thomas H. Huxley from his fertile mind.

More recently there has been the idea that the Out of Africa migration was characterized by a “southern wave” which skirted the coastlines of the Indian ocean, and pushed all the way to Australia. The reason that this rapid maritime migration has been posited is that the residence of modern humans in Australia is of long standing, on the order of ~50,000 years. In a traditional genetic model of the emergence of modern humanity that left barely any time between the rise of modern humans in Africa and their arrival in Australia (in contrast, anatomically modern humans didn’t arrive in Europe until after 40,000 years before the present, and perhaps a bit later). Obviously any migration of humans from Africa to Australia would have had to touch base in India. Therefore genetic anthropologists went looking, in particular they focused on the mitochondrial and Y chromosomal lineages. Eventually they found what they were looking for. At low frequencies in India they detected possible connections to Australian haplogroups. In other words, the ancestors of Australian Aborigines who had no doubt touched down in India left some descendants in India.

The idea of a southern migration of neo-Africans ~50,000 years ago naturally allowed one to bridge Huxley’s model of an “Australoid race” derived from pre-cladistic taxonomy to the methods of modern genetics. And conveniently for the purposes of time depth the features of the Australoid race are more clearly represented amongst the tribal and low caste populations which are also presumed to have deeper roots in South Asia.

There are two major problems which jump out at me here though. The first is somewhat theoretical: how exactly does phenotypic continuity get maintained between populations which diverged ~50,000 years ago? According to the older model of modern human origins this isn’t really that much later than the last common divergence between all non-Africans, and perhaps even Africans. Did the Australian Aborigines and Indian tribal populations enter into a period of phenotypic stasis? There a rejoinder here: the connections between Indian tribal populations and Australian Aborigines are far more recent. The arguments, theses, and data to support this conjecture are all laid out in the paper. The most extreme adherents have suggested that in fact a migration occurred to Australia within the last ~5,000 years, which brought the dingo, and that that migration is the common source population of Australian Aborigines and Indian tribes. Both the genetic and archaeological data are tendentious which might support this model. The discussion in the text of the paper doesn’t go into the contention and frank politicization which occurred in regards to these theories in Australia. And why should they? It’s a journal of human genetics, not one of the social construction of science. But it’s important to keep in mind.

But the big issue is that as they note surveys of hundreds of thousands of SNPs don’t really show a connection between Aborigines and South Asians which are particularly supportive of any strong affinity between the two groups. Projects such as the Harappa Ancestry Project have huge data sets of South Asians, including tribal Indians. At low K’s there is some affinity between Papuans and South Asians, but this tends to go away at higher K’s. I do think there is some continuity and relationship between Oceanians (Australian Aborigines & Melanesians) and the genetic substrate of South and Southeast Asia, but it is far too attenuated to substantiate the persistence of an Australoid race.

So what’s going on with the results in this paper? As I note in the title the methods are in my opinion kosher from what I can tell. But the conclusion just doesn’t seem creditable. How to explain the failure of valid methods? First, they use 15 loci. Granted, these are hypervariable regions of the genome which should be ancestrally informative. But it’s still 15 markers! Very importantly the authors note in regards to the Australian Aborigine affiliated Indian tribe:

For example, they possess the lowest number of alleles (115) of all the reference worldwide populations examined…They also display the lowest average observed heterozygosity (0.75643)…The high degree of genetic homogeneity observed could also have been caused, in part, by their low status in the social hierarchy.

I think a plausible explanation for their genetic homogeneity is that like many Indian tribes they have low effective population sizes, and so lost most of their genetic variation because of drift. Take 15 markers, crank them through drift, and I don’t think it is implausible that you could random walk a population far away from its neighbors. Indian tribal populations in other analyses seem to exhibit a repeated pattern of strange results because of excessive inbreeding or some sort of population bottleneck in the recent past (think about how the Kalash of Pakistan often break out in their own genetic cluster).

This brings me back to my suspicion that this is just a false positive which bubbled up at the confluence of a preconceived model and the noise which is going to be an issue in any of these statistical genetic analyses. The authors know that Indian tribes should cluster with Australian Aborigines in some models. So when they see one of their several Indian tribal populations clustering with Aborigines on their 15 marker diagnostic, naturally this result is slotted into the prefab model. But as I have hinted before if you “mix & match” the populations in your data, modulate the marker thickness, and tweak parameters enough, you can “stumble” upon many explanatory models using these algorithms which infer genetic distance and ancestry. I suspect that other research teams using other tribal populations with other STRs may have stumbled onto weirder results, such as a cluster of Indian tribals with Sami or Greenlanders, which were just assumed to be ridiculous on the face of it. This particular result is obviously not ridiculous on the face of it, but I think looking at the full sweep of other genetic results we can discard it as being a good representation of the total genome affinity between these two populations. A reductio ad absurdum of this emphasis on a small marker set were the old attempts to construct races based on blood group distributions!

Finally, what about old Thomas H. Huxley and his Australoid race? I think that it’s probably convergent evolution. Humans come in a range of colors from pink to very dark brown. They don’t come in red or yellow or green. They’re tall or short. Their hair is curly or straight. And so on. In the finite set of possible variables you’re going to have many human populations which arrive at a convergence of traits, and so resemble each other despite lack of particularly recent common ancestry. The Ainu of Japan were once assumed to be a distant branch of the family of European peoples because of their lack of the distinctive characteristics of their Japanese neighbors. Even the early classical genetic markers disabused scientists of this possibility, and more recent genetic work seems to point a broad affinity with other Siberian populations. Similarly, despite superficial similarities between Melanesians and Africans, the two groups are not particularly close (in fact, most genetic distance measures seem to place Melanesians as more distant from Africans than West Eurasian populations, probably due to greater long term isolation).

These sorts of complications are why I’m so obsessed with emphasizing a caution about relying on a particular figure or paper as definitive on a given genetic question. In some domains results can be taken out of their proper context, but in the case of a statistical science there’s just a lot of randomness, and our pattern matching intuitions and culturally preconditioned expectations strongly predispose us to anchor onto confirming results. This is a major reason why I’m pretty dismissive and hostile to attempts to “win” arguments by dragging out a few citations. The unfortunately reality is that most results are either trivial or false, and with a search engine you can construct an argument with five supporting facts elementary school style within a few minutes.

This may “win” the argument, but you lose the war to “win” an understanding of reality.

Addendum: The undersampling of Australian Aborigine populations and South Asians in surveys of genetic variation softens the force of my critique here. It may be that the Soglia are a particular distinctive Dravidian tribe, which preserve a very ancient element in South Asian genetic history. Honestly I kind of doubt it after seeing the rampant admixture results among all South Asians in the most recent waves of SNP-chip studies (including the amateurs who are genome blogging). A bigger issue for me is the undersampling of Australian Aborigines. There may be variation which we’re just no aware of it. I doubt that that variation will be too surprising, but who knows?

Citation: Morlote DM, Gayden T, Arvind P, Babu A, & Herrera RJ (2011). The Soliga, an isolated tribe from Southern India: genetic diversity and phylogenetic affinities. Journal of human genetics, 56 (4), 258-69 PMID: 21307856

CATEGORIZED UNDER: Anthroplogy, Genetics, Genomics
  • http://www.daktre.com Prashanth

    Very nice analysis. It is amazing how there have been so many instances of ‘juicing’ up results needs to be done, either to match journal style or make studies more ‘publishable’. And in any case, under-sampling is a problem with most Dravidian tribes. We hardly know anything at all about so many of the Dravidian tribal people in and around BR Hills, Nilgiris and Anamalais. If you also see the previously published collections in the map, it shows how little evidence the study sits on and how amazingly dramatic conclusions it draws. Although, I am not at all well versed to understand the details of their analysis, but with such little information (even historical) known about the local migrations of the Soligas and related tribes, I find it strange to draw such ‘certain’ conclusions as to imply that they may be the oldest. How about minimal triangulation with recent history and archaeology? To make such conclusions, I wish the authors would have tried to be a bit more inter-disciplinary rather than merely relying on phylogenetics.

  • John Emerson

    *Honorable intent and punctilious adherence to proper form and method does not guarantee a set of results which flesh out a genuine phenomenon. Much of science is tragic.*

    Damn, I wish people had figured that out 50 years ago. Especially in the social sciences, we’ve suffered through half a century of methodologism, where correct methodology can validate science all by itself. The separation between vetting methods for correctness and evaluating the value of the results is often not made.

  • Ian

    In one sense, that’s the point of the game: “here’s what I found, prove me wrong”. The problem is that not everyone doing science realised that that’s the point of the game, and people trying to use science to come up with policy are even less prone to realise. And more prone to fixate on the “errors” and use that to discredit whole fields (or publish “Darwin was wrong” cover stories).

    I remember when I was a fairly new grad student, looking over a draft of a lab-mate’s thesis in which he had done a whole set of (Spearman? iirc) correlations between various site factors and tree growth. And found a few significant correlations, but discarded them based on the fact that at a 0.05 p value you’d expect at least that many spurious correlations. It felt so wrong to me at the time. And if it were my data, I’d probably still be bothered (though, of course, I’d hopefully use better statistical tools). It’s one thing for a stats prof to say that “statisticians are people who expect to be wrong 5% of the time”. It’s quite another to actually have to confront the fact that the person who’s wrong is you, not someone else.

  • http://washparkprophet.blogspot.com ohwilleke

    “how exactly does phenotypic continuity get maintained between populations which diverged ~50,000 years ago?”

    The most obvious answer would be low effective population size coupled with endogamy.

    To produce new phenotypes you need to add new alleles to the genetic mix via mutation or admixture. Mutation rates are a function of person-generations. A population with an effective size of 1,000 generates the same number of mutations in 2,000 generations as a population with an effective size of 100,000 does in twenty generations.

    Australian Aborigines has unparalleled endogamy and pretty low effective population sizes for 50,000 years, give or take. The Soliga people probably also had a low effective population sizes for most of the time period and while we don’t know precisely how endogamous they were, it isn’t impossible that they were quite endogamous.

    So, while skepticism is warranted, the conditions a basically almost as good as they can be for phenotype continuity for a long period of time in this situation. Moreover, even if populations do accumulate phenotype changing mutations over time, there are so many place that those mutations could have occurred, that they would not necessarily have been phenotype changing mutations that change the particular phenotypes that people cling to in seeing similarities between two groups of people.

    This said, there could be another reason for the apparent phenotypic similarity even if the genetic closely is mostly a fluke.

    Intuitive racial classification is based upon the visually obvious features that are ancestry informative. If the Australian Aborigines and the Soliga people where the main, high population clusters in the world, everyone would probably naturally be struck by the ancestry informative visual features that distinguish them that have arisen over time. But, instead, we have different cues that make people from China and Russia seem very different to us, but has a hard time distinguishing between Russians and Ainu peoples. Thus, even if there is significant phenotypic variation, if it is not of the kind we are tuned to see, we may perceive more similarity than there is at the genetic level.

    Indeed, the same issues come into play with markers chosen to be ancestry informative. If populations with very basal separations from other populations have developed a set of ancestry informative mutations that are no in the set we are used to looking at and have tuned our classification schemes to identify, we are prone to miss the full extent of the genetic distance between two populations.

  • http://blogs.discovermagazine.com/gnxp Razib Khan

    To produce new phenotypes you need to add new alleles to the genetic mix via mutation or admixture. Mutation rates are a function of person-generations.

    you can also just shift extant variation. small effective population results in a lot of genetic drift, and inbreeding usually produces fixation of weird phenotypes.

  • Ian

    If the Australian Aborigines and the Soliga people where the main, high population clusters in the world, everyone would probably naturally be struck by the ancestry informative visual features that distinguish them that have arisen over time

    When I first came across the idea that south Indian tribals were related to Australians, it was an “of course” type of moment – the sort of thing that should have been obvious but you’d just never noticed it. And while I don’t know Australian aboriginal features terribly well, I’m better with Indian faces (of various kinds) than am I with any other kind. And I feel pretty confident that there are elements* shared by the two groups, even if they are not closely related. Rather than attributing the elements to convergence, why not stabilising selection?

    Around the equator (and south of it), throughout the Palaeotropics, you find people who are dark-skinned with broad features and frizzy hair – Indian tribals, Filipino and SE Asian ‘Negritos’, Australians, Melanesians. While that suite of features may be selected for, it may also be that they are the ancestral human type, and that all the other people in the world represent the descendants of small groups of people who evolved in cooler climates, and then ‘reinvaded’ the tropics bringing agriculture and its demographic advantages.

    *Trinidadians, when they see someone new tend to do two things – put that person into a known racial category (which includes several mixed-race types) and then try to figure out what they “have in them”…if they’re mixed, what sort of mixture they are. Indo-Trinis especially will try to determine whether someone who looks mostly Indian is actually Indian or if they are mixed – curly hair and broad features get scrutinised for evidence of being ‘dougla’ (mixed African and Indian descent). With enough familiarity with various types of mixed people, you can see a difference between ‘dougla’ and “madrassi’ (south Indian).

    The fun thing, of course, was trying to partition out Indian ancestral types within my own family. My father and his 8 siblings ranged from light skinned “Chinese”-looking to dark skinned with broader features and curly hair. And with a host of cousins (including one set who shared three grandparents with my father) and their children (my second cousins) it’s actually possible to partition out a lot of variation by “type” (these are “family” features, while those come from the spouses).

  • Ian

    None of that is to say that I disagree with Razib’s doubts about the findings of the paper, by the way. Not at all sure that’s clear, given my perambulations.

  • AV

    Darn you, Razib! Way to go ahead and debunk their findings when I thought we might have well found our candidate for a peninsular Indian ASI population. On a more serious note, the tribals of the BR Hills for whatever it’s worth don’t look particularly Australoid, as in they do look quite “aboriginal” but they don’t look more aboriginal than say the Paniyas. This is based on my personal experience of course, as I stayed in a resort in the BR Hills a few years ago.

  • http://www.kinshipstudies.org German Dziebel

    The paper is important as it documents another case of a high interpopulation vs. low intrapopulation diversity combo. In Africa we usually see the opposite: high intrapopulation vs. low/moderate interpopulation diversity. The out of Africa theorists have mistaken the latter for the sign of African antiuity. In fact, it’s likely that Soliga in South Asia join Hadza in Sub-Saharan Africa as populations preserving a truly relic population structure associated with low effective population size and high levels of genetic drift. This population structure, most consistently preserved in the New World, represents the closest approximation to Mid-Pleistocene population realities that preceded the population growth and population amalgamation in Sub-Saharan Africa, Europe and SE Asia after 40K and later, during the Neolithic times (see, most recently, http://johnhawks.net/weblog). It’s therefore not surprising that a long-distance genetic connection between South Asian and Australian aborigines was preserved precisely in a relic, intragroup diversity population such as Soliga. It reminds me of the high frequency of mtDNA X haplotype in Druze, an genetic isolate in West Asia, and its long-distance connection to North America. As it was first pointed out by Edward Sapir, similarities between forms found in areas widely separated geographically are likely due to ancient kinship. The areas in the middle were more susceptible to recent developments.

  • Eze

    Interesting post. I tend to agree that we shouldn’t categorize relatively unrelated peoples like the Indian tribals and aboriginal Australians into one ‘(sub-)race’/’type’ etc when they split nearly 40-50 kya.

  • Kosmatka

    So they found the genetic link they were hoping to find, just where were hoping to find it. I understand the impulse not to let interpretations over-reach the data, but support for the linkage of these two populations now seems fairly compelling to me. Anatomists proposed it. Genetics could have refuted it, but instead found evidence that suggests the same link.

    Convergent evolution might confound anatomists, and a whole assortment of factors might skew genetic results–but the odds that both data sets indepently suggest the same wrong answer seem fairly unlikely to me. Occam’s razor now seems to side with the theory that the two populations are linked.

  • http://blogs.discovermagazine.com/gnxp Razib Khan

    Genetics could have refuted it, but instead found evidence that suggests the same link.

    most have refuted it. the only advantage this study has is that it samples a new population. i don’t think that this tribe is any different from other south indian tribes (the others did not cluster with australian aborigines even in this study), so i think the 15 markers cluster because of drift.

  • Kosmatka

    Ah, mia culpa. That little detail flew by me on the first read through.

  • http://www.kinshipstudies.org German Dziebel

    “the only advantage this study has is that it samples a new population. i don’t think that this tribe is any different from other south indian tribes”

    I can see your point, Razib. I checked Soliga kinship system (Morab, The Soliga of Biligiri Rangana Hills. Calcutta, 2010) and it does look like a typical Dravidian kinship system. Dravidian kinship is built around prescriptive bilateral cross-cousin marriage (mother’s brother’s daughter is simultaneously father’s sister’s daughter) because the same marriage rule is applied from one generation to the next. This practice can deplete intragroup genetic diversity in a long-haul and cause genetic drift. But this practice is typical for many Dravidian populations, and Soliga apparently even abandoned it at some point recently. Nevertheless, “Dravidian” kinship (quotation marks mean I use it now as a typological label that goes beyond Dravidians proper) just like “Kariera” kinship in Australia are considered by kinship studies experts to be basal types of human kinship systems.

  • pconroy

    It’s interesting that Papuans and Tasmanians seem to be the same color on the map!

    So if there had been an early dispersal towards Oceania, that populated the entire region, then a later migration that never reached the Papuans or Tasmanians – who remain as relics of the earlier population?

  • Laredo

    They claim to have discovered a connection between Australian Aborigines and southwest Asians! Aborigines share more ancestry with southwest Asians than with southeast Asians, a find which they claim substantiates another study that analysised an Aboriginal tribe’s ancestry and found it to be composed of over 56.4% “India” and 25.2% “Arab”. Aboriginals also have more African ancestry than east Asian ancestry. It just goes to show how you can not take these programs too seriously.

  • Pingback: The power of one (Nubian that is) | Gene Expression | Discover Magazine()

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com

ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »