1) The “South Asians” in the HGDP data set that’s been used for so long are rather on the inbred side, and relatively genetically distinct as far as South Asian populations go. It goes to illustrate the importance of finely calibrating geographic coverage, and the consequences of the “Permit Raj.”
2) Some of the Gujarati individuals in the HapMap also shake out as a moderately tight group (the square in the middle of the graphic above). Not too surprising, but rather striking. Another illustration of the importance of selecting representative and informative populations for any given region.
3) In the clustering spreadsheet Zack found that my parents aligned with a Brahmin from trans-Himalaya India and some Indians from Singapore, all notable for clear East Asian ancestry. He labelled the cluster “bit-east asian,” which is appropriately descriptive. One thing that seems to reoccur in these clustering algorithms is that South Asians with elevated East Asian ancestry are often thrown together into one pot, despite very diverse origins. That’s probably because the combination is not too common, and jumps out as rather distinctive. It goes to show the limitations of summarizing individual elements of genetic variation into one statistic or label. On a personal level the ultimate question in regarding my family’s background has to be explored in the future by partitioning up the genome appropriately and then focusing on the history of specific segments. For example, which Southeast Asian groups is this ancestry from? How much Munda do my parents have? Is the close relationship between my parents and various caste groups in South India (e.g., the Naidu and Reddys) due to Bengalis being a compound of these groups with an East Asian group?
Fun times if you have data, a little persistence, and time on your hands.