Zack has been posting his data sources, as well as how he filtered and formatted them, all this week. I assume that the first wave of results will be online soon. As of yesterday, this is what he had (I know he got some more today):
- Punjab 7
- Bengal 1
- Bihar 1
- Tamil 5
- Karnataka 1
- Anglo-Indian 1
- Roma 1
- Iran 3
Whole swaths of north-central India are missing. I am hopeful that more people will join in after the first wave of results are put out there. But, from what I have discussed with Zack it looks plausible that the very first wave will have a richer set of results because of the necessity of preliminary steps. So there’s some benefit in getting early. It’s really ridiculous to have literally 1 sample representing the 300 million people of Uttar Pradesh and Bihar. That’s 25% of South Asians represented by one person. I’ve gotten a commitment from one friend who was born U.P. to give his data up once it comes in, but there have to be others out there. (the Bengali N should go up to 2 when I swap my parents in for me)
The public data sources have Gujaratis, Tamils, Pakistanis (Punjabis, Pathans, Sindhis), and some South Indian groups (Tamil and Telugu). This leaves a blank spot on the North Indian plain.
Here’s the brief for the project again.