Dienekes has a post up, The Bronze Age Indo-European invasion of Europe. The crux of his argument is as such:
But there is another component present in modern Europe, the West_Asian which is conspicuous in its absence in all the ancient samples so far. This component reaches its highest occurrence in the highlands of West Asia, from Anatolia and the Caucasus all the way to the Indian subcontinent. It is well represented in modern Europeans, reaching its minima in the Iberian peninsula….
Thanks to the public release of genetic data Dienekes has developed his theories in part out of his own analyses of said data. Though I’ve run fewer analyses, with smaller data sets, some of the same patterns jump out at me. In particular, there is a component which is modal in northern West Asia (e.g., the trans-Caucasian region) which seems to drop mysteriously between the French generally and French Basques, and the Basque vs. non-Basque Spanish samples. There are also similar, though not necessarily easy to map across the two regions, disjunctions in South Asia between geographically close Indian groups.
Ultimately model-based clustering algorithms and PCA is going to get us only so far. Remember that the clusters generated from these methods don’t give us reality as such, but particular patterns which map back to reality. You can’t read from cluster A to population X without a non-genetic interpretative frame. Nevertheless, I do think within the next few years we may solve the “mystery” of the demographic origin of Indo-European languages and culture through genetics.
First, we have to posit a hypothesis in the fashion which Dienekes proposes. That is, Indo-European languages began to spread rapidly ~5,000 years ago from a small core population. This rapidity leads to both cultural integrity, and some genetic signal, which spans Indo-Eurpean groups. There are two dimensions, time and space. As Dienekes notes ancient DNA data points will get thicker. If the West Asian component in European ancestry begins to show up >5,000 years B.P., that’s going to be highly suspicious. Second, our data set of extant populations is going to get thicker. You’ll be able to independently contrast relatively similar Indo-European vs. non-Indo-European populations. For example, Sinhala vs. Tamil and Swede vs. Finn. By and large the two tips of the two clades should exhibit genetic similarity, but if the thesis of Indo-European demographic expansion is correct then you’re going to see a subset of matches which correlate with language family, and not proximity. You can then construct a “synthetic” genome from these matches across independent pairs. Finally, you can compare that genome to present populations and ancient DNA samples.
Addendum: I assume that methods like looking at identity-by-descent tracts are going to be important, even though recombination will have broken apart the regions quite a bit.