Sunday, July 19, 2015
The real thing
A couple of years ago Moorjani et al. concluded that present-day Georgians of the Transcaucasus were the best available proxy for the ancient West Eurasian population that mixed into the South Asian gene pool.
This was a solid statistical fit. And you can see on the TreeMix graph below, featuring a Georgian and a Kalash, why it worked so well.
But it was also a big fat coincidence, because check out what happens when I add another migration edge to the same graph.
Thus, the Indo-Iranian and hence Indo-European speaking Kalash no longer looks very similar to the Kartvelian speaking Georgian. In fact, he appears to be most closely related to the supposedly Indo-European speaking Afanasievo and Yamnaya nomads of the Early Bronze Age Eurasian steppe. The rest of his ancestry is probably best described as South Central Asian, which is an unknown quantity to me at this stage, but probably in large part of indigenous South Asian origin (see here).
I'm only able to show this thanks to the ancient samples that are on the tree, for which, as far as I know, there aren't any useful substitutes among present-day populations. Obviously, Moorjani et al. didn't have this luxury, so they ended up with a model that was statistically sound, but didn't make much sense otherwise, especially in terms of linguistics.
My TreeMix model is easily reproducible with most of the other South Asian samples from the Human Origins, and it gels nicely with uniparental marker data too. For instance, here's a close up from a similar graph featuring a Pathan, with a few extra details.
Yep, not only do Pathans cluster among these ancients of the Eurasian steppe, but most of them also carry the same Y-chromosome haplogroup: R1a-Z93, which is derived from R1a-M417, and in all likelihood first expanded in a big way with the Proto-Indo-Iranians of the Trans-Ural steppe.
By the way, the Human Origins has four different sets of Gujarati samples from Houston, USA, marked A, B, C and D, and each one shows a different level of ancient steppe admixture as inferred with my test. GujaratiA score around 50% while GujaratiD only 40%. Does anyone know why these Gujaratis were grouped in such a way? Was it based on genetic structure or caste origin?
Full output from the analysis above is available in a zip file here. The reference samples and markers are listed here and here. The ancient samples are from Allentoft et al. 2015 and Haak et al. 2015.
The Poltavka outlier