I've been playing around with the new qpAdm program and the Haak et al. dataset over the past few days and managed to come up with what I think are some very promising results. For instance, the Yamnaya genomes from the Samara Valley and surrounds fit rather well as 0.514 Samara hunter-gatherer + 0.486 Georgian (std. errors 0.032, chisq 3.890).
This is an interesting outcome, mainly because Georgian is a Kartvelian language, and linguistics data suggest that the early Indo-Europeans - presumably the Yamnaya nomads or their ancestors - were in close contact with Proto-Kartvelian speakers. Moreover, even though the Yamnaya males tested to date all belong to Y-chromosome haplogroup R1b, which they probably inherited from their hunter-gatherer ancestors, because the Samara forager also belonged to this haplogroup, some of their mtDNA lineages appear to be derived from the Caucasus and/or nearby areas of the Near East.
However, the main problem with this analysis is that it's attempting to model an ancient population as a mixture of a modern one. Indeed, my estimate is that present-day Georgians harbor around 20% of the so called Ancient North Eurasian (ANE) component, which probably arrived in the Caucasus from the Eurasian steppe (see here). If so, then the qpAdm run might be overestimating the non-steppe admixture in the Yamnaya genomes by at least 10%. Nevertheless, I'm quite happy with this result as I await ancient DNA from the Caucasus and Near East.
By the way, I also pretty much nailed the Corded Ware sample: 0.73 Yamnaya + 0.27 Esperstedt_MN (std. errors 0.060, chisq 2.621). Admittedly, an identical result for the same genomes was reported months ago at the ASHG 2014 conference (see here), but that's OK, because it means I'm on the right track.
qpAdm is easy to run, but the quality of its output heavily reliant on the outgroup or "right set" of populations picked by the user. As far as I can see, the following ten populations (a subset of the "magic set" of 15 from Haak et al.) produce the most robust outcomes when analyses are limited to West Eurasian groups.
BiakaWhy do they work so well? I really have no idea, but through simple trial and error I found that some of the others from the "magic set", in particular the Ami, produced much poorer results.
I'll probably end up posting a whole catalog of qpAdm output in the comments section below over the next couple of weeks. I'm open to suggestions about the models to test and how to improve my runs.
Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317
qpAdm tour of Iran
Yamnaya's exotic ancestry: The Kartvelian connection