search this blog

Thursday, December 29, 2016

Early Indo-European migrations map


Wikipedia has a new animated gif of early Indo-European migrations (available at various resolutions here). It's pretty good overall, but very speculative and potentially erroneous in parts. For instance, my understanding is that the Vedic Aryans did not emerge from BMAC per se, as the map suggests, but rather from a post-BMAC phenomenon heavily influenced by steppe pastoralists. Hi-res ancient DNA from BMAC and post-BMAC sites should be able to resolve this issue.


As far as I know, BMAC remains were being tested at Harvard earlier this year, but the year is almost out, and nothing has been published. So either David Reich and co. are keeping the results for a new paper on the Indo-European homeland question, or they couldn't get any usable data from the samples. Keep in mind that only 30-40% of the ancient remains that are tested at Harvard are successfully genotyped. I can imagine that the success rate for samples from arid locations, like former BMAC sites in Turkmenistan, is even lower.

Update 31/12/2016: Commentator Tapatuevik Kaarmkyno points us to an article from earlier this year at NIH Record featuring this quote from David Reich:"We’ve sequenced more than 1,000 samples in our own lab — there’s not enough time to publish". That's probably why the second half of 2016 was so agonizingly slow. Next year should be awesome.

See also...

Maybe first direct hints of Yamnaya-related gene flow into South Central Asia

204 comments:

«Oldest   ‹Older   201 – 204 of 204
huijbregts said...

@Matt
My last post was about the paradox of the Sangarius-Eren transformation.
This transformation relies on weird self-invented mathematics. Yet in many cases this transformation results in distances which appear more natural.
As a way out of this unsatisfactory situation I suggested that weighting by eigenvalue might have a side effect of weighting by other variables. As a plausible candidate I mentioned the unbalanced sampling density of the populations.
Stated differently: weighting by eigenvalue might be a proxy for weighting by sampling density.
The PCA algorithm searches for the dimensions which best explain the variance of the populations. Now if the sampling density of the populations is changed, you will necessarily get a different PCA; in the worst case the populations scores will even be projected on different dimensions as before.
I think the logic behind this is quite strong. Whether the practical effects are small or large is an empirical matter.
If I have correctly understood you, you state that the proxy effect of weighting by sampling density will disappear if you replace (the square root of) the eigenvalues by the min-max differences. Now the Achilles heel of the PCA is its sensitivity to extreme values, so using the min-max differences is statistically very unfavorable.
But I just don't understand the logic behind the argument.

Bradley Benz said...

The beginning source is not Yamnaya culture. It is Armenia. Update this and the puzzle will fit nicely. I am happy to expound if asked.

Davidski said...

The beginning source is not Yamnaya culture. It is Armenia.

Only possible if R1b-M269 is native to Armenia. But this looks extremely unlikely.

Unknown said...

@huijbregts @Davidski @FrankN @Matt @Alberto and @everyone.

First, thanks a lot for the blog, for nMonte and for ALL comments.
I'm no mathematician or anything. I read carefully the post http://eurogenes.blogspot.com.es/2016/12/early-indo-european-migrations-map.html
and I have a question. I did run a lot of nMonte over Golbal10, results are very sensitive because of the number of PC´s used, subset of data, etc. I don´t know, but belive if the pc´s are not scaled may be better weighted them. For example, if we configure a spanish like mix of portugues and french using pc2 and pc3, may be (invented data) results 50/50, but if we account pc1 too then may be 75/25 due the african mixture. A informatic friend of mine help me and add two lines at nMonte scrip wich plot -colMeans(matAdmix)- used for the aproximation. AFAIK, seems less sensitive to the random/sample choice of inputs (number of pc´s, etc) the weighted form.
Link the plot of “(colMeans(matAdmix) of Non-Weighted vs Weighted. Full/Restricted” of German and a the cluster of “Weighted vs Non-weighted Model”. In relation to the last, the weighted model seems very much coherent.
https://drive.google.com/file/d/0ByTmlcptkfxubS01cHl1b25Pak0/view?usp=sharing
https://drive.google.com/file/d/0ByTmlcptkfxuQ0M3U2x3STBVZ3c/view?usp=sharing
What do you think about it?
Thanks and regards.
Note: The weigths are from a link of anthogenica (i don´t remember), and are this: 38.855594,27.863177,7.971269,4.066185,3.469409,3.150401,2.873287,1.886145,1.832293,1.777605

«Oldest ‹Older   201 – 204 of 204   Newer› Newest»