Eurogenes Blog: Genetic maps featuring 67 ancient genomes and more than 3,000 present-day individuals

Saturday, January 13, 2018

Genetic maps featuring 67 ancient genomes and more than 3,000 present-day individuals

I've got some eye candy for you guys as we wait for 2018 to really get going. Below are three Principal Component Analyses (PCA) plots, or genetic maps, based on the ancient diploid dataset from Martiniano et al. 2017 (described in more detail here). Click on the images to download hi-res PDFs of each plot. The relevant datasheets are available here.

The important thing about these PCA is that none of the samples in the analyses are missing more than 1% of the ~188K markers used to compute the PCs, which means that I didn't have to resort to any type of projection to get things right. In other words, the relationships between the samples that you see on these plots are direct.

PCA are easy to read. The main thing to keep in mind is that the results are dependent on the samples in the analysis. For instance, note that the Indians (Gujaratis and Brahmins) cluster rather close to some Europeans on the West Eurasian plot, but much further from them on the Eurasian/American plot. Why? Because the addition of hundreds of East Eurasian individuals to the latter plot highlights the significant East Eurasian-related admixture in the Indians, and pulls them away from the Europeans, who generally have much less of this type of ancestry.

It's interesting, I think, that all of the ancients from burial sites from within the borders of present-day Europe (discussed in an earlier blog post here), cluster with present-day Europeans, or at least closest to us. See anything else interesting? Feel free to share it in the comments below.

If you're having trouble spotting certain individuals and/or populations, type the relevant individual or population ID in the PDF search box and click enter. The PDF will initially show you a box where the samples of interest are located; click on the box, and the PDF will zoom into the boxed area and highlight these samples, like this:

See also...

Who's your (proto) daddy Western Europeans?

135 comments:

Slumbery said...: It is interesting to see the effect of mixing far distanced populations on their PCA positions. At least I assume that is the reason behind the weird position of some Ecuadorian samples. Some of them are very close to some Bashkir, Uzgeg and Nogai. So mixing Amerindian with West Eurasian can result the same PCA position as mixing Siberian/East Asian with West Eurasian.
This also a reminder that very close PCA positions can be the results of vastly different recent ancestry.

One of my pet topics is of course Uralic groups. When I was in high school (so decades ago, before the time of modern population genetics) I learned the Chuvash are suspected to be possibly have substantial Uralic ancestry (based on details of their language). This is not decisive (partly because what I just said about recent ancestry and PCA), but their PCA position seems to support that. They are closest to Udmurt and notably they keep close in every PCA, even if East Asian and Siberian groups are removed from the equitation. The only Turkic group that come close to them are Volga Tatars, but those also have a lot of local ancestry.; January 13, 2018 at 10:50 PM
Unknown said...: Thanks David.; January 13, 2018 at 11:17 PM
Davidski said...: @Slumbery

This also a reminder that very close PCA positions can be the results of vastly different recent ancestry.

Yes, but different dimensions can sort that out easily, and this is why I posted the PCA datasheets.; January 13, 2018 at 11:30 PM
Slumbery said...: Davidski

Yes, I agree and I am also glad for your work. Of course you were not the target audience of that warning.; January 13, 2018 at 11:55 PM
Unknown said...: Also, why do Iranians seem to have a pull towards Arabs, when other plots indicate the contrary?; January 14, 2018 at 12:22 AM
Davidski said...: Some of these Iranians have minor Sub-Saharan ancestry, so some might also have Arabian ancestry. Not sure which part of Iran they were sampled at though?; January 14, 2018 at 12:31 AM
Slumbery said...: @Shahanshah of Persia

Are they pulled towards Arabs compared to what reference point? There are a few outliers (obviously the Arab invasion had _some_ effects, at least on the South-West), but on the main mass I do not see a specifically Arab pull on the PCA.; January 14, 2018 at 12:43 AM
Onur Dincer said...: Interesting that on the worldwide PCA plot the West Eurasia-East Eurasia axis is represented by eigenvector 1 while the West Eurasia-Africa axis is represented by eigenvector 2. Normally it is the other way round on worldwide PCA and similar worldwide analyses.; January 14, 2018 at 1:42 AM
Eren said...: @Onur: I guess that is due to the predominance of Eurasian samples in the data set.; January 14, 2018 at 4:10 AM
Shaikorth said...: @Eren
That's surely the case, we know the PCA method is affected by sample sizes more than genealogical relatedness of said samples. McVean demonstrated this:

http://journals.plos.org/plosgenetics/article/figure?id=10.1371/journal.pgen.1000686.g003; January 14, 2018 at 4:41 AM
Eren said...: @Shaikorth
Yeah, I came to realize that this was the real cause of the problem I had with the Global 10 PCA early last year. As you know I tried to fix this with weighting the PCs.. :D

The solution is an equal representation of samples, but that's probably not as straightforward as it sounds.; January 14, 2018 at 5:46 AM
Simon_W said...: Interesting how in this West Eurasian PCA South Italians are even more West Asian-shifted than the Sicilians. I suspect these South Italians are from the tip of Calabria. And note how the only samples inbetween South Italians and Cypriots are Sephardic and Moroccan Jews, while Ashkenazi Jews are similarly West Asian as South Italians and Sicilians. A few Greeks are also in this cluster, even some from Macedonia, but the bulk of the Greeks is more northern. Also noteworthy how much Italian_Bergamo overlaps with Iberians.; January 14, 2018 at 6:56 AM
Simon_W said...: And Ireland_EBA and one of the two Roman_Britain individuals cluster with Slavic people rather than with modern northwest Europeans. Obviously because they still had a little more steppe ancestry than the modern ones.

And one of the two Hungary_BA individuals, no doubt BR2, is quite close to the French. As was already the case in the Global 10 PCA. In the late Bronze Age and presumably in the Iron Age there must have been a belt of French-like people from (presumably) France over southern Germany to western Hungary.; January 14, 2018 at 7:08 AM
Onur Dincer said...: @Eren

I guess that is due to the predominance of Eurasian samples in the data set.

That is what I thought too. But I wanted to hear David's take on this as his worldwide PCA plots are also usually dominated by the West Eurasia-Africa axis at eigenvector 1.; January 14, 2018 at 9:36 AM
Onur Dincer said...: @Simon_W

Interesting how in this West Eurasian PCA South Italians are even more West Asian-shifted than the Sicilians. I suspect these South Italians are from the tip of Calabria. And note how the only samples inbetween South Italians and Cypriots are Sephardic and Moroccan Jews, while Ashkenazi Jews are similarly West Asian as South Italians and Sicilians. A few Greeks are also in this cluster, even some from Macedonia, but the bulk of the Greeks is more northern. Also noteworthy how much Italian_Bergamo overlaps with Iberians.

Those strongly West Asian-leaning or West Asian-like Greek individuals almost certainly have recent ancestry from Anatolia, the nearby Aegean islands, Cyprus, Crimea, the Armenian Highland and/or the Levant (in other words, from the Greek communities with origins outside the Balkans, the nearby islands or southern Italy).; January 14, 2018 at 9:48 AM
Kristiina said...: @Slumbery

If you are interested, you should also check this new paper: Between Lake Baikal and the Baltic Sea: genomic history of the gateway to Europe (https://bmcgenet.biomedcentral.com/articles/10.1186/s12863-017-0578-3)

If you take a look at their Fig. 3 on ancient and recent IBD sharing, you see that Chuvash share ancient ancestry in particular with Komi, Udmurt, Khanty and Tatar. Recent sharing only shows some recent IBD with Tatars.

http://media.springernature.com/full/springer-static/image/art%3A10.1186%2Fs12863-017-0578-3/MediaObjects/12863_2017_578_Fig3_HTML.gif

These conclusions are also interesting:

It is noteworthy that the genomes of closest linguistic relatives of Bashkir, Volga Tatar, bears very little traces of East Asian or Central Siberian ancestry. Volga Tatar are a mix between Bulgar who carried a large Finno-Ugric component, Pecheneg, Kuman, Khazar, local Finno-Ugric tribes, and even Alan. Therefore, Volga Tatars are predominantly European ethnicity with a tiny contribution of East-Asian component. As most Tatar’ IBD is shared with various Turkic and Uralic populations from Volga-Ural region, an amalgamation of various cultures is evident. When the original Finno-Ugric speaking people were conquered by Turkic tribes, both Tatar and Chuvash are likely to have experience language replacement, while retaining their genetic core. Most likely, these events took place sometime around VIII century AD, after the relocation of Bulgar tribes to Volga and Kama river basins, and expansion of Turkic people.

We speculate that Bashkir, Tatar, Chuvash and Finno-Ugric speakers from Volga basin has a common Turkic component, which could have been acquired as a result of Turkic expansion to Volga-Urals region. However, the original Finno-Ugric substrate was not homogeneous: Tatar and Chuvash genomes carry mainly “Finno-Permic” component, while Bashkir carry the “Magyar” one. The fraction of the Turkic component in Bashkir is, undoubtedly, quite significant, and larger than that in Tatar and Chuvash. This component reflects the South Siberian influence on Bashkir, which makes them related to Altai, Kyrgyz, Tuvinian, and Kazakh people.

As a standalone approach, an analysis of shared IBD is not sufficient to support the Finno-Ugric hypothesis of Bashkir origin as a sole source, while pointing at temporal separation of genetic components in Bashkir. Hence, we demonstrated that Bashkir genepool is a multifaceted, multicomponent system, lacking the main “core”; it is an amalgamation of Turkic, Ugric, Finnish and Indo-European contributions. In this mosaic, it is impossible to identify the leading element. Therefore, Bashkir are the most genetically diverse ethnic group of the Volga-Urals region.; January 14, 2018 at 10:03 AM
Slumbery said...: @Kriistina

Thank you. Davidski wrote about this article on his other blog and I even commented on the high IBD sharing between Khanti and Bashkir, but I have to admit I yet to read the original article.

Using the datasheets here I plotted the relevant populations up to PC7-8 and the Mari-Udmurt-Chuvash is a very persistent cluster that stays together in every dimensions even when everything around them moves around. And the Chuvash do not show any Turkic pull. If anything they are a bit pulled toward the more core European populations (Finnish groups and Slavs) compared to the Mari and Udmurt.

As for the Khanti-Mansi vs. Bashkir, they do not really cluster. It seems to me that Khanti-Mansi have a considerable recent (later than the Ugric common times) Siberian ancestry which they picked up as they moved to Siberia and the North from the South Ural homeland and assimilated some local HG groups. This extra Siberian ancestry pulls them away from the Bashkirs on PCA despite the Baskirs drawing ancestry from the South Ural Ugric population.
Also this PCA data is on agreement with that article about the Baskirs having actual Turkic ancestry. They form a cline towards the Altai samples in multiple dimensions. It is impossible to tell from PCA however whether their Uralic side is Khanti-Mansi related, because the Khanti-Mansi are too strongly effected by their Extra Siberian. In most dimensions they cluster with Ket against other Uralic groups. This reminds me of somebody who claimed on a Hungarian forum that modern Khanti and Mansi are mostly relatively recently assimilated Siberian groups on the periphery.; January 14, 2018 at 10:39 AM
Matt said...: Really nice stuff, and there's a lot of high level structure here. Would it also be possible to get eigenvalues for the PCs in each datasheet?
Also, if its possible, I would quite like to see if Fst matrix has changed at all for populations under Martiniano's ancient dataset...

@Simon, yep re:Ireland EBA, in the West Eurasian PCA actually it looks like all the ancient Bronze Age Europeans are distinct from Northwest Europeans on the basic East vs West PC2 - https://imgur.com/a/CMVrs and overlap more with Northeast Europeans

At the same time, all the ancient BA for Northwest and West-Central Europe overlap pretty clearly with Western Europeans on the higher order PCs that are contribute further to European structure - https://i.imgur.com/goSAw20.png / https://i.imgur.com/6ZYiR4l.png / https://i.imgur.com/7xIhg0w.png

It will be interesting if it's ever possible to put the big samples of British Bell Beakers on this and really the whole British transect going up to the Iron Age - it seems likely to me that there has to have been a subtle effect of isolation by distance genetic flow probably already by Bronze-Iron to explain why modern Northwest Europeans are ever so slightly different.

On European structure in the West Eurasia plot, I think it's interesting that in terms of general structure (not just specific to a few populations), you've got:

- PC4 that seems to be a slightly "purer" reflection of Anatolian Neolithic ancestry as distinct from the Levant_N ancestry in the Near East (or maybe Arabic specific!), and that seems to have some clear West-East substructure.

- PC5 seems to reflect a distinction between Yamnaya related ancestry and other Volga-Ural ancestry (with Neolithics falling around 0?) and overlap between NW Europeans and non-Russian Northern Baltic-Slavic peoples.

- PC6 seems to be dominated by a West-East European split that doesn't have much to do with Anatolian/Yamnaya related ancestry, with East-Central European samples having their own distinct position (and West Europeans and Volga-Ural, modern and ancient, sitting together).; January 14, 2018 at 10:54 AM
Onur Dincer said...: @Slumbery

As for the Khanti-Mansi vs. Bashkir, they do not really cluster. It seems to me that Khanti-Mansi have a considerable recent (later than the Ugric common times) Siberian ancestry which they picked up as they moved to Siberia and the North from the South Ural homeland and assimilated some local HG groups. This extra Siberian ancestry pulls them away from the Bashkirs on PCA despite the Baskirs drawing ancestry from the South Ural Ugric population.
Also this PCA data is on agreement with that article about the Baskirs having actual Turkic ancestry. They form a cline towards the Altai samples in multiple dimensions. It is impossible to tell from PCA however whether their Uralic side is Khanti-Mansi related, because the Khanti-Mansi are too strongly effected by their Extra Siberian. In most dimensions they cluster with Ket against other Uralic groups. This reminds me of somebody who claimed on a Hungarian forum that modern Khanti and Mansi are mostly relatively recently assimilated Siberian groups on the periphery.

I always thought that the southern Ural area was home to Indo-European tribes in ancient times and more northern Ural areas where Khanty and Mansi currently live was home to Uralic peoples from time immemorial. Proto-Uralics seem to be a hunter-gatherer population from a northern region around the Ural Mountains (maybe also western Siberia). Proto-Magyars seem to have headed south towards the southern Ural area and come into contact with Indo-European steppe tribes and recently arrived Turkic tribes living there and acquired the steppe culture from them.; January 14, 2018 at 11:33 AM
Onur Dincer said...: The lack of any Neolithic Anatolian admixture in Khanty and Mansi in contrast to Sintashta and Andronovo peoples and peoples with Sintashta or Andronovo ancestry seems to confirm that they did not come from the southern Ural area.; January 14, 2018 at 11:47 AM
Slumbery said...: @Onur Dincer

Not very likely, but more importantly you seem to conflate multiple time layers. Regardless of where the Uralic as a whole formed the Ugric branch probably comes from the Cherkaskul/Mezhovskaja archaeological cultures and those were in the Western Sibera - South Ural region. Not exactly on the Southernmost tip of the Urals, but their name-sites and main territories are way South from the current Khanti-Mansi territory.
And exactly because the Ugric branch shows Indoeuropean-Iranic contact it is likely that they formed in the southern contact region and expanded North later. (That is not to say that the North was not Uralic "since time immemorial", but that is beside the point.)
A migration that placed the ancestral Hungarians more into the Steppe is assumed, but the source region was nowhere near as far North as the NW corner of Siberia.

"The lack of any Neolithic Anatolian admixture in Khanty and Mansi in contrast to Sintashta and Andronovo..."
Again, there is a huge time span. Sintashta is ancient, it was long gone before the Ugric branch formed. A lot happened everywhere. Also if the Khanti-Mansi are mostly later assimilated HG-s that actually had to dilute any EEF. And then there are sampling density issues.; January 14, 2018 at 12:13 PM
Kristiina said...: Slumbery, I agree with you!

Onur, your idea about hunter-gatherers conquering Indo-Europeans does not make any sense.

How do you fit the findings of the recent paper on Sargat yDNA and mtDNA in your theory? There was 2x R1a1 and 5 x N1c1 without any true Siberian mtDNA. Moreover, Sargat samples lack N1b which accounts for c. 28% in Khanty; and c. 63% in Mansi, of which c. 37% is Eastern N1b-VL67.

Sargat samples do not contain Siberian mtDNA. However, 30% of modern Khanty mtDNA and 41.3% of modern Mansi mtDNA is Siberian/Altaian. It is very easy to explain the rise of Siberian ancestry in Ob-Ugrics with N1b-VL67 and mtDNA such as D4e4, D4j2, D4l2, D5a3, C4a1, C4b, C5b, A, G2a, F1c.

https://anthrogenica.com/showthread.php?97-Genetic-Genealogy-and-Ancient-DNA-in-the-News/page170

https://www.researchgate.net/publication/321071660_Kinship_Analysis_of_Human_Remains_from_the_Sargat_Mounds_Baraba_Forest-Steppe_Western_Siberia; January 14, 2018 at 12:37 PM
Onur Dincer said...: @Slumbery

Not very likely, but more importantly you seem to conflate multiple time layers. Regardless of where the Uralic as a whole formed the Ugric branch probably comes from the Cherkaskul/Mezhovskaja archaeological cultures and those were in the Western Sibera - South Ural region. Not exactly on the Southernmost tip of the Urals, but their name-sites and main territories are way South from the current Khanti-Mansi territory.
And exactly because the Ugric branch shows Indoeuropean-Iranic contact it is likely that they formed in the southern contact region and expanded North later. (That is not to say that the North was not Uralic "since time immemorial", but that is beside the point.)
A migration that placed the ancestral Hungarians more into the Steppe is assumed, but the source region was nowhere near as far North as the NW corner of Siberia.

Those are fair points but:

"The lack of any Neolithic Anatolian admixture in Khanty and Mansi in contrast to Sintashta and Andronovo..."
Again, there is a huge time span. Sintashta is ancient, it was long gone before the Ugric branch formed. A lot happened everywhere. Also if the Khanti-Mansi are mostly later assimilated HG-s that actually had to dilute any EEF. And then there are sampling density issues.

Even hugely Turkic admixed steppe populations such as Altaians, Kazakhs and Kyrgyz have EEF admixture, obviously due to their Andronovo-related ancestry (including Scythian-Saka ancestry), but Khanty and Mansi conspicuously lack it, which needs an explanation. That is why I look for their origins in more northern regions than the southern Ural contact zone.; January 14, 2018 at 12:38 PM
Rob said...: Onur

“I always thought that the southern Ural area was home to Indo-European tribes in ancient times and””

As in the Kazakh steppe/ Botai people?; January 14, 2018 at 12:41 PM
Anthro Survey said...: @Simon

Lowland Campanians score like this, too. I've seen their results in other PCAs and they are seemingly more West Asian shifted than Sicilians and definitely more so than any continental European groups. Such a shift is probably a combo of a Roman-age Samaritan-like influx from Syria in addition to an existing Anatolia_BA layer there.

I say *seemingly* because perhaps West Sicilians actually have more post-Neo influence. We simply can't know as we don't have good Roman-age, Fatimid-era DNA from NA or proxies we can be confident in for such populations.; January 14, 2018 at 1:02 PM
Kristiina said...: @Onur ”Even hugely Turkic admixed steppe populations such as Altaians, Kazakhs and Kyrgyz have EEF admixture, obviously due to their Andronovo-related ancestry (including Scythian-Saka ancestry), but Khanty and Mansi conspicuously lack it, which needs an explanation.”

Khanty and Mansi are very much Western Siberian natives and most of their ancestors probably spoke an extinct Siberian language. They carry a high amount of ANE and EHG as you can see in Fig8: http://media.springernature.com/full/springer-static/image/art%3A10.1186%2Fs12863-017-0578-3/MediaObjects/12863_2017_578_Fig8_HTML.gif in which f3 values to estimate (a) Eastern European Hunter-Gatherer, b Neolithic Farmer, c Caucasus hunter-gatherer, and d) Mal’ta (Ancient North Eurasian) ancestry in modern humans. According to the same graph, Turkic speaking Kyrgyz also lack EEF and CHG similarly as Khanty and Mansi, and however, circa 63% of modern Kyrgyz carry R1a1. Moreover, I presume that there was not any EEF in proto-Uralics. Andronovoans were not Uralics and therefore they are not relevant.; January 14, 2018 at 1:09 PM
Anthro Survey said...: Davidski,

I do see something interesting, as a matter of fact.

See that unusual Bosnian sample occupying the space between North Caucasus and Balkan clusters? Any thoughts on it?

Had this been a sample from the early 1900s, it wouldn't be TOO surprising since we could suspect some Circassian pasha as an immediate relative.; January 14, 2018 at 1:25 PM
Onur Dincer said...: @Kristiina

It is well known that Khanty and Mansi lived in more western areas (including west of the Ural Mountains) than they do today before the Russian demographic expansion in Siberia during the last couple of centuries, but I have not seen any reliable evidence of their migration towards north, at least during the historical times.

I do not understand your point by showing those Sargat results, do we have any genomewide autosomal results from them? I see 7 West Eurasian and 1 East Eurasian mtDNA haplogroups among those published Sargat ancient DNA results, but then again, what is your point by showing them?

It is commonly accepted that Proto-Uralics were a hunter-gatherer population based on the reconstructed Proto-Uralic vocabulary and archaeological evidence. Proto-Ugrics were probably reindeer herders as modern Ugric peoples of the Ural area traditionally were, but since modern Ugrics of the Ural area do not show any clear evidence of post-EMBA steppe ancestry, Ugrics of the Ural area probably had very little interaction with steppe peoples after their divergence from other Uralic peoples.; January 14, 2018 at 1:25 PM
Onur Dincer said...: @Rob

I am not sure about the Botai people, as they are an early people and we have no ancient DNA results from them.; January 14, 2018 at 1:33 PM
Onur Dincer said...: @Kristiina

I mentioned Andronovans in relation to steppe peoples and to show their contrast with non-steppe peoples such as Khanty and Mansi.

Kyrgyz obviously have some Andronovan-like ancestry, however small, but Khanty and Mansi completely lack it. The high R1a in Kyrgyz is due to drift or founder effect.; January 14, 2018 at 1:43 PM
Onur Dincer said...: @Anthro Survey

You can see the ADMIXTURE result of that unusual Bosnian sample here:

https://www.researchgate.net/profile/Kristiina_Tambets/publication/264985653/figure/fig2/AS:296007708495873@1447585145141/Figure-2-ADMIXTURE-analysis-of-autosomal-SNPs-of-the-Western-Balkan-region-in-a-global.png

It is from this study:

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105090; January 14, 2018 at 2:16 PM
Kristiina said...: @Onur ”I have not seen any reliable evidence of their migration towards north, at least during the historical times.”

Encyclopeadia Britannica explains that ”Together [Khanty and Mansi] numbered some 30,000 in the late 20th century. They are descended from people from the south Ural steppe who moved into this region about the middle of the 1st millennium ad.” (https://www.britannica.com/topic/Khanty#ref272005)

According to Wikipedia ”In the centuries of the second millennium BC, the territories between Kama and Irtysh rivers were the home of a Proto-Uralic speaking population who had contacts with Proto-Indo-European speakers from the South.[5] The inhabitants of these areas were of Europid stock,[5] although the Khanty are predominantly Uraloid. This woodland population is the ancestor of the modern-day Ugrian inhabitants of Trans-Uralia”.
[5] Wiget, Andrew; Balalaeva, Olga (2011). Khanty, People of the Taiga: Surviving the 20th Century. University of Alaska Press. p. 3.

”but since modern Ugrics of the Ural area do not show any clear evidence of post-EMBA steppe ancestry, Ugrics of the Ural area probably had very little interaction with steppe peoples after their divergence from other Uralic peoples.”

Yes, and so what? I took a look at the admixture graph of ”Extensive farming in Estonia started through a sex-biased migration from the Steppe”, p. 12 (https://www.biorxiv.org/content/biorxiv/suppl/2017/03/02/112714.DC1/112714-1.pdf)
Mansi lack EEF but also Yamnaya Samara and Kalmykia lack EEF, which means that the ”IE” ancestry in Mansi is comparable to the "IE" ancestry in Indians, i.e. it is of Yamnaya and not of Andronovo or Sintashta type. According to that admixture graph, Mansi may carry a significant amount of EBA steppe ancestry.

”I see 7 West Eurasian and 1 East Eurasian mtDNA haplogroups among those published Sargat ancient DNA results, but then again, what is your point by showing them?”

According to Ian Logan site C4a2c and C4a2c1 are typical for Pamir. It is not an ancient Siberian haplogroup.

My question is what language do you think that Sargat people spoke?; January 14, 2018 at 2:29 PM
Matt said...: Looking at the World 20 dimensions here, through neighbour joining and just the West Eurasian populations, actually looks like there's enough structure in it to pick out much more of the structure in West Eurasia than shows up in the Global10: https://imgur.com/a/vnS7q

Should be good for nMonte modeling (scaling might help but even unscaled models seem to work fairly well in nMonte).

Neighbour joining under the West Eurasian tree is qualitatively similar: https://imgur.com/a/XgAFE. Main difference being it captures more of the private drift in some small populations with founder effects or restricted growth - Druze, Ashkenazi, Sardinian, Basque, Roma, etc. Less importance if you're not trying to be 100% sure a sample doesn't share recent ancestry with to those groups or understand their unique recent evolution. Scaling is probably more important on this one as there are so many more dimensions where a single small group is extreme (and so its more of a problem to treat all dimensions as being as large as the first dimension).

So I'm pretty impressed.; January 14, 2018 at 2:34 PM
Rob said...: @ Kristiina

Mansi:
Mansi
"Itelmen" 55.75
"Yamnaya_Samara" 34.05
"Blatterhole_MN" 10.2
d% = 5

Does that result make sense to you ?; January 14, 2018 at 2:35 PM
Onur Dincer said...: @Kristiina

Encyclopeadia Britannica explains that ”Together [Khanty and Mansi] numbered some 30,000 in the late 20th century. They are descended from people from the south Ural steppe who moved into this region about the middle of the 1st millennium ad.” (https://www.britannica.com/topic/Khanty#ref272005)

According to Wikipedia ”In the centuries of the second millennium BC, the territories between Kama and Irtysh rivers were the home of a Proto-Uralic speaking population who had contacts with Proto-Indo-European speakers from the South.[5] The inhabitants of these areas were of Europid stock,[5] although the Khanty are predominantly Uraloid. This woodland population is the ancestor of the modern-day Ugrian inhabitants of Trans-Uralia”.
[5] Wiget, Andrew; Balalaeva, Olga (2011). Khanty, People of the Taiga: Surviving the 20th Century. University of Alaska Press. p. 3.

Could be. But that is prehistory for that region and there is a lot of room for speculation there. Speculations get stronger when they are backed by genetics.

”but since modern Ugrics of the Ural area do not show any clear evidence of post-EMBA steppe ancestry, Ugrics of the Ural area probably had very little interaction with steppe peoples after their divergence from other Uralic peoples.”

Yes, and so what? I took a look at the admixture graph of ”Extensive farming in Estonia started through a sex-biased migration from the Steppe”, p. 12 (https://www.biorxiv.org/content/biorxiv/suppl/2017/03/02/112714.DC1/112714-1.pdf)
Mansi lack EEF but also Yamnaya Samara and Kalmykia lack EEF, which means that the ”IE” ancestry in Mansi is comparable to the "IE" ancestry in Indians, i.e. it is of Yamnaya and not of Andronovo or Sintashta type. According to that admixture graph, Mansi may carry a significant amount of EBA steppe ancestry.

I was talking about post-EMBA steppe ancestry, I did not say or imply anything about the lack of EMBA steppe ancestry among Ural Ugrics or Proto-Ugrics. Proto-Uralics (including the Proto-Uralic ancestors of Proto-Ugrics) were obviously interacting with early steppe IE peoples as is clear from the reconstructed Proto-Uralic vocabulary and archaeology. My point is that Ugrics formed and evolved away from the IE contact zone in the Ural area.

According to Ian Logan site C4a2c and C4a2c1 are typical for Pamir. It is not an ancient Siberian haplogroup.

So what?

My question is what language do you think that Sargat people spoke?

Almost certainly a non-Ugric language.; January 14, 2018 at 2:56 PM
Anthro Survey said...: @Onur Dincer

They didn't try to offer an explanation for the result, it looks like. :-(

By the way, I've also considered Roma ancestry in the individual. It's just that Roma presence is notably rare in territories west of the former First Bulgarian Empire and Bosniaks tend to have strong opinions when it comes to them. Now, the Romanian outliers in the ADMIXTURE run are clearly Roma, though.

I'll take a more in-depth look later. So far, though, I've busted out nMonte and ran the distances. The sample is GSM1424650, it looks like, and is significantly closer to all 3 Roma samples in the dataset than other Bosniaks.

How do you explain it?; January 14, 2018 at 2:58 PM
Onur Dincer said...: @Rob

Mansi:
Mansi
"Itelmen" 55.75
"Yamnaya_Samara" 34.05
"Blatterhole_MN" 10.2
d% = 5

Does that result make sense to you ?

Nein. The distance is too high anyway.; January 14, 2018 at 3:14 PM
Rob said...: @ Onur

I think you’re confusing decimal points
Pretty sure that’s on target. Anything smaller is overfitted; January 14, 2018 at 3:17 PM
Anthro Survey said...: What the heck? I did a preliminary run to test the waters at 20D(lol!).

[1] "distance%=4.1777 / distance=0.041777"

Bosnian_1_GSM1424650

GS000014325 34.45
Bosnian_10_GSM1424651 27.25
Bosnian_11_GSM1424652 13.20
Bosnian_5_GSM1424661 11.75
Bosnian_3_GSM1424659 8.70
Bosnian_7_GSM1424663 1.75
Bosnian_16_GSM1424657 1.70

GS....14325 is one of the Roma samples.
A distance of 0.04 would be terrible on Global10, but it actually indicates a good fit in this case because it's pretty similar to the average inter-sample distance within the Bosniak cluster.; January 14, 2018 at 3:17 PM
Onur Dincer said...: @Anthro Survey

Not sure how to explain it in the absence of proper South Asian or Roma populations in that ADMIXTURE run. I would be interested to see the results of your nMonte analyses of that sample though.; January 14, 2018 at 3:28 PM
Onur Dincer said...: @Rob

0.5% would be a close distance, not 5%.; January 14, 2018 at 3:33 PM
Rob said...: Ah the mistake was mine. Yes it was 0.5% indeed.
So the restul makes sense to me, EMBA steppe with minimal EEF + some sort of paleo-Siberian for Mansi; January 14, 2018 at 3:37 PM
Onur Dincer said...: @Rob

Ah the mistake was mine. Yes it was 0.5% indeed.

Then it is a good fit, but:

So the restul makes sense to me, EMBA steppe with minimal EEF + some sort of paleo-Siberian for Mansi

Why represent the East Eurasian ancestry of Mansi with a pretty distant population such as Itelmen when there is the far more representative Nganasan population?; January 14, 2018 at 3:47 PM
Rob said...: @ Onur

"Why represent the East Eurasian ancestry of Mansi with a pretty distant population such as Itelmen when there is the far more representative Nganasan population?"

According to who ?
Anyhow, switching to Nganasan made little difference - they both represent the some LNBA radiation from Siberia.

Mansi
"Nganasan" 50.5
"Yamnaya_Samara" 40.25
"Blatterhole_MN" 9.25; January 14, 2018 at 4:02 PM
Matt said...: Few nMonte runs based on the raw Ancient 67 World PCA dimensions with European targets, and ancient samples and outgroups as data:

No distance penalty: https://pastebin.com/zwSsqaCv
Distance penalty: https://pastebin.com/8FF65cJv

Populations in Europe get the regional ancestors that make sense (e.g. modern England is mostly RomanBritain+NordicIA/AngloSaxon and the rest is largely some composite of IberianLNBA populations and Steppe ancestry, which all makes sense, and so on for other pops). Some low levels of extra admixture coming in from world populations, but that may just be because the unscaled PCA is not representing the full population distances.

(Note in the Finnish example the new distance penalty feature has the side effect that real distant admixtures, like the Finnish's Siberian ancestry, seem to be removed.); January 14, 2018 at 4:37 PM
Davidski said...: @Onur, Erin & Shaikorth

Yes, the World PCA isn't showing Eurasian vs Sub-Saharan differentiation in Eigenvector 1 because the dataset has many more Eurasians than Africans.

However, it's still useful to have the Sub-Saharan Africans on the plot, because they help to flesh out the extra substructures in West Eurasia caused by recent Sub-Saharan admixture there.

@Anthro Survey

That outlier Bosnian sample has recent West Asian admixture, but I haven't tried to pinpoint its precise source. You can probably figure that out though, by looking at the datasheet to see which sample it is, and then analyzing his/her recent ancestry with this cM matrix.

http://eurogenes.blogspot.com/2017/09/ancient-ibdcm-matrix-analysis-offer.html; January 14, 2018 at 4:44 PM
Onur Dincer said...: @Rob

Interesting nMonte result.

David, can you test using formal methods whether Khanty and Mansi have actual EEF ancestry and its levels?

@Matt

Excellent work with nMonte!; January 14, 2018 at 5:58 PM
Chad said...: Yamnaya and Mansi have EEF/Anatolian. That is a certainty.; January 14, 2018 at 6:05 PM
Onur Dincer said...: @Chad

Yamnaya and Mansi have EEF/Anatolian. That is a certainty.

Yes, Yamnaya seem to have some EEF admixture according to formal analyses, not sure about Khanty and Mansi though. But since Khanty and Mansi seem to have some Yamnaya-related ancestry, they should be expected to have some EEF ancestry too.; January 14, 2018 at 6:44 PM
Davidski said...: @Matt

I've added a folder with eigenvalues to the datasheets zip file.

https://drive.google.com/file/d/1PYpUj_DHf-lPMJnZGVgzq07ZFcPomOwB/view?usp=sharing; January 14, 2018 at 7:58 PM
Kristiina said...: Thanks Rob! That makes a lot of sense to me! Yamnaya Samara percentage, 34.05, is indeed very high.

@Onur ”It is commonly accepted that Proto-Uralics were a hunter-gatherer population based on the reconstructed Proto-Uralic vocabulary and archaeological evidence.”

Jaakko Häkkinen has reconstructed two words for metals *wäśka and *äsa and some agricultural words, *oxči (sheep), *woxji (butter), *šeŋti (wheat/barley) and *puśnV (flour), to Proto-Uralic (https://tuhat.halvi.helsinki.fi/portal/fi/persons/jaakko-hakkinen%286e21403c-6ff1-4ba4-a0db-d868bf394c97%29/publications.html).

This means that these words show regular sound correspondences in the daughter languages and existed already in Proto-Uralic. There are also words such as sata, ”hundred” that are reconstructed to Proto-Uralic. This shows that Proto-Uralic speakers represented the modern BA culture. Moreover, the Mansi ethnonym 'Mansi' is linked with the word for man in Old Indian ”manuṣya" and Avestan "manuš".

From the archaeological point of view you should read Asko Parpola’s article ”The problem of Samoyed origins in the light of archaeology: On the formation and dispersal of East Uralic (Proto-Ugro-Samoyed)” (http://www.sgr.fi/sust/sust264/sust264_parpola.pdf) to get more perspective.

Parpola has written the article without any genetic data, and therefore, we have to fit his views with the results of ancient DNA.

He writes about Proto-Hungarian that ”the local Gorokhovo people began the practice of mobile pastoral herding and then became part of the multicomponent pastoralist Sargat culture (c. 500 BCE to 300 CE), which in a broader sense comprised all cultural groups between the Tobol and Irtysh rivers, succeeding here the Sargary culture. The Sargat intercommunity was dominated by steppe nomads belonging to the Iranian-speaking Saka confederation, who in the summer migrated northwards to the forest steppe. A leading Hungarian archaeologist happily supports the following correlation with Proto-Hungarian: “Most scholars of western Siberian archaeology agree that the Sargatka culture can be plausibly identified with the proto-Hungarians”.

This fits perfectly well with the Sargat yDNA which is R1a1 and N1c.

As for Proto-Khanty, he writes that ”Proto-Khanty may have been spoken in the Late Bronze Age and Early Iron Age cultures related to the Gamayunskoe and Itkul’ cultures that extended up to the Ob: the Nosilovo , Baitovo , Late Irmen’ , and Krasnoozero cultures (c. 90 0 – 500 BCE). Some of these were in contact with the Akhmylovo of the Mid-Volga. All these cultures of the forest steppe were later absorbed into the Sargat culture discussed below (Parzinger 2006: 545–564, 679–681).”

As for Proto-Mansi, he writes that ”The Mezhovka culture was succeeded by the genetically related Gamayunskoe culture (c. 1000–700 BCE) (Parzinger 2006: 446; 542–545). From Gamayunskoe descended the Itkul’ culture (c. 700–200 BCE), which was distributed along the eastern slope of the Ural Mountains (Parzinger 2006: 552–556). Known for its walled forts, it constituted the principal Trans-Uralian centre of metallurgy in the Iron Age, and was in contact with both the Anan’ino and Akhmylovo cul- tures (the metallurgical centres of the Mid-Volga and Kama-Belaya region) and the neighbouring Gorokhovo culture.”

From the genetic point of view, it it significant that Sargat men were R1a1 and N1c-Z1936 (possibly Ugric-specific L1034). N1c-Z1936 has been considered the main vector of the expansion of Uralic languages as it has the widest distribution of all N yDNA in the Uralic groups. Therefore, this new paper supports this view, and it is important that this paper also suggests that N1c-Z1936 came from the south and was not autosomally Siberian as there is no Siberian mtDNA.

By the way, Mezhovska samples are R1b1a2-PF6494 (RISE524) and R1a1a1b-Z649 (RISE525), and c. 7% of modern Mansi carry R1b and 5% R1a1 and c. 14% Ugric-specific N1c-L1034.; January 14, 2018 at 10:55 PM
Matt said...: @Davidski, thanks for that.

Few basic graphics using the scaled World datasheet over all 20 dimensions:

Neighbour Joining Tree: https://imgur.com/a/PJJz7
Euclidean Distance Comparisons :

Yamnaya_Kalmykia vs Sweden_MN: https://i.imgur.com/xmNDNzx.png. (Note that the Northeast Europeans and even Lezgins and Tajiks come out slightly closer to Yamnaya_Kalmykia than NW Europeans do (as would make sense), despite the NJ tree above placing NW Europeans closer to the phylogeny to Yamnaya. This is because of a bridging effect where NW Europeans are related to Iron and Bronze Age Scandinavian, British Isles and Irish, who are in turn most related to Sintashta and Andronovo, who are in turn related to Yamnaya. Once the Baltic Bronze Age and other samples are available at similar quality, the bridge will likely shift to NE Europe).

Further distances here: https://imgur.com/a/PJJz7. It seems like there is enough structure in these dimensions to find slight disequilibria where a) NW Europeans (English, Scottish, Norwegian) seem as roughly related to Loschbour/Bichon/La Brana as NE Europe (Lithuanians, Latvians) and less related to Hungary_HG / Motala_HG than the Balts/Polish, b) West Europeans relatively more related to Iberia_EN, East to Hungary_CA (at a very subtle level).

Will have a go with nMonte a bit later.; January 15, 2018 at 4:15 AM
huijbregts said...: @Matt
The removal of Finnish/Siberian admixtures in nMonte3 may be unwanted, but it is not a side effect.
In machine learning it is a standard practice to reduce the effects of overfitting by penalizing some feature of the model, preferably in combination with crossvalidation.
Indeed, the Sangarius weighting we have discussed in the past, is a simple scheme to penalize the admixtures of high K dimensions.
The main feature of nMonte3 is the penalizing of reference populations with a great distance to the target population. I think this is useful for modern populations with much variance, like Eurogenes North-European or LukaszM K36.
Penalizing distant populations is risky when you are targetting ancient populations (although I did see a quite acceptable model of Ballynahatty).
It is up to the knowledgeable human user to judge whether a specific variant of penalizing is OK.
If you are interested in Finnish/Siberian admixtures, you should switch the penalty feature off.
A few other remarks:
- If the distance to the closest population is exceptionally small, distance penalizing will result in a nearly 100% admixture for this population.
- The other feature of nMonte3 is that it runs individual samples and aggregates afterwards. This is better than using population averages. You can do this yourself manually, but why.
- I do not agree that the PCA should be scaled. You don't calculate the distance from Paris to Moscow, while scaling the distances by the country you are crossing.
- I too have wondered what might be in PC11-PC20.; January 15, 2018 at 4:35 AM
Matt said...: @All, nMonte3 outputs on the scaled Ancient 67 World 20 dimensions: https://pastebin.com/z65skAqP

@huijbregts: Yes sure, perhaps we don't wish to talk about it as a side effect. It was merely something that I hadn't considered as an issue of using distance penalization until I actually used the feature and thought worth mentioning to others in this comment thread.

huijbregts: I do not agree that the PCA should be scaled. You don't calculate the distance from Paris to Moscow, while scaling the distances by the country you are crossing.

Let's say you had a series of cities across the world, and a distance matrix giving distances between them. You wanted to transform that matrix into an abstract set of dimensions describing those distances, so you use Principal Coordinates Analysis to do so. We'd expect the two dimensions output to be equivalent to latitude and longitude.

Now, let's say those cities were much more spatially compressed on latitude than longitude. The algorithm should should represent this, and preserve the true distance, by scaling the latitude dimension to a smaller eigenvalue. If it did not eigenvalue scale the dimensions, then when deriving distances back from the output dimensions, you would find distances were relatively inflated for cities which vary in latitude, compared to the ground truth.

In an extreme case, if your cities varied 1 mile in latitude and 100 miles on longitude, you would find that, if the longitude and latitude dimensions were scaled to be the same magnitude, you would predict that very close pairs were considered as distant as very distant pairs. (This could be a problem if you had some algorithm that was built to try and use distance minimization to represent the position of a city as a linear combination of other cities in your data!)

Does this make sense as to why I would find this undesirable? I am not trying to further scale an already scaled output, I am adding eigenvalue scaling into an unscaled output in which, for example, dimension 20 is the exact same size as dimension 1.

If I were given the distance between Moscow and Paris in two abstract dimensions which were scaled such that Moscow=1,1 and Paris=0,0, then of course I would want to scale the dimensions before working out the real distance between them.

PCA software can output dimensions with or without eigenvalue scaling. Whatever Davidski has used here has output the dimensions without that scaling as the default. All the dimensions are the same magnitude.

The other feature of nMonte3 is that it runs individual samples and aggregates afterwards. This is better than using population averages. You can do this yourself manually, but why.

I can think of some reasons, when using populations dispersed in dimensions (for transparency of what the admixture actually represents), when using overlapping populations, and when using populations with no equal sample size. But I am happy to be using the post-workflow aggregation, which is what I've used in the models here.; January 15, 2018 at 5:31 AM
huijbregts said...: @Matt
I doubt that you are right on scaling. Eigenvalues are heavily dependent on sampling density, so eigenvalue scaling would penalize high frequency populations.; January 15, 2018 at 6:03 AM
Matt said...: Of course, I equally strongly doubt you're right. The eigenvector scaling more closely recapitulates formal population differentiation statistics of one kind of or another (allele sharing, fst, f3, etc.). The unscaled matrix does not.

In my experience, I'd also say you're exactly wrong about which populations are "penalized"; eigenvalue scaling "penalizes" (reduces distance) for low frequency populations that form a distinct dimension of variation more. For example, scaling the Kalash dimension that they alone score highly on such that it is *not* exactly the same size as the African-Eurasian or West-East Eurasian dimension is obviously going to decrease their relative distance from other Eurasian peoples compared to the unscaled scenario in which high dimensions are treated of equal size to the lower. It's not the most high frequency populations which are "penalized" it is those who are less genetically differentiated in reality.; January 15, 2018 at 6:18 AM
Anonymous said...: Dear Davidski,

Interesting PCA results. I have one curious question. Please bear with me as I am pretty much a layman in genetics.

Shouldn't Amerindians be more Western Eurasian-shifted in the first PCA (PCA world) than their positions considering that Amerindians are genetically around 35-45% ANE (please correct me if I am wrong)? Furthermore, I heard ANE is a lot closer to Western Eurasians than to Eastern Eurasians. (Again, correct me if I am wrong) If that's the case, shouldn't Amerindians be located in similar positions to populations like Kirghiz, Khakass, Altaian, some Kazakhs, etc. than their current positions in terms of Western-shifted ancestry in the first PCA/PCA 67 World?

Please kindly answer my question in layman terms regarding this as I am a pretty much beginner in population genetics.

Thank you very much.; January 15, 2018 at 7:19 AM
huijbregts said...: @Matt
I am not a mathematician and I thought neither are you. So lets be careful about our assumptions.
My understending is that PCA software can present its output in one of two ways.
In the first way (my preference) the columns of the score matrix have a variance which is equal to the eigenvalue.
The second way (your preference) is called 'eigenvector scaling'. Here the columns of the score matrix are divided by the respective eigenvalues, which sets the variance to 1.
Both of these representations are mathematically correct, but in further processing you have to treat them differently.
Now keep in mind that we are handling genealogical data, which are ultimately derived from numbers of haplotype differences and which are mixed with some degree of noise.
Now I have two assertions:
1. When the data are eigenvalue scaled, the highest dimensions (which are the most noisy) are more inflated then the lower dimensions. This inflates the overall noise in the matrix.
2. To be useful, the Euclidean distance should approximate a measure of the number of haplotype differences. In the unscaled way, it does so. But the Euclidean distance of scaled data is an approximation of the scaled number of differences, which is completely useless.
Based on these arguments, I think that I am on solid ground in preferring the unscaled data.; January 15, 2018 at 8:17 AM
Ryan said...: I would love to see this annotated with what each of the clines and vertices represents.; January 15, 2018 at 9:11 AM
Matt said...: Some more nMonte fits using Ancient 67 World 20 scaled:

European population averages using all pre-Bronze Age and outgroups: https://pastebin.com/BMbZLxhY

"Simple" European population averages using all Steppe_EMBA, Anatolia_N, CHG, WHG, SHG and outgroups: https://pastebin.com/bAfFBU0u

Graphics for the simple fits: https://imgur.com/a/Gtryg; January 15, 2018 at 9:35 AM
MomOfZoha said...: @Davidski:
"Some of these Iranians have minor Sub-Saharan ancestry, so some might also have Arabian ancestry. Not sure which part of Iran they were sampled at though?"

Abadani people could be relatively closer to Arabs, with or without elevated Sub-Saharan ancestry (given that Arabs too are not monolithic), just from geography.

Going east from the port Bandar-i Abbas towards Bandar Beheshti one might see elevated Sub-Saharan ancestry too, due to the historic movements -- willing or unwilling -- of Bantu peoples ancestral to the Siddi of Pakistan and India today.

Then again, within the very same country Iran, one may also find descendants of Georgians and Armenians along the Caucasus border, Turkmen along the Turkmenistan border, and Tajiks along the Afghan border too. Not to mention every combination thereof, whether due to admixtures or common-ancestor origination (chicken or egg)...

Hence, it is not surprising that the commonality concerning Iranians throughout all three graphs is that they are very spread out. Also, Iranians seem to sprinkle the vast space between the Pakistan-India-Afghanistan-Tajikistan people east of Iran and the people west of Iran including Caucasus-Turkey-Yemen (though not a cluster). Eh, geography...; January 15, 2018 at 10:30 AM
Onur Dincer said...: @Kristiina

Thank you for the links. I have now read Parpola's article. Not sure which of Hakkinen's articles you meant, so I have not read his articles on that link, most of which are in Finnish anyway. I cannot say I am convinced by your references from Hakkinen about the existence of an agricultural economy among Proto-Uralics. Proto-Uralics were probably Neolithic hunter-gatherers (including fishers).

https://encyclopedia2.thefreedictionary.com/Volosovo+Culture

I have already mentioned the existence of words of IE origin in Proto-Uralic.

I could not find anything contradicting my arguments in Parpola's article. Some Uralic peoples lived in the IE contact zone in the forest steppe areas, I have never disputed this, nor do I dispute Khanty and Mansi peoples having post-Proto-Ugric admixture from more East Eurasian-derived peoples (Uralic or not).

Also, this is what Parpola says in regard to Sargat: "The Sargat intercommunity was dominated by steppe nomads belonging to the
Iranian-speaking Saka confederation, who in the summer migrated northwards
to the forest steppe."

Still, you jump to quick conclusions about ancient autosomal results based on a few ancient haplogroup results. The Sargat people might well have had some Siberian type East Eurasian admixture (probably derived from the Uralic peoples they absorbed or incorporated).

By the way, I had sent you an email a while ago, but you have not replied to it.; January 15, 2018 at 10:41 AM
Samuel Andrews said...: Spoiler Alert, Northern Bell Beaker's farmer ancestor was Funnel Beaker. No way was it Globular Amphora. No way. That's my opinion based on mtDNA. My blog will be up in a few days.; January 15, 2018 at 2:12 PM
Samuel Andrews said...: Some H2a is from the Steppe, some from EEF. H2a2 is from EEF and was probably popular in FUnnel beaker. H2a1 is from the Steppe.; January 15, 2018 at 2:14 PM
Anthro Survey said...: @Onur Dincer @Davidski

After looking into it, it's pretty clear the Bosnian outlier does have Roma ancestry, after all, not West Asian. (Btw, Onur, that ADMIXTURE graph does have proper South Asian samples and presence of the modal component in the Bosnian is what made me suspect Roma ancestry).

In fact, this sample gets the highest cM sharing with all 3 Roma samples on that datasheet(thanks Dave)----by miles. So much so that the individual must have a Roma parent. Unsurprisingly enough, the Roma sample he shares most cM with is also the one nMonte selected to model him.

The plot thickens, though. The second highest sharing sample(but not quite as high) is a Peloponesian Greek individual to whom I didn't pay much attention earlier . After a very cursory glance I made a hasty assumption he has ancestry from the 1920s' population exchange.
He is GreecePelop6. Once again, nMonte selected the highest-sharing Roma for him.
Rumors about Georgios Karaiskakis, one of the Greek Revolution's heroes, having Roma ancestry might not be so far-fetched, it seems.

I then examined the two Bulgarian outliers on the PCA: Bulgarian12H and Bulgarian10H. Sure enough, they take a decisive 3rd place on that matrix. Their sharing is significantly higher than other samples, but surely more modest than the Bosnian. Adding the 3 Roma into the input didn't result in a drastic improvement in the fit.

Some of Monte's highlights are shown below. Note again that a distance of 0.05 and below is a pretty decent fit in 20D as it's close to the distance between two samples in an average cluster.

Bosnian with Roma:
[1] "distance%=4.1777 / distance=0.041777"
Bosnian_1_GSM1424650
GS000014325 34.45
Bosnian_10_GSM1424651 27.25
Bosnian_11_GSM1424652 13.20
Bosnian_5_GSM1424661 11.75
Bosnian_3_GSM1424659 8.70

Bosnian without Roma, but with various North Caucasus samples:
[1] "distance%=10.2317 / distance=0.102317"
Bosnian_1_GSM1424650
Bosnian_6_GSM1424662 63.9
Bosnian_11_GSM1424652 12.2
Bosnian_4_GSM1424660 12.2
HGDP01403 11.8

Greek with Roma inputs(GS14325):
[1] "distance%=2.8508 / distance=0.028508"
GreecePelop6
GreecePelop8 34.70
GreecePelop3 34.10
GS000014325 18.65
GreecePelop5 11.10
GreecePelop7 1.45

Greek without Roma:
[1] "distance%=6.2836 / distance=0.062836"
GreecePelop6
GreecePelop8 77.6
GreecePelop4 17.4
GreecePelop3 3.9
GreecePelop5 1.1

Bulgarian with and without Roma:
[1] "distance%=2.8877 / distance=0.028877"
Bulgarian12H
Bulgaria33 67.25
GS000014325 13.40
Bulgarian17H 7.75
Bulgarian16H 7.40
Bulgarian18H 4.20
--------------------
[1] "distance%=4.882 / distance=0.04882"
Bulgarian12H
Bulgaria33 68.6
Bulgarian6H 15.4
Bulgarian7H 8.0
Bulgarian5H 4.2; January 15, 2018 at 2:16 PM
Rob said...: @ Sam

"Spoiler Alert, Northern Bell Beaker's farmer ancestor was Funnel Beaker. No way was it Globular Amphora. No way. That's my opinion based on mtDNA. My blog will be up in a few days"

That would be a great find if true, and is somewhat expected viz. archaeology. How sure are you though, given that there are only 6 GAC mtDNAs ?; January 15, 2018 at 2:41 PM
Davidski said...: @Qagan

The plot positions of the individuals and populations, and the resulting clusters and clines, that you're seeing on these PCA reflect pairwise genetic relationships between all of the samples, and all of the things that this entails.

So they aren't just the result of certain levels of ancient ancestral components, but also ancient and recent demographic events, like, for example, rapid expansions of small founder populations, and resulting genetic drift.

Such relatively recent genetic drift can be so extreme that it can dominate certain dimensions of the PCA, and completely mask more ancient relationships, especially when some populations are oversampled relative to others.

This is essentially why many Amerindians are being pushed so far to the left in Eigenvector 1 on the Eurasia & Americas PCA, despite their ancient West Eurasian ancestry.

However, looking at more dimensions than just two or three, which is all that we can plot visually, by using them to model ancestry proportions, is likely to reveal the western shift in Amerindians compared to East Eurasians. That's because we'd be using dimensions in which the Amerindian-specific genetic drift has very little or no impact.

But I've done PCA in the past in which Amerindians appear significantly West Eurasian in the first two dimensions, and that;s because I used only one Amerindian sample in each run. See here...

http://eurogenes.blogspot.com/2016/09/the-eurasians-idiots-guide.html; January 15, 2018 at 2:43 PM
Matt said...: @huijbregts: In the first way (my preference) the columns of the score matrix have a variance which is equal to the eigenvalue.
The second way (your preference) is called 'eigenvector scaling'. Here the columns of the score matrix are divided by the respective eigenvalues, which sets the variance to 1.

To be clear, the scaling I am using is to multiply the "columns of the score matrix" (which in the raw data each account for an equal amount of the total variance) by the square root of the eigenvalue. This is the same operation as the eigenvalue scale option does in PAST3 when set on PCoA (scales each dimension from for an equal amount of the variance by multiplying by the square root of the eigenvalue).

The procedure you describe as "your preference" (dividing each column by the eigenvalue) is not what I'm doing. I'm not sure where you've got this idea from because it's not what I've described ITT or elsewhere (and I'm further not sure why anyone would do it since it would make the lower dimensions account for a systematically smaller amount of the variance in a way that directly distorts the ground truth).; January 15, 2018 at 3:27 PM
Samuel Andrews said...: @Rob,
"That would be a great find if true, and is somewhat expected viz. archaeology. How sure are you though, given that there are only 6 GAC mtDNAs ?"

I'm pretty sure. For two reasons. First, Irish share multiple recent links with Scandinavians including mHGs that have already been found in Funnel Beaker remains. Second, I'm confident Poles & Russians' main farmer ancestor is Globular Amphora and several young EEF-derived mHGs in eastern Europe are not found in Northwestern Europe.

But I'm open to being wrong. Modern mtDNA has been miss leading before.; January 15, 2018 at 3:33 PM
Matt said...: @ huijbregts: I mean, if this helps -

https://folk.uio.no/ohammer/past/multivar.html

This is the procedure I'm following A: "Principal Coordinates ... The "Eigenvalue scaling" option scales each axis using the square root of the eigenvalue (recommended)."

I'm not following B: Principal components analysis ... If the "Eigenval scale" is ticked, the data points will be scaled by 1/sqrt(dk), and the biplot eigenvectors by sqrt(dk) - this is the correlation biplot of Legendre & Legendre (1998).. (This is the procedure that puts the data in the state that the datasheet is in, with each dimension accounting for an equal share of variance.); January 15, 2018 at 3:52 PM
Onur Dincer said...: @Anthro Survey

No, that ADMIXTURE graph does not include any proper South Asian population, populations such as the Baloch, Brahui, Pashtuns/Pathans and Kalash are not proper South Asians, they are South Central Asians. By "proper South Asian" I mean populations with medium to high ASI levels such as the Punjabi, Gujarati, Bengali, Tamils and Paniya, no such population exists in that ADMIXTURE analysis, as a result their "South Asian" component has little ASI influence and should be treated as South Central Asian. Non-Iranian Near Eastern and Caucasus populations have no or almost no proper South Asian ancestry, so the component that is modal among the South Central Asian populations is obviously not a proper South Asian component. You should keep these in mind in your future analyses.; January 15, 2018 at 5:05 PM
Kristiina said...: @Onur
I have not received your email, so please send it again.

You are free to keep your opinions, but the difference between us is that I have adduced a lot of evidence in support of my views and you have adduced none.

In any case, I believe in plurality of cultures and languages which gives room to many ethnic identities that have interacted, merged and disappeared in prehistory.

Häkkinen's article is in Finnish: Kantauralin ajoitus ja paikannus: perustelut puntarissa.; January 15, 2018 at 10:06 PM
Anthro Survey said...: @Onur Dincer

At first I was not entirely sure what you meant since many do not make this distinction and it does tend to vary. Your point is valid and I make such a distinction myself(referring to everyone east and inclusive of Punjabis & Sindhis), actually. Nevertheless, it doesn't take away from my suspicion---strengthened by cM sharing data and Monte---being fully justified.

In this case, it's a composite component containing alleles related to Iran_N, ASI and steppe-related ancestries, proportions fluctuating somewhat depending on population(e.g.it would be essentially ASI-free in the context of Armenians or Iranians).

Little ASI, yes, but NOT trivial by any means.
For one, we know the 4 populations of interest contain ASI from other analyses, and it's the only component where ASI would reside at that K in the Makrani and Brahui(if you look at the expanded ADMIXTURE results in S1). Note also how the "yellow" ceases to persist after K=5 in Brahui & Makrani and markedly reduces in the Pathans and Burusho.

It is almost a certainty that had a low-caste North Indian population been included, we'd see overlap with these 4 populations at low Ks as they do in virtually all other ADMXITURE runs---mainly due to a shared stream of SA-specific Iran_N-like ancestry(but also ASI).

Now, had recent West Asian ancestry been the culprit, as opposed to SA-related, chances of seeing such a big piece in the Bosnian would have been slim-to-none. Instead, we'd expect to get higher-than-normal pink and perhaps 1/6-1/3 as much of the green.
Taken together with those two Romanians in the run behaving similarly and Romania(as well as Balkans at large) harbouring sizeable Roma populations, it was hard not to suspect Roma ancestry.

And after closer inspection summarized in my last comment, it is hard to have serious doubts about the ancestry in question being Roma.; January 15, 2018 at 10:13 PM
Aram said...: Samuel Andrews

H2a2 is from EEF?
Based on modern distribution (yes it looks European) or there was an ancient DNA with H2a2 that I missed?; January 16, 2018 at 1:24 AM
Kristiina said...: H2a2 is CRS which means that HVR1 does not help much and, in particular, many older ancient mDNA papers are based on HVR1. This means that there are many old samples that are potentially H2a2 but not necessarily.

For example, a sample from Neolithic Mongolia, Bayankhongor, was identified as H2a2a, as well as a sample from LBA Mongolia, Övörkhangai, AT232. However, it is possible that this identification is based only on HVR1.

Similarly, samples from Donau Eneolithic Smyadovo Bulgaria and Dniepr Eneolithic Vinogradnoye Ukraine are potentially H2a2 (https://publications.ub.uni-mainz.de/theses/volltexte/2015/3975/pdf/3975.pdf).; January 16, 2018 at 2:12 AM
huijbregts said...: @Matt
1. Of course the data have to be scaled by the root of the eigenvalues. My blooper, your point.
2. Multivariate data often have variables with different units, as length, mass, time etc. In this case you will have to normalize. This is why in Past3 normalizing is standard.
But in raw genetical data all the variables are of the same datatype (=haplotype).
3. We are interested in different problems. You are interested in chasing new eigenvectors. For this purpose it is not a problem to scale the scores by the root of eigenvalues. Indeed this will increase the visibility of smaller eigenvectors. The only problem is that it also magnifies the noise; the smaller eigenvalues may even entirely be composed of overfitted noise.
4. All this does not imply that scaling is just an innocent reorganization of the data. It is a profound restructuring of the data; it drops all the information in the eigenvalues, making it impossible to transform the scaled data back into into the originals (unless you reintroduce the eigenvalues). It also decreases the signal-to-noise ratio.
5. After scaling, calculations on multiple variables have lost meaning. What I am specifically disapproving is calculating distances after having rescaled the scores of the components.
The distance of Paris to Moscow is the sum of the distances in the countries one is crossing. Calculating the sum of normalized distances would be a weird misunderstanding, this would inflate the distance through the small country Luxembourg and shrink the distance through the huge country like Russia. The same applies to higher dimensional Euclidean distances.
I am not the only one saying so. I found an old paper from 2005
http://www.pbarrett.net/techpapers/euclid.pdf
"A further problem is that raw Euclidean distance is sensitive to the scaling of each constituent variable." (p.6).; January 16, 2018 at 5:29 AM
Onur Dincer said...: @Kristiina

I have not received your email, so please send it again.

I have now re-sent my email to you.

You are free to keep your opinions, but the difference between us is that I have adduced a lot of evidence in support of my views and you have adduced none.

What you provided in support of your view on the presence of agricultural vocabulary in Proto-Uralic was weak evidence at best (at least based on the part you translated to English). I did not feel need to provide evidence for the non-existence of agricultural vocabulary (other than some IE loanwords) in Proto-Uralic because there is abundant source material about that and you yourself should already have read some of them as someone who is interested in the subject. Nevertheless, I will provide some evidence as per your request.

https://books.google.com.tr/books?id=yJwkDQAAQBAJ&pg=PT574&lpg=PT574&dq=proto-uralic+agricultural+vocabulary&source=bl&ots=SzVQnLj6ki&sig=nrm3u1aupvISvxT8A-sOPHsRNgo&hl=tr&sa=X&ved=0ahUKEwiGvpfYwdzYAhXMFiwKHdBrD_EQ6AEIMzAB#v=onepage&q=proto-uralic%20agricultural%20vocabulary&f=false

https://books.google.com.tr/books?id=Ubb3DQAAQBAJ&pg=PA48&lpg=PA48&dq=proto-uralic+agricultural+vocabulary&source=bl&ots=U8pl6yCFBQ&sig=h5tYDSIzdFJWJsLsFTEtqW-Cl0g&hl=tr&sa=X&ved=0ahUKEwiGvpfYwdzYAhXMFiwKHdBrD_EQ6AEIQTAD#v=onepage&q=proto-uralic%20agricultural%20vocabulary&f=false

https://edisciplinas.usp.br/pluginfile.php/4104228/mod_resource/content/2/ANTHONY%2C%20D.%20Language%20and%20place.pdf

"Another thirty-six words were borrowed from differentiated IndoEuropean daughter tongues into early forms of Uralic prior to the emergence of differentiated Indie and Iranian—before 1700-1500 BCE at the latest. These later words included such terms as bread, dough, beer, to winnow, and piglet, which might have been borrowed when the speakers of Uralic languages began to adopt agriculture from neighboring IndoEuropean-speaking farmers and herders."

In any case, I believe in plurality of cultures and languages which gives room to many ethnic identities that have interacted, merged and disappeared in prehistory.

I agree, but when we go back enough in time we can find the cultures and lifestyles of the peoples speaking proto-languages using archaeology, historical linguistics and ancient DNA data, so we can pinpoint the specific ways of life of the proto-language speakers. This does not take away from the variety of cultures found among the speakers of the daughter languages. You Finns have been an agricultural people for thousands of years irrespective of the hunter-gatherer lifestyle of your Proto-Uralic or your Proto-Finno-Ugric ancestors.

Häkkinen's article is in Finnish: Kantauralin ajoitus ja paikannus: perustelut puntarissa.

Thanks for pointing.; January 16, 2018 at 5:57 AM
Onur Dincer said...: @Anthro Survey

I did not mean to say that your assessment of the results of those Balkan outliers were incorrect. All I said was that their results are hard to interpret based on that ADMIXTURE graph alone. That ADMIXTURE analysis does not have a proper South Asian component because no proper South Asian population is included in it. A good methodology would be to include at least one population from southern India. That is what is done in Behar et al. 2010 and as a result the South Asian component only appears in Iranians at higher than noise levels among the Near Eastern and Caucasus populations included in that study:

http://2.bp.blogspot.com/_Ish7688voT0/TA_8VX3jGkI/AAAAAAAACcM/HVkOLdPm94g/s1600/admixture-global.jpg; January 16, 2018 at 6:30 AM
Samuel Andrews said...: @Aram,

No ancient mtDNA for H2a2 other than one from Bell Beaker. Yeah, I am basing my claim on modern distribution. I've found quite a few mHGs, including H2a2a1 & H2a2b, which date around 6,000 years which are mostly found in Northwestern Europe. So, I suspect, on modern mtDNA, they orignated in FUnnel Beaker farmers.

@Kristinia,

Some studies make the mistake of labeling all samples without any difference with rCRS in HVR1 as H2a2a. Real H2a2a is defined by mutations in the CR region. My H2a2a(s) are all mitogenome so I know they have H2a2a.; January 16, 2018 at 7:56 AM
Anonymous said...: The plot positions of the individuals and populations, and the resulting clusters and clines, that you're seeing on these PCA reflect pairwise genetic relationships between all of the samples, and all of the things that this entails.

So they aren't just the result of certain levels of ancient ancestral components, but also ancient and recent demographic events, like, for example, rapid expansions of small founder populations, and resulting genetic drift.

Such relatively recent genetic drift can be so extreme that it can dominate certain dimensions of the PCA, and completely mask more ancient relationships, especially when some populations are oversampled relative to others.

This is essentially why many Amerindians are being pushed so far to the left in Eigenvector 1 on the Eurasia & Americas PCA, despite their ancient West Eurasian ancestry.

However, looking at more dimensions than just two or three, which is all that we can plot visually, by using them to model ancestry proportions, is likely to reveal the western shift in Amerindians compared to East Eurasians. That's because we'd be using dimensions in which the Amerindian-specific genetic drift has very little or no impact.

But I've done PCA in the past in which Amerindians appear significantly West Eurasian in the first two dimensions, and that;s because I used only one Amerindian sample in each run. See here...

http://eurogenes.blogspot.com/2016/09/the-eurasians-idiots-guide.html

@Davidski,

Thank you very much for your detailed explanations to my question. I just have a look at the PCA in the link you show me and indeed the Amerindian (Clovis) sample seems to be a mix between ENA and Western Eurasians.

I have several more questions if you don't mind me asking?

Do you have any more PCAs showing the Western-shiftness of Amerindians?

Also can you give estimates of percentage numbers of how much ENA/Eastern Eurasian and Western Eurasian are Amerindians genetically on average? Are Amerindians like Karitiana as Western-shifted autosomally as certain Central Asians and Siberians like Kazakh, Kyrgyz, Khakass?

Thank you again for your kind replies and regards,; January 16, 2018 at 9:01 AM
Rob said...: I think Kristiina is correct to point out that recent trends place Uralic expansion to a more recent period than the Mesolithic rake model previously in current. Eg Haakinen, or this article I saw on Academia https://www.academia.edu/34581649/Vectors_of_language_spread_at_the_central_steppe_periphery_Finno-Ugric_as_catalyst_language (but cannot quote it). Also a computation approach: https://www.ncbi.nlm.nih.gov/pubmed/23675756

About agricultural terminology - the forest steppe had poorly developed agriculture even as late as Iron Age, whether IE or not. So Im not sure what it adds .; January 16, 2018 at 1:00 PM
Davidski said...: @Qagan

Off the top of my head...

http://eurogenes.blogspot.com/2016/10/a-fresh-look-at-global-genetic-diversity.html

http://eurogenes.blogspot.com/2018/01/a-genome-from-first-founding-population.html

That's probably about it at this blog. I don't really focus on the Americas much.; January 16, 2018 at 1:08 PM
Onur Dincer said...: @Rob

I think Kristiina is correct to point out that recent trends place Uralic expansion to a more recent period than the Mesolithic rake model previously in current.

I too claim that Uralic expanded post-Mesolithic. I never found the Mesolithic expansion theory of Uralic realistic.; January 16, 2018 at 1:15 PM
Matt said...: huijbregts: For this purpose it is not a problem to scale the scores by the root of eigenvalues. Indeed this will increase the visibility of smaller eigenvectors. The only problem is that it also magnifies the noise; the smaller eigenvalues may even entirely be composed of overfitted noise.

Compared to default with this datasheet, where each dimension (column) is of equal magnitude, multiplying each dimension (column) by the root of the eigenvalue will decrease the visibility of the smaller eigenvectors.

If we have dimensions (1,2,3) with eigenvalues (25,16,4) then after applying eigenvalue scaling the size of dimension 1 will 5x pre-scaling, 2 will be 4x prescaling, etc. The lower dimensions will account for less of the overall distance (be less visible).

The distance of Paris to Moscow is the sum of the distances in the countries one is crossing. Calculating the sum of normalized distances would be a weird misunderstanding, this would inflate the distance through the small country Luxembourg and shrink the distance through the huge country like Russia.

Yes, I agree this would not make sense; this is why I am taking the datasheet Davidski has provided, in which each dimension is normalized, and introducing the eigenvalues. This is what I have called eigenvalue scaling (and it is what PAST3 calls eigenvalue scaling in the specification within the Principal Coordinates Analysis Function).

I am emphatically *not* normalizing out the eigenvalues of each dimension. That is the opposite of what I am doing.; January 16, 2018 at 2:13 PM
Matt said...: If anyone wants to try and use some datasheets with the data in which I have used the eigenvalue scaling procedure to reintroduce the eigenvalue information (PAST3 format):

West Eurasian PCA - https://pastebin.com/9Hs0545n
West Eurasian PCA Averages - https://pastebin.com/HfkDGe4m
World PCA - https://justpaste.it/1fsja
World PCA Averages - https://pastebin.com/BTqdiKTL

I've colourized the populations on the datasheets, see here for example graphics: https://imgur.com/a/JXxZW

Poss one thing of interest to @Davidski and others in the above set of graphics, on what seems to be the case for total distance on the West Eurasia sheet after scaling (reintroducing of eigenvalue information) which I've been trying today (after trying the World one previous); I've noticed that this seems to reintroduce overall distances to ancient populations which *eerily* match haplotype chunk sharing patterns from Martiniano et al. 2017 that this diploid data is from.

For instance, across the 20 dimensions (including all the eigenvalue information), the Afanasievo and Yamnaya_Kalmykia individuals actually seem to end up slightly closer to some North Caucasus and Tajiks than they are to present day Northern Europeans... This mirrors the finding of Martiniano et al that Lezgins had the highest (or joint highest) chunk donation with Yamnaya, but it is in contrast to general findings with outgroup f3 statistics on lower coverage pseduo-haploid data.

Obviously those f3 are telling us something real that Northern Europeans coalesce with the Steppe_EMBA on a very deep level in their ancestry (more so than NC populations!). However considering the above distance seems like this may potentially offer strong vindication of Kurd's argument that high coverage diploid data will capture patterns of relatedness which are less deep in the phylogeny, and more structured by geography and potentially by actual patterns of descent. (Of course this should be validated with other comparison statistics over the same data).

This could mean lots of interesting things in terms of the potential for future work to pick apart different streams and sources within the steppe wave - immediately to my mind just on the European scene, potentially to pick apart some of these questions people have posed as to whether ancestry in the Corded Ware and Bell Beaker populations came from distinguishable sources or one and the same, and actually move to solve that question. Also the question of how much the early Steppe_EMBA *was* a single rapidly expanding population and so our hypotheses on culturally how the fusion of EHG with Caucasus agro-pastoralists happened.

(Note as well this distance finding is not true for the Andronovo and Sintashta, or Mezhovskaya or Karasuk individuals).

(Similarly you have the total distance across these dimensions being minimal for Hungary_BA with Russia and Ireland_EBA with Britain, despite the overall picture of ancestry tending to suggest that on very a deep levels the pattern may be reversed).; January 16, 2018 at 3:49 PM
Unknown said...: @Davidski Thanks for answering my previous inquiry.

Could you explain what your Tail Diffs entail? Say, if a Tail Diff is 12% is it a reliable model? What about 50%? I am just curious, thanks.; January 16, 2018 at 4:30 PM
Davidski said...: In theory, the higher the tail the more robust the model. If it's below 0.05 or so, then it's a fail.; January 16, 2018 at 4:58 PM
Matt said...: Plotting distances from the 20 dimension eigenvalue scaled World Ancient 67 PCA against 20 dimension West Eurasia equivalent: https://imgur.com/a/FDH8u

Samples which tend to be above trend on distance in West Eurasia seem to be either:

a) the same groups with fairly heavy drift that is not captured in World PCA (except for Kalash who have their own dimension in World PCA), i.e. Basque, Ashkenazi, Roma, Samaritan, Druze

b) populations that look closely related to ancient samples in deep genealogy but are geographically a bit further away (e.g. largely Western Europeans move away from Hungary BA in West Eurasia specific distance, same for comparing Sardinians in Anatolia_N, and the effect is most noticeable on Afanasievo because it causes some groups to switch rank in relatedness at the margin).

So it seems plausible to me that the West Eurasia PCA here is in its 20 dimensions capturing slightly more extra information distinguishing ancients from moderns who have deep ancestry that's more similar (and linking them more to people who have slightly different deep ancestry who live closer today?).

Then again, thinking again, hard to say if this is "more accurate", like you say, "The main thing to keep in mind is that the results are dependent on the samples in the analysis." Each level highlights different things...; January 16, 2018 at 5:05 PM
Seinundzeit said...: David,

Tremendous stuff!

Thank you very much; your work is essential.

All,

With this data, I have yet to undertake an exhaustive exploration. That'll take some time.

Still, I did have some quick fun today. Due to the spare time on my hands, I decided to produce some simulations.

Basically, I created Levant_N, Iran_N, Iran_Chl, and ASI proxies. They've worked out quite well.

So far, I've only worked with the Eurasian/American PCA. I used 13 dimensions (eigenvalue scaled, as described by Matt).

Again, results are very sensible, but my simulations are quite far from perfect. Once I find some time, I'll improve these further.

West Eurasians (of the Caucasus/West Asian/South Central Asian variety):

Chechen

39.45% CHG
27.15% Steppe_EMBA + 0.40% Karasuk_outlier
18.80% Anatolia_N
11.85% Levant_N_Simulation
2.35% Mongola

distance=0.0934

Georgian

59.10% CHG
27.10% Anatolia_N
13.15% Levant_N_Simulation
0.65% Steppe_EMBA

distance=0.1469

Iranian

55.10% Iran_Chal_Simulation
21.60% Levant_N_Simulation
9.45% Karasuk_outlier + 4.45% Steppe_EMBA
9.40% Iran_N_Simulation

distance=0.1472

Kalash

41.30% Steppe_MLBA + 6.05% Karasuk_outlier
33.35% Iran_N_Simulation + 9.40% Iran_Chal_Simulation
9.90% ASI

distance=3.2518

Pashtun_Afghanistan (excluding Afghan Uzbek/Hazara-like outlier)

44.25% Iran_Chal_Simulation + 11.55% Iran_N_Simulation
23.70% Steppe_EMBA + 11.25% Karasuk_outlier
9.25% ASI

distance=0.417

Pashtun_Pakistan (excluding 9 Qasibghar samples)

41.75% Iran_Chal_Simulation + 12.90% Iran_N_Simulation
26.00% Steppe_EMBA + 5.95% Karasuk_outlier
13.40% ASI

distance=0.6092

Tajik_Yagnobi

42.20% Iran_Chal_Simulation + 12.55% Iran_N_Simulation
19.90% Steppe_EMBA + 18.20% Karasuk_outlier + 3.85% WHG + 2.50% Steppe_MLBA + 0.45% Karasuk
0.35% ASI

distance=0.1733

Tajik_Shugnan

37.15% Steppe_EMBA + 13.95% Karasuk_outlier
38.90% Iran_Chal_Simulation + 2.95% Iran_N_Simulation
5.15% ASI
2.95% Mongola

distance=0.3674

Again, very solid.

Although, my Iran_N and Iran_Chl simulations do need more work, and I probably should look into the Kalasha case (Steppe_MLBA shouldn't be at play for them, but that might be fixed by using fewer dimensions).

As the days go by, I'll try some things with World_67, and I'll explore further with Eurasian/American_67. If I see anything of interest, I'll post.; January 16, 2018 at 9:51 PM
Unknown said...: @Sein Nice stuff!

@David Thanks again!; January 16, 2018 at 10:57 PM
Aram said...: Samuel

I see. But I don't think we will see any H2a2 in EEF context. W5 is more rare but we have already 3 EEF W5-s.
I think H2a2 in N Central Europe is a metal ages drift. Founder effect. H2a* from which it started could be steppic (entered with proto BB) but also could be from second wave from Anatolia (Kum6). This latter version is less likely than the first.

Anyway I agree with You that BB got it's EEF from Funnel Beaker. Y dna structure of L51 also points toward that region.; January 17, 2018 at 1:46 AM
Matt said...: @Sein interesting approach to introduce simulation samples in. Not sure what method you used, I was wondering about the similar thing, using a regression on the World Ancient 67 plot with the Global10 results to "get on" an approximation of the Global10 samples that aren't in Martiniano's dataset.

Prompted by your result I did just now, and here's the 20D datasheet with all the projected on G10 results using regression:
part 1: https://pastebin.com/zW7wdFkH, Part 2: https://pastebin.com/n06w1uC2, Part 3: https://pastebin.com/8BDexSjY

(split to 3 files due to pastebin limits).

This includes some workable simulated ancient Levant/Iran/Armenia/ANE positions, in case you wanted to try and download and compare how these sims from the regression on G10 behave compared to what you have done.

(Colour scheme follows the above scaled World Ancient 67 datasheet I put in my comment above, only with the projected on Global10 samples in Slategrey).

Couple of graphics that show where the projected samples sit on this: https://imgur.com/a/Q9EQW

I'm not totally 100% sure about this, as in one frame of mind it will just recapitulate the results from Global10, so not necessarily independent value in it, but in case it is of interest anyway.; January 17, 2018 at 2:00 AM
Kristiina said...: @Rob
Yes, I think that flour was bought on the Central Asian steppe and it is still bought from elsewhere in Western Siberia. However, animals were kept earlier, and Parpola connects the Ugric word for horse, *lox, to Central Asian cultures such as Botai. In his article, Häkkinen emphasizes the importance of metallurgy, and I think that the Ugric expansion was related to metallurgy and horse breeding.

@Onur
I received your email and will reply to you.

“What you provided in support of your view on the presence of agricultural vocabulary in Proto-Uralic was weak evidence at best (at least based on the part you translated to English). I did not feel need to provide evidence for the non-existence of agricultural vocabulary (other than some IE loanwords) in Proto-Uralic because there is abundant source material about that and you yourself should already have read some of them as someone who is interested in the subject. Nevertheless, I will provide some evidence as per your request.”

Häkkinen published his article in 2007 and your authority, Péter Hajdú, died in 2002, therefore you cannot use Hajdu’s thinking as a tool to refute Häkkinen’s analysis. You have to refute it with linguistic means. I would like to see you to show that Häkkinen’s reconstuctions are wrong.

Your next authority is David W. Anthony and his article “Archaeology and Language: Why Archaeologists Care about the Indo-European Problem”. As far as I know, he is mostly an archaeologist, and I doubt his competence in Uralic linguistics. When he argues that Proto-Uralic was a language spoken by foragers, he refers to Parpola 2012. After reading Parpola’s article, I see that he means that Proto-Uralic developed among the foragers of the forested northeastern part of European Russia, and before the disintegration of Proto-Uralic they adopted the Bronze Age economy. Parpola argues that “In response to this justifi ed criticism my first revision was to correlate Proto-Uralic with the Volosovo culture (c. 3650–1900 BCE), as its late disintegration would allow the presence of Indo-Iranian loanwords in Proto-Uralic”.

Your third reference is again to Anthony and to Chapter 5 “Language and Place the Location of the Proto-Indo-European Homeland”. Anthony proposes that the oldest strata of Uralic languages belongs to Lyalovo foragers in the region between the Oka and the Urals. Anthony writes that “At a conference dedicated to these subjects held at the University of Helsinki in 1999, not one linguist argued for a strong version of the late-loan hypothesis. Recent research on the earliest loans has reinforced the case for an early period of contact at least as early as the level of the proto-languages.”

I cannot see any contradiction between my thinking and Parpola’s and Anthony’s thinking in this respect.; January 17, 2018 at 2:24 AM
Onur Dincer said...: @Kristiina

I do not understand your point. I never contested the idea of bronze technology among early Uralic peoples. Late hunter-gatherers of Eurasia used bronze technology. Parpola also advocates the theory of hunter-gatherer lifestyle of Proto-Uralics and Proto-Finno-Ugrics. Also, I too agree with early contacts between Proto-Uralics and early IE peoples. I have no disagreement with Parpola's views, nor with Anthony's views. I do not know Hakkinen's views enough to comment on them.; January 17, 2018 at 2:48 AM
Onur Dincer said...: @Kristiina

I received your email and will reply to you.

Thanks. It seems I accidentally sent the previous email to an email account that is not used by you.; January 17, 2018 at 2:59 AM
Seinundzeit said...: Matt,

Thanks for the data!

I couldn't resist trying some stuff, right before I hit the sack. The results are quite beautiful.

Kalash:

46.0% Iran_N
38.4% Andronovo + 0.5% AfontovaGora3
15.2% Onge

distance=36.7603

Pashtun10_17Af:

41.6% Iran_N + 9.5% Iran_ChL + 1.9% Satsurblia
25.6% Andronovo + 9.5% LaBrana1
10.9% Onge
1.0% Mongola

distance=5.1818

That's all I've tried so far; but again, very sensible output.; January 17, 2018 at 4:04 AM
Kristiina said...: @Onur "Parpola also advocates the theory of hunter-gatherer lifestyle of Proto-Uralics and Proto-Finno-Ugrics."

Parpola writes that "Kallio (2006, 2007) and Jaakko Häkkinen (2009)
not only subscribed to the lower chronology of Proto-Finnic but extended it to Proto-Uralic, pointing out that the disintegration of Proto-Uralic is a relatively late phenomenon: the Uralic protolanguage already certainly had several Indo-Iranian loanwords that reflect both an earlier and a later phase of development. In response to this justified criticism, my first revision was to correlate Proto-Uralic with the Volosovo culture (c. 3650–1900 BCE), as its late disintegration would allow the presence of Indo-Iranian loanwords in Proto-Uralic."

It is true that the Volosovo culture was originally a forager culture, but it adopted new techniques with time.

Use of copper started already c. 3500 BC, and the later Garino-Bor (ca. 2500–1500 BC) became an important centre of copper technology.

Fatyanovo culture (3200–2300 BC) and Abashevo culture (ca. 2500–1900 BCE) brought sheep and pig breeding and small-scale cultivation.

Aryan influence was probably connected with the Abashevo culture which brought new words/concepts such as bread, dough, beer, to winnow, and piglet.

If this is how you understand the forager lifestyle, I agree with you.; January 17, 2018 at 4:29 AM
Ric Hern said...: @ Samuel

It will surely be interesting if the Irish and Most Northwest Europeans only have TRB admixture. I can think of many questions popping up like, How did R1b L51 avoid GAC admixture since GAC replaced TRB in the East and Southern parts of its range and apparently expanded earlier than the proposed Indo-European migration ? This would have created a barrier making Indo-European migration without some GAC admixture virtually impossible. And did R1b L51 migrate using a Northern Route rather than through the Balkans or Hungary ?

I see TRB stretched as far as Northwestern Ukraine. Could this be the Birthplace of R1b L51 ? Etc.; January 17, 2018 at 7:10 AM
huijbregts said...: @All
I have always felt uneasy about scaling PCA scores.
The idea behind PCA is that you first search for the directions that carry most of the variance (the eigenvectors). Next you project the data on these eigenvariables and order them according to the eigenvalues. These PCA scores can be used for dimension reduction, because it is generally agreed that the highest dimensions carry the most noise and can be safely discarded.
Now sometimes the PCA scores are 'scaled' (devided by the root of their values) so as to equalize their standard deviation. I have three comments on this scaling:
1. It is odd: first you sort the principal components according to the standard deviation and next you standardize them to equalize the standard deviations.
2. The advantage of this scaling is that in a plot the components appear to better separated. But remember that the higher dimensions carry more noise and this will be inflated as well. I you want to use the components for further calculations (like multidimensional distances) you will probably have a decreased signal to noise ratio.
3. I have read that the eigenvectors are not invariant to scaling the components. Unfortunately I did not save the reference, but is seems plausible.
4. The Euclidean distance is definitely not invariant to scaling. The Euclidean distance of scaled scores is very different from real distances.
So I have been glad that Davidski's datasheets are not scaled.

That is, until Matt's post above, which told me that the Martiniano datasheet is scaled. I checked and sure enough all the standard deviations are 0.0251.
This implies that Matt is right in scaling this dataset by the root of the eigenvalues. It also implies that Euclidian distances on the original datasheet may be invalid.
Euclidian distances should be calculated on the (re)unscaled data that Matt calls 'including all the eigenvalue information'. I think that it is confusing to call this eigenvalue scaling, because it actually undoes the effect of the previous eigenvalue scaling.

I also had a look at the Global10 data (last update). Here the result was embarrassing: PC1, PC2, PC4, PC6, PC7 had a standard deviation of about 0.3, the other five were lower. I can't explain this, maybe unbalanced sampling may be an issue.

One more remark. In the past we had a discussion about the (modified) Sangarius weighting, which might calculate better distances. This weighing exactly undoes the effect of scaling the PCA scores (up to constant which is equal to the root of the first eigenvalue).; January 17, 2018 at 7:14 AM
Archaelog said...: @Davidski Hey David. How many components do you think are needed to classify all the populations in the world? In your world graph, I saw a East Asia-Europe cline and an Africa-Europe cline. But is that enough? Wouldn't a 3d or 4d (if such a thing is possible) do more justice?

Second, I also wanted to ask how well the group average represents individual cases. Is it possible to get individuals from a modern population who are massive outliers wrt their respective groups?; January 17, 2018 at 8:39 AM
Simon_W said...: @ Anthro Survey

"Lowland Campanians score like this, too. I've seen their results in other PCAs and they are seemingly more West Asian shifted than Sicilians"

Thanks for the hint. A pity they haven't been sampled in any scientific paper. So this would mean that the Italian cline isn't simply north-south.; January 17, 2018 at 9:24 AM
Samuel Andrews said...: @Kristina,

I've been looking at Finnish mtDNA as well. I'll make a post about it on my site. I've haven't done enough research to say much. I will say though I have found very real recent links with Northwest Europe as well as with Balto-Slavs.

The mtDNA from Trzciniec culture in Latvia is really similar to modern eastern Europeans. This has nothing to do with Finnish origins. But I do think Finns have a decent chunk of ancestry from the same farmers, probably Globular Amphora. Trzciniec mTDNA/Y DNA wise looks like a Corded Ware male, mostly Globular Amphora female mix.; January 17, 2018 at 11:58 AM
Onur Dincer said...: @Kristiina

I do not see anything particular to disagree with in your last comment. Early Uralics acquired many technologies related to metallurgy and agriculture from IE peoples, that is what I have been proposing for a long time.; January 17, 2018 at 12:21 PM
Anonymous said...: @Davidski

Thank you for the links. Will have a look at it.

Pardon me for asking these questions again but are there any calculation methods to find out how much ENA and Western Eurasian are Amerindians genetically on average?

And are Amerindians as Western-shifted genetically as certain populations such as Kirghiz, Kazakh, Shor, etc.?

Thank you again; January 17, 2018 at 12:27 PM
Onur Dincer said...: @David

In case you missed my question to you in this thread, I repeat:

David, can you test using formal methods whether Khanty and Mansi have actual EEF ancestry and its levels?

By the way, there was an email I sent to you a few weeks ago, can you also reply to it? Thanks in advance.; January 17, 2018 at 12:41 PM
Matt said...: @Sein, yeah, the projection from G10 seems to have worked fairly well.

Here's a neighbour joining tree using the full 20 dimensions where the pop averages of the projected samples are present in Slategrey along with the coloured averages for other populations: https://imgur.com/a/HLIqI

The projected on populations seem to fit roughly much where they should in the overall structure which is reassuring. Where there are samples from the same population projected on and some real samples, then you'll see two different versions of the same population label and they *tend* to be close together.

I used matching sample labels to set up the regression so there are also some instances where the same sample had slightly different labels between the datasets, and so you have a projected and non-projected sample and they tend to be close (e.g. Roman_Britain_outlier).

Where the projection falls down is where there are dimensions present in Ancient67 World20 that describe fine variation that just isn't present in the Global10, or present in a way that translates, or vice versa.

This mostly affects the position of some samples like Onge or Africans (who both I think tend have a fair bit of differentiation in their own dimensions in G10) or the very fine scale differentiation between modern Europeans and Siberians, ancient EEF and populations like these (maybe NE Asia), where there just isn't enough information in G10 to place say, the Dutch, in the right position in these high order dimensions in the 20. See - https://i.imgur.com/kpgnn62.png, where all the other coloured samples have distinct positions in the high order dimension 18, while the slategrey projected samples are compressed to the centre.

Ultimately there's no way to get round that projection issue than to try to make sure all samples are at least equally projected (like I believe lots of Davidski's normal PCA do) or hopefully in the future to extend this dataset to a more diverse set of ancient samples (GD13, etc.) But these projected on may not be too bad for now for quick toy models (maybe esp when we're projecting populations who are fairly far away on well established clines like Iran_N).; January 17, 2018 at 1:16 PM
Matt said...: @huijbregts: I think that it is confusing to call this eigenvalue scaling, because it actually undoes the effect of the previous eigenvalue scaling.

I think this at is the heart of the confusion of our discussion in this thread; I can readily see how you could read the term "eigenvalue scaling" as referring to the normalization process of scaling out eigenvalue information (equal shares of variance on each eigenvector), rather than the process of scaling in eigenvalue information back into eigenvectors which have been normalized.

The software PAST3, after all, refers to it in both ways in two different modular functions within the same software (PCA and PCoA)!

I take no strong stance on which is more intuitive or grammatical or appropriate - I've simply been referring to it in the way of scaling back in because this is how it is used in the PCoA function where I've been using it frequently, and may have assumed that my meaning in using the term in this way was very clear.

Adding to the confusion is the ghosts of past discussion in which I believe at least Alberto was initially interested in trying to normalize eigenvectors (scaling out before settling on this being a bad idea (for similar same reasons that you describe in your last post though more focused on representing total distance than the overfitting and noise aspects).

And... as you mention the previously observed unexplained pattern of variance in the eigenvectors of Global10. I believe Alberto settled on scaling in eigenvalue information, and this is why we were so keen to get the eigenvalue information for Global10 and for other datasheets which have been scaled before - note this is not the first datasheet which has been (and so I tend to check every datasheet after downloading it). But it was not unambiguous as to whether this was the correct course of action (due to that pattern which baffled us both when we discussed it).

(I have no issues with trying to develop a single agreed terminology on this, though I think it would be hard to establish wider than a few commentators in this thread!); January 17, 2018 at 1:18 PM
huijbregts said...: @Matt
I found some useful information in the book "Statistical Methods in the Atmospheric Sciences" (not a joke).
Mathematically one is free to choose any scaling one wants, but this choice does have consequences for the type of transformation that is performed by the PCA.
For the purpose of Euclidean distances I need just the simplest transformation, which is a rigid rotation, without distortion. This demands that the variance of the PCA scores should be the eigenvalues, or all the eigenvalues multiplied by the same constant. This not the case in the Martininiano datasheet.
In each other transformation the eigenvectors are of course orthogonal, but the the scaling of the the dimensions is unequal, which shifts the eigenvectors into new directions (transformation not isomorphic).
This also what happens when the variances of the PCA scores are equal. In this case the higher PCs are more elongated than the lower PCs. This may be useful for finding new eigenvectors, but it is not a rigid rotation. It should not be used for calculating Euclidean distances.
As to the terminology, I prefer to avoid the term 'eigenvalue scaling' altogether, it is too ambivalent. We might be a bit more verbose, and mention whether eigenvalue information is included.

I do not understand what has happend with the variances of the Global10 PCA scores. They look very irregular.; January 17, 2018 at 4:04 PM
Davidski said...: @Qagan

Pardon me for asking these questions again but are there any calculation methods to find out how much ENA and Western Eurasian are Amerindians genetically on average?

It depends how one defines ANE, ENA and West Eurasian. If we say that ANE is West Eurasian, then some Amerindians have as much as 40% or maybe more West Eurasian admixture. On the other hand, ANE appears to have minor input from a lineage basal to East Asian, and this influence seems to have spilled over to European Hunter-Gatherers younger than Kostenki14 and Sunghir. So if we consider ANE only largely West Eurasian, then that 40% is cut by as much as 1/3. You should read these papers on the topic...

http://eurogenes.blogspot.com/2017/01/east-and-west-eurasians-separated-at.html

http://eurogenes.blogspot.com/2018/01/a-genome-from-first-founding-population.html

And are Amerindians as Western-shifted genetically as certain populations such as Kirghiz, Kazakh, Shor, etc.?

I really don't know because I haven't looked at this issue, and, as per above, the outcome would depend on the definitions used. It's a very complex question that I can't answer right now.

@Onur Dincer

David, can you test using formal methods whether Khanty and Mansi have actual EEF ancestry and its levels?

In line with Chad's response to your question, yes, but it appears to be minor...

https://drive.google.com/file/d/1_39_1-_mrivhZM7CWAQdaCVw-_rN-nuJ/view?usp=sharing

https://drive.google.com/file/d/1q6_hBvCSbqq5vLODLL_g-PglKddJrSF8/view?usp=sharing

By the way, there was an email I sent to you a few weeks ago, can you also reply to it? Thanks in advance.

That's probably because you sent it to my hotmail account, which I don't use anymore. I'm now only using my gmail account for blog related stuff: eurogenesblog [at] gmail [dot] com; January 17, 2018 at 4:25 PM
Onur Dincer said...: @David

Thanks for the answer.

As for the email issue, yes, I had sent my last two emails to your hotmail account. I did not know your gmail account. As per your request, I forwarded them to your gmail account now.; January 17, 2018 at 5:33 PM
Kristiina said...: @Onur

Yeah, the point of view argued by Parpola and Häkkinen is that when Proto-Uralic started to spread it was a Bronze Age language which spread with the Bronze Age package. Moreover, IMO, it was heavily Indo-Europeanized during its formation.; January 17, 2018 at 11:50 PM
Archaelog said...: @Davidski Could you give your opinion on the outliers question I posed? How likely is it to get individuals from a modern population who are massive outliers wrt their respective groups?; January 18, 2018 at 2:12 AM
Davidski said...: @Chetan

Hey David. How many components do you think are needed to classify all the populations in the world? In your world graph, I saw a East Asia-Europe cline and an Africa-Europe cline. But is that enough? Wouldn't a 3d or 4d (if such a thing is possible) do more justice?

It depends on the PCA, but three or more are usually enough to split everyone into fairly neat global clusters, although we've been using 10 on this blog to model both modern and ancient ancestry. See here...

http://eurogenes.blogspot.com/2016/10/a-fresh-look-at-global-genetic-diversity.html

Second, I also wanted to ask how well the group average represents individual cases. Is it possible to get individuals from a modern population who are massive outliers wrt their respective groups?

That depends on the group in question and the number of samples available from each group.

Groups that are recent mixtures of other groups, like, say, Mexicans, usually form long clines in PCA, with many, if not most, being clear outliers from the average. But older, more homogeneous ethnic groups, like, say, Irish or Poles, usually occupy relatively small spaces in PCA, and include very few or no clear outliers.; January 18, 2018 at 2:27 AM
Archaelog said...: @David Thanks man. Really helpful; January 18, 2018 at 2:34 AM
Onur Dincer said...: @Kristiina

Yeah, the point of view argued by Parpola and Häkkinen is that when Proto-Uralic started to spread it was a Bronze Age language which spread with the Bronze Age package. Moreover, IMO, it was heavily Indo-Europeanized during its formation.

That is also what I think, at least for Proto-Finno-Ugric.; January 18, 2018 at 4:20 AM
EastPole said...: @Samuel Andrews

“Some H2a is from the Steppe, some from EEF. H2a2 is from EEF and was probably popular in FUnnel beaker. H2a1 is from the Steppe.”

There is a new paper in ‘nature’ on mtDNA in India:

“Ancient Human Migrations to and through Jammu Kashmir- India were not of Males Exclusively”

https://www.nature.com/articles/s41598-017-18893-8#Sec14

They found some H2a1 there:

https://s10.postimg.org/ykibs047d/screenshot_323.png; January 18, 2018 at 11:17 AM
Anthro Survey said...: @Onur Dincer

Hard(er) to interpret, but things were pointing more towards South Asia(and hence Romany ancestry) on that run than West Asia. But, undoubtedly, had an actual population from India proper been run, it would have been a matter of putting a million dollars on the line as opposed to ten thousand, so to speak. In fact, I could probably just stop at the ADMIXTURE stage in that case.

Yeah, at K=8 and above, the SA-specific component is pretty salient. It's modal in South Indians, but unsurprisingly important in SC Asians as would be expected thanks to a shared stream of Neolithic Iranian and ASI ancestries. What's more, those two Romanian outliers are clearly Roma gypsies if you look at the original, non-blurred version.

Given that it is still low(albeit above noise levels) in Iranians, I wonder if there's actual ASI ancestry embedded there or whether it's due to Iranians sharing Iran_N ancestry that happens to be relevant in Neolithic colonization of S. Asia.

Perhaps d stats similar to (Han, Onge;Iranian(various),Dinka) could yield more definitive clues(or not) until we get our hands on high-quality ASI aDNA. Although, it would be somewhat confounded by some Iranians having Turkic ancestry and some East Asians having ANE-related ancestry. Runs featuring South Asians, Muslim Iranians and Zoroastrians could help, too. Never delved into this, to be honest. Maybe Sein or Matt have. My guess is that limited ASI introgression is possible given medieval slave markets in India.; January 18, 2018 at 7:19 PM
Anthro Survey said...: @Seinundzeit

If you are still following this thread, can you please look into something for me if and/or when time permits?

There's a pattern I've been noticing and it's concerning steppe ancestry in contemporary Indo-Iranian speaking peoples.

If so, I will just make my post in the newest thread as it's actually more pertinent to the discussion there.; January 18, 2018 at 7:23 PM
Seinundzeit said...: Anthro Survey,

Sure thing; if there is a pattern that you find to be of interest, I'll definitely look into it (whenever I have some spare time).; January 18, 2018 at 9:10 PM
Onur Dincer said...: @Anthro Survey

Yeah, those outlier Romanians are certainly Roma (gypsies). The Bosnian outlier is most likely Roma too, but has more indigenous ancestry than the Romanian Roma samples. Due to the lack of proper South Asian populations in the Balkan study, ASI ancestry is largely represented by the East Asian yellow component and the Iranian Neolithic-like ancestry is largely represented by the South Central Asian green component.; January 18, 2018 at 10:31 PM
Vincent said...: @Simon_W , @Onur Dincer

"South Italians and Sicilians. A few Greeks are also in this cluster, even some from Macedonia, but the bulk of the Greeks is more northern."

"Those strongly West Asian-leaning or West Asian-like Greek individuals almost certainly have recent ancestry from Anatolia, the nearby Aegean islands, Cyprus, Crimea, the Armenian Highland and/or the Levant (in other words, from the Greek communities with origins outside the Balkans, the nearby islands or southern Italy)."

I wouldn't trust those PCAs too much. In genetic studies Sicilians cluster with Peloponnese Greeks, or even somewhat "north" of them. And non-European Greeks are WAY more West Asian leaning than even Crete.

https://3.bp.blogspot.com/-DEII2qLmo3U/WNOhkCTD4zI/AAAAAAAAAhM/lD0SjxuhyFcGZJHeJsOeP8CLGjlNykbZwCLcB/s1600/Stamatoyannopoulos2017_Fig2.png

https://3.bp.blogspot.com/-rCYnCrJaAss/WNOhjB2WvgI/AAAAAAAAAhI/ohrul8QnFf0ULFVlK_GAIhlUaOkiiOISgCLcB/s1600/Stamatoyannopoulos2017_Fig3ab.png; January 19, 2018 at 2:49 AM
Vincent said...: @Anthro Survey

"Such a shift is probably a combo of a Roman-age Samaritan-like influx from Syria in addition to an existing Anatolia_BA layer there."

Higher CHG already explains the greater "West Asian" shift of South Italians. There was no Roman-age influx from Syria.; January 19, 2018 at 2:56 AM
Vincent said...: @Simon_W

"So this would mean that the Italian cline isn't simply north-south."

Yes it is (except with Sardinians). Every PCA shows a clear N->S cline in Italy, with not much structure within any area. Sicilians are slightly more "north" because Sicily is geographically more West Med than the mainland south, but the difference is small.

https://4.bp.blogspot.com/-VXi3qHLd7gU/UF7C7F8giCI/AAAAAAAAAt8/1A6xV5e3ATA/s1600/digaetano2012-figS2.png

https://3.bp.blogspot.com/-3OX7IwdpBAU/V3uAay0kUdI/AAAAAAAAA_o/rOP_Nvtps80eOiR54eY4S__E9IsYZDjoQCLcB/s1600/fiorito2016-fig2bc.png

https://3.bp.blogspot.com/-_RMiJ1ToYNM/WFJ6vkjfDmI/AAAAAAAABB8/FYnPWXE3qjorWmRfXAaE9_4PwlpFr7vWgCLcB/s1600/sazzini2016-figS1.png; January 19, 2018 at 3:31 AM
hahaha said...: Where did you get data for Asian populations?; January 19, 2018 at 4:15 AM
Davidski said...: @Hieu Phamnhu

See here...

http://eurogenes.blogspot.com.au/2017/10/genetic-and-linguistic-structure-across.html?showComment=1514271792728#c7354798589542701283; January 19, 2018 at 4:23 AM
Matt said...: As per World Ancient 67 datasheet, here are some extra Global 10 samples projected onto the West Eurasia Ancient 67 using regression - https://pastebin.com/3D7HQ3ZF

Some graphics based on this sheet: https://imgur.com/a/LVbJW; January 20, 2018 at 7:35 AM
Anthro Survey said...: @Seinundzeit

Thank you! It's posted now. Again, no rush.; January 20, 2018 at 11:00 PM
Anthro Survey said...: @Vincent

"Higher CHG already explains the greater "West Asian" shift of South Italians."

And how do you suppose they acquired this "higher CHG" in the first place? Primarily, some Circum-Aegean exchange network emanating from Western Anatolia c. 2200BC is the culprit, hence we should look to Anatolia_BA/Chl-like ancestry.In fact, the Italian cline actually projects towards it and those samples make great fits for them.

Of interest to you:
http://www.ufg.uni-kiel.de/dateien/dateien_studium/Archiv/Kirleis/200100_Kirleis_DalCorso/Oxford%20Handbook%20of%20European%20Bronze%20Age/Chapter%203%20Heyd_Europe_2500_to_2200_BC_Bet.pdf

"There was no Roman-age influx from Syria."
There was. It's just a question of how much. Remember Juvenal's line about "Orontes flowing into the Tiber"?

Now, the reason I said "Samaritan-like" is because the Italian cline doesn't swerve much in the "Natufian" direction. Note that Samaritans and Lebanese are considerably east-shifted compared to Neolithic Levantines. There is reason to think that, by Roman times, Syria obtained a ton of CHG-rich ancestry from a KA expansion(believed to have transformed Anatolia_N into Anatolia_BA/Chl), among later migrations from the north and east.

Hence, Anatolia_BA/Chl and Samaritans likely share an important chunk of ancestry and them giving somewhat inconsistent ratios in my modeling of S.Italians seems to support this notion. In fact, I took a rain check on separating out Aegean and Syrian ancestry in them.
One thing that is consistent is how Tuscans and Bergamasque take significantly less of this ancestry(in relation to Anatolia_BA) than do S. Italians.

We'll see, though.; January 20, 2018 at 11:52 PM
Vincent said...: @Anthro Survey

"And how do you suppose they acquired this "higher CHG" in the first place? Primarily, some Circum-Aegean exchange network emanating from Western Anatolia c. 2200BC is the culprit"

It's from Indo-Europeans. Sarno_2017 said that some IE languages likely spread through a southern route and came with more CHG than EHG.

"There was. It's just a question of how much. Remember Juvenal's line about "Orontes flowing into the Tiber"?"

Juvenal was exaggerating for political effect, because the answer is: almost none

https://www.gnxp.com/WordPress/2017/05/17/the-orantes-has-not-mixed-much-with-the-tiber/

https://italianthro.blogspot.com/2011/01/tenney-franks-orientalization-refuted.html; January 22, 2018 at 3:23 AM
Davidski said...: @Vincent

It's from Indo-Europeans. Sarno_2017 said that some IE languages likely spread through a southern route and came with more CHG than EHG.

Which languages? Italo-Celtic?

Haha.; January 22, 2018 at 4:12 AM
Alogo said...: @Vincent,

Since you brought up the Sarno study, compare what the Sicilian and the Arcadian samples look like there too.

Sicily seems to have acquired a decent amount of Levantine/North African ancestry, separate from the extra stream of non-IE Caucasus/Iran type of ancestry we already find in the Bronze Age Aegean-Balkans and so likely Italy, at some point (or various points). Actually, that seems to be the case for Southern Europe in general (Spaniards, Sardinians etc. seem to show a similar pattern), and likely via somewhat different areas of the Near East in each case, just with a peak in the South Italy-Sicily-Malta area.

Here's a quick one with some relevant southeast European populations from Chad's Africa+Eurasia K10 that captures differences between Anatolia (FEF, peaks in Anatolia_N), Levant (N Africa-Levant, peaks in Natufians) and Iran-Caucasus (Iran-Caucasus, peaks in Iran_N and CHG): https://i.imgur.com/PkKlXbN.png

The Sarno paper was overall pretty good imo but that was certainly a weak point of theirs. Italic itself just couldn't be from a "southern route" considering its obvious ties to Celtic; only Greek could have potentially come from a southern route, via Anatolia (less likely than via the Balkans as it is), due to its ties to Armenian and Indo-Iranian and we know that it doesn't appear in Italy until later on. So either way, you'd be getting the extra Caucasus-Iran to Italy in the same indirect way.

Ditto on their thinking that the contemporary differences between the south Balkans and south Italy would be solely due to a greater impact of subsequent northern migrations in the former. It's clear that south Italy was also disproportionaly impacted from the Near East and likely in more recent times than the Bronze Age.; January 22, 2018 at 9:29 AM
Vincent said...: @Alogo

"Sicily seems to have acquired a decent amount of Levantine/North African ancestry"

Sicilians have 3-5% Moorish admixture. This is already well known.

"Italic itself just couldn't be from a "southern route" considering its obvious ties to Celtic; only Greek could have potentially come from a southern route, via Anatolia"

Greek and Illyrian too were both spoken in southern Italy, but not in the north, where Celtic was spoken. That can explain the clinal CHG/EHG difference in Italy. I don't know about Italic.

"It's clear that south Italy was also disproportionaly impacted from the Near East and likely in more recent times than the Bronze Age."

No it wasn't. Sazzini_2016 says that all peninsular Italians can be modeled as "Sardinian + Caucasus/Iran + Russian" (Sicilians too except for that little MENA admix).

https://italianthro.blogspot.com/2017/04/much-better-population-structure.html; January 24, 2018 at 1:35 AM
Vincent said...: @Alogo

"Sicily seems to have acquired a decent amount of Levantine/North African ancestry"

Sicilians have 3-5% Moorish admixture. This is already well known.

"Italic itself just couldn't be from a "southern route" considering its obvious ties to Celtic; only Greek could have potentially come from a southern route, via Anatolia"

Greek and Illyrian too were both spoken in southern Italy, but not in the north, where Celtic was spoken. That can explain the clinal CHG/EHG difference in Italy. I don't know about Italic.

"It's clear that south Italy was also disproportionaly impacted from the Near East and likely in more recent times than the Bronze Age."

No it wasn't. Sazzini_2016 says that all peninsular Italians can be modeled as "Sardinian + Caucasus/Iran + Russian" (Sicilians too except for that little MENA admix).

https://italianthro.blogspot.com/2017/04/much-better-population-structure.html; January 24, 2018 at 1:38 AM
Alogo said...: Vincent,

I'm not particularly wed to any specific scenario in this case but considering that the relevant (Iberian and Mycenaean right now) Bronze Age ancient samples don't seem to show any "Natufian" type of ancestry while pretty much all of modern Southern Europe seems to in small amounts in some analyses (just with a peak in Sicily and Calabria) currently points to it being affected post Bronze Age by new movements from the Middle East at some point. At the same time, the Near East has acquired steppe ancestry part of which might have very well arrived during Hellenistic-Roman times so it seems that admixture went both ways. Also, yes, clearly the more recent movements don't seem to be as important as those Bronze Ages ones, but they're there.

It's a bit futile to attempt to calculate exact percentages (though the relative ratios give you an idea of where it's highest and all analyses seem to agree on that, the study that you posted as well) since we don't know the exact admixing populations involved and for most areas it doesn't seem to be that big. The main point is that something does seem to have arrived later to Europe and as your study shows too it did affect South Italy disproportionately (the blue slowly keeps increasing as you go south after all).

As for the other and more important clinal difference of CHG/EHG within Italy, that's most likely ancient indeed -no dispute there- but the point was more about the apparent implications of the extra CHG and the origins of IE in the Sarno paper, not the spread of IE languages to Italy later on, on which we're in agreement. Clearly a spread from the Aegean and the Balkans (South Italy) would have given you different results than one from Central Europe (the rest). I guess we misunderstood each other there.; January 24, 2018 at 3:40 PM
Vincent said...: @ Alogo

We're mainly in agreement. Just one thing about the blue in the Sazzini study, that's likely the small Moorish admixture I was talking about because this is how it's explained:

"A similar, but even more extreme south-north gradient was observed also for the blue component highly representative of Northern African groups that was additionally detected in Middle East and, to a significant lower extent, in Southern Italy (4.6%, mainly in Sicily)."; January 26, 2018 at 12:24 AM

search this blog

Saturday, January 13, 2018

Genetic maps featuring 67 ancient genomes and more than 3,000 present-day individuals

135 comments: