search this blog

Monday, August 15, 2016

Basal-rich K7 vs D-stats: the puzzle


It's interesting that, as per the graphs below, the K7 Villabruna cluster shows an awesome correlation with Villabruna affinity. At the same time, the K7 Basal-rich cluster shows an awesome inverse correlation with AG3 (aka AfontovaGora3) affinity.

Conversely, the K7 Basal-rich cluster shows a much poorer inverse correlation with Villabruna affinity, and the K7 AG3-MA1 cluster shows a much poorer correlation with AG3 (aka. AfonovaGora3) affinity.

Why is that? Anyone know? If you think you do, please post your answer in the comments. I already know why and shall reveal the answers shortly. The relevant datasheet is available here.






Update 16/08/2016: Without further adieu, here are the answers (and feel free to disagree with me, but please make sure you have some convincing arguments if you do)...

- the Villabruna cluster shows a strong correlation with Villabruna affinity simply because the K7 is pretty good at estimating proportions of Villabruna-related admixture and, at the same time, Villabruna is an excellent reference sample for Villabruna-related affinity in Western Eurasians

- the Basal-rich cluster shows a strong inverse correlation with AG3 affinity because, paradoxically, AG3 is a fairly poor reference for AG3-related admixture in most Western Eurasians, thereby acting as a pretty good reference for overall Basal Eurasian-free forager ancestry (note that the correlation only breaks down somewhat for samples unusually rich in AG3-related ancestry compared to their neighbors, like those from the Bronze Age steppe and Caucasus)

- the Basal-rich cluster shows a fairly poor inverse correlation with Villabruna affinity because, again, as per above, and also somewhat paradoxically, Villabruna is an excellent proxy for Villabruna-related admixture in Western Eurasians, thereby forcing groups with unusually high Villabruna-related admixture and affinity well above the diagonal line

- the AG3-MA1 cluster shows a fairly poor correlation with AG3 affinity simply because, as per above, AG3 is a fairly poor reference for AG3-related admixture in Western Eurasians, and especially West Central Asians, thereby forcing them well below the diagonal line

Update 17/08/2016: And Matt's explanation is in the comments here, along with the graphs below.


See also...

The Basal-rich K7

23 comments:

Rob said...

Because Villabruna is that "WHG- like" pop which mixed into levant, but AFG-3 is not a great proxy for that ANE -like group which mixed into Neolithic west Asia ?

Davidski said...

OK, that's part of the answer. But there's a second part.

Samuel Andrews said...

Because non-AG3 scores have variable relations to AG3. Yamnaya and Caucasus_HG score about the same AG3, but Yamnaya's non-AG3 is more related to AG3 than Caucasus_HG's non-AG3.

George Okromchedlishvili said...

It shows that VIllabruna for sure came from Balkan-Anatolian refugium and that's why folks that have passed it or lived close to it - basically all Neolithic Euros, their closest descendants and some modern Balkan pops show uber-high affinity to Villabruna.
MA guys on the other hand did not live anywhere in SC-Asia cause otherwise MA-component rich folks with high BE poprortion like Neolithic Iranians would demonstrate much higher affinity to MA.

George Okromchedlishvili said...

Ok, I thin I've nailed it.

Everyone, take a look at lower left corner of the last graph: see anything special?

Yep, our "Western" Basal-rich guys show an unexpected pattern of MA affinity!

Basically Natufians and Neolithic Levantines rightfully appear under-related to MA compared to their AG score.

However, Barcin and especially Iberian Neolithic guys are much close related to MA! More so than Neolithic Iranians!

It means there was steady influx of MA-related ancestry (EHG or SHG or whatever else) into Europe and European hunter-gatherers even before the IE expansion!
These guys have began mixing with the fringe ME pops in the Anatolia through the Balkan corridor!

Rob said...

By George !

George Okromchedlishvili said...

Ok, on the other hand we could try to explain this through the higher HG ancestry of Neolithic Europeans so disregard previous comment
On the other and 4th plot shows super-well the degree of actual IE input into populations. Those that are very above it clearly have some Steppe-infused ancestry and those very below obviously lack it (like Iran Neolithic).
Thus we get a nice estimate of IE/non-IE MA ancestry proportions.

George Okromchedlishvili said...

Another logic conclusion: Villabruna and MA are two very diverged HG branches of ancestry so increased ancestry from MA actually reduces relatedness to Villabruna!
Just compare Basques and Iberia Neolithic. It is clear that indirect MA ancestry from IE Steppe admixture made Basques more distant from Villabruna than more BE rich MN-Iberians.

I suspect that this is due to the fact that those VIllabruna folks carry a line that has diverged from the rest of Crown Eurasians very early on and is the closest to BE out of all other Crown Eurasian groups!
This would make sense given its hypothetical "homeland" in Balkan-Anatolian refugium.

Davidski said...

I got a headache thinking about this, but in the end it's pretty straightforward. You just gotta tackle one graph at a time.

Certainly, the fact that AG3 is a fairly poor reference for the ANE that survives in Eurasia is an important factor for graphs 2 and 4.

Alberto said...

For correlating AG3-MA1 cluster in K7 with D-stats involving AG3, probably an F4 ratio would be preferable to a single D-stat. Or at least some form of subtraction and normalization between the AG3 stat and the Villabruna stat.

And for the K7 Basal-rich cluster, we know that it isn't 100% Basal Eurasian, but rather closer to 50%. So that should be taken into account too (the other 50% being some kind of West Eurasian, probably related to Villabruna).

So it would just require a few tweaks to make things match better, I think.

More intrigued about the possibility of "something" missing related to CHG/Iran_N, though.

George Okromchedlishvili said...

Ok, looking at Graph 4:

- We see very poor fit. Why is that? Well, it may be cause of BE's impact. And indeed those samples that have lower affinity to AG than predicted by their AG ancestry indeed carry a lot of BE which we see from other graphs (like Iran Neolithic or Natufians)

Then looking at Graph 3:

- It's obvious that BE can not be the source of the difference in results.
- Let's suppose that my initial theory about the relatedness of Villabruna to BE is not correct and it's as far from Basal as AG
- Then what's causing the errors? East Eurasian affinity! Just look at the fact that all South Asians are south of the plot. And it also implies that Iran Neolithic has a lot EA-like ancestry

Why don't we see similar stuff in the Graph 2? Cause East Asian like ancestry is more related to AG either due to AG contributing to it or via versa

Davidski said...

OK, enough with the fun and games. I posted the answers.

Samuel Andrews said...

@David,
" the AG3-MA1 cluster shows a fairly poor correlation with AG3 affinity simply because, as per above, AG3 is a fairly poor reference for AG3-related admixture in Western Eurasians, and especially West Central Asians, thereby forcing them well below the diagonal line"

I'm surprised you don't think the same about Villabruna and WHG-related ancestry in the Middle East.

Davidski said...

The Villabruna cluster/Villabruna affinity correlation is great, even for the Middle East.

https://1.bp.blogspot.com/-id3v4ahFGNs/V7GQIgSMxLI/AAAAAAAAEwk/e5sTBVD8CB8F6gQApQrSB_mcKiiVwdvMQCLcB/s1600/Villabruna_vs_Villabruna_cluster.png

Where are you seeing the same problems as with AG3?

Matt said...

I think the explanation given sounds a bit over complicated and, I don't really think this is necessary though. I'll have a go.

If you look at the graph for AG3 proportion vs stat, you've got three groups of high ANE ancestry:

- The steppe-Europe group; they mostly trade off ANE for WHG (relatively high affinity to ANE)

- The Basal_Rich group; these trade off ANE for Basal_Rich (relatively low affinity to ANE)

- The South Central Asian / Caucasus group; these trade off ANE for Basal Rich and WHG or ASI (intermediate affinity to ANE).

So because you've got three different tradeoffs who are all very differently related to ANE, which will produce different slopes against the stat.

For Villabruna by contrast, you're either trading off Villabruna for ANE or for Basal_Rich, and *both* of those groups were just about equally related to WHG (Basal_Rich by sharing recent ancestry with admixture, ANE by ancient ancestry). So you produce only a single slope meaningful slope.

Any good?

Crucially for me, when I attempt to produce a set of 3 components using only the D-stats and PCA (using 4mix plus PCA), the results are roughly the same for the correlation between my estimated proportions and the stat (and are fairly well correlated with the K7). Which makes me think that enough information is present in the stats to get close to what is in the K7, so extra populations with different shared drift, although I don't doubt, will be of value, are not strictly necessary to explain the K7 pattern.

I think Sam has the right quick way of explaining all this; it's variable relationships in the non-ANE fraction, while the non-WHG fractions have equal relationships to WHG.

Davidski said...

Any good?

Sounds more complicated than mine.

I think that once a more recent ANE sample than AG3 is available, the AG3-MA1 cluster vs AG3 affinity graph will look pretty much like the Villabruna cluster vs Villabruna affinity graph.

But the Basal-rich cluster vs the new and better ANE genome affinity graph will look much worse than the Basal-rich cluster vs AG3 affinity.

At the same time, if we ever get a genome from the immediate ancestor of both ANE and WHG, the Basal-rich vs this new sample graph should look even better than the Basal-rich vs AG3 affinity graph.

Matt said...

Sounds more complicated than mine.

Well, if you wanted, you could test it via setting up a set of virtual populations with D-stat, based on the real the known populations D-stats and their proportions. If you get a single cline whether you're trading off WHG for AG3 / a Basal Rich proxy and multiple clines from trading off AG3 for alternatively WHG, Basal Rich proxy, ENA or complicated combinations of them, then Sam's / mine would be the correct one (I think).

Another prediction of your theory (though I don't think it would be watertight either way) would be to graph Villabruna proportion vs D (Mbuti, El_Miron)(Mota,X). El_Miron would have the same relationship to Villabruna that you'd postulate AG3 to have to a newer and more relevant ANE sample, so that could provide a comparison.

Matt said...

For more of an example, when I run an example with the D-stats I estimate for the ANE, Basal_Rich and WHG I used for my PCA+4mix+estimated ANE, Basal_Rich and WHG positions, it seems pretty clear:

http://imgur.com/Qhee3FJ
http://imgur.com/KVvCC6Z

There could be some difference there for the K7 models for proportions and for D-stats better real proxies for ANE and WHG, enough to give a different result, but I would doubt it since the correlations between the PCA+4mix+estimated and K7 are pretty good.

Davidski said...

Matt,

I've added ElMiron and Kostenki14 to the datasheet. But I'm not sure what running them against Villabruna can really say about AG3?

https://drive.google.com/file/d/0B9o3EYTdM8lQZGFGbmFWNUVDcnc/view?usp=sharing

Interestingly, running ElMiron against the Basal-rich cluster seems to show that Iberia MN and Chalcolithic have inflated ElMiron-specific ancestry.

human443 said...

Another thing to consider is how bad of a reference can AG3 really be for West Eurasians?

Mbuti GoyetQ116-1 MA1 AfontovaGora3 -0.0243 -2.587 144086
Mbuti Kostenki14 MA1 AfontovaGora3 -0.0211 -2.421 159767
Mbuti Karitiana MA1 AfontovaGora3 0.0241 3.553 170325
Considering these stats are all about the same D value, it leads me to believe it could be a simple case of paleoeuropean input into MA1 to the exclusion of AG3 (theories of gravettian input into malta'buret come to mind).

However, when ANE enters Europe in the late paleolithic, this happens...
Mbuti Villabruna MA1 AfontovaGora3 -0.0009 -0.095 161042
The difference is nullified. The ANE that entered Europe must have been more related to AG3 than to MA1 to overturn the affinity to the paleoeuropean portion of MA1's ancestry.

In populations with higher amounts of ANE the trend is outright reversed.
Mbuti Satsurblia MA1 AfontovaGora3 0.0149 1.603 117071
I don't have the direct stat for Karelia_HG, but I expect it to be similar.






Matt said...

@ Davidski, El_Miron is as a comparative test to the idea that an older reference of about the same age difference as AG3, would be linked to the lessened correlation.

If the K7 AG3-MA1 component breaks apart from correlation with D(Mbuti,AG3)(Mota,X) because of AG3-MA1 is an imperfect, older reference for the K7 AG3-MA1 component, then wouldn't you expect to see the same breakdown of correlation between the D(Mbuti,El_Miron)(Mota,X) and K7 Villabruna, as El_Miron is an imperfect, older reference for Villabruna?

El_Miron stat graphed against Villabruna cluster: http://imgur.com/a/82W7P

Though, more importantly, did it make any sense what I was doing in the graphs in my last post? Not sure it it was clearly explained in that.

Does this make it any clearer?:

http://i.imgur.com/3xqCJdF.png

(Two clines away from ANE_Model to Basal_Rich_Model and WHG_Model, as each of those have different relatedness to ANE. Single cline away from WHG_Model to Basal_Rich_Model and ANE_Model, as they're both roughly equally related to WHG.

Moderns intermediate, as they're never mixing 100% WHG or 100% Basal_Rich in and some have ASI.

I've overlaid with the K7 proportions for Villabruna and AG3-MA1 in the last set of graphs here

Purely a decreased correlation because of the differences between the two admixing groups in the case of the AG3-MA1 graphs, which doesn't exist in the WHG graphs).

Davidski said...

I need to chew this over. But yeah, that last graph makes sense.

Matt said...

Davidski: Interestingly, running ElMiron against the Basal-rich cluster seems to show that Iberia MN and Chalcolithic have inflated ElMiron-specific ancestry.

Yes, all the ancient Iberians seem a bit outlying on the El_Miron stat relative to Villabruna. To some extent present day Iberia seems so too, but the amount of extra El_Miron ancestry in present day Europeans, even Basques, is probably very low.

Graph comparing both K14 and El_Miron to Villabruna stat:

http://imgur.com/al2EDbS / http://imgur.com/DUCTcAV

Iran_Neolithic almost alone seemingly has no preference in these stats for any of Villabruna, K14 or El_Miron (as if they were a clade). Iran_Chal seemingly shows no real preference for El_Miron or Villabruna, though prefers both to K14.