search this blog

Tuesday, August 20, 2013

Principal component analysis (PCA) of West Eurasia


In the past I've done MDS and SPA analyses of West Eurasia, but below is a PCA. Anorther version with individual IDs is available here.

The first eigenvector is a reflection of the genetic cline that runs from Northern Europe to the Middle East, with Finns being the most Northern European and Saudis and some Bedouin the most Middle Eastern. Mediterranean ancestry defines the second eigenvector, with Sardinians being the most Mediterranean, and the Mari of the Volga-Ural region the least.



See also...

Cluster analysis of West Eurasia: 13 clusters from 18 dimensions

PCA of the world

A multidimensional view of Europe + West Asia


40 comments:

Maju said...

PCA is always limited in its analytical power, for example in this case it only determines "Europeanness" (or "Eastern Balticness" being more precise, as the extremes are Finns and Lithuanians) in PC1 and "Caucasianness" in PC2. All the rest are just negative (or intermediate) polarities.

Still the clusterings are interesting to watch and they are suggestive or real distinctions: 8-9 European clusters, four West Asian ones and one African. Of course sampling figures affect the result different figures for each of the populations or clusters would surely alter the overall result somewhat.

andrew said...

Is there a way to get a less blurry picture?

Davidski said...

It's not blurry. You're doing something wrong.

Eduardo Pinto said...

Hello David,

I've been paying attention to one of your old ADMIXTURE runs, WE12, where a North Atlantic and a Baltic cluster seem to juggle for dominion in Central Europe. Using this PCA of yours I've tried to sketch a very basic diagram of early/late neolithic movements inbetween the Middle East and Europe.

http://tinypic.com/view.php?pic=35ap2sl&s=5


My theory is that the north atlantic cluster represents western mixed cultures which are the product between danubian farmers and western european hunter-gatherers and the baltic cluster representing Danubian farmers but also Caucasus farmers intermingling with hunter-gatherers living in volga basin.

I could have added subsequent movements such as the BB intrusion in Central Europe or the BB/CW backflow into Iberia but it would have only served to confuse people.

sds said...

David, did you include US265 in this run?

Davidski said...

No, I can't run relatives.

Fanty said...

I can imagine what you are doing wrong.
If using Firefox for example, you get a somewhat "blurry" looking image if you click.

This is, because its somewhat line-art kind of picture resized to screen resolution.

If you click on it once more, its shown in 100% size and wont be blurry anymore, for sure. Try that.

Gui S said...

David, is it possible to get PCA coordinates of the participants?

Davidski said...

Not for this round, I deleted them. But you're between FR10 and FR9.

manfromat said...

David, can you update the PCA with recent participants? I wan't in Eurogenes at that time.. My ID is Kurd13..

Davidski said...

That map doesn't work for me because it ignores the clines (or lack of) on the PCA.

The PCA shows that most of the traffic from West Asia to Europe went via the Mediterranean and the Balkans. There was very little contact between populations of the Caucasus and the East European Plain.

Also, I think you'll find that the hunter-gatherers of North Central Europe were very Eastern European-like (in other words, uber East Baltic). This has already been shown by the results from Neolithic Gotland. I'd say that more Western European-like hunter-gatherers were confined to the Atlantic fringe. Volga-Ural hunter-gatherers were probably very similar to those in North Central Europe, but possibly with significant Siberian admix.

Essentially, I think the first eigenvector is a function of isolation-by-distance between the Middle East and Europe, and reflects multiple ancient and recent contacts. I'd say the second eignevector is mostly a reflection of the level of western Mediterranean ancestry (and relative lack of West Asian and Siberian ancestry).

Davidski said...

As per my reply to Eduardo, I think the first eigenvector is essentially a function of isolation-by-distance between the Middle East and Europe, and reflects multiple ancient and recent contacts. I'd say the second eignevector is mostly a reflection of the level of western Mediterranean ancestry (and relative lack of West Asian and Siberian ancestry).

Maju said...

"the first eigenvector is a function of isolation-by-distance between the Middle East and Europe"...

Per definition, and correct me if I'm wrong (that I don't think so), the first eigenvector only describes the similitude to those showing the highest values (in this case some NE Europeans), while the negative pole (left) says nothing at all except the negativeness of this "NE Europeanness".

That A is different from B and A is different from C, does not mean that B is identical (or even similar) to B. A dog and a horse are different from an elephant but that does not make them identical in any way. Careful with that, please.

This graph is only indicating to what degree each population is similar to Finns-Latvians (PC1) and to Caucasian peoples (PC2). The clines you see are largely illusory therefore: Finns and Caucasians/Iranians/Turks do cluster in PC2 (just fold the graph along the vertical axis), Basques, French and Hungarians cluster as well in the PC1 meaning only that they are similar to Finns in the same degree.

In the negative zone: Moroccans and Arabians also cluster in the PC1 but again it means that they are nearly identical in one thing: their (very low) level of "Finnishness". That's also how Greeks, Italians and Chechens cluster: same level of "Finnishness".

The "clines" only make sense once you have mapped enough PCs as to cover the majority of the genetic diversity. Usually PCs (at least in European/West Eurasian) analysis only cover figures under 20% values, often under 10% each, so you will need many PC analysis to get a comprehensive picture on how different populations are actually related to each other. PCAs are like a black and white film compared with the true color and depth of reality.

Davidski said...

You certainly can't say that "the graph is only indicating to what degree each population is similar to Finns-Latvians (PC1) and to Caucasian peoples (PC2)". That really makes no sense at all.

Consider that eigenvector 1 shows more variation than eigenvector 2. Now, Finns are absolutely not anywhere near the Caucasians across e1, and don't even quite line up with them across e2. So in fact, Italians are way more Caucasus-like than Finns, and this is easily verified by other types of autosomal analyses and uniparental markers.

Not only that, but this PCA won't look very different even if I take out all the Finns and Balts. I can show you that later.

Therefore, clines are important, not only because the PCA isn't as limited as you describe, and also because well documented major population movements and often used migration routes always leave traces in modern genomes.

Maju said...

Finns at e2 cluster (or "line up", if you prefer) with Turks, Armenians, Iranians, and some Caucasians (Georgians), what means that they have similar level of "Caucasianness" as those peoples of West Asia. Nothing more, nothing less.

"eigenvector 1 shows more variation than eigenvector 2"

It's standard. I don't see how that would matter.

"... this PCA won't look very different even if I take out all the Finns and Balts".

Probably. I'm not saying that Finns and Balts cause the effects on their own. It was not my intention to suggest that. All I say is that they are the extreme of a polarity (e1) so, in essence, they define it. If you take them out surely Russians, Swedes and others will take their place, of course - unless you radically alter the sampling strategy (what is not necessarily better nor worse, it would just offer different viewpoints).

What I mean is just that there is no one absolute truth that this or any other graph shows but that there are many nuances that can't be ignored or dismissed as irrelevant. That the graph must be understood for what it is. For example: what in that graph separates French and Basques? The degree of "Caucasianness" (greater among French), what separates Spaniards and Basques? Both the degree of "Caucasianness" (greater among Spaniards) and the degree of "Finnishness" (greater among Basques), etc. How reliable is a cluster in the negative area? Not much (at least not necessarily): all the graph shows is that they are neither this nor that but it's unclear if they are something else that links them together. How useful are "Finnishness" and "Caucasianness" to understand properly West Eurasian genetics, not much: it's just very preliminary assessment.

That's what I mean.

"Therefore, clines are important"...

While I do agree that the bulk of the genetic flow between West Asia and Europe happened through the Balcans and nearby areas, I wouldn't put my hand on fire for your reading of this mere PCA nor about your rejection of other flows via the Caucasus, maybe at older times or whatever. In that very graph there is clear affinity between Eastern Europe and Highland West Asia in the PC2, which may weight a bit less than PC1 but it should not be deemed irrelevant either.

"I can show you that later".

You know what I would like you to show me (asking is for free, up to you to do it or not). Some sort of analysis (another PCA maybe, why not?) in which the samples are roughly apportioned to actual population sizes. You know: France has 70 million people, Finland not sure if 8, so 10x more French samples than Finns, etc. I often feel that small oversampled populations cause some notable distortions, especially the very different ones like Finns or Sardinians.

Davidski said...

A single outlier Georgian lines up with samples that I labeled East Finns.

Also, let me just stress that taking out single populations from this dataset won't change the results. I can take out all the Finns and nothing fundamental will change. Same with the Balts, Sardinians, etc.

Matt said...

It looks to me like PC1 reflects a North-South West Eurasian vector, and is most different between the most Northern population, Finns, and the most Southern population, Saudis (both populations at roughly the same longitude).

PC2 reflects a West-East , and is most different between the most Eastern population, Lezgins and Georgians, and the most Western population, Moroccans (both populations at again *roughly* the same latitude).

The shape of the graph reflects either that North-South population flow across history has tended to be is less than East-West population flow (thus why East-West is PC2 rather than PC1) and that North-South population flow is higher in the West than the East.

One fun thing to do with this PCA is to save it to disk then do the Novembre et al trick and rotate it 280 clockwises and then flip it horizontally. That aligns the PCs better with the geography that they map to. See example here http://oi44.tinypic.com/50rtxl.jpg (obviously not as readable as if Davidski had flipped the axis on the output graph).

The positions of the samples map reasonably well to geography and isolation when this happens (and remember the surface of the earth is curved), if you allow for an elevated European-non European gap, with the main exception seeming to me that the Greeks are positioned south of where they should be.

mikej2 said...

Maju, I think that you are on the right way when asking about the extremes on eigenvectors. Although PCA does this quite well and find the biggest difference and most meaningful PC's, it could cause nonlinear results just for that reason, because the biggest difference doesnt correspond to the genetic history, I mean there is not full equivalence between genetic difference and genetic history, only partial equivalence. I am sure that we should pay special attention to the genetic history of populations and select the basic sample set of our tests to ensure right measures between populational histories, not only the quantity of samples but also quality in this meaning.

Onur said...

If PC2 is really a reflector of West-East differentiation, why, for instance, are Georgians positioned so much to the "east" of Arabians, even to the "east" of Iranians?

Davidski said...

It's because they lack Mediterranean ancestry which peaks in North Africa and Sardinia.

Onur said...

It's because they lack Mediterranean ancestry which peaks in North Africa and Sardinia.

I know. But is it enough to label them as genetically so "eastern"?

Davidski said...

Mediterranean Oetzi the Iceman-like ancestry is one of the main causes of the West vs. East dichotomy within West Eurasia. Georgians lack this influence so they come out very "eastern" in terms of West Eurasian genetic diversity in many analyses. They look much more western in analyses where the Mediterranean component is less important, and where west/east differentiation is dictated by East Eurasian affinity, or lack thereof.

Onur said...

Another point: According to Matt's hypothesis, Georgians are genetically more "eastern" than Finns as well. But Finns lack Mediterranean-like ancestry too. So what makes the difference?

Maju said...

David: There is NO "Mediterranean ancestry which peaks in North Africa and Sardinia", at least not deductible from this analysis: your "Mediterranean ancestry" is NOTHING but the lack of Caucasian affinity (as you are measuring it from PC2). What on Earth would bring Sardinians and Moroccans so close anyhow? Just a negative comparison: they are not "Fords" but one may well be a "Toyota" and the other a "Renault", they only share their lack of identity with the brand "Ford" or actually Caucasus or Highland West Asia, what is exactly the only thing that e2 measures: Caucasus or Highland West Asia affinity.

I'm surprised that you are not stroke by the fact that the two vectors correspond grosso modo with to two commonly found components in West Eurasia: North-Central European and Highland West Asia. But, regardless, negative values re. these components do not make the affected populations any more similar: they may be or they may not be. That would only be apparent in an analysis where at least one of them is in the positive zone.

Davidski said...

Sardinians share a lot of deep ancestry with Moroccans and Berbers, which is essentially the Mediterranean ancestry I'm talking about.

This ancestry is synonymous with what I call "Southwest Eurasian" ancestry, and it showed up clearly even when I only used a single Sardinian and six Mozabite Berbers in an ADMIXTURE run.

http://bga101.blogspot.com.au/2012/03/admixture-analysis-of-west-eurasia-k13.html

http://eurogenes.blogspot.com.au/2012/03/northwest-eurasians-southwest-eurasians.html

Onur said...

David, you still did not respond to my question. What makes Georgians genetically more "eastern" than Finns?

This ancestry is synonymous with what I call "Southwest Eurasian" ancestry, and it showed up clearly even when I only used a single Sardinian and six Mozabite Berbers in an ADMIXTURE run.

http://bga101.blogspot.com.au/2012/03/admixture-analysis-of-west-eurasia-k13.html

http://eurogenes.blogspot.com.au/2012/03/northwest-eurasians-southwest-eurasians.html


But according to your ADMIXTURE analysis Georgians too are in the "Southwest Eurasian" zone.

Davidski said...

Finns are Europeans with significant Northwest European ancestry, while Georgians are West Asians. That's basically why Finns are more western than Georgians. Who do you think is genetically closer to an Irishman, Frenchman, or even a Spaniard; a Finn or a Georgian?

Also, Georgians aren't Southwest Eurasians. That map doesn't take into account the Caucasus cluster, which is closely related to the Northwest Eurasian cluster and distant from the Southwest Eurasian one. So that result on the map is a quirk of the methodology, with Georgians mostly belonging to the Caucasus cluster, but also having a relatively greater membership in the Southwest Eurasian cluster than the Northwest Eurasian cluster. However, in absolute terms they have a very low membership in the Southwest Eurasian cluster, and are more closely related to populations with a high membership in the Northwest Eurasian cluster.

Onur said...

David, rather than putting forward hypothetical (i.e., unproven) proposals, I think we should limit ourselves as much as we can to what the data already tell us, because we can only measure the level of accuracy of proposals such as that of Matt with more mathematical methods. At present, it is not clear how much these data can inform us about the level of "easternness" or "northernness" of a population. My objection was essentially to Matt, not you, as it is him who put forward a hypothesis about the "northernness" or "easternness" of the populations.

Davidski said...

It's a proven fact that Finns are Europeans and Georgians are West Asians.

Onur said...

It's a proven fact that Finns are Europeans and Georgians are West Asians.

Who objected to that? What is unclear is the influence of the level of "Europeanness" and the level of "West Asianness" on the level of "easternness".

Maju said...

"Sardinians share a lot of deep ancestry with Moroccans and Berbers, which is essentially the Mediterranean ancestry I'm talking about."

Not sure if that's too real (the ADMIXTURE graph is too heavy for my cheap PC and crashes it) but, whatever the case, it is NOT what the PCA indicates. It simply cannot indicate that. I understand that the component you mention is quite stronger in Spaniards than Basques but Basques appear closer to Moroccans and Sardinians in e2 anyhow. The only thing that negative e2 indicates is lack of affinity to the Highland West Asian (or Caucasian) component and it probably correlates with having low frequencies of Y-DNA J2 and G2, which are the most definitory patrilineages of that demographic pole.

The problem is that there are more than just two poles (you explored up to 13, for example) and the PCA can't show but two (three if tridimensional).

Davidski said...

I updated the PCA with some new samples. If you rotate this new version clockwise you'll get a map of West Eurasia, more or less.

Eduardo Pinto said...

David,

Most farmers who have entered Europe - north of the Alps of course - during the neolithic came through the Danube basin and they were of West Asian ancestry, hence why Baltic and Northwest clusters on ADMIXTURE are always closer - Fst-wise - to the West Asian cluster than to the Mediterranean. The Baltic,the Northwest cluster or even Dienekes' Atlantic_Baltic are most likely 1/2 farmer 1/2 hunter-gatherer, if you have the time play around with VV's calculator MDLP 22, remove the northeastern cluster and see where the admixture follows...

idurar said...

Is it possible to include the Tunisian Jew sample in the next update?

Davidski said...

I've updated the PCA with that sample.

EDP said...

And in the W. Eurasian PCA, I see a Cuban between Tuscan and Spanish Murcia. Is that me by any chance? I'm Cu6

idurar said...

Yes, it's CU6. Looks like you deviate towards Berbers as do the Spanish Murcian and Canary Islanders. The three Tuscans above you deviate most likely towards Sardinians, which explains why all these individuals who are North African and Sardinian admixed respectively are close to each other here.


@Davidski, thanks for adding the Tunisian Jewish individual. Will there be an update in a few weeks/months? The real question here is if you will accept new submissions from North African Jews and Berbers (and other poorly sampled groups for that matter)or is it over?
Thanks.

EDP said...

Thanks Idurar, I do have Canarian ancestry.

Davidski said...

I've updated the PCA.

ppc experts said...

A great online marketing company for tech support inbound calls, Inbound Technical Support call by osiel web