Saturday, April 21, 2012

So who's the most (indigenous) European of us all?

Basically, the first map below reveals the answer. It shows the spread of a European specific cluster from a global-wide ADMIXTURE analysis at K=8 (eight ancestral populations assumed), which I call "North European". Thus, genetically, the most European populations are found around the Baltic Sea, and in particular in the East Baltic region. In my genome collection, samples from Lithuania clearly and consistently score the highest percentages in ADMIXTURE clusters specific to Europe. However, I suspect that if I had Latvians with no known foreign ancestry going back more than four generations, they'd come out the "most European". Hopefully we can test that in the near future.

Below are the fifteen Eurogenes samples that scored the highest percentage levels of membership in the North European cluster. The list only includes groups with five or more individuals present in the analysis, so some populations, like Estonians or Danes, weren't included, even though they easily made the cut. The spreadsheet with all the results from this run can be seen here. A table of Fst (genetic) distances between the eight clusters is available here.

Lithuanians 77%
Finns 74%
Belorussians 70%
Swedes 69%
Norwegians 68%
Kargopol Russians 68%
Russians 68%
Poles 68%
Erzya 66%
Ukrainians 66%
Moksha 66%
Orcadians 63%
HapMap Utah Americans (CEU) 63%
Irish 63%
British 62%

So why did I pick the results from K=8, and not some other K, like 2, 10, or 25? Well, it's not possible to evaluate who is more European without a European-specific cluster (ie. modal in Europeans, with a low frequency outside of Europe). Provided that a decent number and range of global and West Eurasian samples are used in the analysis, such clusters begin appearing at around K=5 or K=6, and start breaking up into local clusters from about K=9. I found that runs below K=8 produced European clusters that spilled too generously outside of the borders of Europe. On the other hand, runs above K=8 produced European clusters that weren't representative of enough European groups (ie. too localized). But the European cluster from K=8 was pretty much perfect, and I think that's obvious from the map. In fact, I can hardly believe how well it fits the modern geographic concept of Europe - north of the Mediterranean and west of the Urals. Amazing stuff.

There are two other clusters that show up across Europe in non-trivial amounts - Mediterranean and Caucasus (see maps below). These can also be thought of as native European clusters, since they've been on the continent for thousands of years. However, their peak frequencies are found in West Asia, so they're not particularly useful signals of European-specific ancestry.

So what do these three clusters show exactly? They represent certain allele frequencies in modern populations, and in fact, these can change fairly rapidly due to admixture, selection, and genetic drift. So claiming that such clusters represent pure ancient populations is unlikely to be true in most cases, if ever. However, I don't think there's anything wrong in saying that, when robust enough, they can be thought of as signals of ancestry from relatively distinct ancestral groups.

Indeed, anyone who's read up on the prehistory of Europe, knows that there are three general Neolithic archeological waves to consider when trying to untangle the story of the peopling of Europe. These are Mediterranean Neolithic, Anatolian Neolithic and Forest Neolithic (for example, see here).

Mediterranean Neolithic refers to a series of migrations from West Asia via the Mediterranean and its coasts. The areas most profoundly affected by these movements include the islands of Sardinia and Corsica, and the Southwest European mainland. Anatolian Neolithic describes migrations into Europe from modern day Turkey, mostly into the Balkans, but also as far as Germany and France. At the moment, Forest Neolithic of Northeastern Europe is something of a mystery. However, the general opinion is that it was largely the result of native Mesolithic hunter-gatherers adopting agriculture.

Obviously, it's very difficult to dismiss the correlations between these three broad archeological groups and the European and two European/West Asian clusters produced in my K=8 ADMIXTURE analysis. Is it a coincidence that the Mediterranean cluster today peaks in Sardinia, which has been largely shielded from foreign admixture since the Neolithic, and today forms a very distinct Southern European isolate? Why does the North European cluster show the highest peaks in classic Forest Neolithic territory? And why does the Caucasus cluster radiate in Europe from the southeast, which is where Anatolian farmers had the greatest impact? These can't all be coincidences, and I'm willing to bet that none of them are. I'm convinced that the three clusters from my K=8 run are strong signals from the Neolithic, and the North European cluster also from the Mesolithic.

Eventually, these issues will be settled with ancient DNA data, in a much more comprehensive way than ever possible using modern genomes. We've already seen some preliminary results, mostly from Mesolithic, Neolithic and Bronze Age sites around Europe, so perhaps it's useful to ask whether my ADMIXTURE analysis and commentary here mirror these early findings? I think they do. For instance, here's an interesting conclusion regarding the East Baltic area from a study on ancient Scandinavian mtDNA by Malmström et al.

Through analysis of DNA extracted from ancient Scandinavian human remains, we show that people of the Pitted Ware culture were not the direct ancestors of modern Scandinavians (including the Saami people of northern Scandinavia) but are more closely related to contemporary populations of the eastern Baltic region. Our findings support hypotheses arising from archaeological analyses that propose a Neolithic or post-Neolithic population replacement in Scandinavia [7]. Furthermore, our data are consistent with the view that the eastern Baltic represents a genetic refugia for some of the European hunter-gatherer populations.

I suppose there will be people wondering why I didn't take Sub-Saharan African, East Asian, and South Asian admixtures into account in my analysis. The reason is that I wasn't looking at which group was most West Eurasian, or Caucasoid. Based on everything I've seen to date, in my own work as well as elsewhere, the most West Eurasian group would probably be the French Basques from the HGDP. However, the differences between them, and certain groups from Northeastern Europe, like Northern Poles and Lithuanians, really wouldn't be that great anyway. I might do a write up about that at some point.


- Maps by Eurogenes project member FR7

- Additional stats by Eurogenes project member DESEUK1


Helena Malmström et al., Ancient DNA Reveals Lack of Continuity between Neolithic Hunter-Gatherers and Contemporary Scandinavians, Current Biology, 24 September 2009, doi:10.1016/j.cub.2009.09.017

Noreen von Cramon-Taubadel and Ron Pinhasi, Craniometric data support a mosaic model of demic and cultural Neolithic diffusion to outlying regions of Europe, Proc. R. Soc. B published online 23 February 2011, doi: 10.1098/rspb.2010.2678


EliasAlucard said...

You've basically replicated Dodecad K12a which means we now have empiric evidence of Lithuanians or Baltics in general being most "north European", and as you know, in my opinion, this is the main proto-Indo-European component. It's no coincidence it peaks in Lithuanians especially as Lithuanian is often considered the most conservative extant Indo-European language.

If we only look at the north European component, Finns will be placed second, but if we consider also their Siberian component (and this is also true for Russians), then I'm not sure Finns can be placed second. You did however get a bit higher north European component in the Swedes than Dienekes got; are you using the same reference individuals/populations? Because most people who sent you their files also participated in Dodecad and vice versa. Did you use any different settings/software?

Nonetheless, the north European component could be derived from indigenous hunter-gatherer populations, but it was definitely there when these ancestral hunter-gatherer populations went from this subsistence/economy to that of the proto-Indo-Europeans. I'd imagine that the Yamnaya remains would score something like 80-85% "north European" and the rest something like Caucasus/Mediterranean/West Asian or something along those lines.

By the way, could you put together an average of this analysis for all the ethnic/racial groups like Dienekes did with K12a? It would be so much easier to view the average for all ethnic groups.

Davidski said...

I've got more Swedes than Dodecad, including a few from Northern and Eastern Sweden, and with ancestry from Aland.

EliasAlucard said...

^^ I see, that explains it. But even so, your Norwegian cluster also got higher than Dodecad K12a (they got something like 55% there and Swedes have 57% if I remember correctly), which means that your entire Scandinavian cluster somehow got a higher north European score here than they did in Dodecad K12a. Perhaps you have different K settings then.

In any case, I think this correlates fairly well with the proto-Indo-European component. If they release the autosomal DNA from the upcoming Yamnaya study, you should definitely do another analysis with Yamnaya included, that would be cool.

By the way, you don't have any Danes? And Utah Americans are basically British colonials, right?

Antonio Pedro said...

One alternative way to look at this question is to ask which one of these groups has the higher rate of non-european admixture. I would be curious about that. Cheers,