search this blog

Sunday, April 22, 2018

Likely Yamnaya incursion(s) into Northwestern Iran

Despite being stratigraphically dated to 5900-5500 BCE (ie. the Chalcolithic period), ancient sample Hajji_Firuz I2327 from Narasimhan et al. 2018, belongs to Y-haplogroup R1b-Z2103 and shows minor, but unambiguous, Yamnaya-related ancestry on the autosomes. Why is this a problem? Because both R1b-Z2103 and the Yamnaya culture are dated to the Bronze Age, and Yamnaya samples from Kalmykia and Samara are exceptionally rich in R1b-Z2103.

Hence, pending a successful radiocarbon (C14) dating analysis, it seems rather unlikely that Hajji_Firuz I2327 was alive during the Chalcolithic. Rather, it appears that he's partly of Yamnaya origin and has been wrongly dated. His remains are likely to be from a secondary burial from the Bronze Age that collapsed into the layer below, right into a Chalcolithic bin ossuary burial full of much older bones.

This scenario is strongly corroborated by data from two other ancient individuals from what is now Northwestern Iran:

- Hajji_Firuz_BA I4243 (also from Narasimhan et al. 2018 and from the same site as Hajji_Firuz I2327) was initially also stratigraphically dated to the Chalcolithic, but is now labeled as a Bronze Age sample after a radiocarbon (C14) analysis of the remains revealed a date of 2465-2286 calBCE. Moreover, this individual packs around 50% Yamnaya-related ancestry.

- Iran_IA F38 (from Broushaki et al. 2016) from an Iron Age burial at Tepe Hasanlu, which is just a few miles from Hajji Firuz, also belongs to Y-haplogroup R1b-Z2103 and harbors some sort of steppe ancestry on the autosomes (see here).

Below is a Principal Component Analysis (PCA) showing how this trio compare in terms of genome-wide ancestry to C14-dated Chalcolithic samples from Hajji Firuz and the nearby Seh Gabi. The relevant datasheet is available here.

Clearly, they're shifted "north" relative to the Chalcolithic group and thus closer to the Eneolithic/Bronze Age steppe cluster, suggesting that they carry steppe ancestry that was missing, or at least much less pronounced, in the region before the Bronze Age. I can use qpAdm and Global25/nMonte to double check this and also estimate more precisely their levels of Yamnaya-related admixture.

Afanasievo 0.172±0.033
Hajji_Firuz_ChL 0.313±0.156
Seh_Gabi_ChL 0.515±0.158
tail: 0.668410201 (full output)

Hajji_Firuz_ChL 0.484±0.033
Yamnaya_Samara 0.516±0.033
tail: 0.26511852 (full output)




Considering the standard errors and statical fits, qpAdm and Global25/nMonte have produced very similar results for both samples, which cannot be explained away as coincidental outcomes. I think these are signals of a population movement or movements from the Pontic-Caspian steppe into the South Caspian region, probably across the Caucasus, and most likely during the Bronze Age rather than the Chalcolithic.

I don't have a clue who these people were. It's rather unlikely that they were the early Iranians, who probably arrived in the region from Central Asia during the Late Bronze Age or even Iron Age (for instance, see here). Perhaps they were the Hittites? Indeed, in his book In Search of the Indo-Europeans, archaeologist James Mallory suggested that the ancestors of the Hittites and other Anatolian-speakers entered the Near East via the Caucasus route:

Most arguments for an Indo-European invasion from the northeast concern the appearance of a new burial rite at the end of the fourth and through the third millennium BC. At that time, both north of the Black Sea and the Caucasus, burials on the Russian-Ukrainian steppe were typically placed in an underground shaft and covered with a mound (kurgan in Russian). Before 3000 BC there begin to appear in the territory of the indigenous Transcaucasian (Kuro-Araxes) culture somewhat similar burials such as the royal tomb of Uch-Tepe on the Milska steppe. As tumulus burials are previously unknown in this region, some would explain their appearance by an intrusion of steppe pastoralists who migrated through the Caucasus and subjugated the local Early Bronze Age culture. More importantly, a status burial inserted into a mound at the site of Korucu Tepe in eastern Anatolia has been compared with somewhat similar burials both in the Caucasus and the Russian steppe. The discovery of horse bones on several sites of east Anatolia such as Norsun Tepe and Tepecik are seen to confirm a steppe intrusion since, as mentioned earlier, the horse, long known in the Ukraine and south Russia, is not attested in Anatolia prior to the Bronze Age.

Another option, however, is that they belonged to some other extinct Indo-European group, such as the Gutians (see here). In any case, keep an eye out for more Bronze Age samples from this part of the world. I have a strong feeling that, unlike their Neolithic and Chalcolithic predecessors, they will be rich in steppe ancestry and R1b-Z2103.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, April 18, 2018

Protohistoric Swat Valley peoples in qpGraph

If I was to add one thing to the Narasimhan et al. 2018 preprint, it'd be a series of uncomplicated qpGraph trees that back up, very simply and directly, the main conclusions in the manuscript. Such as this:

If some of you think that it's possible to show pretty much anything in these sorts of graphs, then you're wrong. For instance, it's not possible to swap West_Siberia_N for Sintashta, because the highest Z score usually blows out from almost nothing to well over five. And it's not possible to push Sintashta-related ancestry into Dravidian-speakers from South India. But if you think it is, then, by all means, have a go. The graph file is here.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, April 13, 2018

On the doorstep of India

One of the most remarkable discoveries in the recent Narasimhan et al. 2018 preprint has to be the presence of what are essentially Eastern European migrant populations within the Inner Asian Mountain Corridor (IAMC) during the Middle to Late Bronze Age (MLBA). Remarkable for so many reasons, but seemingly under-appreciated by a lot of people, judging by the online discussions that I've seen on the preprint, and even, I'd say, the authors themselves.

Narasimhan et al. labeled these groups as belonging to the "forest/steppe MLBA" complex (for instance, see the main figure from the preprint here). This is indeed what they are in terms of their genetic structure, but certainly not geography, because the IAMC is well south of the steppe. Thus, in my Principal Component Analysis (PCA) I'm going to label them as part of the "post-steppe herder expansion Turan" complex.

Strikingly, most of these people cluster with Bronze Age Eastern Europeans, and even some Bronze Age Central Europeans. They're also sitting very close to the more easterly present-day Slavic-speakers from Russia and Ukraine, and indeed closer to the bulk of the European cluster than some present-day Turkic and Uralic groups from the Volga-Ural region. Even I never predicted such an outcome. Sure, I was expecting to see ancient genomes from South Central Asia with some very heavy steppe influence, but not this. The relevant datasheet is available here.

Two of the MLBA IAMC individuals are from Kashkarchi in the Ferghana Valley, in what is now Uzbekistan, and basically on the doorstep of the Indian subcontinent. I've made special mention of them on the plot, and I've also highlighted a pair of individuals from the Bronze Age Central Asian sites of Gonur Tepe and Shahr-i Sokhta, who are, in all likelihood, unadmixed migrants from the Indus Valley (for more on that, see here).

It's surely not a coincidence that the ancient and present-day South Asians on the plot (including those from Pakistan's Swat Valley dated to the Iron Age) form an almost prefect cline between these two pairs of individuals. It's also surely not a coincidence that the MLBA IAMC groups are rich in Y-haplogroup R1a-M417, and in particular its R1a-Z93 subclade, which is today an especially frequent marker in Indo-European-speaking South Asians.

Forget about the pre-MLBA populations from the forests, steppe, or IAMC, like those represented by Dali_EBA; they're practically irrelevant to this story. How do I know? Because they have little to no impact on the above mentioned cline. And this can be easily verified with mixture models based on multiple Principal Components (PCs) and formal statistics (for instance, see here).

Clearly, many populations in South Asia, particularly those speaking Indo-European languages, derive the bulk of their steppe-related ancestry from the peoples of the MLBA IAMC, and/or their very close relatives. And if you do believe that this inference is just based on coincidences, then I'm sorry to say this, but obviously a new, much less mentally challenging, hobby or profession beckons. All the best with that.

Just to help put all of this in a geographic perspective, here's a topographical map of Eurasia. I've marked the location of the Ferghana Valley. The close relatives of Kashkarchi_BA most likely skirted their way around those winding high mountains and slipped into India via the Khyber Pass, which I've also marked on the map.

And the rest, as they say, is history, including the history described in the ancient Indo-Aryan Sanskrit texts known as the Vedas. I'm sure we'll soon be learning about these events in great detail when many more ancient samples from Pakistan and, hopefully, the first ancient samples from India, are published.


Narasimhan et al, The Genomic Formation of South and Central Asia, Posted March 31, 2018, doi:

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, April 11, 2018

Bronze Age Central Asia: terra incognita no longer

I've updated my Global25 datasheets with the samples from the Narasimhan et al. 2018 preprint (look for these labels). Feel free to use this output for anything you like, and please show us the results in the comments below.

Global 25 datasheet

Global 25 datasheet (scaled)

Global 25 pop averages

Global 25 pop averages (scaled)

Also, here's my Principal Component Analysis (PCA) of ancient West Eurasia featuring most of the new samples. Note the cline made up of ancient and present-day South Asians running from the likely Indus Valley diaspora individuals (from the Gonur Tepe and Shahr-i Sokhta archaeological sites, in present-day Turkmenistan and Iran, respectively) towards the Bronze Age steppe. The relevant datasheet is available here.

I have little doubt that these are indeed migrants from the Indus Valley Civilization (IVC). Their relatively unusual genetic structure - which includes ancestry from an West Eurasian ghost population that is inferred to have been exceedingly poor in Anatolian-related ancestry, as well as significant indigenous South Asian ancestry - leaves little scope for plausible alternatives. If you're wondering what they may have been doing so far north of the IVC, Frenez 2018 has a detailed discussion on the topic. From the paper:

An alternative and intriguing hypothesis is instead supported by significant archaeological and textual data from comparable socio-economic or geographical contexts, which suggest that the likely high commercial and ideological value of ivory and of the expertise required to carve it made also possible and economically profitable the presence in Central Asia of independent itinerant ivory carvers native to or trained in the Indus Valley. These itinerant artisans might have provided at the same time both the raw material and the unique skills to transform it into finished objects.


Moreover, the existence of itinerant ivory workers in ancient South Asia is also described in a few literary sources. The Guttila Jātaka mentions a group of ivory carvers who traveled from Benares to Ujjain to offer their products and skills to the local elites (Pal, 1978: 46), while a Buddhist Sanskrit Vinaya tells the story of an Indian master ivory carver who traveled “up to the land of the Yavanas”, most likely the Hellenistic Bactria, to put his superior expertise at the service of a renown local artist (Dwivedi, 1976: 19).

Citation: Frenez, D., Manufacturing and trade of Asian elephant ivory in Bronze Age Middle Asia. Evidence from Gonur Depe (Margiana, Turkmenistan), Archaeological Research in Asia (2017),

See also...

On the doorstep of India

Saturday, March 31, 2018

Andronovo pastoralists brought steppe ancestry to South Asia (Narasimhan et al. 2018 preprint)

Over at bioRxiv at this LINK. Note that the Andronovo samples that are shown to be the best fit for the steppe ancestry in South Asians are labeled Steppe_MLBA_East (ie. Middle to Late Bronze Age eastern steppe). Below is the abstract and a couple of key quotes from the paper and its supp info PDF. Emphasis is mine:

The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.


Third, between 3100-2200 BCE we observe an outlier at the BMAC site of Gonur, as well as two outliers from the eastern Iranian site of Shahr-i-Sokhta, all with an ancestry profile similar to 41 ancient individuals from northern Pakistan who lived approximately a millennium later in the isolated Swat region of the northern Indus Valley (1200-800 BCE). These individuals had between 14-42% of their ancestry related to the AASI and the rest related to early Iranian agriculturalists and West_Siberian_HG. Like contemporary and earlier samples from Iran/Turan we find no evidence of Steppe-pastoralist-related ancestry in these samples. In contrast to all other Iran/Turan samples, we find that these individuals also had negligible Anatolian agriculturalist-related admixture, suggesting that they might be migrants from a population further east along the cline of decreasing Anatolian agriculturalist ancestry. While we do not have access to any DNA directly sampled from the Indus Valley Civilization (IVC), based on (a) archaeological evidence of material culture exchange between the IVC and both BMAC to its north and Shahr-i-Sokhta to its east (27), (b) the similarity of these outlier individuals to post-IVC Swat Valley individuals described in the next section (27), (c) the presence of substantial AASI admixture in these samples suggesting that they are migrants from South Asia, and (d) the fact that these individuals fit as ancestral populations for present-day Indian groups in qpAdm modeling, we hypothesize that these outliers were recent migrants from the IVC. Without ancient DNA from individuals buried in IVC cultural contexts, we cannot rule out the possibility that the group represented by these outlier individuals, which we call Indus_Periphery, was limited to the northern fringe and not representative of the ancestry of the entire Indus Valley Civilization population. In fact, it was certainly the case that the peoples of the Indus Valley were genetically heterogeneous as we observe one of the Indus_Periphery individuals having ~42% AASI ancestry and the other two individuals having ~14-18% AASI ancestry (but always mixes of the same two proximal sources of AASI and Iranian agriculturalist-related ancestry). Nevertheless, these results show that Indus_Periphery were part of an important ancestry cline in the wider Indus region in the 3 rd millennium and early 2 nd millennium BCE. As we show in what follows, peoples related to this group had a pivotal role in the formation of subsequent populations in South Asia.


These results—leveraging our rich data from ancient samples closer in time to the Bronze Age—show that the group(s) that contributed Iranian agriculturalist-related ancestry to South Asia shared more genetic drift with the Iranian agriculturalist-related groups in our dataset that are temporally and geographically closest, compared to Caucasus HGs (CHG) or early Zagros related agriculturalists previously shown to be related to source populations for South Asians (11, 81). We are not only able to exclude these early farming and hunter-gathering groups, but also Copper and Bronze Age groups in western Iran (Seh_Gabi_C and Hajji_Firuz_C), and even in eastern Iran and Turan (Tepe_Hissar_C, Gioksiur_EN, and BMAC). Our detailed analyses in Text S3 indicate that what is driving the failure of these models is an excess of Anatolian agriculturalist-related ancestry in all of these groups, suggesting that the Iranian agriculturalist-related population that mixed into South Asia had less Anatolian agriculturalist-related ancestry than all of these. However, we find that mixtures using the Indus_Periphery sample (a pool of three outlier individuals from the BMAC site of Gonur and from Shahr-i-Sokhta), provides an excellent source population for the Iranian agriculturalist-related ancestry in South Asia when combined with any individuals in the Steppe_MLBA cluster (Srubnaya, Sintashta_MLBA, Steppe_MLBA_West or Steppe_MLBA_East).

Narasimhan et al, The Genomic Formation of South and Central Asia, Posted March 31, 2018, doi:

Update 12/04/2018: The dataset from the prerprint has been made available early at the Reich Lab website here. I've already started analyzing it. You can see the results in the new threads here and here.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Central Asia as the PIE urheimat? Forget it

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

Sunday, March 25, 2018

Central Asia as the PIE urheimat? Forget it

Right or wrong, the main contenders for the title of the Proto-Indo-European (PIE) homeland, or urheimat, are Eastern Europe, Anatolia and Transcaucasia, in that order. Central Asia, is, at best, one of the also-rans in this tussle, much like India and the Arctic Circle.

However, if you've been following the discussions on the topic in the comments at this blog over the last couple of years, you might be excused for thinking that Central Asia was in fact a natural choice for the PIE homeland, and thanks to new insights from ancient DNA, on the cusp of being proven to be the only choice.

Well, it's already been a very busy year for insights from ancient DNA, including in regards to Central Asia.

For instance, back in February a paper in Science by Gaunitz et al. revealed that the Botai people of Eneolithic Central Asia kept a breed of horse that was ancestral to the Przewalski's horse (see here). This is potentially a crucial fact in the PIE homeland debate, because the horse is the most important animal in early Indo-European religion. However, the Przewalski's horse is a significantly different clade of horse from the modern-day domestic horse. Hence, even if the Botai people were the first humans to domesticate the horse, then so what, because they didn't domesticate the right type of horse.

It remains to be seen who domesticated the right type of horse, and apparently there's a least one major ancient DNA paper on the way that will try to solve this problem. But we already know that the Middle Bronze Age Sintashta people, who lived in the southern Urals, just east of the current border between Europe and Asia, but were the descendants of Eastern European migrants to the region, did keep the right type of horse, that was also phylogenetically somewhat more basal, and thus ancestral, to most modern-day horse breeds.

Interestingly, by far the most basal horse genome within the domestic horse clade is Duk2, from an Early Bronze Age archaeological site near the city of Dunaujvaros in Hungary. But it's not certain who this horse belonged to exactly or where it really came from, because the site in question was probably a major trading post, where livestock and crops were exchanged for bronze articles. In other words, Duk2 may have been imported from somewhere nearby or afar. My bet is that it came from the Pontic-Caspian steppe. Let's wait and see.

Moreover, earlier this week the New York Times ran a feature on the work that David Reich and his colleagues at Broad MIT/Harvard are doing with ancient DNA. The article included an image of Reich standing in front of a whiteboard, and this whiteboard just happened to have on it a migration and mixture model based on ancient human DNA for Central Asia focusing on the period 2200-1500 BCE (scroll down the page here).

I've already analyzed this model in as much detail as I could in an earlier blog entry (see here). However, in the context of this blog entry, it's important to note that the model clearly shows major population movements from Europe and West Asia into Central Asia, rather than the other way around (ie. all of the really big arrows are pointing east). The paper with the final version of this model is apparently coming soon, and after it does come, we'll probably be having our last ever discussion here about Central Asia as a potential PIE homeland. I can't wait.

Update 01/04/2018: The preprint of the paper on ancient Central Asia that I mentioned above is now available at bioRxiv. See here.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Thursday, March 22, 2018

Siberian ancestry and Y-haplogroup N1c spread across Northern Europe rather late in prehistory (Lamnidis et al. 2018 preprint)

A claim often made in popular culture is that the Saami people of Fennoscandia and Northern Russia are the last indigenous Europeans. I saw some guy blurt this out on a random cooking show the other day. But it's been obvious for a while now, thanks to analyses of modern-day DNA, that the Saami, and indeed almost all other Uralic-speaking groups in Europe, have a somewhat more complex population history than the majority of non-Uralic-speaking Europeans.

Now, ancient DNA is helping to cement these findings. The quotes and figure below are from a new preprint at bioRxiv by Lamnidis et al. [LINK] focusing on the spread of Siberian ancestry across Northeastern Europe from the late stone age onwards. It's a phenomenon that had the biggest impact on the Uralic-speaking populations of Fennoscandia, and is, in all likelihood, related in a profound, albeit complex, way to the ethnogenesis and expansion of the proto-Uralic people. Emphasis is mine:

The six ancient individuals from Bolshoy show substantially higher proportions of the Siberian component, which comprises about half of their ancestry (49.4-65.3 %), whereas the older Mesolithic individuals from Motala do not share this Siberian ancestry. The Siberian ancestry seen in EHG probably corresponds to a previously reported affinity towards Ancient North Eurasians (ANE)​ [2,24]​ , which also comprises part of the ancestry of Nganasans. Interestingly, results from uniparentally-inherited markers (mtDNA and Y chromosome) as well as certain phenotypic SNPs also show Siberian signals in Bolshoy: mtDNA haplogroups Z1, C4 and D4, common in modern Siberia​ 18,25,26​ , in individuals BOO002, BOO004 and BOO006, respectively (confirming previous findings​ [18​] ), as well as Y-chromosomal haplotype N1c1a1a (N-L392) in individuals BOO002 and BOO004. Haplogroup N1c, to which this haplotype belongs, is the major Y chromosomal lineage in modern North-East Europe and European Russia, especially in Uralic speakers, for example comprising as much as 54% of Eastern Finnish male lineages today​ [27​]. Notably, this is the earliest known occurrence of Y-haplogroup N1c in Fennoscandia.


We formally tested for admixture in north-eastern Europe by calculating ​ f3(​Test;Siberian source, European source) using Uralic-speaking populations - Estonians, Saami, Finnish, Mordovians and Hungarians - and Russians as ​ Test populations. Significantly negative ​ f ​ 3 values correspond to the ​ Test population being admixed between populations related to the two source populations​ [34]​. Additionally, the magnitude of the statistic is directly related to the ancestry composition of the tested source populations and how closely those ancestries are related to the actual source populations. We used multiple European and Siberian sources, to capture differences in ancestral composition among proxy populations. As proxies for the Siberian source we used Bolshoy, Mansi and Nganasan, and for the European source modern Icelandic, Norwegian, Lithuanian and French. Our results show that all of the test populations are indeed admixed, with the most negative values arising when Nganasan are used as the Siberian source (Supplementary Table 3).


Consistent with f3​-statistics above, all the ancient individuals and modern Finns, Saami, Mordovians and Russians show excess allele sharing with Nganasan when used as Test populations. Of all Uralic speakers in Europe, Hungarians are the only population that shows no evidence of excess allele sharing with Nganasan, consistent with their distinct population history as evidenced​ by​ historical​ sources​ (see​ ref​ 35 and​ references​ therein).


While the Siberian genetic component described here was previously described in modern-day populations from the region​ [1,3,9,10​], we gain further insights into its temporal depth. Our data suggest that this fourth genetic component found in modern-day north-eastern Europeans arrived in the area around 4,000 years ago at the latest, as illustrated by ALDER dating using the ancient genome-wide data from Bolshoy Oleni Ostrov. The upper bound for the introduction of this component is harder to estimate. The component is absent in the Karelian hunter-gatherers (EHG)​ [3] dated to 8,300-7,200 yBP as well as Mesolithic and Neolithic populations from the Baltics from 8,300 yBP and 7,100-5,000 yBP respectively [8]​. While this suggests an upper bound of 5,000 yBP for the arrival of Siberian ancestry, we cannot exclude the possibility of its presence even earlier, yet restricted to more northern regions, as suggested by its absence in populations in the Baltic during the Bronze Age.


The large Siberian component in the Bolshoy individuals from the Kola Peninsula provides the earliest direct genetic evidence for an eastern migration into this region. Such contact is well documented in archaeology, with the introduction of asbestos-mixed Lovozero ceramics during the second millenium BC [47], and the spread of even-based arrowheads in Lapland from 1,900 BCE​ [48,49]​. Additionally, the nearest counterparts of Vardøy ceramics, appearing in the area around 1,600-1,300 BCE, can be found on the Taymyr peninsula, much further to the east​ [48,49​]. Finally, the Imiyakhtakhskaya culture from Yakutia spread to the Kola Peninsula during the same period​ [18,50​]. Contacts between Siberia and Europe are also recognised in linguistics. The fact that the Siberian genetic component is consistently shared among Uralic-speaking populations, with the exceptions of Hungarians and the non-Uralic speaking Russians, would make it tempting to equate this component with the spread of Uralic languages in the area. However, such a model may be overly simplistic. First, the presence of the Siberian component on the Kola Peninsula at ca. 4000 yBP predates most linguistic estimates of the spread of Uralic languages to the area​ [51]​. Second, as shown in our analyses, the admixture patterns found in historic and modern Uralic speakers are complex and in fact inconsistent with a single admixture event. Therefore, even if the Siberian genetic component partly spread alongside Uralic languages, it likely presented only an addition to populations carrying this component from earlier.

This generally looks like a very solid preprint, so I don't expect any major changes between now and formal publication. I have to be honest though, the qpAdm analysis looks like crap. Also, the authors are using the Russian sample set from the Human Origins dataset, which comes from the Kargopol district in Northern Russia. This was actually an Uralic-speaking region until not long ago. No wonder then, that they're inferring that Russians are very similar to Uralic-speaking populations.

But I know from my own analyses that there's quite a bit of genetic substructure within European Russia. For instance, Russians from southwest of Moscow are much less Uralic-like than the Kargopol Russians, and indeed very difficult to distinguish from other East Slavs, and even West Slavs. Hence, it might be useful to sample and run a couple more regional ethnic Russian groups for comparison. This might help to strengthen the argument that Siberian ancestry is somehow intimately intertwined with the expansion of Uralic languages in Europe.


Lamnidis et al., Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe, bioRxiv, Posted March 22, 2018, doi:

The whiteboard

David Reich's book, Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past, is coming out next Tuesday (see here). Chapter 6 has the potentially controversial title The Collision that Formed India, and indeed I know for a fact that Bronze Age steppe pastoralists, who seem to induce panic attacks amongst a lot of people, and especially Out-of-India proponents, get a big hat tip in this chapter.

But I can't really say more than that until after the book launch. So in the meantime, let's focus on this intriguing photo of a messy whiteboard that was published in the New York Times this week along with a feature on the Reich Lab's work with ancient DNA. The version below was edited by me to highlight and fill in a few details. The original can be viewed by scrolling down here.

Clearly, this is a mixture and migration model for Central Asia and surrounds covering the crucial period 2200-1500BCE, when, according to a consensus amongst historical linguists, waves of Indo-European speakers moved into the region from the steppes. It's probably from a jam session about an upcoming ancient DNA paper. Here's my interpretation of the model:

- nodes 1, 2, 3 and 4 track the migration of Bronze Age pastoralists from the Pontic-Caspian steppe deep into Central Asia, while nodes B and C follow the expansion of Neolithic farmers from east of Anatolia (probably from somewhere in present-day Iran) into Central and South Asia (nodes 1 and B aren't actually visible in the original pic, but must be there, and more or less where I marked them)

- node 2 probably represents the formation of late Corded Ware Culture (CWC) populations across Northern Europe around 2900 BCE, via the mixture of Yamnaya or Yamnaya-related steppe pastoralists (node 1) with European farmers, who were themselves a mixture of Anatolian farmers and Western European Hunter-Gatherers (WHG)

- Sintashta and Andronovo_NW at node 3 derive directly from the mixture event at node 2, so either they're offshoots of late CWC or a closely related population

- intriguingly, and perhaps crucially, nodes 2 and 3 only take one pulse of admixture from node 1 (red X), while the branch leading to Andronovo_SE at node 4 takes two such pulses, with one apparently later than 1900 BCE, possibly suggesting that Andronovo_SE was more Yamnaya-like compared to late CWC, Sintashta and Andronovo_NW

- moreover, the branch leading to Andronovo_SE absorbs significant admixture from Western Siberian Hunter-Gatherers (West_Siberian_HG) and possibly a Central Asian ghost population, no doubt resulting in a further reduction of Anatolian farmer and WHG ancestry ratios in Andronovo_SE compared to Sintashta and Andronovo_NW

- thus, Andronovo_SE, unlike Sintashta, might fit the bill statistically as enough Yamnaya-like to be the Yamnaya-related steppe pastoralists who "crashed" into India during the Bronze Age (see here), although, admittedly, this isn't actually shown on the whiteboard

- on the other hand, if, perhaps, the model includes a migration edge from node 1 to B, then this would suggest that Yamnaya-related ancestry arrived in South Asia with a very different population than Andronovo_SE, and possibly much earlier than 1500 BCE, but we don't know because David Reich is (strategically?) blocking that part of the whiteboard.

Also worth noting is that there's actually nothing about India in the model. The most proximate region that gets a mention is "Turan/Northern South Asia". So should we be concerned that the supposedly imminent publication of ancient DNA from Rakhigarhi and other Indian prehistoric sites has been pushed back indefinitely, perhaps for political reasons? Normally I'd say no, but in recent weeks I've been hearing rumors that this is indeed the case.

Update 01/04/2018: The preprint of the paper on ancient Central Asia that I mentioned above is now available at bioRxiv. See here.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Tuesday, March 20, 2018

The Iberomaurusians

I can honestly say that I've suddenly become a more open minded individual after running the five Iberomaurusian samples from M. van de Loosdrecht et al. 2018 in my Global25 Principal Component Analysis (PCA).

They're certainly a curious bunch. In many pairs of the 25 PCs, they sit alone, in parts of the plots that I never expected to see populated. Interestingly though, modern-day North Africans often "pull" towards them, suggesting moderate to strong genetic continuity in North Africa since the Pleistocene. The PAST datasheet used to produce the plots below is available here.

To analyze this in more detail, I ran a series of nMonte mixture models for seven North African populations using Global25 scaled data. The models show the Iberomaurusians as one of the two best reference options for all of these North African groups except the Egyptians, which, at the very least, is an outcome that fits nicely with geography.

[1] distance%=2.5772 / distance=0.025772


Levant_BA 30.9
Iberomaurusian 24.1
Iberia_EN 17.9
Iberia_BA 14.45
Yoruba 11.85
Ethiopia_4500BP 0.8
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Levant_N 0
Natufian 0


[1] distance%=2.7927 / distance=0.027927


Levant_BA 73
Iberia_BA 7.7
Ethiopia_4500BP 7.55
Yoruba 5.3
Iberomaurusian 4.45
Iberia_EN 2
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Levant_N 0
Natufian 0


[1] distance%=1.6931 / distance=0.016931


Levant_BA 56.8
Iberomaurusian 11.75
Iberia_BA 10.05
Yoruba 8.55
Natufian 6.55
Ethiopia_4500BP 3.4
Levant_N 2.9
Iberia_ChL 0
Iberia_EN 0
Iberia_MN 0
Iberia_Southwest_CA 0


[1] distance%=1.7158 / distance=0.017158


Levant_BA 35.3
Iberomaurusian 25.85
Yoruba 14.6
Iberia_EN 13.35
Iberia_BA 10.9
Ethiopia_4500BP 0
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Levant_N 0
Natufian 0

[1] distance%=2.4367 / distance=0.024367


Iberomaurusian 29.6
Levant_BA 25.9
Iberia_EN 21.7
Iberia_BA 11.55
Yoruba 11.25
Ethiopia_4500BP 0
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Levant_N 0
Natufian 0


[1] distance%=2.3656 / distance=0.023656


Iberomaurusian 36.5
Levant_BA 17.15
Levant_N 13.7
Iberia_EN 12.85
Iberia_BA 9.95
Yoruba 9.55
Ethiopia_4500BP 0.3
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Natufian 0


[1] distance%=2.0838 / distance=0.020838


Levant_BA 41.85
Iberomaurusian 20.85
Iberia_BA 13.9
Iberia_EN 11.45
Yoruba 9.4
Ethiopia_4500BP 2.55
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Levant_N 0
Natufian 0

Using the same methods, I also basically reproduced the ancestry proportions from the main mixture model for the Iberomaurusians in M. van de Loosdrecht et al. (~60/40% Natufian-like/Sub-Saharan African-related). But clearly, the very poor statistical fits suggest that, much like for the model in the paper, something is way off.

[1] distance%=25.4991 / distance=0.254991


Natufian 55.85
Tanzania_Luxmanda_3000BP 21.5
Ethiopia_4500BP 21
Tianyuan 1.65
ElMiron 0
GoyetQ116-1 0
Levant_N 0
Malawi_Hora_Holocene 0
South_Africa_2000BP 0
Ust_Ishim 0
Vestonice16 0


[1] distance%=24.6253 / distance=0.246253


Natufian 65.45
Dinka 22.9
Yoruba 9.45
Tianyuan 2.2
ElMiron 0
Ethiopia_4500BP 0
GoyetQ116-1 0
Levant_N 0
Malawi_Hora_Holocene 0
South_Africa_2000BP 0
Tanzania_Luxmanda_3000BP 0
Ust_Ishim 0
Vestonice16 0

The updated Global25 datasheets are available at the links below. Here's a challenge for the people in the comments: try to come up with a coherent, chronologically sound, mixture model for the Iberomaurusians that shows a distance of less than 15%. I don't think that this is doable just yet, and won't be until we have at least a few more ancient forager samples from Africa and the Near East, but let's see what happens anyway.

Global 25 datasheet

Global 25 datasheet (scaled)

Global 25 pop averages

Global 25 pop averages (scaled)


M. van de Loosdrecht et al., Pleistocene North African genomes link Near Eastern and sub-Saharan African human populations, Science 10.1126/science.aar8380 (2018)

See also...

Unleash the power: Global 25 test drive thread

Sunday, March 18, 2018

Max Planck scientists: on a mission against geography

I was just reading the new Marieke van de Loosdrecht et al. 2018 paper [LINK] about the Pleistocene North African hunter-gatherers, and really enjoying it, until I saw this strange map. Please note that I edited the image for the purpose of review and to highlight an error (red pointer and arrow).

This is either a stupid oversight, or the authors of the paper, mainly from the Max Planck Institute for the Science of Human History, and also the scientists who peer reviewed it, don't know where the steppe is located in Eastern Europe. It's certainly not located anywhere near Karelia, Northern Russia, as the map suggests.

Now, you might say that I'm being nit picky. Well I'm not, because I can see an alarming trend emerging. Here's a quote from Aida Andrades Valtueña et al. 2017 [LINK], another paper authored mainly by scientists from the Max Planck Institute for the Science of Human History.

The Baltic Late Neolithic Y. pestis genomes (Gyvakarai1 and KunilaII) were reconstructed from individuals associated with the Corded Ware complex. Along with the Croatian Y. pestis genome (Vucedol complex) these are derived from a common ancestor shared with the Yamnaya-derived RK1001 and Afanasievo-derived RISE509. This supports the notion of the pathogen spreading in the context of the large-scale expansion of steppe peoples from Central Eurasia to Eastern and Central Europe.

Thus, what the authors are claiming is that the Pontic-Caspian steppe, which is where the Yamnaya culture was located, is in Central Eurasia rather than West Eurasia.

Obviously, Eurasia is a landmass made of up two continents: Europe and Asia. Try putting your finger in the middle of a map of Europe and Asia and see whether it lands anywhere near the Pontic-Caspian steppe. It won't, unless you've got the shakes or something, because Central Eurasia is more or less located around the Altai Mountains, between the Kazakh and Mongolian-Manchurian steppes, several thousand miles east of the Pontic-Caspian steppe.

Just another oversight, you might say? I doubt it, because here's a very similar case from Alissa Mittnik et al. 2018 [LINK], yet another paper authored mainly by scientists from the Max Planck Institute for the Science of Human History.

Studies of ancient genomes have shown that those associated with the CWC were closely related to the pastoralists of the Yamnaya Culture from the Pontic-Caspian steppe, introducing a genetic component that was not present in Europe previously [2, 3].

Nope, sorry, that doesn't make any sense whatsoever. Why? Because the Pontic-Caspian steppe is west of the Ural Mountains, therefore it's in Europe. You see, according to current geographic conventions, Eurasia west of the Urals and north of the Caucasus is Europe. Right or wrong, as things stand, that's just how it is. And if you happen to be a Max Planck scientist and adamant that I'm wrong, then Google it. I dare you to.

If anyone's still confused, then here's a simple guide, in point form, with a very basic, hopefully easy to grasp map:

- the Eurasian steppe is not a continent nor a country, but a geographical and topographical feature, and, indeed, it's called the Eurasian steppe because it's located on two continents known separately as Europe and Asia, and together as Eurasia

- the western part of the Eurasian steppe is called the Pontic-Caspian steppe, and it's firmly located in Eastern Europe

- the central part of the Eurasian steppe is called the Kazakh steppe, and it's located in Western and Central Asia, while the eastern part of the Eurasian steppe is called the Mongolian-Manchurian steppe, and it's located in East Central Asia

- the Yamnaya culture or horizon was entirely located within the Pontic-Caspian steppe, and therefore in Europe, and more precisely, in Eastern Europe.

See also...

Matters of geography

The Iberomaurusians

Tuesday, March 13, 2018

First real foray into Migration Period Europe: the Gepid, Roman, Ostrogoth and others...

This is going to be our first meaningful look at the all important Migration Period, thanks to the recently published Veeramah et al. 2018 paper and accompanying dataset (see here). The Migration Period is generally regarded to have been the time when present-day Europe first began to take shape, in a rather sudden and violent way, with, you guessed it, a lot of migrations taking place.

Here's where most of the ancients from Veeramah et al. 2018 cluster in my Principal Component Analysis (PCA) of ancient West Eurasian genetic variation. Those East Germanics (the Gepid and Ostrogoth) are certainly very eastern, and indeed more exotic than I would've ever expected them to be. But I do love surprises like this. The relevant datasheet is available here.

Obviously, as per the paper, the ACD in about half of the labels stands for Artificial Cranial Deformation. I've also updated my Global25 datasheets with many of the same ancients. You can use these datasheets to plot them on 2D or 3D "genetic maps", and model their ancestry proportions. Feel free to share your findings in the comments below.

Global 25 datasheet

Global 25 datasheet (scaled)

Global 25 pop averages

Global 25 pop averages (scaled)

Here are a few of my own models for some of the more interesting of these individuals, using nMonte3 and based mainly on Iron Age (IA) reference samples. I used the same data file for all of the models; it includes scaled coordinates and is available for download here.

[1] distance%=3.7819



[1] distance%=3.6339




[1] distance%=2.5535




[1] distance%=2.9444



The Gepid and Ostrogoth show significant Scythian- and Armenian-related ancestry proportions, respectively. Should that be taken literally? Or do we have to wait for, say, Avar and Hunnic genomes to expect more realistic models?

Update 15/03/2018: This is where many of the Medieval German samples cluster in my PCA of modern-day Northern European genetic variation (see here). Obviously, I could only run the individuals with wholly or overwhelmingly North European genomes, and most of these turned out to be the males without any signs of ACD. They look very West Germanic. The relevant datasheet is available here.

See also...

Modeling genetic ancestry with Davidski: step by step

Monday, March 12, 2018

Exotic female migrants in Early Medieval Bavaria (Veeramah et al. 2018)

PNAS has a new open access paper on the genomics of Early Medieval Bavarians, with a special focus on women with artificial skull deformation [LINK]. The data also include two very interesting Medieval samples from Crimea and Serbia, associated with the East Germanic Ostrogoths and Gepids, respectively. Both show significant Asian admixture. I'll try to get my hands on the dataset ASAP. Here's the abstract and a couple of quotes from the paper. Emphasis is mine:

Modern European genetic structure demonstrates strong correlations with geography, while genetic analysis of prehistoric humans has indicated at least two major waves of immigration from outside the continent during periods of cultural change. However, population-level genome data that could shed light on the demographic processes occurring during the intervening periods have been absent. Therefore, we generated genomic data from 41 individuals dating mostly to the late 5th/early 6th century AD from present-day Bavaria in southern Germany, including 11 whole genomes (mean depth 5.56×). In addition we developed a capture array to sequence neutral regions spanning a total of 5 Mb and 486 functional polymorphic sites to high depth (mean 72×) in all individuals. Our data indicate that while men generally had ancestry that closely resembles modern northern and central Europeans, women exhibit a very high genetic heterogeneity; this includes signals of genetic ancestry ranging from western Europe to East Asia. Particularly striking are women with artificial skull deformations; the analysis of their collective genetic ancestry suggests an origin in southeastern Europe. In addition, functional variants indicate that they also differed in visible characteristics. This example of female-biased migration indicates that complex demographic processes during the Early Medieval period may have contributed in an unexpected way to shape the modern European genetic landscape. Examination of the panel of functional loci also revealed that many alleles associated with recent positive selection were already at modern-like frequencies in European populations ∼1,500 years ago.


A much more diverse ancestry was observed among the females with elongated skulls, as demonstrated by a significantly greater group-based FIS (SI Appendix, Fig. S35). All these females had varying amounts of genetic ancestry found today predominantly in southern European countries [as seen by the varying amounts of ancestry inferred by model-based clustering that is representative of a sample from modern Tuscany, Italy (TSI), Fig. 3], and while the majority of samples were found to be closest to modern southeastern Europeans (Bulgaria and Romania, Fig. 4C), at least one individual, AED_1108, appeared to possess ∼20% East Asian ancestry (Fig. 3), which was also evident from the high number of haplotypes within the 5-Mb neutralome that were private to modern East Asian 1000 Genomes individuals (EAS), while also demonstrating an overall ancestry profile consistent with Central Asian populations (SI Appendix, Fig. S33). No modern European individual from the Simons Genome Diversity Panel (SGDP) (11) showed any evidence of significant East Asian ancestry except one Hungarian individual with less than 5%. A higher amount of East Asian ancestry was inferred for AED_1108 than all modern Caucasus and Middle Eastern individuals, and 28 of 33 South Asian individuals.


A diverse ancestry was also inferred for the two non-Bavarian samples with elongated heads. KER_1 from Ukraine possessed significant southern European ancestry as well as South Asian ancestry, with an overall profile that best matched modern Turkish individuals. The Gepid VIM_2 from Serbia demonstrated a similar Central Asian-like genetic profile to the Medieval Bavarian AED_1108 with an even larger East Asian component and number of private haplotypes but with less southern European/Middle Eastern ancestry (SI Appendix, Figs. S31 and S33).

Veeramah et al., Population genomic analysis of elongated skulls reveals extensive female-biased immigration in Early Medieval Bavaria, PNAS 2018; published ahead of print March 12, 2018,

See also...

First real foray into Migration Period Europe: the Gepid, Roman, Ostrogoth and others...

Saturday, March 10, 2018

Was Ukraine_Eneolithic I6561 a Proto-Indo-European?

It's certainly a valid question, simply because the remains of this individual (sampled by Mathieson et al. 2018, see here) are from a cemetery of the Sredny Stog culture, which, based on historical linguistics and archaeological data, has already been posited to have been a Proto-Indo-European (PIE) culture, that gave rise to the supposedly Late Proto-Indo-European (LPIE) Yamnaya culture, that swept into Central Europe from the Pontic-Caspian steppe during the 3rd millennium BC. Moreover, consider the following points:

- whatever you might say about calling Y-Chromosome haplogroups "Proto-Indo-European", the fact is that Ukraine_Eneolithic I6561 is the oldest recorded individual belonging to Y-haplogroup R1a-M417, which is not a marker that can be reasonably linked to human expansions dating to the Paleolithic or even Neolithic, and yet today it peaks in frequency in modern-day Indo-European-speaking East and North European Europeans and South Asians, and is also recorded as the main Y-haplogroup amongst the ancient Scythians, who also were, in all likelihood, Indo-European-speakers, which strongly suggests that it was initially spread far and wide across Eurasia by the early Indo-Europeans

- following on from the last point, R1a-M417 can be divided into three main subclades: R1a-L664, R1a-Z293 and R1a-Z282, the first of which is almost exclusively confined to Northwestern Europe, while the latter two peak in frequency in South Central Asia and Eastern Europe, respectively, and the really interesting and important thing is that R1a-Z93 and R1a-Z282 are more closely related to each other than either is to R1a-L664, which mirrors the relatively close linguistic relationship between Balto-Slavs, who are rich in R1a-Z282, with Indo-Aryans, who are rich in R1a-Z93, (for instance, see here) and renders any arguments in this case based on isolation-by-distance practically useless

- Ukraine_Eneolithic I6561 is the oldest sample with UDG-treated genome-wide data to carry the 13910*T lactase persistence allele, which reaches its maximum frequency in Northwestern Europe, and is also relatively common amongst Indo-European-speaking South Asians, but not Middle Easterners (see here), suggesting that it spread from the Eastern European steppes both into Northwestern Europe and South Asia along with such ancient steppe markers as R1b-M269 and R1a-M417, and Indo-European speech

- based on historical linguistics data, the Proto-Indo-Europeans are generally regarded to have been foragers turned pastoralists, rather than farmers, but nevertheless, pastoralists familiar with farming, and indeed Ukraine_Eneolithic I6561 appears to be mostly a mixture of Eastern European and Caucasus Hunter-Gatherers (EHG and CHG, respectively), but with around 30% input from early European farmers.

Of course, we'll need many more ancient samples from Ukraine and surrounds to cement these findings, and prove, beyond any reasonable doubt, that the Sredny Stog people were indeed the Proto-Indo-Europeans, and that the Yamnaya people were the Late Proto-Indo-Europeans. It might also be necessary to develop new scientific methods that take into account multidisciplinary data to achieve this.

On a related note, the University of Leiden is currently seeking four historical linguists and one bioarchaeologist to take part in a new project titled The Linguistic Roots of Europe's Agricultural Transition. The principal investigator on the project is Guus Kroonen, whom I mentioned in a couple recent blog posts (see here, here and here). This is the project objective:

Today, Europe’s linguistic landscape is shaped almost entirely by a single language family: Indo-European. Even by the dawn of history, a patchwork of Indo-European subgroups, Germanic, Celtic, Italic, Baltic, Slavic and Greek, was covering the continent, and over the centuries, these subgroups evolved into the modern European languages, among which Russian, Italian, German, Lithuanian and Swedish, as well as the global lingua francas French, Spanish, and English.

The Indo-Europeanization of Europe was probably one of the most profound linguistic shifts ever to have taken place in the prehistory of Europe. The origin of the European languages, unsurprisingly, is therefore a matter of intense academic debate. There are currently only two prehistoric events that in the present academic debate are considered as likely driving factors behind the spread of Indo-European speech.

One the one hand, there are those historical linguists who by meticulous comparison of the different Indo-European languages have reconstructed a language and culture that is typical of the early Bronze Age. Terminology for horse-riding and wagon technology provides a possible link to the expansion of the Yamnaya culture on the Pontic-Caspian steppes, which was fueled by the invention of the wheel and the domestication of the horse. Others have suggested that the Indo-European languages diffused from Anatolia together with another major prehistoric event, the spread of agriculture to Europe between the 8th and 5th millennium.

The debate has remained unresolved for over two decades, but a new approach produces potentially decisive results. By studying prehistoric loanwords absorbed by the speakers of Indo-European when they entered Europe, and test the resulting cultural implications against the available archaeological record, new light can be shed on the language of Europe’s first farmers, and whether or not they spoke a form of Indo-European.

If you have the necessary passion and qualifications to apply for these positions, then please do so ASAP via these links:

PhD Candidate or Postdoctoral Researcher in the field of linguistics

Postdoctoral Researcher in the field of archaeology (specialization: bioarchaeology)

Friday, March 9, 2018

Ancient genomes from Southeast Asia (McColl et al. 2018 preprint)

Over at bioRxiv at this LINK. I'm still reading and trying to figure out what the 25 ancient genomes from this preprint say about the peopling of Eurasia and, in particular, South Asian population structure, including the so called Ancestral South Indian (ASI) genetic component. Any ideas? Below are the abstract and Figure 4 from the preprint.

Two distinct population models have been put forward to explain present-day human diversity in Southeast Asia. The first model proposes long-term continuity (Regional Continuity model) while the other suggests two waves of dispersal (Two Layer model). Here, we use whole-genome capture in combination with shotgun sequencing to generate 25 ancient human genome sequences from mainland and island Southeast Asia, and directly test the two competing hypotheses. We find that early genomes from Hoabinhian hunter-gatherer contexts in Laos and Malaysia have genetic affinities with the Onge hunter-gatherers from the Andaman Islands, while Southeast Asian Neolithic farmers have a distinct East Asian genomic ancestry related to present-day Austroasiatic-speaking populations. We also identify two further migratory events, consistent with the expansion of speakers of Austronesian languages into Island Southeast Asia ca. 4 kya, and the expansion by East Asians into northern Vietnam ca. 2 kya. These findings support the Two Layer model for the early peopling of Southeast Asia and highlight the complexities of dispersal patterns from East Asia.

McColl et al., Ancient Genomics Reveals Four Prehistoric Migration Waves into Southeast Asia, bioRxiv, Posted March 8, 2018, doi:

Update 10/03/2018: Harvard and friends strike back with their own preprint on the same topic (LINK). Here's the abstract:

Southeast Asia is home to rich human genetic and linguistic diversity, but the details of past population movements in the region are not well known. Here, we report genome-wide ancient DNA data from thirteen Southeast Asian individuals spanning from the Neolithic period through the Iron Age (4100-1700 years ago). Early agriculturalists from Man Bac in Vietnam possessed a mixture of East Asian (southern Chinese farmer) and deeply diverged eastern Eurasian (hunter-gatherer) ancestry characteristic of Austroasiatic speakers, with similar ancestry as far south as Indonesia providing evidence for an expansive initial spread of Austroasiatic languages. In a striking parallel with Europe, later sites from across the region show closer connections to present-day majority groups, reflecting a second major influx of migrants by the time of the Bronze Age.

Lipson et al., Ancient genomes document multiple waves of migration in Southeast Asian prehistory, bioRxiv, Posted March 10, 2018, doi:

Thursday, March 8, 2018

Beakers vs modern-day Northern Europeans

Here are most of the Beakers from Olalde et al. 2018 in my Principal Component Analysis (PCA) of modern-day Northern European genetic variation. They look rather Celtic or perhaps Celto-Germanic, don't they? The relevant datasheet is available here.

If you're wondering why the Yamnaya and early Baltic Corded Ware individuals are sitting in the middle of the plot, I'd say it's because they don't share enough genetic drift with any specific sub-set of modern-day Northern Europeans to cluster with them. This might be also why the Ukraine Neolithic samples are so dispersed around the middle of the plot. In other words, they're possibly too old to feature in this PCA, unlike the Beakers and Bronze Age descendants of the Baltic Corded Ware people, who are clustering fairly deliberately with their likely closest modern-day relatives.

See also...

Genetic and linguistic structure across space and time in Northern Europe

Tuesday, March 6, 2018

Main candidates for the precursors of the proto-Greeks in the ancient DNA record to date

Thanks to the recent release of the Mathieson et al. 2018 dataset (see here), I've been able to spot a very interesting northwest to southeast genetic cline running from the oldest Peloponnese Neolithic (Peloponnese_N) individuals to the Bronze Age Anatolians (Anatolia_BA). Here it is, highlighted in my Principal Component Analysis (PCA) of ancient West Eurasian variation. The relevant datasheet is available here.

I don't think it's a stretch to assume that this cline represents, more or less, the genetic diversity that existed in the Aegean region during the early Helladic period, just prior to the incursions of Bronze Age steppe or steppe-derived peoples who, according to the current academic consensus, probably gave rise to the proto-Greeks and Mycenaeans (see here).

There are three main reasons for this: 1) the Peloponnese_N samples show a very deliberate "pull" towards Anatolia_BA, suggesting that the Peloponnese population experienced admixture from a source similar to Anatolia_BA prior to the Bronze Age, 2) the cline cuts right through the middle of an "Old European" cluster made up of Minoans, who lived on Crete and other Aegean islands on the eve of the aforementioned steppe-derived incursions, and 3) both the Mycenaeans and Minoans can be modeled in large part as Anatolia_BA and Peloponnese_N.

The identification of this genetic cline, and what it likely stands for, is important, because it should allow us to plausibly point to the source of foreign input that created the Mycenaeans, and thus the Proto-Greeks. And clearly, the trajectory of the Mycenaean "pull" away from this cline is towards most of the samples marked as "Eneolithic and Bronze Age steppe".

However, this doesn't mean that it's necessary, or even sensible, to look for the precursors of the Proto-Greeks amongst these samples. That's because there might be much more proximate options based on, say, geography, archeology, chronology and mixture modeling. Indeed, using various criteria, I've chosen three individuals who sit along the Mycenaean to Eneolithic/Bronze Age steppe cline in the above PCA and might plausibly represent the precursors of the Proto-Greeks, or close relatives thereof. The first two are from Mathieson et al. 2018 and the third from Olalde et al. 2018.

- if, as most academics posit, the people who were to become the Proto-Greeks came from the Early Bronze Age (EBA) Yamnaya horizon on the Pontic-Caspian steppe, then it's possible that they were similar in terms of genome-wide genetic structure to the only Bulgarian Yamnaya sampled to date: Yamnaya_Bulgaria Bul4

- on the other hand, if, as has also been postulated in academic literature, they derived from the Middle Bronze Age (MBE) chariot warrior groups of the post-Yamnaya Pontic-Caspian steppe, then they may have been similar to Balkans_BA I2163, who is also from Bulgaria, but dated to more than a thousand years later than Bul4, and clusters strongly with the said chariot warriors, such as the Sintashta people, and even belongs to the same Y-haplogroup: R1a-Z93

- but if they came from the Yamnaya horizon via the Carpathian Basin, which, I'm told in the comments here, is also a serious option, although admittedly I've missed it in my reading, then they may have been similar to Proto-Nagyrév individual Hungary_BA I7043, who belongs to Western European-specific Y-haplogroup R1b-L51, a marker fairly common amongst modern-day Greeks.

And here's a mixture model for the Mycenaeans, using the Global25/nMonte method (see here and here), and the above trio as potential reference samples, alongside Anatolia_BA and Peloponnese_N.

[1] distance%=1.9802



Thus, it seems that the precursors of the Proto-Greeks came from Bulgarian Yamnaya. However, they, or the Mycenaeans, may also have had minor ancestry from the chariot warriors of the MBA Pontic-Caspian steppe. Yes, I'm probably reading far too much into these results, but I can't help it, because they appear so logical. Indeed, check this out:

[1] distance%=4.209

Mycenaean:I9033 (elite burial)


If this is just an artifact of the method, then it's a really nice one. But who are your main candidates for the precursors of the Proto-Greeks in the ancient DNA record to date? Feel free to let me know in the comments.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Sunday, March 4, 2018

On the origin of steppe ancestry in Beaker people (work in progress)

One of the major themes in the recent Bell Beaker Behemoth (ie. Olalde et al. 2018) is the presence of Yamnaya- or steppe-related ancestry in most of the Beaker individuals. Up to a whopping 75% in one guy from what is now Hungary. However, as far as I can see, the authors don't go into any specifics about the origin of this admixture. This is about as close as they come. Emphasis is mine:

However, migration had a key role in the further dissemination of the Beaker complex. We document this phenomenon most clearly in Britain, where the spread of the Beaker complex introduced high levels of steppe-related ancestry and was associated with the replacement of approximately 90% of Britain’s gene pool within a few hundred years, continuing the east-to-west expansion that had brought steppe-related ancestry into central and northern Europe over the previous centuries.

During the third millennium bc, two new archaeological pottery styles expanded across Europe and replaced many of the more localized styles that had preceded them [1]. The expansion of the ‘Corded Ware complex’ in north-central and northeastern Europe was associated with people who derived most of their ancestry from populations related to Early Bronze Age Yamnaya pastoralists from the Eurasian steppe [2–4] (henceforth referred to as ‘steppe’).

To be honest, I'm not quite sure what they're saying there. Is it that the steppe ancestry in the Beakers comes from Corded Ware people, one way or another, or that it derives from a later, closely related but separate, population wave from the steppe? Or are they leaving the question wide open for now?

If they are leaving it open, then I'm not surprised. That's because the only way to solve this mystery is to genotype at least a few hundred Eneolithic and Bronze Age skeletons from the Pontic-Caspian steppe in order to pinpoint the shared steppe homeland, or separate steppe homelands, of the Corded Ware and Beaker peoples. No doubt this will happen eventually, but it might take a few years for us to see the results. In the meantime, we can mess around with the data already available to see what it might reveal in regards to this topic.

Of course, I'm well aware that the Y-haplogroup most closely associated with the Corded Ware expansion is R1a, and in particular its R1a-M417 subclade, and that Beaker males with steppe ancestry almost exclusively belong to Y-haplogroup R1b, especially its R1b-P312 subclade. But this means very little for now, because considering the patchy sampling of ancient remains from Eneolithic/Bronze Age Europe, it's still possible that, for instance, these Beakers descend from an as yet unsampled subset of the Corded Ware population rich in R1b.

So for now, as we wait for more ancient data, the pertinent question is: are there any genome-wide genetic signals specific to Corded Ware people that are missing in the Beaker people, and vice versa?

One possible way to catch something like this might be to focus on differences in hunter-gatherer (HG) ancestry. That's because European hunter-gatherers are known to have had low effective populations and, as a result, a lot population-specific genetic drift. I can try to test this idea using the Global25/nMonte method (see here and here) and the following plausible, at least according to me, reference groups and individuals.

Barcin_N (Neolithic farmers from western Anatolia)
Blatterhole_HG (HG-like Middle Neolithic sample from Germany)
Koros_HG (HG-like Early Neolithic sample from Hungary)
Narva_Lithuania (late HGs from the southern Baltic)
Ukraine_Mesolithic (HGs from the North Pontic steppe)
Yamnaya_Samara (Bronze Age herders from the eastern end of the Pontic-Caspian steppe)

First up, the Corded Ware Culture (CWC) people, grouped into five sub-populations, based on geography and chronology:

[1] distance%=2.7491



[1] distance%=2.815



[1] distance%=1.9983



[1] distance%=2.9738



[1] distance%=3.2783



I'm pretty happy with these results. They make a lot of sense considering everything that we've seen about these samples to date. For instance, CWC_Baltic_early looks like it might have arrived in the Baltic region straight from the North Pontic steppe, which agrees with scientific literature and my earlier analyses (for instance, see here). Note also the exceptionally high Baltic HG signal in CWC_Baltic, which is missing in CWC_Baltic_early, no doubt caused by increasing gene flow from the indigenous Baltic population into the Corded Ware people. Now the Beakers:

[1] distance%=3.0892



[1] distance%=2.3366



[1] distance%=3.0011



Again, these clearly are very solid outcomes. But what do they tell us about the relationship between these Beakers and the Corded Ware people? To be honest, I'm not sure. The Narva_Lithuania signal is missing, which might be important, but then again, it's also missing in CWC_Czech. And now onto the Hungarian Beakers, grouped into three categories:

[1] distance%=1.9191



[1] distance%=4.9659



[1] distance%=2.4992



Check out the imposing level of Narva_Lithuania ancestry in Beaker_Hungary. Admittedly, I wasn't expecting this. Is there a chance that it's real? I honestly don't know, but we've certainly seen similar signals from Northeastern Europe in later Bronze Age samples from Hungary. On the other hand, Beaker_Hungary_outlier is the guy estimated by Olalde et al. to be as much as 75% steppe-derived. Here he gets a very similar figure of 76% of Yamnaya-like ancestry. Very nice! Finally, here are the Southern European Beakers:

[1] distance%=3.818



[1] distance%=5.4342



[1] distance%=2.992



[1] distance%=4.8488



[1] distance%=4.7903



[1] distance%=3.7945



It might be worth noting the lack of Narva_Lithuania and almost complete lack of Ukraine_Mesolithic ancestry proportions in these models. If this is not an artifact of the method, and please note that it very well might be, then it perhaps suggests that the steppe ancestors of the Beakers were basically like Samara Yamnaya, and that the northern and eastern Beakers picked up their Narva_Lithuania and/or Ukraine_Mesolithic-related ancestry by mixing with the descendants of the Corded Ware people.

Or not? At the very least, am I on the right track? How can I improve this analysis? Feel free to let me know in the comments.

Also, I should mention that I had to add a sample from Chalcolithic Anatolia (Anatolia_ChL) to the model for Beaker_Sicily_no_steppe to obtain more plausible ancestry proportions and a better statistical fit. It's intriguing that this type of ancestry is present in this southern Beaker, and missing in all the rest, but we've discussed this issue at length already in an earlier thread (see here).

On a related note, Danish linguist Guus Kroonen has a new article with his interpretations of the main findings by Olalde et al., freely available at his page at the link below.

Comments to Olalde et al. 2018 on the Bell Beaker phenomenon

It's interesting, I think, that he sees two distinct, and indeed "potentially competing", Indo-European migrations from the steppe, represented by the R1a-rich Corded Ware people and the R1b-P312-rich Beakers.

The identification of two different Y-chromosomal haplogroups deriving from the Steppe/Caucasus area is relevant for the prehistoric formation of the European linguistic landscape. What it implies is that Europe may have been confronted with originally separated networks of different, potentially competing, steppe-derived groups. It is through these cultural networks that Indo-European dialects may have diffused, probably existing alongside now extinct, non-Indo-European languages (cf. Iversen & Kroonen 2017).

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...