search this blog

Wednesday, October 31, 2012

First official attempt to divide R1a1 into multiple subclades since the discovery of R-M458

Unfortunately, this paper has already become outdated since being submitted for peer review at the AJPA, largely thanks to work by R1a hobbyists (see here). For instance, the authors claim that the overlap zone between R-Z280 and R-Z93 is Inner and Central Asia. In fact, these two subclades overlap in Europe, which is where most of the basal R-Z93 lineages have been located to date. Hopefully a major paper on R1a is on the way that will clear this up at "scientific" level, because it's a strong hint that R-Z93 might have expanded deep into Asia from Europe.

Since the discovery of R1a1-M458, this is the first scientific attempt to divide haplogroup R1a1-M198 into multiple SNP-based sub-haplogroups. We have genotyped 217 R1a1-M198 samples from seven different population groups at M458, as well as the Z280 and Z93 SNPs recently identified from the “1000 Genomes Project”.

The two additional binary markers present an effective tool because now more than 98% of the samples analyzed assign to one of the three sub-haplogroups. R1a1-M458 and R1a1-Z280 were typical for the Hungarian population groups, whereas R1a1-Z93 was typical for Malaysian Indians and the Hungarian Roma. Inner and Central Asia is an overlap zone for the R1a1-Z280 and R1a1-Z93 lineages. This pattern implies that an early differentiation zone of R1a1-M198 conceivably occurred somewhere within the Eurasian Steppes or the Middle East and Caucasus region as they lie between South Asia and Eastern Europe. The detection of the Z93 paternal genetic imprint in the Hungarian Roma gene pool is consistent with South Asian ancestry and amends the view that H1a-M82 is their only discernible paternal lineage of Indian heritage.


Previous publications have pointed out that regions of highest haplogroup frequencies do not always indicate the territory of origin (Cinnioglu et al., 2004) and high STR diversity may not be exclusively an indicator of in-situ diversification but could also be the consequence of repeated gene flow from different sources (Zerjal et al., 2002; Sharma et al., 2009).

Pamjav, H., Fehér, T., Németh, E. and Pádár, Z. (2012), Brief communication: New Y-chromosome binary markers improve phylogenetic resolution within haplogroup R1a1. Am. J. Phys. Anthropol.. doi: 10.1002/ajpa.22167

Monday, September 3, 2012

Next-generation sequence data suggests a "rapid" and "extreme" expansion of R1b across Europe during the Neolithic

Note the words used in these abstracts, referring to the spread of R1b as "rapid" and "extreme". This is important, because the fact that this was an explosive event probably explains why R1b hasn't yet been found in any ancient DNA samples from Europe until the late Neolithic. In other words, to find it in Europe before or even during its early expansion, we need to test very specific remains, belonging to cultures that facilitated this expansion.

Y-chromosomal insights from large-scale resequencing

Tyler-Smith C, Wei W, Ayub Q, Chen Y, Jostins L, McCarthy S, Hou Y, Carbone I, Durbin R, Xue Y

Next-generation sequencing technology now makes it possible to resequence whole genomes or targeted regions on a population scale, providing extensive sequence data from the Y chromosome. Coverage of the Y chromosome is lower than that of autosomes, and repeated sequences complicate mapping of reads to their correct location, but about 10 Mb of unique Y sequence is accessible to current technologies. We have explored the insights that can be obtained from two such datasets. Complete Genomics have released high coverage sequences of 35 diverse males (, which we supplemented by sequencing an additional male belonging to haplogroup A. From these sequences, we identified about 6.6 thousand Y variants, which showed high validation rates. These variants were used to construct a maximum parsimony phylogenetic tree that recapitulated the known phylogeny and distinguished all individuals. Using a measured SNP mutation rate of 1x10-9 per bp per year, the ages of nodes of interest could be estimated. The TMRCA of the entire tree was ~115 KYA (thousand years ago), and of the lineages outside Africa ~60 KYA, both as expected. Additional insights included a rapid expansion of hg F ~40 KYA, and of R1b in Europe ~5-10 KYA. The archaeological counterpart of the former is unclear, but the latter is likely to represent a Neolithic expansion of this lineage. The second dataset consisted of low-coverage (~2x) sequence of 525 diverse males from the 1000 Genomes Project ( About 18.7 thousand Y-SNPs were called, >98% of which validated, but the callset missed ~17% of SNPs because of the low coverage. A maximum likelihood tree was constructed that again recapitulated and refined the known phylogeny and distinguished all individuals. The expansions noted above were also seen, although estimating times was more complex because of the missing variants. These explorations of large-scale Y resequencing illustrate the power and limitations of current technologies and also the need for the community to develop efficient ways to use such large datasets, including a nomenclature compatible with complete lineage resolution.

A calibrated human Y-chromosomal phylogeny based on resequencing

Wei W, Ayub Q, Chen Y, McCarthy S, Hou Y, Carbone I, Xue Y, Tyler-Smith C

We have analysed a dataset of 36 complete Y-chromosomal sequences, 35 released by Complete Genomics ( and an additional sequence from a haplogroup A3b individual, in order to explore how effectively complete sequence data from the Y chromosome can be used to construct and calibrate a phylogeny. We identified unique-sequence regions of the chromosome where we expected variant identification from next-generation sequence data to be reliable, and developed additional filtering steps for the data. Validation rates of the resulting filtered genotype calls were >99%. In total, we identified 5,865 SNPs, 741 indels and 56 MNPs. 4,861 of the variants are new and 262 of them are recurrent even in this small sample. We constructed parsimony-based phylogenetic trees using PHYLIP incorporating all or different subsets of the variants, and estimated times for the entire tree and different clades of interest using GENETREE or the rho measure. The tree structure was consistent with literature data. The GENETREE TMRCA for the complete set of chromosomes examined was 105-125 KYA; times for the out-of-Africa movement were 62-79 KYA, a Paleolithic expansion 37-48 KYA, and the expansion of R1b in Europe 7-10 KYA; rho times were broadly similar. Our study identifies vast numbers of new variants, and explores the methodological steps necessary to obtain reliable biological insights from current next-generation sequence data. It also poses challenges such as how to develop a nomenclature system that can accommodate such extensive sequence information, or how to identify the archaeological counterparts of the male expansions detected.

Insight into human Y chromosome variation from low-coverage whole-genome resequencing data

Xue Y, Chen Y, McCarthy S, Ayub Q, Jostins L, Durbin R, Tyler-Smith C

Phase 1 of the 1000 Genomes Project has generated low-coverage whole-genome sequence data from 1,094 individuals from worldwide populations, including 528 males. SNP calls on the Y chromosome were made using SAMtools. In low coverage data, there are errors and uncertainty in the genotype calls. We developed a filtering strategy to reduce these, including restricting the analysis to 8.9 Mb of Y unique regions. We called a total of 18,692 Y-SNPs, 16,679 with the ancestral allele known. The false negative rate and false positive variant site identification rates were measured at 14% and 1.72% respectively by comparison with Complete Genomics calls on an overlapping subset of samples. The genotype accuracy was 97.4% compared with HapMap3 chip genotypes and 96.6% compared with Complete Genomics sequences. Using known literature variants, we assigned each sample to a haplogroup and these samples covered most of the major lineages except F, K, L, and M. A phylogenetic tree was constructed based on all the sites with known ancestral states using the RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analysis. The tree was consistent with the established structure. It confirmed Hg E (Bantu), O (China) and R1b (Europe) expansions associated with the Neolithic transitions in different parts of the world, and revealed that the expansion in Europe was the most extreme. One novel finding was a striking expansion of lineages F to R ~20 thousand years after the out-of-Africa movement, suggesting a previously unknown event of importance to male demography at this time.


DNA in Forensics 2012, Final Program & Abstracts

Friday, August 10, 2012

Reconstructing the origins of Eurasian populations using dental markers

There are some awesome PCA maps in this preprint about dental traits among Europeans and Asians. Below you can see the map based on the first PC, which mainly shows West vs. East Eurasian influence, and closely correlates with results obtained with high density, genome-wide genetic markers. The other maps are more cryptic in what they show (and unfortunately so is the text in this paper which attempts to explain them).

On the base of advantages in gene geography and anthropophenetics the phenogeographical method for anthropological research is initiated and experienced using dental data. Statistical and cartographical analyses are provided for 498 living Eurasian populations. Mapping principal components supplied evidence for the phene pool structure in Eurasian populations and for reconstructions of our species history on the continent. The longitudinal variability seems to be the most important regularity revealed by principal components analysis (PCA) and mapping proving the division of the whole area into western and eastern main provinces. So, the most ancient scenario in the history of Eurasian populations was developing from two perspective di erent groups: western group related to ancient populations of West Asia and the eastern one rooted by ancestry in South and/or East Asia. In spite of the enormous territory and the revealed divergence the populations of the continent have undergone wide scale and intensive time-space interaction. Many details in the revealed landscapes could be backgrounded to different historical events. The most amazing results are obtained for proving migrations and assimilation as two essential phenomena in Eurasian history: the wide spread of the western combination through the whole continent till the Pacific coastline and the envision of the movement of the paradox combinations of eastern and western markers from South or Central Asia to the east and to the west. Taking into account that no additional eastern combinations in the total variation in Asian groups have been found but mixed or western markers’ sets and that eastern dental characteristics are traced in Asia since Homo erectus, the assumption is made in favour of the hetero-level assimilation in the Eastern province and of net-like evolution of our species.

V. F. Kashibadze et al., Reconstructions in human history by mapping dental markers in living Eurasian populations, Submitted on 17 Jul 2011, arXiv:1107.3319v1 [q-bio.PE]

Additional citation...

David Reich et al., Reconstructing Native American population history, Nature, Year published: (2012), DOI: doi:10.1038/nature11258

Thursday, July 19, 2012

Ancient mtDNA from Western Siberia (aka. Kurgan and Scythian country)

Here's a new paper that describes the genetic shifts that took place on the Baraba Steppe of the West Siberian Plain from the Neolithic to the Iron Age. It's part of an e-book with the latest stable isotope and ancient DNA data from across Eurasia, available free of charge here.

The authors argue that ancient mtDNA and cranial results show at least four different populations making their mark on the Baraba Steppe. These apparently include the aboriginal Western Siberians (carrying mtDNA haplogroups A,C,D and Z), Mesolithic Northeast Europeans from just across the Urals (U2e, U4 and U5), Bronze Age Andronovo nomads from what is now Central Kazakhstan (T, U5 and C), and late Bronze Age/early Iron Age Barabans from Chicha, possibly originally from West Central Asia (showing a wide variety of West Eurasian mtDNA haplogroups).

The analysis of mtDNA samples from the Chicha-1 population revealed some interesting patterns. Crucial changes in the composition of mtDNA haplogroups in the gene pool were observed as compared to the earlier Baraba groups studied (Fig. 3). Dominance of Western Eurasian haplogroups and the near absence of East Eurasian were observed. Additionally, several new West Eurasian haplogroups appeared in the region, including Haplogroups U1a, U3, U5b, K, H, J and W.

The phylogeographic analysis suggests that the distribution and diversification centres of several of these mtDNA haplogroups and specific lineages are located on the west and south west of the Baraba forest steppe region, on the territory corresponding to modern-day Kazakhstan and Western Central Asia (Fig. 10). Apparently, the migration wave from the south strongly influenced the gene pool of the Baraba population in the transitional period from the Bronze to the Early Iron Age.


Subsequently, in the Scythian-Sarmatian period, a large cultural group, called the Sargat culture, developed in the region. Its representatives were widespread across the region, from the Ob River to the Urals. Their development represented one of the most significant cultural events in North Asia.

Unlike the authors, however, I don't see any evidence in the paper that points to a southern origin of the Chicha group. In other words, I don't think there's any reason to believe that this population migrated to the Baraba Steppe from West Central Asia across the deserts near the Aral Sea.

In my opinion a more plausible explanation is that this was another wave of settlers from the western steppe of present-day Southern Russia and Ukraine. I suspect they basically followed in the footsteps of the earlier Andronovo groups. Such a scenario would match archaeological evidence, and also various ancient DNA results from Neolithic sites in Ukraine, which have shown most of the mtDNA haplogroups found in the Chicha individuals, like H, U1 and U3 (see, for instance my previous blog entry covering another article from the same e-book).

Indeed, it's interesting that haplogroup T is singled out in this study as a potential maternal marker of the Andronovo nomads from the Baraba Steppe. That's because this haplogroup has already been found among multiple Neolithic remains from Ukraine, and is fairly common today among populations from between the Baltic and Black seas.

The genetic influence of migrants can be detected by the appearance of a new mtDNA haplogroup that was absent in the populations preceding the migration wave. This new mtDNA haplogroup, a West Eurasian T haplogroup, was detected in the Late Krotovo population. The T haplogroup appears simultaneously (with a 15 % frequency) in the Krotovo and Andronovo groups, but was completely absent in all preceding Baraba populations. We therefore consider the appearance of the Haplogroup T-lineage as the most likely genetic marker of the Andronovo migration wave to the region.

This assumption is confirmed by mtDNA studies of Andronovo groups from other West Siberian areas. Haplogroup T lineages were found, with a frequency of 25 %, in the samples (n=16) taken from two Andronovo groups from the Krasnoyarsk and upper Ob River areas.


Molodin et al., Human migrations in the southern region of the West Siberian Plain during the Bronze Age: Archaeologcal, palaeogeneic and anthropoloical data, Population Dynamics in Prehistory and Early History (2012), Publication Date: July 2012, ISBN: 978-3-11-026630-6, DOI: 10.1515/9783110266306.93

Ed. by Kaiser, Elke / Burger, Joachim / Schier, Wolfram, Population Dynamics in Prehistory and Early History (2012), Publication Date: July 2012, ISBN: 978-3-11-026630-6, DOI: 10.1515/9783110266306.93

See also...

Ancient mtDNA from the Dnieper-Donets cultural complex

Surprising aDNA results from Neolithic and Bronze Age Ukraine

Ancient Siberians carrying R1a1 had light eyes - take 2

Wednesday, July 18, 2012

Another batch of ancient mtDNA from the Dnieper Basin

This set of results is from a multidisciplinary study on the Mesolithic to Neolithic transition in Ukraine (see here). The mtDNA haplogroups include two C, two T, one U3 and one probable U1.

The paper is part of an open access e-book which features many other articles on prehistoric Europe and Asia: Population Dynamics in Prehistory and Early History (2012).

Despite the small sample and lack of Y-DNA data, I'd say that this is a fairly useful effort. That's because it again shows the presence of South Siberian-specific maternal lineages on the North Pontic steppe during the Neolithic, and gives weight to the scenario that there was a movement of people from the east of the Urals to Europe at a very early timeframe (for more on that, see here and here).

East Eurasian lineages were represented by the C clade (Ya34 and Ya45), which is uncommon in ancient or present-day European populations, but is found in Neolithic populations, as well as contemporary populations from South Siberia, where this lineage is most likely originated (Starikovskaya et al., 2005; Mooder et al., 2006).

Of interest in this context is the fact that the analysis of Neolithic cemeteries of the Baikal region has suggested that a depopulation event occurred in that region during the 6th millennium BP (Mooder et al., 2006). As such, the dating of Yasinovatka (at ca. 6440–6080 [Hedges et al., 1995]) suggests that there is a possible link between the Baikal depopulation event and the appearance of the C lineage of mtDNA in the North Pontic region.


Lillie, Malcolm C et al., Prehistoric populations of Ukraine: Migration at the later Mesolithic to Neolithic transition, Population Dynamics in Prehistory and Early History (2012), Publication Date: July 2012, ISBN: 978-3-11-026630-6, DOI: 10.1515/9783110266306.93

Ed. by Kaiser, Elke / Burger, Joachim / Schier, Wolfram, Population Dynamics in Prehistory and Early History (2012), Publication Date: July 2012, ISBN: 978-3-11-026630-6, DOI: 10.1515/9783110266306.93

See also...

Ancient mtDNA from Western Siberia (aka. Kurgan and Scythian country)

Friday, June 29, 2012

Ancient DNA from Iberian Mesolithic hunter-gatherers

A paper in Current Biology reports on the partial genome sequences of two 7,000-year-old Mesolithic skeletons from a cave in northwestern Spain. It shows that these hunter-gatherer samples fall outside the range of contemporary European genetic variation, but are much more similar to present-day Northern Europeans than Iberians.

They also seem to be closely related to prehistoric hunter-gatherers from as far away as the Baltic region, because like them they belong to mtDNA haplogroup U. That's basically the angle that Science Now has taken in covering the story:

Although the first farmers spread quickly across Europe, trading and exchanging culture across thousands of kilometres, many researchers had assumed that Mesolithic nomadic hunter-gatherers lived in small, isolated bands with little contact over long distances. But the genetic picture, Lalueza-Fox says, suggests "highly mobile" groups that kept in touch and interbred continent-wide.

These are interesting outcomes, because modern-day Northern Europeans, all the way from the Atlantic to the Volga, commonly share a very robust "ancestral" cluster when analyzed with the ADMIXTURE program. This cluster usually peaks in Lithuanians and other Baltic groups, and is difficult to break down (see here). Also, it correlates very well with clusters that peak in Swedish hunter-gatherers analyzed recently by Skoglund et al. (see here). As a result, I have no doubt now that this modern ADMIXTURE cluster is largely of Mesolithic hunter-gatherer origin, and its widespread range in Europe today is at least partly due to the fact that hunter-gatherers from across Europe were very similar genetically.

Unfortunately, the Iberian hunter-gatherers weren't compared to modern-day Lithuanians. Instead, the authors used the samples from the 1000 Genomes Project as references. However, it seems they oversampled the Finns when running their intra-European PCA. This showed clearly that these Finns were different from other Europeans, largely due to fairly recent founder effect and genetic drift, but provided very little information about the hunter-gatherers (marked "Brana" below).

The global PCA was more informative, because it wasn't skewed by the Finns, who were still there, but didn't have enough influence to dominate things at global level. Remarkably, this analysis showed that the prehistoric Iberians were shifted towards both East Asia and Sub-Saharan Africa relative to modern-day Europeans.

However, I suspect that if many more Mesolithic samples were present on that plot, things would look somewhat different. It’s difficult to say how different though. We’ll have to wait and see when more ancient samples come in.

The genetic background of the European Mesolithic and the extent of population replacement during the Neolithic [1,2,3,4,5,6,7,8,9,10] is poorly understood, both due to the scarcity of human remains from that period [11,12,13,14,15,16,17,18] and the inherent methodological difficulties of ancient DNA research. However, advances in sequencing technologies are both increasing data yields and providing supporting evidence for data authenticity, such as nucleotide misincorporation patterns [19,20,21,22]. We use these methods to characterize both the mitochondrial DNA genome and generate shotgun genomic data from two exceptionally well-preserved 7,000-year-old Mesolithic individuals from La Braña-Arintero site in León (Northwestern Spain) [23]. The mitochondria of both individuals are assigned to U5b2c1, a haplotype common among the small number of other previously studied Mesolithic individuals from Northern and Central Europe. This suggests a remarkable genetic uniformity and little phylogeographic structure over a large geographic area of the pre-Neolithic populations. Using Approximate Bayesian Computation, a model of genetic continuity from Mesolithic to Neolithic populations is poorly supported. Furthermore, analyses of 1.34% and 0.53% of their nuclear genomes, containing about 50,000 and 20,000 ancestry informative SNPs, respectively, show that these two Mesolithic individuals are not related to current populations from either the Iberian Peninsula or Southern Europe.

Sánchez-Quinto et al., Genomic Affinities of Two 7,000-Year-Old Iberian Hunter-Gatherers, Current Biology, 28 June 2012 doi: 10.1016/j.cub.2012.06.005

Monday, June 25, 2012

Long IBD gives clues to migrations across Europe from the Iron Age to the present (aka. SMBE 2012 abstracts)

The Society for Molecular Biology and Evolution (SMBE) is holding its annual conference this week, and has released a PDF of abstracts of the presentations at the meeting. Most of these presentations are yet to be published as articles in journals, but after a bit of Googling, I think I located one of them online. Luckily, it just happens to be the one I’m most interested in…

Long IBD in Europeans and recent population history

Peter Ralph, Graham Coop
UC Davis, Davis, CA, USA

Numbers of common ancestors shared at various points in time across populations can tell us about recent demography, migration, and population movements. These rates of shared ancestry over tens of generations can be inferred from genomic data, thereby dramatically increasing our ability to infer population history much more recent than was previously possible with population genetic techniques. We have analyzed patterns of IBD in a dataset of thousands of Europeans from across the continent, which provide a window into recent European geographic structure and migration.

Unfortunately, the link doesn’t include much data, but has lots of impressive graphics. I’ve put together a small selection of these, focusing on…surprise, surprise…Poland. Basically, the larger the circle, the more Identity-by-descent (IBD) shared:

I think t’s very clear from the results that the Polish sample shares a lot of fairly recent IBD with many groups from across Europe, and especially those from north and east of the Alps. Most of these segments were certainly spread by various Indo-European groups, including the Slavs.

The authors have attempted to estimate the ages of the admixtures, and divided the results into three periods. The outcomes for Poland appear very accurate based on what we know from history and archaeology, although keep in mind that East Slavic individuals are missing from this part of the analysis. I’ve also included the graphics for Italy (IT) and Iberia (Iber), for comparison. The results for these two Southern European regions look much more conservative, and I suspect that’s due to their larger effective population sizes, plus the Alps and Pyrenees acting as strong barriers to gene flow from the north.

At the 0-540 ya period, Poles don’t share much with anyone except with each other and Germans. This makes sense, considering, for instance, the heavy migration of Poles from regions under Prussian occupation to the German industrial areas of the Ruhr and Saxony. These people were quickly Germanised and absorbed by the locals. Today, only their Polish sounding names and diluted genes remain.

I think the 555-1500 ya graphic very clearly shows the effects of the Slavic expansion, probably at least partly from the territory of modern Poland. I suspect the same expansion can also be seen on the 1515-2353 ya graphic. But here we can also likely see the effects of several other major population movements, including migrations of the Celts and Germanics. In any case, looking at all those large “Slavic” bubbles in the Balkans, I’m reminded of this quote from Procopius.

Illyria and all of Thrace, that is, from the Ionian Gulf to the suburbs of Constantinople, including Greece and the Chersonese, were overrun by the Huns, Slavs and Antes, almost every year, from the time when Justinian took over the Roman Empire; and intolerable things they did to the inhabitants. For in each of these incursions, I should say, more than two hundred thousand Romans were slain or enslaved, so that all this country became a desert like that of Scythia.

Eventually, the Slavs stopped raiding the Balkans and settled there permanently. Many became subjects of the Roman Empire.

It’d be fascinating if an IBD analysis like this was carried out on an expanded dataset, including many more samples from Northern and Eastern Europe, as well as West and Central Asia. We know there were movements of people from Europe deep into Asia during the metal ages, and learning more about these events could help us unravel the origins of such enigmatic groups as the early Indo-Europeans.

Actually, there’s another abstract in that SMBE selection, and this one is dealing with Identical by State (IBS) tracts in Europeans. It claims there’s been” no significant gene flow between Europeans and Asians within the past few hundred generations”. That sounds like a reasonable statement, but only in the context of the 1000 Genomes samples these scientists compared, which I assume included Europeans vs. South and East Asians. So like I say, what we really need is a study of IBD or IBS, or both, that looks at a wider variety of groups from West and Central Asia, because that’s where most of the relatively recent mixing took place.

Reconstructing demographic histories from long tracts of DNA sequence identity

Kelley Harris1, Rasmus Nielsen1,2

1UC Berkeley, Berkeley, CA, USA, 2University of Copenhagen, Copenhagen, Denmark

There has been recent excitement and debate about the details of human demographic history, involving gene flow that has occurred between populations as well as the extent and timing of bottlenecks and periods of population growth. Much of the debate concerns the timing of past admixture events; for example, whether Neanderthals exchanged genetic material with the ancestors of non-Africans before before or after they left Africa. Here, we present a method for using sequence data to jointly estimate the timing and magnitude of past genetic exchanges, along with population divergence times and changes in effective population size. To achieve this, we look at the length distribution of regions that are shared identical by state (IBS) and maximize an analytic composite likelihood that we derive from the sequentially Markov coalescent (SMC). Recent gene flow between populations leaves behind long tracts of identity by descent (IBD), and these tracts give our method its power by influencing the distribution of shared IBS tracts. However, since IBS tracts are directly observable, we do not need to infer the precise locations of IBD tracts. In this way, we can accurately estimate admixture times for relatively ancient events where admixture mapping is not possible, and in simulated data we show excellent power to characterize admixture pulses that occurred 100 to several hundred generations ago. When we study the IBS tracts shared between and within the populations sequenced by the 1000 Genomes consortium, we find evidence that there was no significant gene flow between Europeans and Asians within the past few hundred generations. It also looks unlikely that the Yorubans of Nigeria interbred with Europeans or Asians in a population-specific way, though there may have been admixture between Africans and an ancestral non-African population.

See also...

Long IBD gives clues to migrations across Europe from the Iron Age to the present - take 2

Wednesday, June 20, 2012

First direct evidence of genetic continuity in West and Central Poland from the Iron Age to the present

I've just been sent a fascinating thesis on the mtDNA of Iron Age and Medieval samples from Poland. It suggests direct genetic continuity between Iron Age samples belonging to the Przeworsk and Wielbark Cultures, of what is now West and Central Poland, and present-day Poles. Here's the English summary, and a map of the sites under study:

For many years the origin of the Slavs has been the subject-matter in archaeology, anthropology, history, linguistics and recently also modern human population genetics. By now there is no unambiguous answer to a question where, when and in what way the Slavs originated. For the purposes of this dissertation, the analysis of ancient human mitochondrial DNA was applied. The ancient DNA was isolated from 72 specimens which came from Iron-Age and medieval graveyards from the area of current Poland. Ancient mtDNA was extracted from two teeth from each individual and reproducible sequence results were obtained for 20 medieval and 23 Iron-Age specimens. On the basis of HVR I mtDNA mutation motifs and coding region SNPs each specimen was assigned to a mitochondrial haplogroup. The obtained results were used together with other ancient and modern populations to analyse shared haplotypes and population genetic distances illustrated by multidimentional scaling plots (MDS). The differences on genetic level and quite high genetic distances (FST) between medieval and Iron-Age populations as well as significant number of shared informative haplotypes with Belarus, Ukraine and Bulgaria may evidence genetic discontinuity between medieval and Iron Ages. From the other side, the highest number of shared informative haplotypes between Iron-Age and extant Polish population as well as the presence of subhaplogroup N1a1a2, can confirm that some genetic lines show continuity at least from Iron Age or even Neolithic in the areas of present day Poland. The results obtained in this work are considered to be the first ancient contribution in genetic history of the Slavs.

Below is an MDS from the thesis, based on data corrected for the effects of potential relatives in the Iron Age sample. I don't think it's a particularly useful way of judging the intra-European affinity of the two ancient Polish groups, mostly because the samples are small, and contemporary North, Central and East Europeans don't differ very much in terms of mtDNA. Nevertheless, we can see that both the Iron Age (Okres Rzymski) and Medieval (Sredniowiecze) samples fall within the range of modern European mtDNA diversity. On the other hand, the German Neolithic LBK sample (Neolit LBK Niemcy) clearly does not, because it's sitting at the far right of the plot, away from the main European cluster. This dichotomy between the genetic structure of the LBK farmers and modern Europeans has been demonstrated in previous studies, but the reasons for it are still a mystery.

Interestingly, modern Poles are closer to an Iron Age sample from Denmark (Okres Zelaza Dania) than to the Polish Iron Age set. However, as per the summary above, the author also compared the frequencies of the most informative haplotypes among the modern and ancient samples, and found that extant Poles are the closest group to the Polish Iron Age remains, followed by Balts, Swedes and Baltic Finns. Below is a table showing those results.

According to the author, these matches might hint at Baltic, Germanic and Finno-Ugric influences in the Polish Iron Age population. Perhaps, but in my opinion, they're simply in line with geography, and reflect the general North European character of maternal lineages shared by populations from around the Baltic, both today and during the Iron Age.

The results for the Medieval Polish sample are more intriguing, because they're somewhat out of whack with geography. Its best matching modern groups are Belorussians, Ukrainians and Bulgarians. This might suggest that, during the early middle ages, the territory of present day Poland experienced an influx of groups from what are now Belarus and Ukraine, who then melted into the gene pool of the natives of Polish Iron Age descent. However, conversely, it might mean that Belorussians, Ukrainians and Bulgarians descend in large part from fairly specific medieval groups from the area of modern Poland.

In any case, whether present day Polish territory saw some migrations from the immediate east during the Medieval period or not, this preliminary look at ancient Polish mtDNA suggests long-standing genetic continuity in the region. What it clearly doesn't show is a complete, or almost complete, population replacement in the areas between the Oder and Bug rivers during the migration period.

Indeed, the thesis results put into doubt past notions that the Przeworsk and Wielbark cultures were of Germanic origin.

The (mtDNA) haplogroup missing from both the Iron Age and medieval samples from the territory of modern Poland was haplogroup I. In contemporary Slavic populations, this haplogroup is found at levels ranging from 1.2% in Bulgarians to 4.8% in Slovaks. It was also recorded at high levels in ancient remains from Denmark. It showed a frequency of 12.5% in an Iron Age sample, and 13.8% in a medieval sample. Melchior et al. 2008 suggest that haplogroup I might have been more common in Denmark and Northern Europe during that period. Therefore, the lack of this haplogroup in ancient DNA from the territory of modern Poland, might mean that the Przeworsk and Wielbark cultures should not be identified with Germanic populations.

I'm sure more ancient DNA studies are on the way looking at the origins of Slavs and Poles. Indeed, if the Y-chromosomes of Przeworsk and Wielbark remains are successfully tested, I won't be surprised if they look fairly typical of modern Poles, with a decent representation of R1a1a-M458, which is the most common Y-chromosome haplogroup in Poland today.

Anna Juras, Etnogeneza Słowian w świetle badań kopalnego DNA, Praca doktorska wykonana w Zakładzie Biologii Ewolucyjnej Człowieka Instytutu Antropologii UAM w Poznaniu pod kierunkiem Prof. dr hab. Janusza Piontka

Friday, June 15, 2012

Ancient mtDNA from the Dnieper-Donets cultural complex

A new paper at the Journal of Human Genetics reports on the mtDNA gene pool of the Dnieper-Donets (DD) cultural complex of Neolithic Ukraine. The authors were able to confirm the presence of the following haplogroups in the ancient remains from the Mariupol-type sites along the Dnieper: two H, two C, one C4a2, one U5a1a and one U3. So, three out of the seven samples belonged to haplogroup C, which is a Siberian-specific marker.

We've already had a sneak peek at these results thanks to a thesis abstract published last year by the Grand Valley State University (see here). The presence of mtDNA C among the DD remains suggests that the North Pontic steppe was formerly inhabited by genetically heterogeneous groups, and part of their ancestry came from Siberia.

The fact that Siberian-specific mtDNA lineages are very rare in Ukraine today, means that something must have happened there since the Bronze Age that basically wiped them out.

Alexey G Nikitin et al., Mitochondrial haplogroup C in ancient mitochondrial DNA from Ukraine extends the presence of East Eurasian genetic lineages in Neolithic Central and Eastern Europe, Journal of Human Genetics, advance online publication, 7 June 2012; doi:10.1038/jhg.2012.69

Thursday, May 3, 2012

First R1b from Neolithic Europe...and it ain't from the steppe

Y-chromosome haplogroup R1b has turned up in two late Neolithic skeletons from a Bell Beaker burial site at Kromsdorf, eastern Germany, with one of the lineages further defined as R1b1b2 (M269+). This is a breakthrough, because for years, geneticists and genetic genealogists have been wondering which archeological culture to credit for the massive expansion of this haplogroup in Europe.

For a long time it was thought R1b was a Cro-Magnon marker native to Western Europe, but that theory fell by the way side when its ancestor R1 was found to be only 18,500 years old (see here). There was then some talk about R1b being a proto-Indo-European lineage, which expanded with Yamnaya pastoralists from the Eastern European steppe. This was a notion mostly entertained by hobby genetic genealogists from Western Europe and the US, but it never really made any sense, due to the paucity of R1b in modern-day Ukraine and Southern Russia.

However, many others, including myself, always had a suspicion that the Bell Beaker folk played an important role in the spread of R1b across Western Europe. Indeed, I mentioned them last week in my blog post about ancient DNA from the Swedish Neolithic, saying they probably had an impact on the genetics of Scandinavians during the Copper Age (see here).

A couple of mtDNA sequences from the samples in this study - those belonging to haplogroups K1 and I1 - are apparently showing haplotype hits in Portugal, as reported here. That's important, because the Bell Beaker "phenomenon" is thought to have originated in present-day Portugal, and then expanded into other areas of Western Europe via maritime routes, before moving onto Central Europe. However, both K1 and I1 are native to the Near East, and most likely arrived in Iberia after the Ice Age. Indeed, the same can probably be said about R1b. What this suggests is that the elements that crystallized into the Bell Beaker Culture in Iberia during the late Neolithic came from the Near East during the Neolithic.

Below is a map showing the approximate extent of Bell Beaker usage (source: Wikipedia).

By the way, it's useful to note that back in 2008 R1a was found in ancient skeletons at a burial site in Eulau, not far from the Kromsdorf Bell Beaker site, and from roughly the same period (see here). However, these skeletons belonged to individuals from the materially and anthropologically very different Corded Ware culture. Thus, it appears as if the two main haplogroups of present-day Europeans, R1a and R1b, can be associated with two major European archeological cultures of the late Neolithic: Corded Ware in the east, and Bell Beaker in the west, respectively. Amazing stuff.

The transition from hunting and gathering to agriculture in Europe is associated with demographic changes that may have shifted the human gene pool of the region as a result of an influx of Neolithic farmers from the Near East. However, the genetic composition of populations after the earliest Neolithic, when a diverse mosaic of societies that had been fully engaged in agriculture for some time appeared in central Europe, is poorly known. At this period during the Late Neolithic (ca. 2,800–2,000 BC), regionally distinctive burial patterns associated with two different cultural groups emerge, Bell Beaker and Corded Ware, and may reflect differences in how these societies were organized. Ancient DNA analyses of human remains from the Late Neolithic Bell Beaker site of Kromsdorf, Germany showed distinct mitochondrial haplotypes for six individuals, which were classified under the haplogroups I1, K1, T1, U2, U5, and W5, and two males were identified as belonging to the Y haplogroup R1b. In contrast to other Late Neolithic societies in Europe emphasizing maintenance of biological relatedness in mortuary contexts, the diversity of maternal haplotypes evident at Kromsdorf suggests that burial practices of Bell Beaker communities operated outside of social norms based on shared maternal lineages. Furthermore, our data, along with those from previous studies, indicate that modern U5-lineages may have received little, if any, contribution from the Mesolithic or Neolithic mitochondrial gene pool.

Update 29/04/2013: There's a new website up called Haplogroup R1b. It's raising funds to study ancient DNA remains in the hope of finally solving the mystery of where European R1b came from (see here). I don't know who's behind the effort, and whether it's legitimate, but their articles are informative and well balanced. Here are some quotes from a piece about the Kromsdorf samples titled R1b and the Bell Beaker Phenomenon.

Busby et al. stated that an east to west migration could not be inferred based on STR variance of R1b1a2’s largest European subclade, R-L11 (aka S127) which makes sense in a rapid spread scenario.


Unfortunately the distribution of L11(xP312,U106) is too fragmented and samples too few to draw any conclusion about its origin. The lack of a variance cline is probably the result of a rapid expansion and is attested by similar STR modal values for the three major subclades of R-P312 (aka S116), which are U152, DF27 and L21. In fact, it is at this level that a slight difference in variance (Table 1) and GD (Table 2) can be observed. [Walsh, B. Computing Genetic Distances, (24 Nov. 2003),, (visited 27 Jun 2012)] The increased variance and decreased genetic distance from P312 makes a case for U152 being the oldest subclade of P312. As such, it is also the likeliest to have occurred near the P312 origin point (Figure 2).


DF27 = Maritime Beaker expansion out of Iberia

U152 & L21 = Reflux expansion from the Alps which would give rise to Italo-Celtic


If we take radio carbon data into account which tells us the earliest Bell Beakers occurred in Iberia, an out-of-Iberia DF27 expansion becomes even more intriguing. Finally, if we take the definition of the North South cluster as one with a Northern and Southern coastal distribution, it aligns with the expansion of Maritime Bell Beakers along the Atlantic and Mediterranean coasts. Maritime Bell Beaker cluster seems more appropriate at this time than North-South cluster. There has finally been an Iberian L165 sample found, which would mean the 10x Isles sample bias may be in play.

If we look at the variance of DF27′s siblings, we see that they are younger than DF27 and may have been involved in the later Bell Beaker reflux expansions. The ordering of variance (1. DF27 2. U152 3. L21) is also a good match for radio carbon dating that shows Bell Beaker age as oldest in Iberia/S. France/N. Italy and then progressively younger as one goes north and east.


Lee et al., Emerging genetic patterns of the European Neolithic: Perspectives from a late Neolithic Bell Beaker burial site in Germany, American Journal of Physical Anthropology, Article first published online: 3 MAY 2012, DOI: 10.1002/ajpa.22074

Karafet et al., New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree, Genome Research, Published in Advance April 2, 2008, doi: 10.1101/gr.7172008

Friday, April 27, 2012

Prehistoric Scandinavians genetically most similar to present-day Poles

Scientists from Uppsala University have managed to extract genome-wide markers from the early Neolithic remains of three hunter-gatherers and one farmer from southern Sweden. They only pulled a few thousand SNPs from each sample, but that was enough to successfully compare the ancient remains to modern Europeans. The results of their study, published in Science Magazine today, reveal that Poles top the allele sharing list with the the hunter-gatherers. Interestingly, Poles also show higher allele-sharing with the farmer than Swedes do, but not as high as Cypriots and Greeks. The figure below illustrates this clearly.

Now, if we look at the ADMIXTURE analysis from the study, it suggests that the farmer was also very similar to Sardinians, albeit with more North European admixture.

So I think it's pretty clear we're dealing here with an individual of mostly deep East Mediterranean origin, whose ancestors made their way from West Asia to Western Europe, probably via maritime routes, and settled islands like Sardinia in the process. They possibly moved into what is now Sweden via Western and Central Europe, but then again, they might have gone straight from the Mediterranean to Scandinavia by boat.

But why is it that Poles show higher similarity to these Neolithic Scandinavians than Swedes do? Firstly, it's important to realize that the differences aren't that great. Note, for instance, that Swedes are the second most similar population to the hunter-gatherers after Poles. However, clearly, the data suggests that there had to be other population movements into Scandinavia after the late Neolithic. These also likely affected Poland, but to a lesser degree.

No one yet knows what these were exactly, but if I had to guess, I'd say the Bell Beaker folk of the Copper Age represented one of the major waves (see figure below from "Europe during the third millennium BC and Bell Beaker culture phenomenon: peopling history though dental non-metric traits study" by Jocelyne Desideri). Also, another factor might be that the hunter-gatherers tested by Skoglund et al. belonged to the Pitted Ware culture, which arrived in Scandinavia from the Eastern Baltic.

Anyway, I'm absolutely delighted with the results from this study. The reason is that they correlate very closely with the experiments I've been running with ADMIXTURE, aimed at untangling the story of the peopling of Europe. Note, for instance, the close correlation between the STRUCTURE plot above, and the results from my Hunter-Gatherer vs. Farmer analysis (see here). All you have to do is add up the blue and purple components from the STRUCTURE graph, and you'll basically get my "Baltic hunter-gatherer" cluster. Also, the orange component is very similar to my "Mediterranean farmer" cluster.

If Skoglund et al. had access to more prehistoric samples, then it's likely these would create their own clusters. That's because the four Neolithic individuals they tested, especially the hunter-gatherers, seem to fall outside the range of modern European genetic variation, like on some of the PCAs below. The appearance of ancient clusters wouldn't invalidate the current results, because such clusters would no doubt show a close relationship to those created by modern samples. However, I’m pretty sure they'd give us a better idea of how much hunter-gatherer ancestry survives in modern Europeans, because they wouldn't be affected by such factors as genetic drift since the Neolithic. So that’s something to look forward to in the future.


Skoglund et al., Origins and Genetic Legacy of Neolithic Farmers and Hunter-Gatherers in Europe, Science 27 April 2012: Vol. 336 no. 6080 pp. 466-469 DOI: 10.1126/science.1216304

Saturday, April 21, 2012

So who's the most (indigenous) European of us all?

Basically, the first map below reveals the answer. It shows the spread of a European specific cluster from a global-wide ADMIXTURE analysis at K=8 (eight ancestral populations assumed), which I call "North European". Thus, genetically, the most European populations are found around the Baltic Sea, and in particular in the East Baltic region. In my genome collection, samples from Lithuania clearly and consistently score the highest percentages in ADMIXTURE clusters specific to Europe. However, I suspect that if I had Latvians with no known foreign ancestry going back more than four generations, they'd come out the "most European". Hopefully we can test that in the near future.

Below are the fifteen Eurogenes samples that scored the highest percentage levels of membership in the North European cluster. The list only includes groups with five or more individuals present in the analysis, so some populations, like Estonians or Danes, weren't included, even though they easily made the cut. The spreadsheet with all the results from this run can be seen here. A table of Fst (genetic) distances between the eight clusters is available here.

Lithuanians 77%
Finns 74%
Belorussians 70%
Swedes 69%
Norwegians 68%
Kargopol Russians 68%
Russians 68%
Poles 68%
Erzya 66%
Ukrainians 66%
Moksha 66%
Orcadians 63%
HapMap Utah Americans (CEU) 63%
Irish 63%
British 62%

So why did I pick the results from K=8, and not some other K, like 2, 10, or 25? Well, it's not possible to evaluate who is more European without a European-specific cluster (ie. modal in Europeans, with a low frequency outside of Europe). Provided that a decent number and range of global and West Eurasian samples are used in the analysis, such clusters begin appearing at around K=5 or K=6, and start breaking up into local clusters from about K=9. I found that runs below K=8 produced European clusters that spilled too generously outside of the borders of Europe. On the other hand, runs above K=8 produced European clusters that weren't representative of enough European groups (ie. too localized). But the European cluster from K=8 was pretty much perfect, and I think that's obvious from the map. In fact, I can hardly believe how well it fits the modern geographic concept of Europe - north of the Mediterranean and west of the Urals. Amazing stuff.

There are two other clusters that show up across Europe in non-trivial amounts - Mediterranean and Caucasus (see maps below). These can also be thought of as native European clusters, since they've been on the continent for thousands of years. However, their peak frequencies are found in West Asia, so they're not particularly useful signals of European-specific ancestry.

So what do these three clusters show exactly? They represent certain allele frequencies in modern populations, and in fact, these can change fairly rapidly due to admixture, selection, and genetic drift. So claiming that such clusters represent pure ancient populations is unlikely to be true in most cases, if ever. However, I don't think there's anything wrong in saying that, when robust enough, they can be thought of as signals of ancestry from relatively distinct ancestral groups.

Indeed, anyone who's read up on the prehistory of Europe, knows that there are three general Neolithic archeological waves to consider when trying to untangle the story of the peopling of Europe. These are Mediterranean Neolithic, Anatolian Neolithic and Forest Neolithic (for example, see here).

Mediterranean Neolithic refers to a series of migrations from West Asia via the Mediterranean and its coasts. The areas most profoundly affected by these movements include the islands of Sardinia and Corsica, and the Southwest European mainland. Anatolian Neolithic describes migrations into Europe from modern day Turkey, mostly into the Balkans, but also as far as Germany and France. At the moment, Forest Neolithic of Northeastern Europe is something of a mystery. However, the general opinion is that it was largely the result of native Mesolithic hunter-gatherers adopting agriculture.

Obviously, it's very difficult to dismiss the correlations between these three broad archeological groups and the European and two European/West Asian clusters produced in my K=8 ADMIXTURE analysis. Is it a coincidence that the Mediterranean cluster today peaks in Sardinia, which has been largely shielded from foreign admixture since the Neolithic, and today forms a very distinct Southern European isolate? Why does the North European cluster show the highest peaks in classic Forest Neolithic territory? And why does the Caucasus cluster radiate in Europe from the southeast, which is where Anatolian farmers had the greatest impact? These can't all be coincidences, and I'm willing to bet that none of them are. I'm convinced that the three clusters from my K=8 run are strong signals from the Neolithic, and the North European cluster also from the Mesolithic.

Eventually, these issues will be settled with ancient DNA data, in a much more comprehensive way than ever possible using modern genomes. We've already seen some preliminary results, mostly from Mesolithic, Neolithic and Bronze Age sites around Europe, so perhaps it's useful to ask whether my ADMIXTURE analysis and commentary here mirror these early findings? I think they do. For instance, here's an interesting conclusion regarding the East Baltic area from a study on ancient Scandinavian mtDNA by Malmström et al.

Through analysis of DNA extracted from ancient Scandinavian human remains, we show that people of the Pitted Ware culture were not the direct ancestors of modern Scandinavians (including the Saami people of northern Scandinavia) but are more closely related to contemporary populations of the eastern Baltic region. Our findings support hypotheses arising from archaeological analyses that propose a Neolithic or post-Neolithic population replacement in Scandinavia [7]. Furthermore, our data are consistent with the view that the eastern Baltic represents a genetic refugia for some of the European hunter-gatherer populations.

I suppose there will be people wondering why I didn't take Sub-Saharan African, East Asian, and South Asian admixtures into account in my analysis. The reason is that I wasn't looking at which group was most West Eurasian, or Caucasoid. Based on everything I've seen to date, in my own work as well as elsewhere, the most West Eurasian group would probably be the French Basques from the HGDP. However, the differences between them, and certain groups from Northeastern Europe, like Northern Poles and Lithuanians, really wouldn't be that great anyway. I might do a write up about that at some point.


- Maps by Eurogenes project member FR7

- Additional stats by Eurogenes project member DESEUK1


Helena Malmström et al., Ancient DNA Reveals Lack of Continuity between Neolithic Hunter-Gatherers and Contemporary Scandinavians, Current Biology, 24 September 2009, doi:10.1016/j.cub.2009.09.017

Noreen von Cramon-Taubadel and Ron Pinhasi, Craniometric data support a mosaic model of demic and cultural Neolithic diffusion to outlying regions of Europe, Proc. R. Soc. B published online 23 February 2011, doi: 10.1098/rspb.2010.2678

Wednesday, April 11, 2012

Prehistoric and "recent" Sub-Saharan African admixture in Europe

We recently learned that many of the typically East Eurasian mtDNA lineages present in Europe today arrived there during the Neolithic, and perhaps in some cases even the Mesolithic (see here and here). It now seems that a large part of the Sub-Saharan African mtDNA lineages found in Europe are also of Neolithic origin. However, most appear to have come "rather recently", as a result of contacts between Europe and Africa during the Roman Empire, the Trans-Atlantic slave trade, and so on.

Mitochondrial DNA (mtDNA) lineages of macro-haplogroup L (excluding the derived L3 branches M and N) represent the majority of the typical sub-Saharan mtDNA variability. In Europe, these mtDNAs account for <1% of the total but, when analyzed at the level of control region, they show no signals of having evolved within the European continent, an observation that is compatible with a recent arrival from the African continent. To further evaluate this issue, we analyzed 69 mitochondrial genomes belonging to various L sublineages from a wide range of European populations. Phylogeographic analyses showed that ∼65% of the European L lineages most likely arrived in rather recent historical times, including the Romanization period, the Arab conquest of the Iberian Peninsula and Sicily, and during the period of the Atlantic slave trade. However, the remaining 35% of L mtDNAs form European-specific subclades, revealing that there was gene flow from sub-Saharan Africa toward Europe as early as 11,000 yr ago.

Maria Cerezo et al., Reconstructing ancient mitochondrial DNA links between Africa and Europe, Published in Advance March 27, 2012, doi: 10.1101/gr.134452.111

Thursday, March 29, 2012

What's the point of peer review (in reference to Marc Haber et al. 2012)?

I was just reading a new study in PLoS One about Y-DNA in Afghanistan, and stopped when I saw this idiotic paragraph...

R1a1a-M17 diversity declines toward the Pontic-Caspian steppe where the mid-Holocene R1a1a7-M458 sublineage is dominant [46]. R1a1a7-M458 was absent in Afghanistan, suggesting that R1a1a-M17 does not support, as previously thought [47], expansions from the Pontic Steppe [3], bringing the Indo-European languages to Central Asia and India.

OK, first of all, the authors made a glaring error in claiming that R1a1a7-M458 dominates the Pontic-Caspian Steppe. It most certainly does not, and this information is available in the report they referenced (Underhill et al. 2010). Secondly, who cares about R1a1a STR diversity? It's not relevant, because it tells us nothing about the origins of R1a1a. In fact, there's no way anyone can accurately estimate the ages/expansion times of Y-chromosome haplogroups. Scientists have attempted such feats on many occasions in recent years, and often came up with ridiculous results. So perhaps it's now time to admit there's a problem and move on?

I'd say there's no reason why R1a1a7-M458 should be present in Central Asia and India. Simply by looking at its modern distribution and frequencies, without even attempting any complex calculations, it seems to have expanded around East Central Europe well after the early Indo-European dispersals (see here). The most sensible claim anyone can make about R1a1a7-M458 is that it's a Slavic, or even West Slavic marker, that probably originated in the people who would become Slavs in or near modern-day Poland.

The European-specific R1a1a SNP that scientists should be looking for in Central Asia, in order to track the movements of the early Indo-Europeans, is Z280. This marker has a much wider distribution in Central and Eastern Europe than M458, and perhaps that was also the case during the relevant time frames - the Chalcolithic and early Bronze Age? Indeed, it has already been found in native Central Asian samples, both in private and academic tests, and the latter results will hopefully be published soon.

To widen the net, they should also test for Z283, which is upstream to both M458 and Z280. Its present distribution hints that it might have been a common marker within the late Neolithic Corded Ware cultural horizon of the North European Plain, which is usually thought of as an early Indo-European culture. If that was the case, then in theory, based on archeological data, it might have traveled with representatives of the Eastern Corded Ware, the Abashevo culture, past the Urals and as far as East Central Asia.

Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, et al., (2012) Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS ONE 7(3):e34288. doi:10.1371/journal.pone.0034288

See also...

First official attempt to divide R1a1 into multiple subclades since the discovery of R-M458

Wednesday, March 7, 2012

Southwest Eurasians + Northwest Eurasians + Mesolithic survivors = modern Europeans

Update 23/13/2013: Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans


For a long time, it was generally accepted that Europeans were direct descendants of Palaeolithic settlers of the continent, with some Middle Eastern ancestry in the Mediterranean regions, courtesy of Neolithic farmers. However, in the last few years, largely thanks to ancient DNA, it dawned on most people that such a scenario was unrealistic. It now seems that Europe was populated after the Ice Age in a big way, by multiple waves of migrants from almost all directions, but especially from the southeast.

Getting to grips with the finer details of the peopling of Europe is going to be a difficult and painstaking process, and will require ancient DNA technology that probably isn’t even available at the moment. However, the mystery about the basic origins and genetic structure of Europeans was solved for me this week, after I completed a series of ADMIXTURE runs focusing on West Eurasia (see
K=10, K=11, K=12, and K=13). The map below, produced by one of my project members, surmises very nicely the most pertinent information from those runs (thanks FR7!). It shows the relative spread of three key genetic clusters, from the K=13, in a wide range of populations from Europe, North Africa, and West, Central and South Asia (i.e. the data represents the nature of West Eurasian alleles in the sampled groups, with only three clusters considered). The yellow cluster is best described as Mediterranean or Southwest Eurasian, while the cyan and magenta, which are sister clades, as Northwest Eurasian.

Thus, it appears as if modern Europeans are made up of two major Neolithic groups, which are related, but at some point became distinct enough to leave persistent signals of that split. They spread into different parts of Western Asia before moving into Europe. The Southwest Eurasians, possibly from the southern Levant, dominated the Mediterranean Basin, including North Africa, Southern Europe, and the Arabian Peninsula. I’m pretty sure that Otzi the Iceman is the best known representative of the ancient Southwest Eurasians (see here).

The Northwest Eurasians might have originated in the northern Levant, but that’s a pure guess. In fact, judging by the map above, their influence isn’t particularly strong in that part of the world today, and only becomes noticeable several hundred kilometers to the north and east, in the North Caucasus and Iran respectively. However, the northern Levant is actually dominated by a fourth West Eurasian cluster, tagged by me as "Caucasus" in the K=13 run, and not shown on the map above. Various calculations show that this can also be assigned to the Northwest Eurasian group, except that it seems to have split from the other Northwest Eurasian components at an early stage (see comments section here).

After their initial spread, it appears as if the Northwest Eurasians inhaled varying amounts of native Mesolithic groups in their newly acquired territories west, north and east of the Levant. This is being strongly suggested by the aforementioned ancient DNA results, at least as far as Europe is concerned. They also mixed heavily with Southwest Eurasians in Europe and nearby. That’s why, for instance, you’ll never find an Irishman who clusters closer genetically to an Indian than to other Europeans. However, even a basic analysis of their DNA, like my own ADMIXTURE runs, shows that a large subset of their genomes comes from the same, relatively recent, “Northwest Eurasian” source.

We can follow the same logic when talking about the differentiation between modern descendants of Southwest Eurasians. For instance, those in Iberia have significant admixture from Northwest Eurasians, while those in North Africa carry appreciable amounts of West and East African influence.

I’m convinced that the scenario of the peopling of Europe outlined above, by two basic stocks of migrants from Neolithic West Asia, is the only plausible one, because the signals from the data are too strong to argue against it. I’m sure you’ll be seeing the same story told by scientists over the next few years in peer reviewed papers. They’ll probably come up with different monikers for the Southwest and Northwest Eurasians, but the general concepts will be the same.

However, that was the easy part. The hard part is linking the myriad of movements of these Southwest and Northwest Eurasians with archaeological and linguistic groups. Perhaps the earliest Southwest Eurasians into Europe were Afro-Asiatic speakers? To be honest, I have no idea, because that’s not an area I’ve studied closely. But I would say that it’s almost certain that the proto-Indo-Europeans were of Northwest Eurasian stock. It’s an obvious conclusion, due to the trivial to non-existent amounts of Southwest Eurasian influence in regions associated with the early Indo-Europeans, like Eastern Europe and Central Asia.

Perhaps the simplest and most diplomatic thing to do for the time being, would be to associate the entire Northwest Eurasian group with an early (Neolithic) spread of Indo-European languages from somewhere on the border between West Asia and Europe? I know that would work for a lot of people, specifically those who’d like to see an Indo-European urheimat in Asia, as opposed to Europe. But it wouldn’t work for me, especially not after taking a closer look at that map above.

As already mentioned, the Northwest Eurasians can be reliably split into two clusters, marked on the map in cyan and magenta. I call the cyan cluster North Atlantic, because it peaks among the Irish and other Atlantic fringe groups, and the magenta Baltic, because it shows the highest frequencies among Lithuanians and nearby populations. The story suggested by the map is pretty awesome, with the Baltic cluster seemingly exploding from somewhere in the middle of the Northwest Eurasian range, and pushing its close relatives to the peripheries of that range. Thus, under such a dramatic model, the North Atlantic is essentially the remnant of the pre-Baltic Northwest Eurasians, and appears to have found refuge in Western and Northwestern Europe, in the valleys of the Caucasus Mountains, and in South Asia.

Indeed, there seems to be a correlation between the highest relative frequencies of the North Atlantic and regions that are still home to non-Indo-European speakers, or were known to have been home to such groups in historic times. For instance, France has the Basques, while the British Isles had the Picts, who are hypothesized to be of non-Indo-European stock. Note also the native, non-Indo European speakers in the Caucasus, like the Chechens, who show extreme relative frequencies of the North Atlantic component. Moreover, at the south-eastern end of the Northwest Eurasian range, in India, there are still many groups of Dravidian speakers.

Below are two maps that isolate the relative frequencies of the North Atlantic (cyan) and Baltic (magenta) components, versus each other and the Southwest Eurasian cluster, to better show the hole in the distribution of the North Atlantic. To be sure, this North Atlantic can be broken down further, but only with more a comprehensive sampling strategy, especially of Northern and Western Europe.

That’s my take on what the data is showing, and other explanations are possible. But I don’t really know what they might be? I should also mention that the potentially proto-Indo-European Baltic cluster shows a remarkable correlation with the spread of Y-chromosome haplogroup R1a, and ancient DNA rich in this haplogroup from supposed early Indo-Europeans. For more info on that, see the links below:

Best of 2008: Corded Ware DNA from Germany

Ancient Siberians carrying R1a1 had light eyes

Ancient Siberians carrying R1a1 had light eyes - take 2

Bronze Age Tarim Basin "Caucasoids" carried R1a1 (and European mtDNA lineages too)

European admixture among ancient East Asians (aka. two-rooted canines carried by early Indo-Europeans to China)

Tuesday, February 28, 2012

Oetzi the Iceman: more Middle Eastern than the average modern Euro

So, Oetzi the Iceman from the Copper Age Tyrolean Alps has turned out more Middle Eastern than the majority of present-day Europeans. You can see that result on the first PCA below (a), where Oetzi (black dot) is closer to the Middle Eastern samples than even most modern Italians (orange dots). Unfortunately, the article doesn't resolve why this is so. But one possibility is that almost all Europeans today, except those from the Mediterranean coastline, have more North European or North European-like ancestry than Oetzi, pushing them up and right on that PCA, away from the Middle East. In any case, this result makes it tough to argue that the ancestors of most modern Europeans (the Y-chromosome R1a and R1b crowd) arrived on the continent after Oetzi's kind (the Neolithic Y-chromosome G crowd). It appears as if they were already there, at the same time as the Iceman, and probably earlier, and then expanded down into South Europe later, leaving only more isolated areas, like Sardinia and Corsica, relatively untouched.

The image above of the figures + tables was edited by me to make it a little more informative than the original. Below is the abstract from the study, and here is the Iceman genome browser. Can anyone tell me where & how I can download this guy's SNPs, so I can make him a Eurogenes project member?

The Tyrolean Iceman, a 5,300-year-old Copper age individual, was discovered in 1991 on the Tisenjoch Pass in the Italian part of the Ötztal Alps. Here we report the complete genome sequence of the Iceman and show 100% concordance between the previously reported mitochondrial genome sequence and the consensus sequence generated from our genomic data. We present indications for recent common ancestry between the Iceman and present-day inhabitants of the Tyrrhenian Sea, that the Iceman probably had brown eyes, belonged to blood group O and was lactose intolerant. His genetic predisposition shows an increased risk for coronary heart disease and may have contributed to the development of previously reported vascular calcifications. Sequences corresponding to ~60% of the genome of Borrelia burgdorferi are indicative of the earliest human case of infection with the pathogen for Lyme borreliosis.

Keller et al., New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing, Nature Communications, Volume: 3, Article number: 698, DOI: doi:10.1038/ncomms1701

Sunday, February 26, 2012

Genetic substructures within the HapMap CEU sample (and Eurogenes' Northwest Europeans)

In this experiment I attempt to characterize more precisely the origins of some of the individuals from the HapMap CEU cohort. These samples are described by the HapMap project as Utah Americans of Western and Northern European descent. But this doesn't seem to be exactly true for at least two of them, who actually come out very Central European in all my tests. Moreover, it's obvious that some of the samples fit nicely into very specific areas of Western and Northern Europe. For instance, at this level of resolution, a few could pass as Irish, and others for Danes or even Swedes. Below is a quick and dirty ADMIXTURE analysis designed specifically for this experiment.

Key: Red = Sub-Saharan African, Yellow = Southern European, Green = North-Central European, Aqua = North Atlantic, Blue = Baltic, Pink = East Asian. See spreadsheet for details.

Based on the K=6 results it's fair to say that at least six of the CEU samples might pass for unmixed Scandinavians, most likely Danes or southern Swedes (NA12003, NA12057, NA12248, NA12249, NA12776 and NA12875). At least five could be confused for Irish or western British samples (NA10850, NA12005, NA12006, NA12386 and NA12812). The two Central European-like Utahns stick out from the CEU set due to their unusually high Baltic scores (NA11917 and NA12286). From the little I know about the CEU samples, I'd say that these two were of eastern or southeastern German origin. But they might have fairly recent ancestry from further east than that. My own MDS analysis (first image below) and a PCA plot from Lao et al. 2008 (second image, slightly edited by me to remove article text) confirm that such Scandinavian-like, German-like and Irish-like individuals do exist in the CEU set.

I think this experiment is very useful for a number of reasons. Firstly, it shows that the CEU set is not a homogeneous one, and carries clear substructures that can be picked up via fairly basic means. However, this doesn't make the CEU samples less valuable, but more so, due to the lack of public access to continental Northwestern European samples. Secondly, the test reveals some interesting information about the genetic substructures within Northwestern Europe. Here are some of my observations:

- Scandinavians often show very high levels of the North-Central European component, and moderately high levels of the North Atlantic component. Many also carry clear amounts of the Baltic component, but, as a rule, lower levels of the Southern European component.

- Germans mainly differ from the Scandinavians in that they carry the Southern European component at appreciable amounts. They show variable amounts of the Baltic component, with those from eastern Germany carrying the highest levels.

- Irish project members, especially those from western Ireland, show very high levels of the North Atlantic component, but low levels of the Southern European component.

- Western British samples, like those from Cornwall or western Scotland, are generally very similar to the Irish, mainly in that they carry the North Atlantic component at high levels. However, they often show somewhat higher levels of the Southern European component.

I'm eventually going to test these classifications of the CEU samples with ChromoPainter, which is by far the most accurate tool for such things at the moment. Unfortunately, it's also a lot of hard work and computationally intensive, so it might take a few weeks. I do have the allele frequencies from the above ADMIXTURE run, and it is possible to make a stand alone test from them. However, I'm not certain that's a good idea at present, due to the small number of samples involved. It might be worth doing when the right samples swell in number, so I can run a more robust analysis. In particular, I need more people from Ireland, Scotland and Scandinavia.


Oscar Lao et al, Correlation between Genetic and Geographic Structure in Europe, Current Biology, Volume 18, Issue 16, 1241-1248, 26 August 2008, doi:10.1016/j.cub.2008.07.049

Sunday, January 22, 2012

Eurogenes' North Euro clusters - phase 2, final results

This is a continuation of my ChromoPainter analysis of Europeans from north of the Pyrenees, Alps and Balkans (see here). To obtain the most accurate results possible on my laptop, I increased the burn-ins and iterations in fineSTRUCTURE to 500K each (5 hour run in all, which is all I'm willing to put this machine through). The end product looks very similar to my initial analysis, in which I explored the data at 200K burn-ins and iterations. What I think this shows is that the results are robust, and I doubt they'd change much even after a couple of days of running fineSTRUCTURE.

Indeed, as mentioned in my previous blog entry, this appears to be the most detailed and accurate cluster analysis of this part of Europe produced anywhere to date. There are 21 clusters in all, with at least 20 looking like strong signals of genetic substructures across North, West, Central and East Europe (see spreadsheet for individual classifications). They include:

pop0 - West Finnish1: This is a pair of reference individuals, most likely from Western Finland, judging by their PCA and ADMIXTURE results. They are either from the same community, or have a very similar mix of very specific ancestries.

pop1 - Erzya + Moksha: This includes all of the Erzya and Moksha in the project, plus a Russian with recent Erzya ancestry. It's closely related to ethnic Russian clusters that stretch from Northwest Russia to near the Volga, and also to the Estonian cluster.

pop2 - South/Central Finnish: This is the largest Finnish cluster, and that's probably more than just the result of sampling bias. I would say that the greater part of the Finnish population would belong to this type of cluster, which occupies regions of highest population density within the country.

pop3 - Fenno-Scandian: This cluster includes a Northern Swede, a Swede with probable recent Finnish ancestry, and Finns with probable recent Swedish influence. I have a feeling that Finland Swedes and Aland Islanders would also be placed here more often than not.

pop4 - Northwest Russian/Southeast Finnish: Although this cluster includes only two individuals, it's definitely much more than just the result of two relatively closely related samples being in the same run. I'd hazard a guess that Northwest Russians with, say, significant Ingrian ancestry, would land here, and so would Finns with recent Russian ancestry.

pop5 - West Finnish2: Based on PCA and ADMIXTURE results, most of these Finns likely come from Western Finland, probably from places like Southern Ostrobothnia. They possibly also have some Swedish influence.

pop6 - West German: This cluster is based on individuals from Western and Northwestern Germany. It also includes a Dutchman, Austrian and people of mixed origin, like a Dane with French and German ancestry, and Americans with British, German, Scandinavian and/or Polish ancestry. In other words, this is where Northwestern Europe meets Central Europe.

pop7 - Vologda Russian: Most of the Vologda Russians from the HGDP land here, so this appears to be a local cluster. Judging from its phylogeny, it looks like a mix of North Slavic, Baltic and Finnic influences.

pop8 - East Finnish: All the project and reference Finns with substantial ancestry from new settlement areas of Eastern Finland appear in this cluster. No wonder then, that this is the cluster with the highest chunk count in this analysis.

pop9 - Estonian: This is a mixed cluster, including individuals from Estonia, and, as far as I know, Russians with substantial ancestry from near Estonia. As mentioned above, it's closely related to the Erzya + Moksha, Northwest Russian and Vologda clusters. However, it's clearly much more western than any of these clusters (for instance, see the PCA below), which suggests Germanic influence in its makeup.

pop10 - Cornish: Almost all of my Cornish samples from the 1000 Genomes Project feature in this very local cluster, which shows the highest chunk count among the Western European samples. The overall results suggest a lack of outbreeding in recent times.

pop11 - French/Belgian: Interestingly, this cluster includes the bulk of the French samples, a French Canadian, and two Belgians. On the other hand, the most northerly French are placed in the more cosmopolitan Northwest European cluster (see below).

pop12 - Lithuanian: All of the more or less pure Lithuanians fall in this cluster. Those that don't are a reference sample from Behar et al. 2009, who always appears very Belorussian like in other analyses, and here sits in the East Slavic cluster, and a project member with recent German ancestry (LIT3). The Western European influence carried by the latter pushes him into the Polish/West Ukrainian cluster, despite not having any documented Polish or Ukrainian ancestry.

pop13 - Northwest Russian: This cluster appears to be made up of Russians who have more Finnic, and/or perhaps Eastern Baltic, ancestry than the individuals in the East Slavic cluster. In other words, it's more northerly, less westerly, and more closely related to the Finnic-speaking Erzya, Moksha and Estonians.

pop14 - Irish + West British: Most Irish individuals fall in this cluster, as well as British samples from Western Scotland and Wales. It's tempting to correlate this cluster with Celtic genetic ancestry in the Isles.

pop15 - South/West Scandinavian: This is basically a Norwegian and Southern Swedish cluster. It also features Swedes from other parts of the country who most likely have some German, Walloon and/or French influence.

pop16 - East German: This cluster includes individuals with significant or even overwhelming Germanic ancestry, but also with very clear Western Slavic input. One of the individuals here is of mixed Polish, German and Swedish ancestry, which pretty much sums up the character of this cluster in a modern context. The presence of two Hungarians from Behar et al. 2009. isn't surprising, because Hungary was settled by both Germanic and Western Slavic groups from the early Middle Ages until modern times.

pop17 - Northwest European: I had reasonable hopes of breaking up this large cluster into a couple of units at least. However, that did not happen, and I don't think it will unless I obtain more samples from the relevant areas of Europe, like Holland and specific parts of the UK. I think the main reason this cluster failed to budge was because of its cosmopolitan nature. In other words, the samples here include some of the most outbred in the analysis, and this, coupled with the fact that they carry very similar ancestral components, means that fineSTRUCTURE doesn't have anything to latch onto to create divisions.

pop18 - East Scandinavian: This could also be called a Swedish cluster. It's almost entirely made up of Swedes, usually from Eastern or Southeastern Sweden, and/or occasionally with recent Finnish influence.

pop19 - Polish/West Ukrainian: The vast majority of the Poles fall in this cluster, and about half of the Ukrainians from Yunusbayev et al. 2011. Most of these Ukrainians appear to be from the Lviv district in the west, and some might even have fairly recent Polish and/or German ancestry. In fact, I would say the latter is a good bet for UkrLv240Y, who shows large Western European segments on several chromosomes.

pop20 - East Slavic: All of the Belorussians cluster here, and so do Russians from near Belorussia and Ukraine, and almost half of the Ukrainians from Yunusbayev et al. 2011 (those who show more easterly genetic characteristics). An individual of mixed Polish and Lithuanian ancestry also makes an appearance here, suggesting that one of the main factors differentiating this cluster from the Polish/West Ukrainian group is a higher level of Baltic admixture in the former.

pop21 - East Central European:
This cluster is based on most of the Hungarians in my dataset, but it also includes a number of Western and Southern Slavs, often with significant German ancestry. Not surprisingly, this cluster shows very high affinity with both the East German and Polish/West Ukrainian clusters.

Let's now move on to some graphics. Below, in order of appearance, are the following: raw data coancestry matrix, showing the placement of individual samples; aggregate coancestry matrix, showing the populations (or clusters) described above; pairwise coincidence matrix, which is useful for spotting very recent ancestral ties; a PCA plot of the 21 clusters. More detailed ChromoPainter/fineSTRUCTURE PCAs of Western Europe can be found at this link.

Finally, those of you who wish to run your own experiments with the ChromoPainter datasheets from this analysis can download them here. Please note, the sheets don't reveal any raw or traits/disease data.