Friday, July 3, 2015
ADMIXTURE analysis of Allentoft et al. and Haak et al. ancient genomes
I haven't had a chance to study the output in detail yet, and I don't know what the cross-validation errors are for each of these unsupervised runs, but I'd say they all look pretty good. A Principal Component Analysis (PCA) of some of the K=10 data, showing how present-day Armenians compare to two Bronze Age Armenians, can be seen here.
By the way, the analysis is based on the Human Origins fully public dataset available at the Reich lab website here.
To reduce errors, I limited the markers to transversion SNPs, and only kept samples with minimum call rates of 20%. This left 113K SNPs and 101 ancient genomes; 47 from Allentoft et al., 36 from Haak et al., and 18 from other recent papers. I didn't thin the markers to correct for LD, because in my experience this often results in less accurate outcomes.