HHS Public Access
Author manuscript
Author Manuscript
Science. Author manuscript; available in PMC 2019 October 31.
Published in final edited form as:
Science. 2019 September 06; 365(6457): . doi:10.1126/science.aat7487.
The Formation of Human Populations in South and Central Asia
A full list of authors and affiliations appears at the end of the article.
Abstract
Author Manuscript
By sequencing 523 ancient humans, we show that the primary source of ancestry in modern South
Asians is a prehistoric genetic gradient between people related to early hunter-gatherers of Iran
and southeast Asia. Following the Indus Valley Civilization’s decline, they mixed with people in
the southeast to form one of the two main ancestral populations of South Asia whose direct
descendants live in southern India. Simultaneously, they mixed with descendants of Steppe
pastoralists who spread via Central Asia after 4000 years ago to form the other main ancestral
population. The Steppe ancestry in South Asia has the same profile as that in Bronze Age Eastern
Europe, tracking a movement of people that affected both regions and that likely spread the unique
shared features shared between Indo-Iranian and Balto-Slavic languages.
Graphical Abstract
Author Manuscript
Author Manuscript
Correspondence to: V.N. (vagheesh@mail.harvard.edu), N.P. (nickp@broadinstitute.org), or D.Re. (reich@genetics.med.harvard.edu).
*These authors contributed equally to this work
+Co-directed this work
‡Present addresses: Department of Anthropology, University of California, Santa Cruz, CA 95064, USA (N.Br.); Department of
Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA (J.O.); Department of Human Evolutionary Biology,
Harvard University, Cambridge MA, 02138, USA and Max Planck-Harvard Research Center for the Archaeoscience of the Ancient
Mediterranean, Cambridge, MA 02138, USA (M.Mi.)
Author contributions: N.P., P.M., N.Ro., M.Me., N.Bo., K.Th., D.Ken., M.Fr., R.Pi. and D.Re. supervised the study. A.Ki., L.O.,
A.C., M.V., J.Ma., V.M., E.Ki., J.Mo., G.A., A.Baga., A.Bagn., B.B., J.B., A.Biss., G. B., T.Cha, T.Chi, P.D., A.D., M.Do., K.D., N.D.,
M.Du., D.E., A.E., S.F., A.F., D.Fu., A.Go., A.Gr., S.G., B.H., M.J., E.Ka., A.Kh., A.Kr., E.Ku., P.K., D.L., F.M., A. M., T.M., C.M.,
D.M., R.M., O.M., S.Mu., A.N., D.P., R.Po., D.Ra., M.R., S.Sa., T.S., K.Sik., S.Sl., O. S., N.S., S.Sv., K.Ta., M.T., A.T., V.T., S.V.,
P.V., D.V., A.Y., M.Z., V. Z, A.Z., V.Sh., C.L., D.A., N.Bo., M.Fr., and R.Pi. provided samples and assembled archaeological and
anthropological information. V.N., N.P., P.M., N.Ro., R.B., S.Ma., I.L., N.N., I.O., M.L., N.Ad., N.A., N.Br., F.C. O.C., B.C., M.Fe.,
D.Fe., B.Ga., D.G., M.H., E.H., T.H., D.Kea., A. L., M.Ma., K.M., M.Mi., M.N., J.O., N.Ra., K.Sir., V.Sl., K.Ste., Z.Z., M.Me., and
D.Re. performed ancient DNA laboratory work or radiocarbon laboratory work or data processing work. V.N., N.P., P.M., I.O., N.Al.,
S.M. and D.R. analyzed genetic data. V.N., N.P., and D.R. wrote the manuscript with input from all coauthors.
Competing interests: The authors declare no competing interests.
Data and materials availability: All newly reported sequencing data are available from the European Nucleotide Archive, accession
number PRJEB32466, and the software for dating admixture events in ancient samples is available at https://zenodo.org/record/
3263997#.XRnebJNKj6A (DOI: 10.5281/zenodo.3263997). The Online Data Visualizer is available at: https://public.tableau.com/
views/TheGenomicFormationofSouthandCentralAsia/Fig_1.
Supplementary Materials:
Materials and Methods
Online Data Visualizer for exploring data: https://public.tableau.com/views/TheGenomicFormationofSouthandCentralAsia/Fig_1.
Tables S1-S93
Figures S1-S61
Genotypes for newly reported individuals
Narasimhan et al.
Page 2
Author Manuscript
Author Manuscript
The Bronze Age spread of Yamnaya steppe pastoralist ancestry into two subcontinents, Europe
and South Asia. Pie charts reflect the proportion of Yamnaya ancestry, and dates reflect the earliest
available ancient DNA with Yamnaya ancestry in each region. There is no ancient DNA yet for the
ANI and ASI, so for these the range is inferred statistically.
One Sentence Summary:
Genome wide ancient DNA from 523 ancient individuals sheds light on genetic exchanges
between the Steppe, Iran and South Asia, and highlights the parallel demographic histories of two
subcontinents: Europe and South Asia.
One Page Summary
Author Manuscript
Introduction and Rationale: To elucidate the extent to which the major cultural
transformations of farming, pastoralism and shifts in the distribution of languages in Eurasia were
accompanied by movement of people, we report genome-wide ancient DNA data from 523
individuals spanning the last 8000 years mostly from Central Asia and northernmost South Asia.
Results: Movements of people following the advent of farming resulted in genetic gradients
across Eurasia that can be modeled as mixtures of seven deeply divergent populations. A key
gradient formed in southwestern Asia beginning in the Neolithic and continuing into the Bronze
Age, with more Anatolian farmer-related ancestry in the west and more Iranian farmer-related
ancestry in the east. This cline extended to the desert oases of Central Asia and was the primary
source of ancestry in peoples of the Bronze Age Bactria Margiana Archaeological Complex
(BMAC). This supports the idea that the archaeologically documented dispersal of domesticates
was accompanied by the spread of people from multiple centers of domestication.
Author Manuscript
The main population of the BMAC carried no ancestry from Steppe pastoralists and did not
contribute substantially to later South Asians. However, Steppe pastoralist ancestry appeared in
outlier individuals at BMAC sites by the turn of the second millennium BCE around the same time
as it appeared on the southern Steppe. Using data from ancient individuals from the Swat Valley of
northernmost South Asia, we show that Steppe ancestry then integrated further south in the first
half of the second millennium BCE, contributing up to 30% of the ancestry of modern groups in
South Asia. The Steppe ancestry in South Asia has the same profile as that in Bronze Age Eastern
Europe, tracking a movement of people that affected both regions and that likely spread the unique
shared features shared between Indo-Iranian and Balto-Slavic languages.
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 3
Author Manuscript
The primary ancestral population of modern South Asians is a mixture of people related to early
Holocene populations of Iran and South Asia that we detect in outlier individuals from two sites in
cultural contact with the Indus Valley Civilization (IVC), making it plausible that it was
characteristic of the IVC. After the IVC’s decline, this population mixed with northwestern groups
with Steppe ancestry to form the “Ancestral North Indians” (ANI) and with southeastern groups to
form the “Ancestral South Indians” (ASI) whose direct descendants live today in tribal groups in
southern India. Mixtures of these two post-IVC groups--the ANI and ASI--drive the main gradient
of genetic variation in South Asia today.
Conclusion: Earlier work recorded massive population movement from the Steppe into Europe
early in the 3rd millennium BCE, likely spreading Indo-European languages. We reveal a parallel
series of events leading to the spread of Steppe ancestry to South Asia, thereby documenting
movements of people that were likely conduits for the spread of Indo-European languages.
Author Manuscript
Introduction
The past ten thousand years have witnessed profound economic changes driven by the
transition from foraging to food production, and have also witnessed dramatic changes in
cultural practice evident from archaeology, the distribution of languages, and the written
record. The extent to which these changes were associated with movements of people has
been a mystery in Central Asia and South Asia in part because of a paucity of ancient DNA.
We report genome-wide data from 523 individuals from Central Asia and northernmost
South Asia from the Mesolithic period onward (1), and co-analyze them with previously
published ancient DNA from across Eurasia and with data from diverse present-day people.
Author Manuscript
In Central Asia, we studied the extent to which the spread of farming and herding practices
from the Iranian plateau to the desert oases south of the Steppe was accompanied by
movements of people or adoption of ideas from neighboring groups (2–4). For the urban
communities of the Bactria Margiana Archaeological Complex (BMAC) in the Bronze Age,
we assessed whether the people buried in its cemeteries descended directly from earlier
smaller scale food producers, and also documented their genetic heterogeneity (5). Further
to the north and east, we showed that the Early Bronze Age spreads of crops and
domesticated animals between southwest Asia and eastern Eurasia along the Inner Asian
Mountain Corridor (6) was accompanied by movements of people. Finally, we examined
when descendants of the Yamnaya, who spread across the Eurasian Steppe beginning around
3300 BCE (7–9), began to appear in Central Asia south of the Steppe.
Author Manuscript
In northernmost South Asia, we report a time transect of more than one hundred individuals
beginning ~1200 BCE, which we co-analyze along with modern data from hundreds of
present-day South Asian groups, as well as ancient DNA from neighboring regions (10).
Previous analyses place the majority of present-day South Asians along a genetic cline (11)
that can be modeled as having arisen from mixture of two highly divergent populations after
4000 years ago: the Ancestral North Indians (ANI) who harbor large proportions of ancestry
related to West Eurasians, and the Ancestral South Indians (ASI) who are much less closely
related to West Eurasians (12). We leveraged ancient DNA to place constraints on the
genetic structure of the ANI and ASI and, in conjunction with other lines of evidence, to
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 4
Author Manuscript
make inferences about when and where they formed. By modeling modern South Asians
along with ancient individuals from sites in cultural contact with the IVC, we inferred a
likely genetic signature for people of the Indus Valley Civilization (IVC) which reached its
maturity in northwestern South Asia 2600–1900 CE. We also examined when Steppe
pastoralist-derived ancestry (9) mixed with groups in South Asia, and placed constraints on
whether Steppe-related ancestry or Iranian-related ancestry is more plausibly associated with
the spread of Indo-European languages in South Asia.
Dataset and Analysis Strategy
Author Manuscript
Author Manuscript
Author Manuscript
We generated whole-genome ancient DNA data from 523 previously unsampled ancient
individuals and increased the quality of data from 19 previously sequenced individuals. The
individuals derive from three broad geographical regions: 182 from Iran and the southern
part of Central Asia that we call Turan (present-day Turkmenistan, Uzbekistan, Tajikistan,
Afghanistan and Kyrgyzstan), 209 from the Steppe and northern forest zone mostly in
present-day Kazakhstan and Russia, and 132 from northern Pakistan. The ancient
individuals are from 1) Mesolithic, Copper, Bronze and Iron Age Iran and Turan (12000–1
BCE from 19 sites) including the Bactria Margiana Archaeological Complex (BMAC); 2)
early ceramic-using hunter-gatherers from the western Siberian forest zone who we show
represent a point along an early Holocene cline of North Eurasians and who emerge as a
valuable source population for modeling the ancestry of Central and South Asians (6400–
3900 BCE from 2 sites); 3) Copper Age and Bronze Age pastoralists from the central
Steppe, including from Bronze Age Kazakhstan (3400–800 BCE from 56 sites); and 4)
northernmost South Asia, specifically Late Bronze Age, Iron Age and historical settlements
in the Swat and Chitral districts of present-day Pakistan (~1200 BCE - 1700 CE from 12
sites) (Fig. 1, Table S1, (1, 13)). We prepared samples in dedicated clean rooms, extracted
DNA (14, 15), and constructed libraries for Illumina sequencing (16, 17). We enriched the
libraries for DNA overlapping around 1.2 million single nucleotide polymorphisms (SNPs)
(7, 18, 19), sequenced the products on Illumina instruments, and performed quality control
(Table S2) (7, 19, 20). Our final dataset after merging with previously reported data (7–9, 16,
18, 19, 21–31) spans 837 ancient individuals that passed all our analysis filters, which
included removing individuals determined genetically to be first-degree relatives of other
higher coverage individuals (Table S3), and restricting to the 92% of individuals (Table S1)
that were represented by at least 15,000 of the targeted SNPs which we found was the
number at which we began to be able to reliably estimate proportions of the deeply divergent
ancestry sources. The median number of SNPs analyzed per individual was 617,000. We
also merged with previously reported whole genome sequencing data from 686 present-day
individuals (Table S1), and co-analyzed with 1,789 present-day people from 246
ethnographically-distinct groups in South Asia genotyped at ~600,000 SNPs (Table S5; (13))
(10, 32, 33).
We grouped individuals based on archaeological and chronological information, taking
advantage of 269 direct radiocarbon dates generated on skeletal material from the
individuals from whom we report DNA (Table S4). We further clustered individuals that
were genetically indistinguishable within these groupings, and labeled outliers with ancestry
that was significantly different from others at the same site and time period (13). For our
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 5
Author Manuscript
primary analyses, we did not include individuals that were the sole representatives of their
ancestry profiles, thereby reducing the chance that our conclusions were being driven by
single individuals with contaminated DNA or misattributed archaeological context. This also
ensured that each major analysis grouping was represented by many more SNPs that our
minimum cutoff of 15,000 per individual. Thus, all but one analysis cluster included at least
one individual covered by >200,000 SNPs, sufficient to support high resolution analysis of
population history (19) (the exception is a pair of genetically similar outliers from the site of
Gonur who are not the focus of any main analyses). We use Italic font to refer to genetic
groupings and plain font to indicate archaeological cultures or sites.
Author Manuscript
Author Manuscript
Author Manuscript
To make inferences about population structure, we began by carrying out principal
component analysis (PCA) projecting ancient individuals onto the patterns of genetic
variation in present-day Eurasians, a procedure that allowed us to obtain meaningful
constraints on ancestry even of ancient individuals with limited coverage because each SNP
from each individual can be compared to a large reference data set (34–36). This revealed
three major clusters strongly correlating to the geographic regions of the Forest Zone/
Steppe, Iran/Turan, and South Asia (Fig. 1), a pattern we replicate in ADMIXTURE
unsupervised clustering (37). To test if groups of ancient individuals were heterogeneous in
their ancestry, we used f4-statistics to measure whether different partitions of these groups
into two subgroups differed in their degree of allele sharing to a third group (using a
distantly related outgroup as a baseline). We also used f3-statistics to test for admixture (33).
To model the ancestry of each group, we used qpAdm, which evaluates whether a tested
group is consistent with deriving from a pre-specified number of source populations (relative
to a set of outgroups), and if so estimates proportions of ancestry (7). We first used qpAdm
to attempt to model groups from the Copper Age and afterward as a mixture of seven
“distal” sources related deeply to pre-Copper Age or distantly related modern populations
for which we have data (Box 1). In this paper we use the term ‘farmers’ to refer to people
who either cultivated crops, or herded animals, or both; this definition covers not only large
settled communities but also smaller and probably less sedentary communities like the early
herders of the Zagros Mountains of western Iran from the site of Ganj Dareh. The latter kept
domesticated animals but did not cultivate crops, and are a key reference population for this
study as they had a distinctive ancestry profile that spread widely after the Neolithic (9, 24,
38). We also identified “proximal” models for each group as mixtures of temporally
preceding groups (10). We implemented an algorithm, DATES, for estimating the age of
population mixtures by measuring the average size of segments of ancestry derive from the
admixing populations, an approach whose reliability we verified by computer simulation
(10) and that is an improvement relative to methods not optimized for analysis of ancient
DNA (33, 39) (the approach’s robustness reflects the fact that it relies for its molecular clock
on the accurately measured rate of meiotic recombination in humans (40)). In Box 2, we
summarize the findings of these analyses (we use the same headings in Box 2 and the main
text to allow cross-referencing), while the Online Data Visualizer (1) allows an interactive
exploration of the data.
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 6
Author Manuscript
Iran and Turan
A West-to-East Cline of Decreasing Anatolian Farmer-Related Ancestry
Author Manuscript
We studied the genetic transformations accompanying the spread of agriculture eastward
from Iran beginning in the 7th millennium BCE (41, 42). We replicate previous findings that
9th to 8th millennium BCE herders from the Zagros Mountains of western Iran harbored a
distinctive West Eurasian-related ancestry profile (9, 27), while later groups across a broad
region were admixed between this ancestry and that related to early Anatolian farmers. Our
analysis reveals a west-to-east cline of decreasing Anatolian farmer-related admixture in the
Copper Age and Bronze Age ranging from ~70% in Anatolia to ~31% in eastern Iran, to
~7% in far eastern Turan (Fig. 1; (13), Fig. S10, Table S8–S16). This suggests that the
archaeologically documented spread of a shared package of plants and animal domesticates
from diverse locations across this region was accompanied by bi-directional spread of
people and mixture with the local groups they encountered (3, 41, 43, 44). We call this the
Southwest Asian Cline. In the far east of the Southwest Asian Cline (eastern Iran and Turan)
in individuals from the 3rd millennium BCE, we detect not only the smallest proportions of
Anatolian farmer-related admixture but also admixture related to West Siberian Hunter
Gatherers (WSHG) (plausibly reflecting admixture from unsampled hunter-gatherer groups
who inhabited this region prior to the spread of Iranian farmer-related ancestry into it). This
shows that North Eurasian-related ancestry impacted Turan well before the spread of
descendants of Yamnaya Steppe pastoralists into the region. We can exclude the possibility
that the Yamnaya were the source of this North Eurasian-related ancestry, as they had more
EEHG-than WSHG-related ancestry, and also carried high frequencies of mitochondrial
DNA haplogroup type U5a as well as Y chromosome haplogroup types R1b or R1a not
represented in Iran and Turan in this period ((13), Table S93-S94).
Author Manuscript
People of the BMAC Were Not a Major Source of Ancestry for South Asians
Author Manuscript
From Bronze Age Iran and Turan, we obtained genome-wide data for 84 ancient individuals
(3000–1400 BCE) who lived in four urban sites of the Bactria Margiana Archaeological
Complex (BMAC) and its immediate successors. The great majority of these individuals fall
in a cluster genetically similar to the preceding groups in Turan, consistent with the
hypothesis that the BMAC coalesced from preceding pre-urban populations (5). We infer
three primary genetic sources: early Iranian farmer-related ancestry (~60–65%), and smaller
proportions of Anatolian farmer- (~20–25%) and WSHG-related ancestry (~10%). Unlike
preceding Copper Age individuals from Turan, people of the BMAC cluster also harbored an
additional 2–5% ancestry related (deeply in time) to Andamanese Hunter-Gatherers (AHG).
This evidence of north-to-south gene flow from South Asia is consistent with the
archaeological evidence of cultural contacts between the Indus Valley Civilization and the
BMAC and the existence of an IVC trading colony in northern Afghanistan (although we
lack ancient DNA from that site) (45), and stands in contrast to our qpAdm analyses
showing that a reciprocal north-to-south spread is undetectable. Specifically, our analyses
reject the BMAC and the people who lived before them in Turan as plausible major sources
of ancestry for diverse ancient and modern South Asians by showing that their ratio of
Anatolian farmer-related to Iranian farmer-related ancestry is too high for them to be a
plausible source for South Asians (p<0.0001, χ2 test; (13), Fig S50–S51). A previous study
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 7
Author Manuscript
(26) fit a model in which a population from Copper Age Turan was used a source of the
Iranian farmer-related ancestry in present-day South Asians, thus raising the possibility that
the people of the BMAC whom the authors correctly hypothesized were primarily derived
from the groups that preceded them in Turan were a major source population for South
Asians. However, that study only had access to 2 samples from this period compared to the
36 we report with this study, and it lacked ancient DNA from individuals from the BMAC
period or from any ancient South Asians. With additional samples, we have the resolution to
show that none of the large number of Bronze and Copper Age populations from Turan for
which we have ancient DNA fit as a source for the Iranian farmer-related ancestry in South
Asia.
Steppe Pastoralist-Derived Ancestry Arrived in Turan by 2100 BCE
Author Manuscript
Author Manuscript
Our large sample sizes from Central Asia, including individuals from BMAC sites, are a
particular strength of this study, allowing us to detect outlier individuals with ancestry
different from those living at the same time and place, and revealing cultural contacts that
would be otherwise difficult to appreciate (Fig. 2). Around ~2300 BCE, we observe three
outliers in BMAC-associated sites carrying WSHG-related ancestry and we report data from
the third millennium BCE from three sites in Kazakhstan and one in Kyrgyzstan that fit as
sources for them (related ancestry has been found in ~3500 BCE Botai culture individuals
(26)). Yamnaya-derived ancestry arrived by 2100 BCE, since from 2100–1700 BCE we
observe outliers from three BMAC-associated sites carrying ancestry ultimately derived
from Western_Steppe_EMBA pastoralists, in the distinctive admixed form typically carried
by many Middle to Late Bronze Age Steppe groups (with roughly two thirds of the ancestry
being of Western_Steppe_EMBA origin, and the rest consistent with deriving from
European farmers). Thus, our data document a southward movement of ancestry ultimately
descended from Yamnaya Steppe pastoralists that spread into Central Asia by the turn of the
2nd millennium BCE.
An Ancestry Profile Widespread During the Indus Valley Civilization
Author Manuscript
We document 11 outliers—3 with radiocarbon dates between 2500–2000 BCE from the
BMAC site of Gonur, and 8 with radiocarbon dates or archaeological context dates between
3300 BCE to 2000 BCE from the eastern Iranian site of Shahr-i-Sokhta—that harbored
elevated proportions of AHG-related ancestry (range of 11–50%) and the remainder from a
distinctive mixture of Iranian farmer- and WSHG-related ancestry (~50–89%). These
outliers had no detectable Anatolian farmer-related ancestry, in contrast with the main
BMAC (~20–25% Anatolian-related) and Shahr-i-Sokhta (~16–21%) clusters, allowing us to
reject both the BMAC and Shahr-i-Sokhta main clusters as sources for them (p<10−7, χ2
test; (13), Table S83). Without ancient DNA from individuals buried in IVC cultural
contexts, we cannot make a definitive statement that the genetic gradient represented by
these 11 outlier individuals, which we call the Indus Periphery Cline, was also an ancestry
profile common in the IVC. Nevertheless, our result provides six circumstantial lines of
evidence for this hypothesis. (i) These individuals had no detectable Anatolian farmerrelated ancestry suggesting they descend from groups further east along the Anatolia-to-Iran
cline of decreasing Anatolian farmer-related ancestry than any individuals we sampled from
this period. (ii) All 11 outliers had elevated proportions of AHG-related ancestry, and two
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 8
Author Manuscript
Author Manuscript
carried Y chromosome haplogroup H1a1d2 which today is primarily found in southern
India. (iii) At both Gonur and Shahr-i-Sokhta there is archaeological evidence of exchange
with the IVC (46, 47), and all the outlier individuals we directly dated fall within the time
frame of the mature IVC. (iv) Several outliers at Shahr-i-Sokhta were buried with artifacts
stylistically linked to Baluchistan in South Asia whereas burials associated with the other
ancestries did not have these linkages (13). (v) In our modeling, the 11 outliers fit as a
primary source of ancestry for 86 ancient individuals from post-IVC cultures living near the
headwaters of the Indus River ~1200–800 BCE as well as diverse present-day South Asians,
whereas no other ancient genetic clusters from Turan fit as sources for all these groups ((13),
Fig S50). (vi) The estimated date of admixture between Iranian farmer-related and AHGrelated ancestry in the outliers is several millennia before the time they lived (71 ± 15
generations, corresponding to a 95% confidence interval of ~5400–3700 BCE assuming 28
years per generation (13, 48). Thus, AHG-and Iranian farmer-related groups were in contact
well before the time of the mature IVC at ~2600–1900 BCE as might be expected if the
ancestry gradient was a major feature of a group that was living in the Indus Valley during
the IVC.
The Steppe and Forest Zone
Ancestry Clines in Eurasia Established After the Advent of Farming
Author Manuscript
The late hunter-gatherer individuals from northern Eurasia lie along a west-to-east huntergatherer gradient of increasing relatedness to East Asians (Fig. 3). In the Neolithic and
Copper Ages, hunter-gatherers at different points along this cline mixed with people with
ancestry at different points along a southern cline to form five later clines, two of which
were in the south (the Southwest Asian Cline and the Indus Periphery Cline which are
described in the previous section), and three of which were in northern Eurasia (Fig. 3).
Furthest to the west in the Steppe and Forest Zone there was the European Cline, established
by the spread of farmers from Anatolia after ~7000 BCE and mixture with Western
European Hunter-Gatherers (19). In far eastern Europe at latitudes spanning the Black and
Caspian Seas there was the Caucasus Cline, consisting of a mixture of Eastern European
Hunter-Gatherers and Iranian farmer-related ancestry with additional Anatolian farmerrelated ancestry in some groups (49). East of the Urals we detect a Central Asian Cline, with
WSHG individuals at one extreme and Copper Age and Early Bronze Age individuals from
Turan at the other.
A Distinctive Ancestry Profile Stretching from Eastern Europe to Kazakhstan in the Bronze
Age
Author Manuscript
Beginning around 3000 BCE, the ancestry profiles of many groups in Eurasia were
transformed by the spread of Yamnaya Steppe pastoralist ancestry (Western_Steppe_EMBA)
from its source in the Caucasus Cline (9, 49) to a vast region stretching from Hungary in the
west to the Altai mountains in the east (7, 8) (Fig. 3). Over the next two millennia this
ancestry spread further while admixing with local groups, eventually reaching the Atlantic
shores of Europe in the west and South Asia in the southeast. The source of the
Western_Steppe_EMBA ancestry that eventually reached Central Asia and South Asia was
not the initial eastward expansion but instead a secondary expansion, which involved a group
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 9
Author Manuscript
that had ~67% Western_Steppe_EMBA ancestry and ~33% ancestry from a point on the
European Cline (8) (Fig. 3). We replicate previous findings that this group included people
of the Corded Ware, Srubnaya, Petrovka, and Sintashta archaeological complexes spreading
over a vast region from the border of eastern Europe to northwestern Kazakhstan (8, 19, 21),
and our dataset adds more than one hundred individuals from this Western_Steppe_MLBA
cluster. We also detect a further cluster, Central_Steppe_MLBA, which is differentiated from
Western_Steppe_MLBA (p=7×10−6 by qpAdm), due to carrying ~9% additional ancestry
derived from Bronze Age pastoralists of the central Steppe of primarily of WSHG-related
ancestry (Central_Steppe_EMBA). Thus, individuals with Western_Steppe_MLBA ancestry
admixed with local populations as they integrated eastward and southward.
Bidirectional Mobility Along the Inner Asian Mountain Corridor
Author Manuscript
As in Iran/Turan, the outlier individuals provide crucial information about human
interaction.
First, our analysis of 50 individuals from the Sintashta culture cemetery of Kamennyi Ambar
V reveals multiple groups of outliers whom we directly radiocarbon dated to be
contemporaries of the main cluster but were genetically distinctive, indicating that this was a
cosmopolitan site (Fig. 2). One set of outliers had elevated proportions of
Central_Steppe_EMBA (largely WSHG-related) ancestry, another had elevated
Western_Steppe_EMBA (Yamnaya-related), and a third had elevated EEHG-related
ancestry.
Author Manuscript
Second, in the central Steppe (present-day Kazakhstan), an individual from one site dated to
2800–2500 BCE, and individuals from three sites dated to ~1600–1500 BCE, show
significant admixture from Iranian farmer-related populations that is well-fit by the main
BMAC cluster, demonstrating northward gene flow from Turan into the Steppe at the same
time as there was southward movement of Central_Steppe_MLBA-related ancestry through
Turan to South Asia. Thus, the archaeologically documented spreads of material culture and
technology both north and south along the Inner Asian Mountain Corridor (50, 51), which
began as early as the middle 3rd millennium BCE, were associated with substantial
movements of people (Fig. 2).
Author Manuscript
Third, we observe individuals from Steppe sites (Krasnoyarsk) dated to ~1700–1500 BCE
that derive up to ~25% ancestry from a source related to East Asians (well-modeled as
ESHG), with the remainder best modeled as Western_Steppe_MLBA. By the Late Bronze
Age, ESHG-related admixture became ubiquitous as documented by our time transect from
Kazakhstan, and ancient DNA data from the Iron Age and from later periods in Turan and
the central Steppe including Scythians, Sarmatians, Kushans, and Huns (25, 52). Thus, these
1st millennium BCE to 1st millennium CE archaeological cultures with documented cultural
and political impacts on South Asia cannot be important sources for the Steppe pastoralistrelated ancestry widespread in South Asia today (since present-day South Asians have too
little East Asian-related ancestry to be consistent with deriving from these groups),
providing an example of how genetic data can rule out scenarios that are plausible based on
the archaeological and historical evidence alone ((13), Fig S52). Instead, our analysis shows
that the only plausible source for the Steppe ancestry is Steppe Middle to Late Bronze Age
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 10
Author Manuscript
groups, who not only fit as a source for South Asia but who we also document as having
spread into Turan and mixed with BMAC-related individuals at sites in Kazakhstan in this
period. Taken together, these results identify a narrow time window (first half of the second
millennium BCE) when the Steppe ancestry that is widespread today in South Asia must
have arrived.
The Genomic Formation of Human Populations in South Asia
Three Ancestry Clines That Succeeded Each Other in Time in South Asia
Previous work has shown that South Asians harbor ancestry from peoples related to ancient
groups in northern Eurasia and Iran, East Asians, and Australasians (9). Here we document
the process through which these deep sources of ancestry mixed to form later groups.
Author Manuscript
We begin with the pre-2000 BCE Indus Periphery Cline, described in an earlier section and
detected in 11 outliers from two sites in cultural contact with the Indus Valley Civilization
(Fig. 4). We can jointly model all individuals in this cline as a mixture of two source
populations: one end of the cline is consistent with being entirely AHG-related, and the
other is consistent with being 90% Iranian farmer-related and 10% WSHG-related (Fig. 4,
(13)). People fitting on the Indus Periphery Cline form the majority of the ancestors of
present-day South Asians. Through formal modeling, we demonstrate that it is this
contribution of Indus Periphery Cline people to later South Asians, rather than westward
gene flow bringing an ancestry unique to South Asia onto the Iranian plateau, that explains
the high degree of shared ancestry between present-day South Asians and early Holocene
Iranians (9, 13, 27).
Author Manuscript
We next characterized the 2000 BCE Steppe Cline, represented in our analysis by 117
individuals dating to 1400 BCE - 1700 CE from the Swat and Chitral districts of
northernmost South Asia (Fig. 2, Fig. 4). We found that we could jointly model all
individuals on the Steppe Cline as a mixture of two sources albeit different from the two
sources in the earlier cline. One end is consistent with a point along the Indus Periphery
Cline. The other end is consistent with a mixture of about 41% Central_Steppe_MLBA
ancestry and 59% from a subgroup of the Indus Periphery Cline with relatively high Iranian
farmer-related ancestry ((13), Fig S50).
Author Manuscript
To understand the formation of the Modern Indian Cline, we searched for triples of
populations that could fit as sources for diverse present-day South Asians groups as well as
peoples of the Steppe Cline. All fitting models include as sources Central_Steppe_MLBA
(or a group with a similar ancestry profile), a group of Indus Periphery Cline individuals,
and either AHG or a subgroup of Indus Periphery Cline individuals with relatively high
AHG-related ancestry (13), Fig S51). Co-analyzing 140 diverse South Asian groups (10)
that fall on a gradient in PCA (13), we show that while there are three deep sources, just as
in the case of the earlier two clines the great majority of groups on the Modern Indian Cline
can be jointly modeled as a mixture of two populations that are mixed from the earlier three.
While we do not have ancient DNA data from either of the two statistically reconstructed
source populations for the Modern Indian Cline, the ASI or the ANI, in what follows we co-
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 11
Author Manuscript
analyze our ancient DNA data in conjunction with modern data to characterize the exact
ancestry of the ASI, and to provide constraints on the ANI.
The ASI and ANI Arose as Indus Periphery Cline People Mixed with Groups to the North &
East
Author Manuscript
To gain insight into the formation of the ASI, we extrapolated to the least West Eurasianrelated theoretical extreme of the Modern Indian Cline by setting the
Central_Steppe_MLBA ancestry proportion to zero in our model. We estimate a minimum
of 55% ancestry from people on the Indus Periphery Cline (by representing the Indus
Periphery Cline by the individual on it with the most Iranian farmer-related ancestry, which
we call Indus_Periphery_West), and modeling the reminder of the ancestry as deriving from
an AHG-related group (13). We find that several tribal groups from southern India are
consistent with ~0% Central_Steppe_MLBA ancestry (13). The fact that these individuals
match the most extreme possible position for the ASI not only reveals that nearly direct
descendants of the ASI live today in South Asia, but also allows us to make a precise
statement about the ancestry profile of the ASI. In particular, the fact that they harbor
substantial Iranian farmer-related ancestry (via the Indus Periphery Cline), disproves earlier
suggestions that the ASI might not have any ancestry related to West Eurasians (11). Using
the DATES software, we estimate an average of 107 ± 11 generations since admixture of the
Iranian farmer-related and AHG-related groups in one of these groups: Palliyar. This
corresponds to a 95% confidence interval of 1700–400 BCE assuming 28 years per
generation (53). Thus, the ASI were note fully formed at the time of the IVC, and instead
must have continued to form through mixture after its decline as material culture typical of
the IVC spread eastward (54) and Indus Periphery Cline ancestry mixed with people of less
West Eurasian relatedness.
Author Manuscript
Author Manuscript
We also obtained additional evidence for a late (Bronze Age) formation of the ASI by
building an admixture graph using qpGraph, co-modeling Palliyar and Juang (an
Austroasiatic-speaking group in India with low West Eurasian-relatedness) (Fig. 5). The
graph fits the component of South Asian ancestry with no West Eurasian relatedness (AASI
- “Ancestral Ancient South Asians”) as an Asian lineage that split off around the time that
East Asian, Andaman Islander, and Australian aboriginal ancestors separated from each
other, consistent with the hypothesis that eastern and southern Asian lineages derives from
an eastward spread that in a short span gave rise to lineages leading to AASI, East Asians,
Andamanese Hunter Gatherers, and Australians (55) (Fig. 5). The Juang cannot be fit
through a mixture of ASI ancestry and ancestry related to Austroasiatic language speakers,
and instead can only be fit by modeling additional ancestry from AASI, showing that at the
time Austroasiatic groups formed in South Asia, groups with less Iranian farming-related
ancestry than in the ASI were also present. Austroasiatic languages are hypothesized to have
spread into South Asia in the 3rd millennium BCE (based on hill cultivation systems
hypothesized to be associated with the spread of Austroasiatic languages (42), and thus the
ancestry profile of the Juang provides an independent line of evidence for a late (Bronze Age
and plausibly post-IVC) formation of the ASI.
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 12
Author Manuscript
Author Manuscript
To shed light on the formation of the statistically reconstructed ANI, we return to the Swat
Valley time transect that formed the Steppe Cline after 2000 BCE. The Modern Indian Cline
intersects the Steppe Cline at a position close to the position of the Kalash, the group in
northwest South Asia with the highest ANI ancestry proportion (56) (Fig. 4). The DATESbased estimate of admixture in the Kalash is 110 ± 12 generations (56), suggesting a postIVC date of formation of the ANI paralleling the post-IVC date of formation of the ASI.
Further evidence for a post-IVC integration of Steppe ancestry into South Asia comes from
ancient individuals on the Steppe Cline (along which the ANI could theoretically have
formed) whose admixture date for Steppe ancestry is also post-IVC. Specifically, we
estimate the date of admixture into the Late Bronze Age and Iron Age individuals from the
Swat District of northernmost South Asia to be on average 26 generations before the date
that they lived, corresponding to a 95% confidence interval of ~1900–1500 BCE. This time
scale for the arrival of Steppe ancestry in the region is consistent with our observation of 6
outlier individuals in Turan who lived between ~2000–1500 BCE and who carry this
ancestry in mixed form (Fig. 2), and with our finding that the R1a Y chromosome associated
with Central_Steppe_MLBA ancestry in South Asia is also present in the Swat District Late
Bronze and Iron Age individuals (2 copies).
Taken together, these results show neither of the two primary source populations of the
Modern Indian Cline, the ANI and ASI, was fully formed before the turn of the 2nd
millennium BCE.
Steppe Ancestry in South Asia is Primarily from Males and Disproportionately High in
Brahmins
Author Manuscript
Author Manuscript
In the Late Bronze Age and Iron Age individuals of the Swat Valley, we detect a
significantly lower proportion of Steppe admixture on the Y chromosome (only 5% of the 44
Y chromosomes of the R1a-Z93 subtype that occurs at 100% frequency in the
Central_Steppe_MLBA males) compared to 20% on the autosomes (Z = −3.9 for a
deficiency from males under the simplifying assumption that all the Y chromosomes are
unrelated to each other since admixture and thus statistically independent), documenting
how Steppe ancestry was incorporated into these groups largely through females (Fig. 4).
However, sex bias varied in different parts of South Asia, as in present-day South Asians we
observe a reverse pattern of excess Central_Steppe_MLBA-related ancestry on the Y
chromosome compared to the autosomes (Z = 2.7 for an excess from males) (13, 57) (Fig.
4). Thus, the introduction of lineages from Steppe pastoralists into the ancestors of presentday South Asians was mediated mostly by males. This bias is similar in direction to what
has been documented for the introduction of Steppe ancestry into Iberia in far western
Europe, although the bias is less extreme than reported in that case (58).
Our analysis of Steppe ancestry also identified 6 groups with a highly significantly elevated
ratio of Central_Steppe_MLBA-to-Indus_Periphery_West-related ancestry compared to the
expectation for the model at the Z < −4.5 level. The strongest two signals were in
Brahmin_Tiwari (Z = −7.9) and Bhumihar_Bihar (Z = −7.0). More generally, there is a
notable enrichment in groups that consider themselves to be of traditionally priestly status: 5
of the 6 groups with Z < −4.5 were Brahmins or Bhumihars even though they comprise only
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 13
Author Manuscript
7–11% of the 140 groups analyzed (p<10−12 by a χ2 test assuming all the groups evolved
independently). We caution that this is not a formal test as there is an unknown degree of
shared ancestry among groups since they formed by mixture, and because our decisions
about which groups to include in the analysis was not made in a blinded way; for example,
we excluded four “Catholic Brahmin” groups with strong evidence of substantial shared
ancestry in the last millennium (10) which makes them not statistically independent (Table
S5, Fig. 4 (13)). Nevertheless, the fact that traditional custodians of liturgy in Sanskrit
(Brahmins) tend to have more Steppe ancestry than is predicted by a simple ASI-ANI
mixture model provides an independent line of evidence, beyond the distinctive ancestry
profile shared between South Asia and Bronze Eastern Europe mirroring the shared features
of Balto-Slavic and Indo-Iranian languages (59), for a Steppe origin for South Asia’s IndoEuropean languages prior to ~2000 BCE.
Author Manuscript
Discussion
Author Manuscript
Our analysis reveals that the ancestry of the greater South Asian region in the Holocene was
characterized by at least three genetic gradients. Prior to ~2000 BCE, there was the Indus
Periphery Cline consisting of people with different proportions of Iranian farmer- and AASIrelated ancestry, which we hypothesize was a characteristic feature of many IVC people. The
ASI formed after 2000 BCE as a mixture of a point along this cline with South Asians with
higher proportions of AASI-related ancestry. Between about 2000 and 1000 BCE, people of
largely Central_Steppe_MLBA ancestry expanded toward South Asia, mixing with people
along the Indus Periphery Cline to form the Steppe Cline. Multiple points along the Steppe
Cline are represented by individuals of the Swat Valley time transect and statistically we find
that the ANI, one of the two primary source population of South Asia, can fit along the
Steppe Cline. After 2000 BCE, mixtures of mixed populations—the ASI and ANI—mixed
themselves to form the Modern Indian Cline, which is represented today in diverse groups in
South Asia (Fig. 4).
Author Manuscript
Our finding based on admixture linkage disequilibrium ((13), Fig S59) that the mixture that
formed the Indus Periphery Cline occurred by ~5400–3700 BCE—at least a millennium
before the formation of the mature IVC—raises two possibilities. One is that Iranian farmerrelated ancestry in this group was characteristic of the Indus Valley hunter-gatherers in the
same way as it was characteristic of northern Caucasus and Iranian plateau hunter-gatherers.
The presence of such ancestry in hunter-gatherers from Belt and Hotu Caves in northeastern
Iran increases the plausibility that this ancestry could have existed in hunter-gatherers further
east. An alternative is that this ancestry reflects movement into South Asia from the Iranian
plateau of people accompanying the eastward spread of wheat and barley agriculture and
goat and sheep herding as early as the 7th millennium BCE and forming early farmer
settlements such as those at Mehrgarh in the hills flanking the Indus Valley (60, 61).
However, this is in tension with the observation that the Indus Periphery Cline people had
little if any Anatolian farmer-related ancestry, which is strongly correlated with the eastward
spread of crop-based agriculture in our dataset. Thus, while our analysis supports the idea
that eastward spread of Anatolian farmer-related ancestry was associated with the spread of
farming to the Iranian plateau and Turan, our results do not support large-scale movements
of ancestry from the Near East into South Asia following ~6000 BCE (the time after which
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 14
Author Manuscript
all ancient individuals from Iran in our data have Anatolian farmer-related ancestry even
though South Asians have very little). Languages in pre-state societies usually spread
through movements of people (62), and thus the absence of much Anatolian farmer-related
ancestry in the Indus Periphery Cline suggests that the Indo-European languages spoken in
South Asia today are unlikely to owe their origin to the spread of farming from West Asia.
Author Manuscript
Author Manuscript
Author Manuscript
Our results not only provide negative evidence against an Iranian plateau origin for IndoEuropean languages in South Asia, but also positive evidence for the theory that these
languages spread from the Steppe. While ancient DNA has documented westward
movements of Steppe pastoralist ancestry providing a likely conduit for the spread of many
Indo-European languages to Europe (7, 8), the chain-of-transmission into South Asia has
been unclear because of a lack of relevant ancient DNA. Our observation of the spread of
Central_Steppe_MLBA ancestry into South Asia in the first half of the 2nd millennium BCE
provides this evidence, and is particularly striking as it provides a plausible genetic
explanation for the linguistic similarities between the Balto-Slavic and Indo-Iranian subfamilies of Indo-European, which despite their vast geographic separation, share the Satem
innovation and Ruki sound laws (63). If the spread of people from the Steppe in this period
was a conduit for the spread of South Asian Indo-European languages, then it is striking that
there are so few material culture similarities between the central Steppe and South Asia in
the Middle to Late Bronze Age (i.e. after the middle of the 2nd millennium BCE). Indeed,
the material culture differences are so substantial that some archaeologists recognize no
evidence of a connection. However, lack of material culture connections does not provide
evidence against spread of genes, as has been demonstrated in the case of the Beaker
Complex, which originated largely in western Europe, but in Central Europe was associated
with skeletons that harbored ~50% ancestry related to Yamnaya Steppe pastoralists (18).
Thus, in Europe we have an unambiguous example of people with ancestry from the Steppe
making profound demographic impacts on the regions into which they spread while adopting
important aspects of local material culture. Our findings document a similar phenomenon in
South Asia, with the locally acculturated population harboring up to ~20%
Western_Steppe_EMBA-derived ancestry according to our modeling (via
Central_Steppe_MLBA groups) (Fig. 3). Our analysis also provides a second line of
evidence for a linkage between Steppe ancestry and Indo-European languages. Steppe
ancestry enrichment in groups that view themselves as being of traditionally priestly status is
striking as some of these groups including Brahmins are traditional custodians of literature
composed in early Sanskrit. A possible explanation is that the influx of
Central_Steppe_MLBA ancestry into South Asia in the mid-2nd millennium BCE created a
meta-population with varied proportions of Steppe ancestry, with people of more Steppe
ancestry (or admixing less with Indus Periphery Cline groups) tending to be more strongly
associated with Indo-European culture. Due to strong endogamy, which kept groups
generally isolated from neighbors for thousands of years (7), some of this population
substructure persists in South Asia among present-day custodians of Indo-European texts.
Our findings also shed light on the origin of its second-largest language group in South Asia,
Dravidian. The strong correlation between ASI ancestry and present-day Dravidian
languages suggests that the ASI, which we have shown formed as groups with ancestry
typical of the Indus Periphery Cline moved south and east after the decline of the IVC to
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 15
Author Manuscript
mix with groups with more AASI ancestry, most likely spoke an early Dravidian language.
A possible scenario combining genetic data with archaeology and linguistics is that protoDravidian was spread by peoples of the IVC along with the Indus Periphery Cline ancestry
component of the ASI. Non-genetic support for an IVC origin for Dravidian languages
includes the present-day geographic distribution of these languages (in southern India and
southwestern Pakistan), and a suggestion that some symbols on ancient Indus Valley seals
denote Dravidian words or names (64, 65). An alternative possibility is that proto-Dravidian
was spread by the approximately half of the ASI’s ancestry that was not from the Indus
Periphery Cline, and instead derived from the south and the east (peninsular South Asia).
The southern scenario is consistent with reconstructions of Proto-Dravidian terms for flora
and fauna unique to peninsular India (66, 67).
Author Manuscript
Author Manuscript
We finally highlight a remarkable parallel between the prehistory of South Asia and Europe.
In both regions there were exchanges between people related to Southwest Asian people and
local people; mixtures of these groups led to the Indus Periphery Cline in South Asia and the
European Cline in Europe. In both subcontinents, people arriving in the 3rd and 2nd
millennia BCE who descended from mixtures of people related to Yamnaya Steppe
pastoralists and European farmers mixed further with local populations: in South Asia
forming the ANI, and in Europe forming groups like that of the Beaker Complex. In both
cases, mixtures of these mixed populations—those with Steppe pastoralist-related admixture
and those without—drive the modern ancestry clines in both regions (Fig. 3). However, there
are also profound differences between the Bronze Age and Neolithic spreads of ancestry
across the two subcontinents. One is that the maximum proportion of local ancestry is higher
in South Asia (AASI ancestry of up to ~60%) than Europe (WEHG ancestry of up to ~30%)
(7), which could reflect stronger ecological or cultural barriers to the spreads of people in
South Asia than in Europe, allowing the previously established groups more time to adapt
and mix with incoming groups. A second difference is the smaller proportion of Steppe
pastoralist-related ancestry in South Asia than in Europe, its later arrival by ~500–1000
years, and a lower male sex bias in the admixture, factors that help to explain the continued
persistence of a large fraction of non-Indo-European speakers amongst people of presentday South Asia today. The situation in South Asia is somewhat reminiscent of
Mediterranean Europe where the proportion of Steppe ancestry is considerably lower than
that of northern and central Europe (Fig. 3), and where many non-Indo European languages
are attested in classical times (68). Further studies of ancient DNA from South Asia and the
linguistically related Iranian world will extend and add nuance to the model presented here.
Materials and Methods
Author Manuscript
Ancient DNA Laboratory Work.
For the skeletal elements that we were not able to transport from field sites, we drilled
directly into bone, for the most part focusing on inner ear portions of petrous bones using a
method for sampling from the cranial base (CBD) (70). The great majority of skeletal
elements were prepared in dedicated ancient DNA clean rooms at Harvard Medical School,
University College Dublin, the University of Vienna, or the Max Planck Institute for
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 16
Author Manuscript
Evolutionary Anthropology in Leipzig Germany, either by drilling, or by sandblasting to
isolate a bone piece followed by milling (Table S1, Table S2).
Author Manuscript
Author Manuscript
All the molecular work except for that on a single individual (Darra-i-Kur) was carried out at
Harvard Medical School (HMS). At HMS, we extracted DNA using a method that is
optimized to retain small DNA fragments. We implemented this method both using a manual
method based on silica spin columns (565 libraries) (14, 15), and with the assistance of
robotic liquid handlers using silica coated magnetic beads and Buffer D (149 libraries) (71).
We converted the DNA into a form that could be sequenced using a double-stranded library
preparation protocol (711 libraries) and a single stranded library preparation protocol (3
libraries) (72). For all but four of the double stranded libraries, we pre-treated with a mixture
of the enzymes Uracil-DNA Glycosylase (UDG) and Endo VIII (USER, New England
Biolabs) to reduce characteristic cytosine-to-thymine errors in ancient DNA except in the
terminal base (17). The remaining four libraries were not pre-treated with USER (73). The
three single-stranded libraries were also pre-treated with USER in a way that results in a
similar damage pattern of inefficient uracil removal in the terminal base (72). We prepared
most double stranded libraries (n=524) with the assistance of a robotic liquid handler,
substituting the MinElute columns used for cleaning up reactions in manual processing with
silica coated magnetic beads in robotic processing, and the MinElute column-based PCR
cleanup at the end of library preparation with SPRI beads (74, 75). We enriched all libraries
both for sequences overlapping mitochondrial DNA (76), and for sequences overlapping
about 1.24 million nuclear targets. We carried out two rounds of enrichment for these targets
(7, 19, 20) either in two independent capture experiments or together. After indexing the
enrichment products in a way that assigned a unique index and combination to each library
(77), we sequenced the enriched products on an Illumina NextSeq500 instrument using v.2
150 cycle kits for 2×76 cycles and 2×7 cycles (2×8 for single-stranded libraries), and
sequenced up to the point that the expected number of additional SNPs covered per 100
additional read pairs sequenced was less than about 1. We also shotgun sequenced doublestranded libraries to assess the fraction of sequences that mapped to the human genome.
Author Manuscript
To analyze the data, we began by sorting the read pairs by searching for the expected
identification indices and barcodes for each library, allowing up to one mismatch from the
expected sequence in each case. We removed adapters and merged together sequences
requiring a 15 base pair overlap (allowing up to one mismatch), taking the highest quality
base in the merged segment to represent the allele. We mapped the resulting sequences to the
hg19 human reference (GRCh37, the version used for the 1000 Genomes project (78)) using
the samse command of BWA (79) (version 0.6.1). We removed duplicate sequences
(mapping to the same position in the genome and having the same barcode pair), and merged
libraries corresponding to the same sample (merging across samples when the genetic data
revealed multiple samples from the same individual). For each individual, we restricted to
sequences passing filters (not overlapping known insertion/deletion polymorphisms, and
having a minimum mapping quality 10), and trimmed two nucleotides from the end of each
sequence to reduce deamination artifacts. We also further restricted to sequence data with a
minimum base quality of 20. To represent each individual at each SNP position, we
randomly selected a single sequence (if one was available).
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 17
Author Manuscript
For Darra-i-Kur, we analyzed a single-stranded DNA library (L5082) at the Max-PlanckInstitute for Evolutionary Anthropology (MPI-EVA) in Leipzig, Germany, generated as part
of a previous study (80). The previous study only analyzed mitochondrial DNA, and for the
current study, we enriched the library for sequences overlapping the same panel of about 1.2
million nuclear targets using two rounds of hybridization capture (7, 19, 20). We sequenced
the enriched libraries on 2 lanes of an Illumina HiSeq2500 platform in a double index
configuration (2×76 cycles) (77), and we determined alleles using FreeIbis (81). We merged
overlapping paired-end and trimmed using leeHom (82). We used BWA to align the
sequences to the human reference genome hg19 (GRCh37) (79). We retained sequences
showing a perfect match to the expected index combination for downstream analyses.
Author Manuscript
We assessed evidence for ancient DNA authenticity by measuring the rate of damage in the
first nucleotide, flagging individuals as potentially contaminated if they had less than a 3%
cytosine-to-thymine substitution rate in the first nucleotide for a UDG-treated library and
less than a 10% substitution rate for a non-UDG-treated library. We used contamMix to test
for contamination based on polymorphism in mitochondrial DNA (83), and ANGSD to test
for contamination based on polymorphism on the X chromosome in males (84).
Radiocarbon dating.
Author Manuscript
Author Manuscript
We generated 269 radiocarbon (14C) dates on bone using accelerator mass spectrometry
(AMS) (Table S3). Most of these (n=242) were generated at the Pennsylvania State
University (PSU) Radiocarbon Laboratory, and here we excerpt a description of the sample
preparation methodology at PSU (the methods used at the other laboratories are publicly
available and we refer readers to the literature for those methodologies). Possible
contaminants (conservants and adhesives) were removed by sonicating all bone samples in
successive washes of ACS grade methanol, acetone, and dichloromethane for 30 minutes
each at room temperature, followed by three washes in Nanopure water to rinse. Bone
collagen for 14C was extracted and purified using a modified Longin method with
ultrafiltration (>30kDa gelatin; (85)). If collagen yields were low and amino acids poorly
preserved we used a modified XAD process (XAD Amino Acids; (86)). For quality
assurance we measured carbon and nitrogen concentrations and C/N ratios of all extracted
and purified collagen/amino acid samples with a Costech elemental analyzer (ECS 4010).
We evaluated quality based on % crude gelatin yield, %C, %N and C/N ratios before AMS
14C dating. C/N ratios for all directly radiocarbon samples fell between 2.9 and 3.6,
indicating excellent preservation (87). Collagen/amino acid samples (~2.1 mg) were then
combusted for 3 h at 900°C in vacuum-sealed quartz tubes with CuO and Ag wires. Sample
CO2 was reduced to graphite at 550°C using H2 and a Fe catalyst, with reaction water drawn
off with Mg(ClO4)2 (88). Graphite samples were pressed into targets in Al boats and loaded
on a target wheel with OX-1 (oxalic acid) standards, known-age bone secondaries, and a
14C-free Pleistocene whale blank. All 14C measurements were made on a modified National
Electronics Corporation compact spectrometer with a 0.5 MV accelerator (NEC 1.5SDH-1).
The 14C ages were corrected for mass-dependent fractionation with measured δ13C values
(89) and compared with samples of Pleistocene whale bone (backgrounds, 48,000 14C BP),
late Holocene bison bone (~1,850 14C BP), late 1800s CE cow bone, and OX-2 oxalic acid
standards. All calibrated 14C ages were calculated using OxCal version 4.3 (Ramsey and Lee
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 18
Author Manuscript
2013) using the IntCal13 northern hemisphere curve (90), and we quote 95% confidence
intervals (2-sigma ranges).
Principal component analysis (PCA)
Author Manuscript
We carried out PCA using the smartpca package of EIGENSOFT 7.2.1 (36). We used default
parameters and added two options (lsqproject:YES and numoutlieriter:0) to project the
ancient individuals onto the PCA space. We used two basis sets for the projection: the first
based on 1,340 present-day Eurasians genotyped on the Affymetrix Human Origins array,
and the second based on a subset of 991 present-day West Eurasians (7, 32, 33). These
projections are shown repeatedly in (13) and are used in the Online Data Visualizer. We also
computed FST between groups using the parameters inbreed:YES and fstonly:YES. We
restricted these analyses to the dataset obtained by merging our ancient DNA data with the
modern DNA data on the Human Origins array and restricting to 597,573 SNPs. We treated
positions where we did not have sequence data as missing genotypes.
ADMIXTURE clustering
Using PLINK2 (91), we first pruned our dataset using the --geno 0.7 option to ensure that
we only performed our analysis on sites where at least 70% of individuals were covered by
at least one sequence. This resulted in 892,613 SNPs. Individuals without coverage on
specific SNPs were assigned missing data at those sites. We ran ADMIXTURE (37) with 10
replicates, reporting the replicate with the highest likelihood. We show results for K=5 in
(13), as we found that this provides good resolution for disambiguating the sources of preCopper Age ancestry in the ancient individuals.
f-statistics
Author Manuscript
We used the qp3pop and qpDstat packages in ADMIXTOOLS to compute f3-statistics and
f4-statistics. We used the inbreed:YES parameter to compute f3-statistics as a test for
admixture with an ancient population as a target, with all ancient genomes as sources. Using
the f4Mode:YES parameter in qpDstat, we also computed two sets of f4-symmetry statistics
to evaluate if pairs of populations are consistent with forming a clade relative to a
comparison population. The first is a “Two-population comparison” statistic where we
compare all possible pairs of ancient groups (the Test populations) to a panel of populations
that encompasses diverse pre-Copper Age and more widespread genetic variation. Thus, we
compute a statistic of the form f4(Test 1, Test 2; Pre-Copper Age, Mbuti). The second is a
“Pre-Copper Age affinity” statistic that compares each ancient group in turn against all
possible pairs of Pre-Copper Age populations, using statistics of the form f4(Pre-Copper
Age 1, Pre-Copper Age 2; Test, Mbuti).
Author Manuscript
Modeling admixture history
We used qpAdm (33) in the ADMIXTOOLS software package to estimate the proportions of
ancestry in a Test population deriving from a mixture of N ‘reference’ populations by
leveraging (but not explicitly modeling) shared genetic drift with a set of ‘Outgroup’
populations. We set the details:YES parameter, which reports a normally distributed Z-score
for the goodness of fit of the model (estimated with a Block Jackknife).
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 19
Hierarchical modeling.
Author Manuscript
For each group on a proposed cline, we used qpAdm to obtain estimates for the proportion
of ancestry from hypothesized source populations, along with the covariance matrix across
groups. We jointly modeled these estimates using a bivariate normal model (forcing the three
proportions to sum to 100%) and estimated the mean and covariance of the two parameters
using maximum likelihood. With this inferred matrix, we tested whether the cline could be
modeled by a mixture of two primary source populations. First, we tested if the covariance
matrix is consistent with being singular, implying that knowledge of the proportion of
ancestry from one of the mixing components was consistent with being fully predictive of
the other two, as expected for two-way mixture. Second, if we were able to establish that this
was the case, we examined the difference between the expected and observed ratios of the
ancestry proportions of the analyzed groups within this generative model by fitting all the
groups simultaneously. This resulted in a handful of groups deviating from expectation.
Author Manuscript
Method for dating admixture events.
We implemented an algorithm, DATES, which leverages ancestry covariance patterns that
can be measured in a single individual (instead of admixture LD that requires multiple
individuals). Full details of the approach and simulations documenting its efficacy in modern
as well as ancient data are presented in (13). The software implementing DATES is available
at https://zenodo.org/record/3263997#.XRnebJNKj6A (DOI: 10.5281/zenodo.3263997).
Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
Author Manuscript
Authors
Author Manuscript
Vagheesh M. Narasimhan1,*, Nick Patterson2,3,*, Priya Moorjani4,5,+, Nadin
Rohland1,2, Rebecca Bernardos1, Swapan Mallick1,2,6, Iosif Lazaridis1, Nathan
Nakatsuka1,7, Iñigo Olalde1, Mark Lipson1, Alexander M. Kim1,8, Luca M. Olivieri9,
Alfredo Coppa10, Massimo Vidale9,11, James Mallory12, Vyacheslav Moiseyev13,
Egor Kitov14,15,16, Janet Monge17, Nicole Adamski1,6, Neel Alex18, Nasreen
Broomandkhoshbacht1,6,‡, Francesca Candilio19,20, Kimberly Callan1,6, Olivia
Cheronet19,21,22, Brendan J. Culleton23,24, Matthew Ferry1,6, Daniel
Fernandes19,21,22,25, Beatriz Gamarra19,21,26, Daniel Gaudio19,21, Mateja
Hajdinjak27, Éadaoin Harney1,6,28, Thomas K. Harper23,24, Denise Keating19, Ann
Marie Lawson1,6, Matthew Mah1,2,6, Kirsten Mandl22, Megan Michel1,6,‡, Mario
Novak19,29, Jonas Oppenheimer1,6,‡, Niraj Rai30,31, Kendra Sirak1,19,32, Viviane
Slon27, Kristin Stewardson1,6, Fatma Zalzala1,6, Zhao Zhang1, Gaziz Akhatov15,
Anatoly N. Bagashev33, Alessandra Bagnera9, Bauryzhan Baitanayev15, Julio
Bendezu-Sarmiento34, Arman A. Bissembaev15,35, Gian Luca Bonora36, Temirlan T.
Chargynov37, Tatiana Chikisheva38, Petr K. Dashovskiy39, Anatoly Derevianko38,
Miroslav Dobeš40, Katerina Douka41,42, Nadezhda Dubova14, Meiram N.
Duisengali35, Dmitry Enshin33, Andrey Epimakhov43,44, Suzanne Freilich22, Alexey
V. Fribus45, Dorian Fuller46, Alexander Goryachev33, Andrey Gromov13, Sergey P.
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 20
Author Manuscript
Author Manuscript
Grushin47, Bryan Hanks48, Margaret Judd48, Erlan Kazizov15, Aleksander
Khokhlov49, Aleksander P. Krygin50, Elena Kupriyanova51, Pavel Kuznetsov49,
Donata Luiselli52, Farhod Maksudov53, Aslan M. Mamedov54, Talgat B. Mamirov15,
Christopher Meiklejohn55, Deborah C. Merrett56, Roberto Micheli9,57, Oleg
Mochalov49, Samariddin Mustafokulov53,58, Ayushi Nayak41, Davide Pettener59,
Richard Potts60, Dmitry Razhev33, Marina Rykun61, Stefania Sarno59, Tatyana M.
Savenkova62, Kulyan Sikhymbaeva63, Sergey M. Slepchenko33, Oroz A.
Soltobaev37, Nadezhda Stepanova38, Svetlana Svyatko13,64, Kubatbek Tabaldiev65,
Maria Teschler-Nicola22,66, Alexey A. Tishkin67, Vitaly V. Tkachev68, Sergey
Vasilyev14,69, Petr Velemínský70, Dmitriy Voyakin15,71, Antonina Yermolayeva15,
Muhammad Zahir41,72, Valery S. Zubkov73, Alisa Zubova13, Vasant S. Shinde74,
Carles Lalueza-Fox75, Matthias Meyer27, David Anthony76, Nicole Boivin41,+,
Kumarasamy Thangaraj30,+, Douglas J. Kennett23,24,77,‡,+, Michael Frachetti78,79,+,
Ron Pinhasi19,22,+, David Reich1,2,6,80,+
Affiliations
1Department
Author Manuscript
Author Manuscript
of Genetics, Harvard Medical School, Boston, MA 02115, USA 2Broad
Institute of Harvard and MIT, Cambridge, MA 02142, USA 3Radcliffe Institute for
Advanced Study, Harvard University, Cambridge, MA 02138, USA 4Department of
Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
5Center for Computational Biology, University of California, Berkeley, CA 94720,
USA 6Howard Hughes Medical Institute, Harvard Medical School, Boston, MA
02115, USA 7Harvard-MIT Division of Health Sciences and Technology, Harvard
Medical School, Boston, MA 02115, USA 8Department of Anthropology, Harvard
University, Cambridge, MA 02138, USA 9ISMEO - International Association of
Mediterranean and Oriental Studies, Italian Archaeological Mission in Pakistan,
19200 Saidu Sharif (Swat), Pakistan 10Dipartimento di Biologia Ambientale,
Sapienza Università di Roma, Rome 00185, Italy 11Department of Cultural Heritage:
Archaeology and History of Art, Cinema and Music, University of Padua, Padua
35139, Italy 12School of Natural and Built Environment, Queen’s University Belfast,
Belfast BT7 1NN, Northern Ireland, UK 13Peter the Great Museum of Anthropology
and Ethnography (Kunstkamera), Russian Academy of Science, St. Petersburg
199034, Russia 14Center of Physical Anthropology, Institute of Ethnology and
Anthropology, Russian Academy of Sciences, Moscow 119991, Russia 15A.Kh.
Margulan Institute of Archaeology, Almaty 050010, Kazakhstan 16Al-Farabi Kazakh
National University, Almaty 050040, Kazakhstan 17University of Pennsylvania
Museum of Archaeology and Anthropology, Philadelphia, PA 19104, USA
18Electrical Engineering and Computer Science, University of California, Berkeley,
CA 94720, USA 19Earth Institute, University College Dublin, Dublin 4, Ireland
20Soprintendenza Archeologia, Belle Arti e Paesaggio per la Città Metropolitana di
Cagliari e le Province di Oristano e Sud Sardegna, Cagliari 09124, Italy 21School of
Archaeology, University College Dublin, Dublin 4, Ireland 22Department of
Evolutionary Anthropology, University of Vienna, 1090 Vienna, Austria 23Department
of Anthropology, Pennsylvania State University, University Park, PA 16802, USA
24Institutes for Energy and the Environment, Pennsylvania State University,
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 21
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
University Park, PA 16802, USA 25CIAS, Department of Life Sciences, University of
Coimbra, Coimbra 3000-456, Portugal 26Catalan Institute of Human Paleoecology
and Social Evolution (IPHES), Tarragona 43007, Spain. 27Max Planck Institute for
Evolutionary Anthropology, Leipzig 04103, Germany 28Department of Organismic
and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
29Institute for Anthropological Research, Zagreb 10000, Croatia 30CSIR-Centre for
Cellular and Molecular Biology, Hyderabad 500 007, India 31Birbal Sahni Institute of
Palaeosciences, Lucknow 226007, India 32Department of Anthropology, Emory
University, Atlanta, GA 30322, USA 33Tyumen Scientific Centre SB RAS, Institute of
the Problems of Northern Development, Tyumen 625003, Russia 34CNRS-EXT500,
Directeur de la Delegation Archaologique Francaise en Afghanistan (DAFA),
Embassy of France in Kabul, Afghanistan. 35Aktobe Regional Historical Museum,
Aktobe 030006, Kazakhstan 36Archaeology of Asia Department, ISMEO International Association of Mediterranean and Oriental Studies, Rome RM00186,
Italy 37Kyrgyz National University, 720033 Bishkek, Kyrgyzstan 38Institute of
Archaeology and Ethnography, Siberian Branch, Russian Academy of Sciences,
Novosibirsk 630090, Russia 39Department of Political History, National and StateConfessional Relations, Altai State University, Barnaul 656049, Russia 40Institute of
Archaeology, Czech Academy of Sciences, Prague 118 01, Czech Republic
41Department of Archaeology, Max Planck Institute for the Science of Human
History, Jena 07745, Germany 42Oxford Radiocarbon Accelerator Unit, Research
Laboratory for Archaeology and the History of Art, University of Oxford, Oxford OX1
3QY, UK 43Institute of History and Archaeology, Ural Branch RAS, Yekaterinburg
620990, Russia 44South Ural State University, Chelyabinsk 454080, Russia
45Department of Archaeology, Kemerovo State University, Kemerovo 650043,
Russia 46Institute of Archaeology, University College London, London WC1H 0PY,
UK 47Department of Archaeology, Ethnography and Museology, Altai State
University, Barnaul, 656049, Russia 48University of Pittsburgh, Department of
Anthropology, Pittsburgh, PA 15260, USA 49Samara State University of Social
Sciences and Education, Samara 443099, Russia 50West Kazakhstan Regional
Center for History and Archaeology, Uralsk 090000, Kazakhstan 51Scientific and
Educational Center of Study on the Problem of Nature and Man, Chelyabinsk State
University, Chelyabinsk 454021, Russia 52Department of Cultural Heritage,
University of Bologna, 48121 Ravenna, Italy 53Institute for Archaeological Research,
Uzbekistan Academy of Sciences, Samarkand 140151, Uzbekistan 54Center for
Research, Restoration and Protection of Historical and Cultural Heritage of Aktobe
Region, Aktobe 030007, Kazakhstan. 55Department of Anthropology, University of
Winnipeg, Winnipeg, MB, R3B 2E9, Canada 56Department of Archaeology, Simon
Fraser University, Burnaby, BC, V5A 1S6, Canada 57MiBAC – Ministero per i Beni e
le Attività Culturali - Soprintendenza Archeologia, belle arti e paesaggio del Friuli
Venezia Giulia, 34135 Trieste, Italy 58Afrosiab Museum, Samarkand 140151,
Uzbekistan 59Department of Biological, Geological and Environmental Sciences,
Alma Mater Studiorum – University of Bologna, Bologna 40126, Italy 60Human
Origins Program, National Museum of Natural History, Smithsonian Institution,
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 22
Author Manuscript
Author Manuscript
Washington, DC 20013, USA 61National Research Tomsk State University, Tomsk
634050, Russia 62F. Voino-Yasenetsky Krasnoyarsk State Medical University,
Krasnoyarsk, 660022, Russia 63Central State Museum Republic of Kazakhstan,
Samal-1 Microdistrict, Almaty 050010, Kazakhstan 64CHRONO Centre for Climate,
the Environment, and Chronology, Queen’s University of Belfast, Belfast BT7 1NN,
Northern Ireland, UK 65Kyrgyz-Turkish Manas University, Bishkek, Kyrgyzstan.
66Department of Anthropology, Natural History Museum Vienna, 1010 Vienna,
Austria 67Department of Archaeology, Ethnography and Museology, The Laboratory
of Interdisciplinary Studies in Archaeology of Western Siberia and Altai, Altai State
University, Barnaul, 656049, Russia 68Institute of Steppe, Ural Branch RAS,
Orenburg 460000, Russia 69Center for Egyptological Studies RAS, Moscow 119991,
Russia 70Department of Anthropology, National Museum, Prague 115 79, Czech
Republic 71Archaeological Expertise LLP, Almaty 050060,Kazakhstan 72Department
of Archaeology, Hazara University, Mansehra 21300, Pakistan 73N.F. Katanov
Khakassia State University, Abakan, 655017, Russia 74Department of Archaeology,
Deccan College Post-Graduate and Research Institute, Pune 411006, India
75Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona
08003, Spain 76Anthropology Department, Hartwick College, Oneonta, New York
13820, USA 77Department of Anthropology, University of California, Santa Barbara,
CA 93106, USA 78Department of Anthropology, Washington University in St. Louis,
St. Louis, MO 63112, USA 79Spatial Analysis, Interpretation, and Exploration
Laboratory, Washington University in St. Louis, St. Louis, MO 63112, USA 80Max
Planck-Harvard Research Center for the Archaeoscience of the Ancient
Mediterranean, Cambridge, MA 02138, USA
Author Manuscript
Acknowledgments:
We acknowledge the people past and present whose samples we analyzed in this study. We thank Oliver Uberti for
the design of Figure 3 and the figure associated with the One Page Summary. We thank the Minusinsk Regional
Museum of N. M. Martyanov for sharing skeletal samples. We thank Orazak Ismagulov, Ainagul Ismagulova, and
Akmatov Kunbolot Toktonsunovich for facilitating access to skeletal material.We thank Orazak Ismagulov, Ainagul
Ismagulova, and Akmatov Kunbolot Toktonsunovich for facilitating access to skeletal material We thank the
Department of Archaeology and Museums, Government of Pakistan, the Directorate of Archaeology and Museums,
Government of Khyber-Pakhtunkhwa Province (Pakistan), and the Dipartimento di Biologia Ambientale, Sapienza
Università (Rome) for facilitating access to the materials from Swat excavated by the Italian Archaeological
Mission (now ISMEO).
Author Manuscript
Funding: N.P. carried out this work while a fellow at the Radcliffe Institute for Advanced Study at Harvard
University. P.M. was supported by a Burroughs Wellcome Fund CASI award. N.N. is supported by a NIGMS
(GM007753) fellowship. T.C. and A.D. were supported by the Russian Science Foundation (project no. 14–
50-00036). T.S. was supported by the Russian Foundation for Basic Research (project no. grant 8–09-00779)
“Anthropological and archaeological aspects of ethnogenesis of the population of the southern part of Western and
Central Siberia in the Neolithic and Early Bronze Age.” D.P., S.S. and D.L. were supported by European Research
Council ERC-2011-AdG 295733 grant (Langelin). O.M. was supported by a grant from the Ministry of Education
and Sciences of the Russian Federation No 33. 1907, 2017/ П4 “Traditional and innovational models of a
development of ancient Volga population”. A.E. was supported by a grant from the Ministry of Education and
Sciences of the Russian Federation No 33.5494, 2017/BP “Borderlands of cultural worlds (Southern Urals from
Antiquity to Early Modern period)”. Radiocarbon dating work supported by the NSF Archaeometry program
BCS-1460369 to D.Ken. and B.C. and by the NSF Archaeology program BCS-1725067 to D.Ken. K.Th. was
supported by NCP fund (MLP0117) of the Council of Scientific and Industrial Research (CSIR), Government of
India, New Delhi. N.Bo., A.N., and M.Z.. were supported by the Max Planck Society. D.Re. is an Investigator of the
Howard Hughes Medical Institute and his ancient DNA laboratory work was supported by National Science
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 23
Author Manuscript
Foundation HOMINID grant BCS-1032255, by National Institutes of Health grant GM100233, by an Allen
Discovery Center grant, and by grant 61220 from the John Templeton Foundation.
References and Notes
Author Manuscript
Author Manuscript
Author Manuscript
1. Online Data Visualizer, (available at https://public.tableau.com/views/
TheGenomicFormationofSouthandCentralAsia/Fig_1).
2. Fuller DQ, Lucas L, in Human Dispersal and Species Movement, Boivin N, Petraglia M, Crassard
R, Eds. (Cambridge University Press, Cambridge, 2017), pp. 304–331.
3. Stevens CJ et al., Between China and South Asia: A Middle Asian corridor of crop dispersal and
agricultural innovation in the Bronze Age. The Holocene. 26, 1541–1555 (2016). [PubMed:
27942165]
4. Allaby RG, Stevens C, Lucas L, Maeda O, Fuller DQ, Geographic mosaics and changing rates of
cereal domestication. Philos. Trans. R. Soc. B Biol. Sci 372, 20160429 (2017).
5. Dani AH et al., History of Civilizations of Central Asia: The Development of Sedentary and
Nomadic Civilizations, 700 B. C. to A (UNESCO Publishing, 1994).
6. Frachetti MD, Smith CE, Traub CM, Williams T, Nomadic ecology shaped the highland geography
of Asia’s Silk Roads. Nature. 543, 193–198 (2017). [PubMed: 28277506]
7. Haak W et al., Massive migration from the steppe was a source for Indo-European languages in
Europe. Nature. 522, 207–211 (2015). [PubMed: 25731166]
8. Allentoft ME et al., Population genomics of Bronze Age Eurasia. Nature. 522, 167–172 (2015).
[PubMed: 26062507]
9. Lazaridis I et al., Genomic insights into the origin of farming in the ancient Near East. Nature. 536,
419–424 (2016). [PubMed: 27459054]
10. Nakatsuka N et al., The promise of discovering population-specific disease-associated genes in
South Asia. Nat. Genet 49, 1403–1407 (2017). [PubMed: 28714977]
11. Reich D, Thangaraj K, Patterson N, Price AL, Singh L, Reconstructing Indian population history.
Nature. 461, 489–494 (2009). [PubMed: 19779445]
12. Moorjani P et al., Genetic evidence for recent population mixture in India. Am. J. Hum. Genet 93,
422–438 (2013). [PubMed: 23932107]
13. Supplementary Materials.
14. Dabney J et al., Complete mitochondrial genome sequence of a Middle Pleistocene cave bear
reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci. U. S. A 110, 15758–15763
(2013). [PubMed: 24019490]
15. Korlević P et al., Reducing microbial and human contamination in DNA extractions from ancient
bones and teeth. Biotechniques. 59, 87–93 (2015). [PubMed: 26260087]
16. Meyer M et al., A high-coverage genome sequence from an archaic Denisovan individual. Science.
338, 222–226 (2012). [PubMed: 22936568]
17. Rohland N, Harney E, Mallick S, Nordenfelt S, Reich D, Partial uracil-DNA-glycosylase treatment
for screening of ancient DNA. Philos. Trans. R. Soc. B Biol. Sci 370, 20130624 (2015).
18. Olalde I et al., The Beaker phenomenon and the genomic transformation of northwest Europe.
Nature. 555, 190–196 (2018). [PubMed: 29466337]
19. Mathieson I et al., Genome-wide patterns of selection in 230 ancient Eurasians. Nature. 528, 499–
503 (2015). [PubMed: 26595274]
20. Fu Q et al., The genetic history of Ice Age Europe. Nature. 534, 200–205 (2016). [PubMed:
27135931]
21. Yang MA et al., 40,000-Year-Old Individual from Asia Provides Insight into Early Population
Structure in Eurasia. Curr. Biol 27, 3202–3208 (2017). [PubMed: 29033327]
22. Saag L et al., Extensive Farming in Estonia Started through a Sex-Biased Migration from the
Steppe. Curr. Biol 27, 2185–2193 (2017). [PubMed: 28712569]
23. Raghavan M et al., Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans.
Nature. 505, 87–91 (2014). [PubMed: 24256729]
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 24
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
24. Mittnik A et al., The genetic prehistory of the Baltic Sea region. Nat. Commun 9, 442 (2018).
[PubMed: 29382937]
25. Mathieson I et al., The genomic history of southeastern Europe. Nature. 555, 197–203 (2018).
[PubMed: 29466330]
26. Lipson M et al., Parallel palaeogenomic transects reveal complex genetic history of early European
farmers. Nature. 551, 368–372 (2017). [PubMed: 29144465]
27. Lazaridis I et al., Ancient human genomes suggest three ancestral populations for present-day
Europeans. Nature. 513, 409–13 (2014). [PubMed: 25230663]
28. Gallego-Llorente M et al., The genetics of an early Neolithic pastoralist from the Zagros, Iran. Sci.
Rep 6, 31326 (2016). [PubMed: 27502179]
29. de B. Damgaard P et al., 137 ancient human genomes from across the Eurasian steppes. Nature.
557, 369–374 (2018). [PubMed: 29743675]
30. de Barros Damgaard P et al., The first horse herders and the impact of early Bronze Age steppe
expansions into Asia. Science. 360, eaar7711 (2018).
31. Broushaki F et al., Early Neolithic genomes from the eastern Fertile Crescent. Science. 353, 499–
503 (2016). [PubMed: 27417496]
32. Lazaridis I et al., Ancient human genomes suggest three ancestral populations for present-day
Europeans. Nature. 513, 409–413 (2014). [PubMed: 25230663]
33. Patterson N et al., Ancient admixture in human history. Genetics. 192, 1065–93 (2012). [PubMed:
22960212]
34. Skoglund P et al., Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe.
Science. 336, 466–469 (2012). [PubMed: 22539720]
35. Patterson N, Price AL, Reich D, Population Structure and Eigenanalysis. PLoS Genet 2, e190
(2006). [PubMed: 17194218]
36. Galinsky KJ et al., Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B
in Europe and East Asia. Am. J. Hum. Genet 98, 456–472 (2016). [PubMed: 26924531]
37. Alexander DH, Novembre J, Lange K, Fast model-based estimation of ancestry in unrelated
individuals. Genome Res 19, 1655–1664 (2009). [PubMed: 19648217]
38. McNeill J, Pomeranz K, Eds., The Cambridge World History (Cambridge University Press,
Cambridge, 2015).
39. Loh P-R et al., Inferring admixture histories of human populations using linkage disequilibrium.
Genetics. 193, 1233–54 (2013). [PubMed: 23410830]
40. Hinch AG et al., The landscape of recombination in African Americans. Nature. 476, 170–175
(2011). [PubMed: 21775986]
41. Petraglia MD, Allchin B, in The Evolution and History of Human Populations in South Asia
(Springer Netherlands, Dordrecht, 2007), pp. 1–20.
42. Bellwood PS, Renfrew C, M. I. for Archaeological Research, Examining the Farming/language
Dispersal Hypothesis (University of Cambridge, 2002), McDonald Institute monographs.
43. Daly KG et al., Ancient goat genomes reveal mosaic domestication in the Fertile Crescent.
Science. 361, 85–88 (2018). [PubMed: 29976826]
44. Ammerman A, Cavalli-Sforza LL, The Neolithic Transition and the Genetics of Populations in
Europe (Princeton University Press, 1984).
45. Dupree L, Notes on Shortugai: an Harappan site in northern Afghanistan (Centre for the Study of
the Civilization of Central Asia, Quaid-i-Azam University, Islamabad, Pakistan, 1981).
46. Minniti C, Sajjadi SMS, New data on non‐human primates from the ancient Near East: The
recentdiscovery of a rhesus macaque burial at Shahr‐i Sokhta(Iran). Int J Osteoarchaeol. 2019;1–
11. Int. J. Osteoarchaeol 1 (2019).
47. Vidale M, A “Priest King” at Shahr-i Sokhta? Archaeol. Res. Asia 15, 110–115 (2018).
48. Moorjani P et al., A genetic method for dating ancient genomes provides a direct estimate of
human generation interval in the last 45,000 years. Proc. Natl. Acad. Sci. U. S. A 113, 5652–7
(2016). [PubMed: 27140627]
49. Wang C-C et al., Ancient human genome-wide data from a 3000-year interval in the Caucasus
corresponds with eco-geographic regions. Nat. Commun 10, 590 (2019). [PubMed: 30713341]
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 25
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
50. Frachetti MD, Multiregional Emergence of Mobile Pastoralism and Nonuniform Institutional
Complexity across Eurasia. Curr. Anthropol 53, 2–38 (2012).
51. Frachetti MD, Pastoralist landscapes and social interaction in bronze age Eurasia (University of
California Press, 2008).
52. Unterländer M et al., Ancestry and demography and descendants of Iron Age nomads of the
Eurasian Steppe. Nat. Commun 8, 14615 (2017). [PubMed: 28256537]
53. Moorjani P et al., A genetic method for dating ancient genomes provides a direct estimate of
human generation interval in the last 45,000 years. Proc. Natl. Acad. Sci. U. S. A 113, 5652–7
(2016). [PubMed: 27140627]
54. Giosan L et al., Fluvial landscapes of the Harappan civilization. Proc. Natl. Acad. Sci. U. S. A 109,
1688–1694 (2012).
55. Mallick S et al., The Simons Genome Diversity Project: 300 genomes from 142 diverse
populations. Nature. 538, 201–206 (2016). [PubMed: 27654912]
56. Hellenthal G et al., The Kalash Genetic Isolate? The Evidence for Recent Admixture. Am. J. Hum.
Genet 98, 396–397 (2016). [PubMed: 26849116]
57. Silva M et al., A genetic chronology for the Indian Subcontinent points to heavily sex-biased
dispersals. BMC Evol. Biol 17, 88 (2017). [PubMed: 28335724]
58. Olalde I et al., The genomic history of the Iberian Peninsula over the past 8000 years. Science
(80-. ). 363, 1230–1234 (2019).
59. Ringe D, Warnow T, Taylor A, Indo‐European and Computational Cladistics. Trans. Philol. Soc
100, 59–129 (2002).
60. Lister DL et al., Barley heads east: Genetic analyses reveal routes of spread through diverse
Eurasian landscapes. PLoS One. 13, e0196652 (2018). [PubMed: 30020920]
61. Costantini L, The first farmers in Western Pakistan: the evidence of the Neolithic agropastoral
settlement of Mehrgarh. Pragdhara. 18, 167–178 (2008).
62. Bellwood P, in The Encyclopedia of Global Human Migration (Blackwell Publishing Ltd, Oxford,
UK, 2013).
63. B. W. Fortson, Indo-European Language and Culture: An Introduction (Wiley, 2011), Blackwell
Textbooks in Linguistics.
64. Parpola A, The Roots of Hinduism: the Early Aryans and the Indus Civilization (Oxford University
Press, 2015).
65. Mahadevan I, Bhaskar MV, in Walking with the Unicorn: Social Organization and Material Culture
in Ancient South Asia, Frenez D, Jamison G, Law R, Vidale M, Meadow RH, Eds. (Archeopress,
Oxford, 2018), pp. 359–376.
66. Southworth F, in 7th ESCA Harvard-Kyoto Roundtable (2005).
67. Krishnamurti B, The Dravidian Languages (Cambridge University Press, Cambridge, 2003).
68. Anthony DW, Ringe D, The Indo-European Homeland from Linguistic and Archaeological
Perspectives. Annu. Rev. Linguist 1, 199–219 (2015).
69. Olalde I et al., Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic
European. Nature. 507, 225–228 (2014). [PubMed: 24463515]
70. Sirak KA et al., A minimally-invasive method for sampling human petrous bones from the cranial
base for ancient DNA analysis. Biotechniques. 62, 283–289 (2017). [PubMed: 28625158]
71. Rohland N, Glocke I, Aximu-Petri A, Meyer M, Extraction of highly degraded DNA from ancient
bones, teeth and sediments for high-throughput sequencing. Nat. Protoc 13, 2447–2461 (2018).
[PubMed: 30323185]
72. Gansauge M-T et al., Single-stranded DNA library preparation from highly degraded DNA using
T4 DNA ligase. Nucleic Acids Res 45, e79 (2017). [PubMed: 28119419]
73. Briggs AW, Heyn P, in Methods in molecular biology (Clifton, N.J.) (2012), vol. 840, pp. 143–154.
[PubMed: 22237532]
74. Rohland N, Reich D, Cost-effective, high-throughput DNA sequencing libraries for multiplexed
target capture. Genome Res 22, 939–946 (2012). [PubMed: 22267522]
75. DeAngelis MM, Wang DG, Hawkins TL, Solid-phase reversible immobilization for the isolation of
PCR products. Nucleic Acids Res 23, 4742–4743 (1995). [PubMed: 8524672]
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 26
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
76. Maricic T, Whitten M, Pääbo S, Multiplexed DNA Sequence Capture of Mitochondrial Genomes
Using PCR Products. PLoS One. 5, e14004 (2010). [PubMed: 21103372]
77. Kircher M et al., A general framework for estimating the relative pathogenicity of human genetic
variants. Nat. Genet 46, 310–5 (2014). [PubMed: 24487276]
78. Auton A et al., A global reference for human genetic variation. Nature. 526, 68–74 (2015).
[PubMed: 26432245]
79. Li H, Durbin R, Fast and accurate long-read alignment with Burrows–Wheeler transform.
Bioinformatics. 26, 589–595 (2010). [PubMed: 20080505]
80. Douka K et al., Direct radiocarbon dating and DNA analysis of the Darra-i-Kur (Afghanistan)
human temporal bone. J. Hum. Evol 107, 86–93 (2017). [PubMed: 28526291]
81. Renaud G, Kircher M, Stenzel U, Kelso J, freeIbis: an efficient basecaller with calibrated quality
scores for Illumina sequencers. Bioinformatics. 29, 1208–1209 (2013). [PubMed: 23471300]
82. Renaud G, Stenzel U, Kelso J, leeHom: adaptor trimming and merging for Illumina sequencing
reads. Nucleic Acids Res 42, e141 (2014). [PubMed: 25100869]
83. Fu Q et al., A Revised Timescale for Human Evolution Based on Ancient Mitochondrial Genomes.
Curr. Biol 23, 553–559 (2013). [PubMed: 23523248]
84. Korneliussen TS, Albrechtsen A, Nielsen R, ANGSD: Analysis of Next Generation Sequencing
Data. BMC Bioinformatics. 15, 356 (2014). [PubMed: 25420514]
85. McClure SB, Puchol OG, Culleton BJ, Ams Dating of Human Bone from Cova De La Pastora:
New Evidence of Ritual Continuity in the Prehistory of Eastern Spain. Radiocarbon. 52, 25–32
(2010).
86. Lohse JC, Culleton BJ, Black SL, Kennett DJ, A Precise Chronology of Middle to Late Holocene
Bison Exploitation in the Far Southern Great Plains. J. Texas Archeol. Hist 1, 94–126 (2014).
87. van Klinken GJ, Bone Collagen Quality Indicators for Palaeodietary and Radiocarbon
Measurements. J. Archaeol. Sci 26, 687–695 (1999).
88. Santos GM, Southon JR, Druffel-Rodriguez KC, Griffin S, Mazon M, Magnesium Perchlorate as
an Alternative Water Trap in AMS Graphite Sample Preparation: A Report On Sample Preparation
at Kccams at the University of California, Irvine. Radiocarbon. 46, 165–173 (2004).
89. Stuiver M, Polach HA, Discussion Reporting of 14C Data. Radiocarbon. 19, 355–363 (1977).
90. Reimer PJ et al., IntCal13 and Marine13 Radiocarbon Age Calibration Curves 0–50,000 Years cal
BP. Radiocarbon. 55, 1869–1887 (2013).
91. Chang CC et al., Second-generation PLINK: rising to the challenge of larger and richer datasets.
Gigascience. 4, 7 (2015). [PubMed: 25722852]
92. Ramsey C. Bronk, Bayesian Analysis of Radiocarbon Dates. Radiocarbon. 51, 337–360 (2009).
93. Enshin DN, Skochina SN, Zakh VA, On settling rites in the Neolithic of the Low Ishim basin
(basing on materials of Mergen 6 settlement). Bull. Archeol. Anthropol. Ethnogr 4, 43–52 (2012).
94. Enshin DN, Neolithic dwellings from settlements of the Mergen lake. Bull. Archeol. Anthropol.
Ethnogr 1, 14–23 (2014).
95. Kosintsev PA, Nekrasov AE, in Ecology of ancient and modern societies. (Publishing House IPOS
SB RAS, Tyumen, 1999), pp. 100–104.
96. Enshin DN, A pottery complex from the settlement of Mergen’ 7 (Low Ishim basin): description
and interpretation. Bull. Archeol. Anthropol. Ethnogr 29, 15–27 (2015).
97. Skochina SN, Enshin DN, Bone inventory of the Neolithic burial of the settlement of Mergen 7. Br.
reports Inst. Archeol, 242–251 (2017).
98. Larin OV, Afanasievo culture of Altai Mountains: burial ground Saldyar-1. Barnaul (2005).
99. Vadetskaya NF, E. B; Polyakov AV; Stepanova, Corpus of the Afanasyevo Culture sites. Azbuka,
Barnaul, 301 (2014).
100. Gryaznov MP, Afanasievo Culture in the Yenisei Basin. Nauk. Saint-petersbg (1999).
101. Teploukhov SA, Classification of the ancient metal cultures of the Minusinks region. (1929), vol.
IV (2).
102. Stepanova NF, On the question of the chronology of the Afanasievo culture of Gorny Altai.
Archaeol. Stud. steppe Eurasia, 61–70 (2013).
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 27
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
103. Stepanova NF, The burial grounds of the Afanasievo culture of the surroundings of the village of
Elo in the Altai Mountains (materials to the arch of monuments). Antiq. Sib. Cent. Asia. Collect.
Sci. Pap 16, 8–26 (2012).
104. Abdulganeev MT, Posrednikov VA, Stepanova N, in Sources on the history of the Altai Republic
(Gorno Altaisk, 1997), pp. 69–90.
105. Parzinger H, Nagler A, Leont’ev N, Zubkov V, The multi-period burial ground of Suchanicha near
Minusinsk. On the position of the Tagar culture in the context of the Bronze Age and Iron Age
culture in the Minusinsk Basin. Eurasia Antiq. 15, 67–208 (2009).
106. Dobeš M, Budinský P, Buchvaldek M, Muška J, The catalog of the Corded Ware in Bohemia V.
Bílina region. Praehistorica. 17, 75–146 (1991).
107. Velemínský P, The skeleton from the grave 4/81 from Radovesice. Arch. Anthropol. Dep. Natl.
Museum, Prague (1997).
108. Stloukal M, Eneolithic skeletons from Radovesice. Arch. Anthropol. Dep. Natl. Museum, Prague
(1981).
109. Černý V, Anthropology of the Chalcolithic in Central Europe: variability of chronological,
geographical and sexual dimorphism. PhD Dissertation. University of Bordeaux I (1999).
110. Hanáková H, Stloukal M, Muška J, Anthropological findings from northwest Bohemia taken from
older finds of the Regional Museum in Teplice. Proc. Natl. Museum Prague B33/3–4, 159–264
(1977).
111. Zápotocký M, Muška J, in Archaeological research in Northwest Bohemia in 1993–1997. (1999),
pp. 7–43.
112. von Weinzierl R, The Neolithic settlement near Velké Žernoseky on the Elbe. Mitth. der
Anthropol. Gesellschaft Wien 25, 29–49 (1895).
113. Epimakhov A, Early complex societies in the North of Central Eurasia (based on the materials of
the burial ground Kamennyi Ambar-5 (Chelyabinsk Publishing House, Chelyabisnk, 2005), vol.
1.
114. Krause R, L. N. (Liudmila N. Koriakova, Multidisciplinary investigations of the Bronze Age
settlements in the Southern Trans-Urals (Russia) (Habelt, 2013).
115. Evdokimov VV, Works of the Karagands Detachment (Nauka, Moscow, 1980).
116. Tkachev AA, Central Kazakhstan in the Bronze Age (Publishing House of the Tyumen Oil and
Gas Institute, Tyumen, 2002).
117. Tkachev AA, Central Kazakhstan in the Bronze Age (Publishing House of the Tyumen Oil and
Gas Institute, 2002).
118. Doumani PN et al., Burial ritual, agriculture, and craft production among Bronze Age pastoralists
at Tasbas (Kazakhstan). Archaeol. Res. Asia 1–2, 17–32 (2015).
119. Mar’Yashev AN, New Materials on the Settlements of the Bronze Epoch in the Bayan Zurek
Mountains. Izvestiya. 1, 23–30 (2002).
120. Usmanova ER, Burial ground Lisakovsky-1: facts and parallels (TENGRI Ltd. Publ., Karaganda,
Lisakovsk, 2005).
121. Epimakhov AV, Revisiting radiocarbon argumentation of the early dating of Alakul’ antiquities.
Bull. Archeol. Anthropol. Ethnogr 34, 60–67 (2016).
122. Panyushkina IP, Mills BJ, Usmanova ER, Cheng L, Calendar Age of Lisakovsky Timbers
Attributed to Andronovo Community of Bronze Age in Eurasia. Radiocarbon. 50, 459–469
(2008).
123. Kuzmina EE, Aryans – way to the South. (Moscow - St.Petersburg, Letniy sad, 2008).
124. Molodin VI, Epimakhov AV, Marchenko Zh V, Radiocarbon chronology of the Bronze Age of the
Urals and south of the Western Siberia: Principles and Approaches, achievements and challenges.
Bull. Novosib. State Univ. Ser. Hist. Philol, 136–167 (2014).
125. Kozintsev AG, Craniometric Evidence of the Early Caucasoid Migrations to Sibera and Eastern
Central Asia, with Reference to the Indo-European Probem. Archaeol. Ethnol. Anthropol.
Eurasia 37, 125–136 (2009).
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 28
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
126. Zubova DV, A. V; Chikisheva TA; Pozdnyakov, in The Aryans in the Eurasian steppes and
neighboring territories: the Bronze and Early Iron Ages (Altai State University Press, Barnaul,
2014).
127. Zubova AV, Populations of Western Siberia of the II millennium BC according to anthropological
data. (IAET SO RAN, Novosibirsk, 2014).
128. Leontyev NV, Archive of the N. M. Martyanov Minusinsk Regional Museum. Minusinsk. 1
(1989).
129. Fribus AV, Grushin SP, in Proceedings of the V (XXI) All-Russian Archaeological Congress in
Barnaul (Barnaul, 2017), p. 370.
130. Kiselev SV, Ancient history of Southern Siberia (Nauka, Moscow, 1951).
131. Molodin VI, Baraba in the Bronze Age (Novosibirsk: Nauka, 1985).
132. Kushaev G, Unpublished Archive Report. Inst. Archaeol. Almaty, Kazakhstan (1983).
133. Isakov TM, A. I; Potemkina, The Cemetery of the Bronze Age Tribes in Tajikistan. Sov.
Arkheologija 1 1, 145–167 (1989).
134. Potemkina TM, The Archeoastronomical Aspect in the Reconstruction of the Worldviews of
Ancient Peoples. Anthropol. Archeol. Eurasia 45, 7–28 (2006).
135. Mar’yashev AN, Goryachev AA, About the Question of the Typology and Chronology of the
Bronze Age Sites of Semirech’e. Russ. Archaeol, 5–20 (1993).
136. Mar’Yashev AA, A. N; Goryachev, Questions of periodization and chronology of the Bronze Age
monuments шт Semirechye. Ross. Arkheologiya 1, 5–19 (1993).
137. Mar’yashev AN, “Unpublished Archive Report number 2172” (Almaty, Kazakhstan, 1988).
138. Mar’Yashev KM, A N; Karabaspakova, in Ancient Memorials of North Asia and their Guard
excavations Novosibirsk, Nauka, Medvedev VE, Khudyakov YS, Eds. (Nauka, Novosibirsk,
1988), pp. 24–39.
139. Mar’yashev AN, “Unpublished Archive Report Number 2044” (Almaty, Kazakhstan, 1984).
140. Ivanov GP, Kashkarchinsky burial ground - a new monument of the late Bronze Age in Fergana.
Soc. Sci. Uzb. Tashkent 10, 44–47 (1988).
141. Varfolomeev V, in Questions of archeology of Central and Northern Kazakhstan (Karaganda,
1989), pp. 76–84.
142. Tkachev AA, Chansharsky archaeological microdistrict in the Late Bronze Age of the Aktobe
Urals. Ufa Archeol. Her 9, 72–83 (2009).
143. Arslanova FK, Essays in Medieval Archeology of the Upper Irtysh region. Arch. A.Kh. Margulan
Inst. Archeol (2013).
144. Arslanova FK, in “In the depths of centuries”. Archeological collection, Akishev KA, Ed. (Nauka,
Alma-Ata, 1974), pp. 46–60.
145. Arslanova FK, “Unpublished Archive Report 1175” (Almaty, Kazakhstan, 1969).
146. Doumani P, Bronze Age Potters in Regional Context: Long-Term Development of Ceramic
Technology in the Eastern Eurasian Steppe Zone. Arts Sci. Electron. Theses Diss (2014), doi:
10.7936/K79S1P69.
147. Spengler RN, Frachetti MD, Doumani PN, Late Bronze Age agriculture at Tasbas in the
Dzhungar Mountains of eastern Kazakhstan. Quat. Int 348, 147–157 (2014).
148. Coon CS, The seven caves: archaeological explorations in the Middle East / by Carleton S. Coon
(Knopf New York, [1st ed.]., 1957).
149. Coon CS, Cave explorations in Iran, 1949. (University Museum, University of Pennsylvania,
Philadelphia, 1951), Museum monographs.
150. Gregg M, Thornton CP, A Preliminary Analysis of Prehistoric Pottery from Carleton Coon’s
Excavations of Hotu and Belt Caves in Northern Iran: Implications for Future Research into the
Emergence of Village Life in Western Central Asia. International Journal of the Humanities. 19,
56–94 (2012),
151. Libby WF, Radiocarbon dating. Univ. Chicago Press Chicago, (1951).
152. Ralph EK, University of Pennsylvania Radiocarbon Dates I. Science (80-. ). 121, 149–151 (1955).
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 29
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
153. Zeder MA, Hesse B, The initial domestication of goats (Capra hircus) in the Zagros mountains
10,000 years ago. Science. 287, 2254–7 (2000). [PubMed: 10731145]
154. Meiklejohn C, Merrett DC, Reich D, Pinhasi R, Direct dating of human skeletal material from
Ganj Dareh, Early Neolithic of the Iranian Zagros. J. Archaeol. Sci. Reports 12, 165–172 (2017).
155. Merrett DC, Bioarchaeology in Early Neolithic Iran: assessment of health status and subsistence
strategy (University of Manitoba, 2004).
156. Meiklejohn C, Agelarakis A, Akkermans PMMG, Smith PEL, Solecki R, Artificial cranial
deformation in the Proto-neolithic and Neolithic Near East and its possible origin: Evidence from
four sites. Paléorient. 18, 83–97 (1992).
157. Smith PEL, Architectural Innovation and Experimentation at Ganj Dareh, Iran. World Archaeol.
21 (1990), pp. 323–335.
158. Voigt MM, Meadow RH, Hasanlu, Volume I: Hajji Firuz Tepe, Iran--The Neolithic Settlement
(University of Pennsylvania Press, 1983), Hasanlu excavation reports.
159. Spengler RN, Willcox G, Archaeobotanical results from Sarazm, Tajikistan, an Early Bronze Age
Settlement on the edge: Agriculture and exchange. Environ. Archaeol 18, 211–221 (2013).
160. Lyonnet B, Isakov AE, Ceramics, (Chalcolithic and Bronze Age Ancient). Mem. French
Archaeol. Mission Cent. Asia 7 (1996).
161. Avanessova NA, Dzhurakulova DM, in Proceeding of the International Conferrnece on The
Culture of Nomads of Central Asia (2007), p. Pp. 13–33.
162. Rice DT, Excavations at Tepe Hissar, Damghan. By Erich F. Schmidt. With an additional chapter
on the Sasanian building by Fiske Kimball. Philadelphia: University of Pennsylvania Press, 1937.
pp. XXI, 478, 177 figs. and 79 plates. £3 7s 6d. Antiquity. 12, 494–495 (1938).
163. Gursan-Salzmann A, The new chronology of the Bronze Age settlement of Tepe Hissar, Iran
(University of Pennsylvania Press, 2016).
164. Bobomulloyev S, Vinogradova NM, Bobomulloyev B, Excavations of the Middle Bronze Age
burial ground of Farkhor in the south of Tajikistan. Zap. IIMK 11, 47–66 (2015).
165. Bobomulloyev S, Vinogradova NM, Bobomulloyev B, Monuments of the Middle Bronze Age in
Southwest Tajikistan. Reports Acad. Sci. Repub. Tajikistan 3, 39–47 (2014).
166. Vinogradova NM, Monuments of the Middle Bronze Age in Southwest Tajikistan. VDI. 4, 84–98
(2011).
167. Vinogradova NM, Preliminary Results of the Excavations at the Necropolis of Farkhor, a Middle
Bronze Age Site in Southern Tajikistan. Archeol. Rev. Anc. east. News Archaeol. Res 2, 21–27
(2015).
168. Tosi M, Excavations at Shahr-i Sokhta, a Chalcolithic Settlement in the Iranian Sīstān.
Preliminary Report on the Second Campaign. East West, 283–386 (1969).
169. Tosi M, Excavations at Shahr-i Sokhta, a Chalcolithic Settlement in the Iranian Sīstān.
Preliminary Report on the First Campaign, October-December. East West. 18 (1967), pp. 9–66.
170. Lamberg-Karlovsky CC, Tosi M, Shahr-i Sokhta and Tepe Yahya: Tracks on the Earliest History
of the Iranian Plateau. East West. 23 (1973), pp. 21–57.
171. S. V.I, The Pottery of Shahr-i Sokhta I and its Southern Turkmenian Connections. In: Tosi M
(ed.). Prehist. Sistan 1. IsMEO Reports Mem. 19.1. IsMEO, Rome 19.1, 183–198 (1983).
172. Bonora GL, Domani C, Salvatori S, Soldini A, The oldest graves of the Shahr-i Sokhta graveyard.
South Asian Archeol. Isiao, Ser. Orient. Rome, 485–94 (1997).
173. Cortesi E, Tosi M, Lazzari A, Vidale M, Cultural Relationships beyond the Iranian Plateau: The
Helmand Civilization, Baluchistan and the Indus Valley in the 3rd Millennium BCE. Paléorient.
34, 5–35 (2008).
174. Salvatori M, Tosi S, Shahr-i-Sokhta Revised Sequence. South Asian Archeol. 2001, Paris, ERC,
281–292.
175. Jarrige J-F, Didier A, Quivron G, Shahr-i Sokhta and the chronology of the Indo-Iranian regions.
Paléorient. 37, 7–34 (2011).
176. Salvatori S, Vidale M, Shahr-i Sokhta 1975 – 1978: central quarters excavations; preliminary
report (IsIAo, 1997), Reports and memoirs. Series minor.
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 30
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
177. F. T. (Fredrik T. Hiebert, K. (Kakamurad) Kurbansakhatov, H. Schmidt, A Central Asian village
at the dawn of civilization, excavations at Anau, Turkmenistan (University of Pennsylvania
Museum of Archaeology and Anthropology, 2003).
178. Denisov EP, The Ksirov burial ground in the context of the funeral monuments of the nomads of
North Bastria-Tokharistan (On the ethnic history of the North Bactria of the Kushan time)].
(Satvropol, 1999).
179. Vidale M, Micheli R, Protohistoric graveyards of the Swat Valley, Pakistan: new light on funerary
practices and absolute chronology. Antiquity. 91, 389–405 (2017).
180. Vidale M, Micheli R, Olivieri LM, Excavations at the protohistoric graveyards of Gogdara and
Udegram (ACT Field School Repors and Memoirs, III, Sang-e-Meel, Lahore, 2016).
181. Antonini C. Silvi, Stacul G, The proto-historic graveyards of Swāt (Pakistan). 1, Description of
graves and finds (IsMEO Reports and Memoirs (Rome) VII, 1972).
182. Stacul G, Preliminary Report on the Pre-Buddhist Necropolises in Swat (W. Pakistan). East West.
16 (1966), pp. 37–79.
183. Antonini CS, Preliminary Notes on the Excavation of the Necropolises found in Western Pakistan.
East West. 14 (1964), pp. 13–26.
184. Tucci G, The Tombs of the Asvakayana-Assakenoi. East West. 14 (1964), pp. 27–28.
185. Vinogradova N, Towards the Question of the Relative Chronology for Protohistoric Swat
Sequence (on the Basis of the Swat Graveyards). East West. 51, 9–36 (2001).
186. Salvatori S, Analysis of the Association of Types in the Protohistoric Graveyards of the Swāt
Valley (Loebanr I, Kātelai I, Butkara II). East West. 25, 333–351 (1975).
187. Castaldi E, The graveyard of Katelai I in the Swat. Excavation report of the tombs 46–80. Accad.
Naz. dei Lincei, Anno CCCLXV, Roma (1968).
188. Passarello P, Macchiarelli R, The proto-historic graveyard of Katelai (Swat, Pakistan):
Anthropological analyses of skeletal material with reference to the paleobiological human
context of the Middle East. Riv. di Antropol. LXV, 5–104 (1984).
189. Tucci G, Preliminary report on an archaeological survey in Swat. East West. 9 (1958), pp. 279–
328.
190. Tusa S, Notes on Some Protohistoric Finds in the Swāt Valley (Pakistan). East West. 31 (1981),
pp. 99–120.
191. Stacul G, Bīr-kōṭ-ghuṇḍai (Swāt, Pakistan) 1978 Excavation Report. East West. 30 (1980), pp.
55–65.
192. Stacul G, Excavation at Bīr-kōṭ-ghuṇḍai (Swāt, Pakistan). East West. 28 (1978), pp. 137–150.
193. Olivieri LM et al., The last phases of the urban site of Bir-Kot-Ghwandai (Barikot): The Buddhist
sites of Gumbat and Amluk-Dara (Barikot) (ACT Reports and Memoirs, II, Sang-e-Meel, Lahore,
2014).
194. Olivieri LM, Iori E, Early-historic Data from the 2016 Excavation Campaigns at the Urban Site of
Barikot, Swat (Pakistan): A Shifting Perspective. Proc. 29th South Asian Art Archeol. Conf.
(2016).
195. Olivieri LM, Urban Defenses at Bīr-koṭ-ghwaṇḍai, Swat (Pakistan). Anc. Civilizations from
Scythia to Sib 21, 183–199 (2015).
196. Olivieri LM, Marzaioli F, A new revised chronology and cultural sequence of the Swat valley,
Khyber Pakhtunkhwa (Pakistan) in the light of current excavations at Barikot (Bir-kot-gwandai).
Proc. 14th Int. Conf. AMS, Nucl. Instruments Methods Phys. Res. Sect. B Beam Interact. with
Mater. Atoms (2017).
197. Stacul G, Tusa S, Report on the Excavations at Aligrāma (Swāt, Pakistan) 1966, 1972. East West.
25 (1975), pp. 291–321.
198. Stacul G, Tusa S, Report on the Excavations at Aligrāma (Swāt, Pakistan) 1974. East West. 27
(1977), pp. 151–205.
199. Zahir M, in A Companion to South Asia in the Past (John Wiley & Sons, Inc, Hoboken, NJ,
2016), pp. 274–293.
200. Alciati G, Human skeletal remains of the necropoli of the Swat (W. Pakistan). 1. Butkara II
(IsMEO Reports and Memoirs (Rome), VIII, 1, 1967).
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 31
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
201. Genna G, First Anthropological Investigations of the Skeletal Remains of the Necropolis of
Butkara II (Swat, West Pakistan). East West. 15 (1965), pp. 161–167.
202. Fritsch CC, Butkara II Revisited. East West. 47, 41–66 (1997).
203. Pardini E, The Human Remains from Aligrāma Settlement (Swāt, Pakistan). East West. 27
(1977), pp. 207–226.
204. Callieri P, Saidu Sharif I (Swat, Pakistan).: The Buddhist sacred area: the Monastery (IsMEO
Reports and Memoirs (Rome), XXIII,1, Rome, 1989).
205. Noci F, Faccenna D, Macchiarelli R, Saidu Sharif I (Swat, Pakistan): The Graveyard, IsIao
Reports and Memoirs (Rome) XXIII, 3, 1997.
206. Faccenna D, Saidu Sharif I (Swat, Pakistan). The Buddhist sacred area: the Stupa terrace (1995),
IsMEO Reports and memoirs (Rome)XXIII, 2, 1995.
207. Olivieri LM et al., The Graveyard and the Buddhist shrine at Saidu Sharif I (Swat, Pakistan): fresh
chronological and stratigraphic evidence. J. Anc. Hist 76, 559–578 (2016).
208. Tucci G, On Swāt. The Dards and Connected Problems. East West. 27 (1977), pp. 9–103.
209. Olivieri LM, Afridi A, Iori E, Urban defenses at Bīr-koṭ-ghwaṇḍai, Swat (Pakistan). Data from
the 2015 excavation campaign. Pakistan Herit. 7, 73–94 (2015).
210. Callieri P, Filigenzi A, Stacul G, Bir-Kot-Ghwandai, Swat: 1987 Excavation Campaign. Pakistan
Archeol. 25, 183–192 (1990).
211. Callieri P et al., Bīr-koṭ-ghwaṇḍai, Swat, Pakistan. 1998–1999 Excavation Report. East West. 50
(2000), pp. 191–226.
212. Bagnera A, The Ghaznavid Mosque and the Islamic Settlement at Mt. Rāja Gīrā, Udegram (ACT
Reports and Memoirs, V, Sang-e-Meel, Lahore, 2015).
213. Chaubey G et al., Population genetic structure in Indian Austroasiatic speakers: the role of
landscape barriers and sex-specific admixture. Mol. Biol. Evol 28, 1013–24 (2011). [PubMed:
20978040]
214. Basu A, Sarkar-Roy N, Majumder PP, Genomic reconstruction of the history of extant
populations of India reveals five distinct ancestral components and a complex structure. Proc.
Natl. Acad. Sci. U. S. A 113, 1594–9 (2016). [PubMed: 26811443]
215. Narasimhan VM et al., The Genomic Formation of South and Central Asia. bioRxiv, 292581
(2018).
216. Reich D et al., Reconstructing Native American population history. Nature. 488, 370–374 (2012).
[PubMed: 22801491]
217. Lazaridis I et al., Genetic origins of the Minoans and Mycenaeans. Nature. 548, 214 (2017).
[PubMed: 28783727]
218. Frenez D, Manufacturing and trade of Asian elephant ivory in Bronze Age Middle Asia. Evidence
from Gonur Depe (Margiana, Turkmenistan). Archaeol. Res. Asia 15, 13–33 (2018).
219. Jarrige J-F, Didier A, Quivron G, Shahr-i Sokhta and the chronology of the Indo-Iranian regions.
Paléorient. 37, 7–34 (2011).
220. Kunsch HR, The Jackknife and the Bootstrap for General Stationary Observations. Ann. Stat 17,
1217–1241 (1989).
221. Seguin-Orlando A et al., Paleogenomics. Genomic structure in Europeans dating back at least
36,200 years. Science. 346, 1113–8 (2014). [PubMed: 25378462]
222. Metspalu M et al., Shared and unique components of human population structure and genomewide signals of positive selection in South Asia. Am. J. Hum. Genet 89, 731–44 (2011).
[PubMed: 22152676]
223. Chaubey G, Kadian A, Bala S, Rao VR, Genetic Affinity of the Bhil, Kol and Gond Mentioned in
Epic Ramayana. PLoS One. 10, e0127655 (2015). [PubMed: 26061398]
224. O’Connell JF et al., When did Homo sapiens first reach Southeast Asia and Sahul? Proc. Natl.
Acad. Sci. U. S. A 115, 8482–8490 (2018). [PubMed: 30082377]
225. Poznik GD et al., Punctuated bursts in human male demography inferred from 1,244 worldwide
Y-chromosome sequences. Nat. Genet 48, 593–599 (2016). [PubMed: 27111036]
226. Moorjani P et al., The History of African Gene Flow into Southern Europeans, Levantines, and
Jews. PLoS Genet. 7, e1001373 (2011). [PubMed: 21533020]
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 32
Author Manuscript
227. Hellenthal G et al., A genetic atlas of human admixture history. Science. 343, 747–751 (2014).
[PubMed: 24531965]
228. 1000 Genomes Project Consortium et al., An integrated map of genetic variation from 1,092
human genomes. Nature. 491, 56–65 (2012). [PubMed: 23128226]
Author Manuscript
Author Manuscript
Author Manuscript
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 33
Author Manuscript
Box 1.
Seven Source Populations Used for Distal Ancestry Modeling
Author Manuscript
Anatolia_N
Anatolian farmer-related: Represented by 7th millennium BCE western Anatolian
farmers (19)
Ganj_Dareh_N
Iranian early farmer-related: Represented by 8th millennium BCE early goat herders
from the Zagros Mountains of Iran (9, 24)
WEHG
Western European Hunter-Gatherer-related: represented by 9th millennium BCE Western
Europeans (7, 19, 32, 69). (WEHG and EEHG discussed below were denoted WHG and
EHG in previous studies, but as we co-analyze them with hunter-gatherers from Asia we
modify the names to specify a European origin.)
EEHG
Eastern European Hunter-Gatherer-related: represented by 6th millennium BCE huntergatherers from Eastern Europe (19, 32)
WSHG
West Siberian Hunter-Gatherer-related: a previously undescribed deep source of
Eurasian ancestry represented in this study by three individuals from the Forest Zone of
Central Russia dated to the 6th millennium BCE.
ESHG
East Siberian Hunter-Gather-related: represented by 6th millennium BCE huntergatherers from the Lake Baikal region with ancestry deeply related to East Asians (26)
AHG
Andamanese Hunter-Gatherer-related: represented by present day indigenous Andaman
Islanders (55) who we hypothesize are related to unsampled indigenous South Asians
(Ancient Ancestral South Indians - AASI)
Author Manuscript
Author Manuscript
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 34
Author Manuscript
Box 2.
Summary of Key Findings
Iran and Turan
Author Manuscript
Author Manuscript
1.
A West-to-East Cline of Decreasing Anatolian Farmer-Related Ancestry.
There was a west-to-east gradient of ancestry across Eurasia in the Copper
Age and Bronze Ages—the Southwest Asian Cline—with more Anatolian
farmer-related ancestry in the west and more WSHG- or AASI-related
ancestry in the east, superimposed on primary ancestry related to early Iranian
farmers. The establishment of this gradient correlates in time to the spread of
plant-based agriculture across this region, raising the possibility that people of
Anatolian ancestry spread this technology east just as they helped spread it
west into Europe.
2.
People of the BMAC Were Not a Major Source of Ancestry for South
Asians. The primary BMAC population largely derived from preceding local
Copper Age peoples who were in turn closely related to people from the
Iranian plateau, and had little of the Steppe ancestry that is ubiquitous in
South Asia today.
3.
Steppe Pastoralist-Derived Ancestry Arrived in Turan by 2100 BCE. We find
no evidence of Steppe pastoralist-derived ancestry in groups at BMAC sites
prior to 2100 BCE, but multiple outlier individuals buried at these sites show
that by ~2100–1700 BCE, BMAC communities were regularly interacted with
peoples carrying such ancestry.
4.
An Ancestry Profile Widespread During the Indus Valley Civilization. We
document a distinctive ancestry profile—45–82% Iranian farmer-related and
11–50% AASI (with negligible Anatolian farmer-related admixture)—that
was present at two sites in cultural contact with the Indus Valley Culture
(IVC). Combined with our detection of this same ancestry profile (in mixed
form) about a millennium later in the post-IVC Swat Valley, this documents
an Indus Periphery Cline during the flourishing of the IVC. Ancestors of this
group formed by admixture ~5400–3700 BCE. There is little if any Anatolian
farmer-related ancestry in the Indus Periphery Cline.
The Steppe and Forest Zone
1.
Author Manuscript
Ancestry Clines in North Eurasia Established After the Advent of Farming.
Prior to the spread of farmers and herders, northern Eurasia was characterized
by a west-to-east gradient of very divergent hunter-gatherer populations with
increasing proportions of relatedness to present-day East Asians: from
Western European Hunter-Gatherers (WEHG), to Eastern European HunterGatherers (EEHG), to West Siberian Hunter-Gatherers (WSHG), to Eastern
Siberian Hunter-Gatherers (ESHG). Mixture of people along this ancestry
gradient and its counterpart to the south formed five later clines following the
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 35
Author Manuscript
advent of farming, the three northern ones of which are the European Cline,
the Caucasus Cline, and the Central Asian Cline.
Author Manuscript
Author Manuscript
2.
A Distinctive Ancestry Profile Stretching from Eastern Europe to
Kazakhstan in the Bronze Age. We add more than one hundred samples from
the previously described Western_Steppe_MLBA genetic cluster, including
individuals associated with the Corded Ware, Srubnaya, Petrovka, and
Sintashta archaeological complexes, and characterized by a mixture of about
two-thirds ancestry related to Yamnaya Steppe pastoralists (from the
Caucasus Cline), and European farmers (from the European Cline) suggesting
that this population formed at the geographic interface of these two groups in
Eastern Europe. Our analysis suggests that in the central Steppe and
Minusinsk Basin in the Middle to Late Bronze Age, Western_Steppe_MLBA
ancestry mixed with about 9% ancestry from previously established people
from the region carrying WSHG-related to form a distinctive
Central_Steppe_MLBA cluster that was the primary conduit for spreading
Yamnaya Steppe pastoralist-derived ancestry to South Asia.
3.
Bidirectional Mobility Along the Inner Asian Mountain Corridor.
Beginning in the 3rd millennium BCE and intensifying in the 2nd millennium
BCE, we observe multiple individuals in the Central Steppe who lived along
the Inner Asian Mountain Corridor and who harbored admixture from Turan,
documenting northward movement into the Steppe in this period. By the end
of the 2nd millennium BCE, these people were later joined by numerous
outlier individuals with East Asian-related admixture which become
ubiquitous in the region by the Iron Age (25, 52). This ancestry is also seen in
later groups with known cultural impacts on South Asia including Huns,
Kushans and Sakas and is hardly present in the two primary ancestral
populations of South Asia, suggesting that the Steppe ancestry widespread in
South Asia derived from pre-Iron Age Central Asians.
South Asia
Author Manuscript
1.
Three Ancestry Clines That Succeeded Each Other in Time in South Asia.
We identify a unique trio of source populations that fits geographically and
temporally diverse South Asians since the Bronze Age: a mixture of AASI, an
Indus Periphery Cline group with predominantly Iranian farmer-related
ancestry, and Central_Steppe_MLBA. Two-way clines driven more by
populations that were mixtures of these three sources succeeded each other in
time: prior to 2000 BCE the Indus Periphery Cline had no detectable Steppe
ancestry, beginning after 2000 BCE the Steppe Cline, and finally the Modern
Indian Cline.
2.
The ASI and ANI Arose as Indus Periphery Cline People Mixed with
Groups to the North and East. An ancestry gradient of which the Indus
Periphery Cline individuals were a part played a pivotal role in the formation
of both the two proximal sources of ancestry in South Asia: a minimum of
~55% Indus Periphery Cline ancestry for the ASI and ~70% for the ANI.
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 36
Author Manuscript
Today there are groups in South Asia with very similar ancestry to the
statistically reconstructed ASI suggesting that they have essentially direct
descendants today. Much of the formation of both the ASI and ANI occurred
in the 2nd millennium BCE. Thus, the events that formed both the ASI and
ANI overlapped the time of the decline of the IVC.
3.
Author Manuscript
Steppe Ancestry in South Asia is Primarily from Males and
Disproportionately High in Brahmins. Most of the Steppe ancestry in South
Asia derives from males, pointing to asymmetric social interaction between
descendants of Steppe pastoralists and peoples of the Indus Periphery Cline.
Groups that view themselves as being of traditionally priestly status,
including traditional custodians of liturgical texts in the early Indo-European
language Sanskrit, tend (with exceptions) to have more Steppe ancestry than
expected based on ANI-ASI mixture, providing an independent line of
evidence for a Steppe origin for South Asia’s Indo-European languages.
Author Manuscript
Author Manuscript
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 37
Author Manuscript
Author Manuscript
Fig. 1. Overview of ancient DNA data.
Author Manuscript
(A) Distribution of sites and associated archeological or radiocarbon dates along with the
number of individuals meeting our analysis thresholds from each site. (B) Locations of
ancient individuals for whom we generated ancient DNA that passed our analysis thresholds
along with the locations of individuals from 140 groups from present-day South Asia that we
analyzed as forming the Modern Indian Cline. Shapes distinguish the individuals from
different sites. Data from 106 South Asian groups that do not fit along the Modern Indian
Cline as well as AHG are not shown. (C) PCA analysis of ancient and modern individuals
projected onto a basis formed by 1,340 present day Eurasians reflects clustering of
individuals that mirrors their geographical relationships. An interactive version of this figure
is presented in the Online Data Visualizer.
Author Manuscript
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 38
Author Manuscript
Author Manuscript
Author Manuscript
Fig. 2. Outlier analysis reveals ancient contacts between sites.
Author Manuscript
We plot the average of Principal Component 1 (x-axis) and Principal Component 2 (y-axis)
for the West Eurasian and All Eurasian PCA plots, as we found that this aids visual
separation of the ancestry profiles. (A) In the Middle to Late Bronze Age Steppe, we
observe in addition to the Western_Steppe_MLBA and Central_Steppe_MLBA clusters
(indistinguishable in this projection), outliers admixed with other ancestries. The BMACrelated admixture in Kazakhstan documents northward gene flow onto the Steppe and
confirms the Inner Asian Mountain Corridor as a conduit for movement of people. (B) At
Shahr-i-Sokhta in eastern Iran, there are two primary groupings: one with ~20% Anatolian
farmer-related ancestry and no detectable AHG-related ancestry, and the other with ~0%
Anatolian farmer-related ancestry and substantial AHG-related ancestry (Indus Periphery
Cline). (C) In individuals of the BMAC and successor sites, we observe a main cluster as
well as numerous outliers: outliers >2000 BCE with admixture related to WSHG, outliers
>2000 BCE on the Indus Periphery Cline (with an ancestral similar similar to the outliers at
Shahr-i-Sokhta), and outliers after 2000 BCE that reveal how Central_Steppe_MLBA
ancestry had arrived. (D) In the Late Bronze Age and Iron Age of northernmost South Asia,
we observe a main cluster consistent with admixture between peoples of the Indus Periphery
Cline and Central_Steppe_MLBA, and variable Steppe pastoralist-related admixture.
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 39
Author Manuscript
Author Manuscript
Author Manuscript
Fig. 3. Ancestry Transformations in Holocene Eurasia.
Author Manuscript
(A) Ancestry clines before and after the advent of farming. We document a South Eurasian
Early Holocene Cline of increasing Iranian farmer- and West Siberian hunter-gatherer
related ancestry moving west-to-east from Anatolia to Iran, and a North Eurasian Early
Holocene Cline of increasing relatedness to East Asians moving west-to-east from Europe to
Siberia. Mixtures of peoples along these two clines following the spread of farming formed
five later gradients (shaded): moving west-to-east: the European Cline, the Caucasus Cline
from which the Yamnaya formed, the Central Asian Cline which characterized much of
Central Asia in the Copper and Bronze Ages, the Southwest Asian Cline established by
spreads of farmers in multiple directions from several loci of domestication, and the Indus
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 40
Author Manuscript
Periphery Cline. (B) Following the appearance of the Yamnaya Steppe pastoralists,
Western_Steppe_EMBA (Yamnaya-like) ancestry then spread across this vast region. We use
arrows to show plausible directions of spread of increasingly diluted ancestry (the arrows are
not meant as exact routes which we do not have enough sampling to determine at present).
Rough estimates of the timing of the arrival of this ancestry and estimated ancestry
proportions are shown.
Author Manuscript
Author Manuscript
Author Manuscript
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 41
Author Manuscript
Author Manuscript
Author Manuscript
Fig. 4. The Genomic Formation of South Asia.
Author Manuscript
(A) The degree of allele sharing with southern Asian hunter-gatherers (AASI) measured by
f4(Ethiopia_4500BP, X; Ganj_Dareh_N, AHG) and with Steppe pastoralists measured by
f4(Ethiopia_4500BP, X; Central_Steppe_MLBA, Ganj_Dareh_N) reveals three ancestry
clines that succeeded each other in time: the Indus Periphery Cline prior to ~2000 BCE, the
Steppe Cline represented by northern South Asian individuals after ~2000 BCE, and the
Modern Indian Cline. (B) Modeling South Asians as a mixture of Central_Steppe_MLBA,
AHG (as a proxy for AASI), and Indus_Periphery_West (the individual from the Indus
Periphery Cline with the least AASI ancestry). Groups along the edges of the triangle fit a
two-way model, and in the interior only fit a three-way model. The 140 present-day South
Asian groups on the Modern Indian Cline are shown as small dots. (C) Groups that
traditionally view themselves as being of priestly status in this and the preceding panel are
shown in red (“Brahmin,” “Pandit,” and “Bhumihar” but excluding “Catholic Brahmins”),
and tend to have a significantly higher ratio of Central_Steppe_MLBA to
Indus_Periphery_Cline ancestry than other groups. (D) Plot of the proportion of
Central_Steppe_MLBA ancestry on the autosomes (x-axis) and the Y chromosome (y-axis)
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 42
Author Manuscript
shows that the source of this ancestry is primarily from females in Late Bronze Age and Iron
Age individuals from the Swat District of northernmost South Asia, and primarily from
males in most present-day South Asians.
Author Manuscript
Author Manuscript
Author Manuscript
Science. Author manuscript; available in PMC 2019 October 31.
Narasimhan et al.
Page 43
Author Manuscript
Author Manuscript
Fig. 5. Admixture Graph Model.
Author Manuscript
Author Manuscript
The largest deviation between empirical and theoretical f-statistics is |Z|=2.9, indicating a
good fit considering the large number of f-statistics analyzed. Admixture events are shown
as dotted lines labeled by proportions, with the minor ancestry in gray. The present-day
groups are shown in orange ovals, the ancient ones in blue, and unsampled groups in white.
(The ovals and admixture events are positioned according to guesses about their relative
dates to help in visualization, although the dates are in no way meant to be exact.) In this
graph we do not attempt to model the contribution of WSHG and Anatolian farmer-related
ancestry, and thus cannot model Central_Steppe_EMBA, the proximal source of Steppe
ancestry in South Asia (instead we model the Steppe ancestry in South Asia through the
more distally related Yamnaya). However, the admixture graph does highlight several key
findings of the study, including the deep separation of the AASI from other Eurasian
lineages, and the fact that some Austroasiatic-speaking groups in South Asia (e.g. Juang)
harbor ancestry from a South Asian group with a higher ratio of AASI-related to Iranian
farmer-related ancestry than any groups on the Modern Indian Cline, thus revealing that
groups with substantial Iranian farmer-related ancestry were not ubiquitous in peninsular
South Asia in the 3rd millennium BCE when Austroasiatic languages likely spread across the
subcontinent.
Science. Author manuscript; available in PMC 2019 October 31.