The present paper provides two case studies of the basic vocabulary of the Turkic languages spoken on the Crimea Peninsula. Its aim is to illuminate the issues that a historical linguist, and in particular a phylogeneticist, faces when... more
The present paper provides two case studies of the basic vocabulary of the Turkic languages spoken on the Crimea Peninsula. Its aim is to illuminate the issues that a historical linguist, and in particular a phylogeneticist, faces when analyzing the basic vocabulary of closely related languages in a situation of intensive contact. The first case study is dedicated to the onomasiological reconstruction of the Proto-Karaim Swadesh list. The main problem here is detection of the West Oghuz loans and especially of contact-induced archaization (fake ar-chaisms) in Crimean Karaim. The objective of the second case study is to identify the genea-logical affiliation of the Crimean Tatar dialects. Both the manual analysis of the innovations in the basic vocabulary and the computational lexicostatistics (Bayesian approach, Neighbor-joining, Maximum Parsimony Analysis) confirm the traditional view that the Coastal dialect belongs to the Oghuz subgroup, the Orta dialect-to the West Kipchak subgroup, and the Steppe dialect-to the Nogai Kipchak subgroup. Such affiliations fully fit the documented ethnic history. The correct genealogical affiliation of the dialects in question became possible only after exclusion of all the loans, which has not been done in previous lexicostatistical studies of Crimean Tatar. Both cases show that careful elimination of areal influences is crucial for semantic (onomasiological) reconstruction and phylogenetic studies.
This article surveys various long-standing ambiguities and confusions that continue to dog lexicostatistics and glottochronology. I aim to offer some novel perspectives and clarifications, which also help map out how we might devise new,... more
This article surveys various long-standing ambiguities and confusions that continue to dog lexicostatistics and glottochronology. I aim to offer some novel perspectives and clarifications, which also help map out how we might devise new, alternative methods to build upon the good in Swadesh's troubled legacy. I challenge the recent trend towards honing down Swadesh's original list to a minimal core. A richer signal on language relationships is to be had not by discarding the data in meanings considered 'unstable', but by exploring the revealing patterns that emerge only when those meanings are kept, and contrasted against their 'core' counterparts.
La Lista Swadesh de voces vascas estables resistente a la donación lingüística y lo que revela su comparación con las restantes familias de lenguas del mundo. Aún hallamos en los libros de textos universitarios y en las enciclopedias más... more
La Lista Swadesh de voces vascas estables resistente a la donación lingüística y lo que revela su comparación con las restantes familias de lenguas del mundo. Aún hallamos en los libros de textos universitarios y en las enciclopedias más autorizadas el viejo paradigma de que el vasco como aún es denominado a nivel internacional y por los mismos vascólogos es un idioma aislado sin pariente alguno y que por ello mismo no sabemos nada sobre su origen. ¿Es realmente cierto que la lengua vasca no tiene parientes ni siquiera en otros lugares de Eurasia? ¿Surgió espontáneamente de la nada? ¿Acaso fue inventado por una persona que vivía en una de las cuevas paleolíticas del País Vasco y desde entonces se mantuvo el idioma aislado sin apenas evolución? Vamos a intentar dar respuesta a estas preguntas desde las herramientas que nos ofrecen la Lingüística Histórica y la Léxicoestadística comparadas.
This article discusses a problem in integrating archaeology and philology. For most of the twentieth century, archaeologists associated the spread of the Celtic languages with the supposed westward spread of the ‘eastern Hallstatt... more
This article discusses a problem in integrating archaeology and philology. For most of the twentieth century, archaeologists associated the spread of the Celtic languages with the supposed westward spread of the ‘eastern Hallstatt culture’ in the first millennium BC. More recently, some have discarded ‘Celtic from the East’ in favour of ‘Celtic from the West’, according to which Celtic was a much older lingua franca which evolved from a hypothetical Neolithic Proto-Indo-European language in the Atlantic zone and then spread eastwards in the third millennium BC. This article (1) criticizes the assumptions and misinterpretations of classical texts and onomastics that led to ‘Celtic from the East’ in the first place; (2) notes the unreliability of the linguistic evidence for ‘Celtic from the West’, namely (i) ‘glottochronology’ (which assumes that languages change at a steady rate), (ii) misunderstood place-name distribution maps and (iii) the undeciphered inscriptions in southwest Iberia; and (3) proposes that Celtic radiating from France during the first millennium BC would be a more economical explanation of the known facts.
The present book is an etymological dictionary of the basic vocabulary of Turkic languages. Under the basic vocabulary in this case we refer to the so-called Swadesh lists collected for most of the Turkic languages and dialects. The... more
The present book is an etymological dictionary of the basic vocabulary of Turkic languages. Under the basic vocabulary in this case we refer to the so-called Swadesh lists collected for most of the Turkic languages and dialects. The Swadesh list includes 100-110 English words with relatively simple meanings, presumably related to a pre-cultural vocabulary. Based on the presumption that words are often borrowed together with the realities ( objects and concepts related to the word's meaning), scientists believe that words translating entities from the Swadesh list in various languages of the world are borrowed significantly more rarely than the other vocabulary. Here we try to reconstruct as much as possible for each item from Swadesh list , which word could fill it in the Proto-Turkic language, and how the forms and meanings changed to the words caught in Swadesh lists for new languages and dialects. This work, on the one hand, helps to clarify the methods of comparative-historical reconstruction for lexics, on the other hand, the reconstruction of Proto-Turkic basic wordlist provides more accurate estimation of external (Altaic) relations of Turkic languages.
This article discusses a problem in integrating archaeology and philology. For most of the twentieth century, archaeologists associated the spread of the Celtic languages with the supposed westward spread of the ‘eastern Hallstatt... more
This article discusses a problem in integrating archaeology and philology. For most of the twentieth century, archaeologists associated the spread of the Celtic languages with the supposed westward spread of the ‘eastern Hallstatt culture’ in the first millennium bc. More recently, some have discarded ‘Celtic from the East’ in favour of ‘Celtic from the West’, according to which Celtic was a much older lingua franca which evolved from a hypothetical Neolithic Proto-Indo-European language in the Atlantic zone and then spread eastwards in the third millennium bc. This article (1) criticizes the assumptions and misinterpretations of classical texts and onomastics that led to ‘Celtic from the East’ in the first place; (2) notes the unreliability of the linguistic evidence for ‘Celtic from the West’, namely (i) ‘glottochronology’ (which assumes that languages change at a steady rate), (ii) misunderstood place-name distribution maps and (iii) the undeciphered inscriptions in southwest Ibe...
What the modern formal comparative linguistics can say about genetic affiliation of the Hurro-Urartian languages. Attribution to the Sino-Caucasian (Dene-Caucasian) macro-family is discussed as the most likely solution. Specific closeness... more
What the modern formal comparative linguistics can say about genetic affiliation of the Hurro-Urartian languages. Attribution to the Sino-Caucasian (Dene-Caucasian) macro-family is discussed as the most likely solution. Specific closeness to the Yeniseian language family is also suspected.
The use of the sewing needle in Western Europe dates back to the Late Upper Palaeolithic. The terms denoting this instrument in the older PIE languages are highly divergent. The present article discusses their etymologies and the... more
The use of the sewing needle in Western Europe dates back to the Late Upper Palaeolithic. The terms denoting this instrument in the older PIE languages are highly divergent. The present article discusses their etymologies and the conclusions to be drawn from the combination of linguistic and archaeological data.
The paper is a thematical follow-up to the refinements of the lexicostatistical method suggested in [Starostin G. 2010]. It discusses the issue of synonymity/polysemy, a well-known obstacle in the compilation of Swadesh wordlists for... more
The paper is a thematical follow-up to the refinements of the lexicostatistical method suggested in [Starostin G. 2010]. It discusses the issue of synonymity/polysemy, a well-known obstacle in the compilation of Swadesh wordlists for various languages, and presents a list of both syntactic/semantic contexts and explanatory notes that could help reduce the ambiguity issue in the creation and quantitative analysis of such wordlists. The notes and contexts are partially based on linguistic tradition and partially on theoretical and/or pragmatic considerations, some of which are stated explicitly.
Swadesh list, revisited. Leipzig-Jakarta list. We offer a list of 100 words best kept across many linguistic families. We propose that the reconstruction of further linguistic families should begin with these words, who form the core of... more
Swadesh list, revisited. Leipzig-Jakarta list.
We offer a list of 100 words best kept across many linguistic families. We propose that the reconstruction of further linguistic families should begin with these words, who form the core of the vocabulary
Glottochronoly was a tentative method to appreciate when in the past two 'cognate languages' had split. It appeared in the wake of the discovery of how Carbon 14 could be used for dating archeological objects. It assumed that languages... more
Glottochronoly was a tentative method to appreciate when in the past two 'cognate languages' had split. It appeared in the wake of the discovery of how Carbon 14 could be used for dating archeological objects. It assumed that languages changed at a fixed rate, like carbon. It is a fascinating illustration of the difficulties we have in admitting that cultural facts are not natural. This is a slightly simplified version of a conference in Leipzig, 2005.
The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion-mainly to East... more
The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion-mainly to East Europe and the northern Balkans-resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pools. Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and linguistic affiliation. The data suggest that genetic diversity of the present-day Slavs was predominantly shaped in situ, and we detect two different substrata: 'central-east European' for West and East Slavs, and 'south-east European' for South Slavs. A pattern of distribution of segments identical by descent between groups of East-West and South Slavs suggests shared ancestry or a modest gene flow between those two groups, which might derive from the historic spread of Slavic people.
A number of recent papers have sought to apply to language data various phylogenetic ‘tree drawing’ techniques initially developed for uses outside linguistics. The reaction from many historical linguists, however, has typically been... more
A number of recent papers have sought to apply to language data various phylogenetic ‘tree drawing’ techniques initially developed for uses outside linguistics. The reaction from many historical linguists, however, has typically been critical, if not outright hostile. This paper explores, and aims to explain, why it is that there has been such a long running failure to reach a consensus between linguists and specialists from other disciplines, notably genetics and archaeology.
We consider linguists’ fundamental concerns as to how non linguists go about using language data; especially whether (and if so, how) one can meaningfully use such phylogenetic analyses on language data, interpret their results, and attempt to put dates on particular nodes in the trees. We look into certain aspects of the very nature of language that it is crucial to bear in mind in order to handle language data appropriately for these purposes, but which many linguists feel are not truly appreciated by non linguists. These aspects include: language’s inherent susceptibility to powerful external forces which vary tremendously through history; the nature of language data and what this means for how they can meaningfully be compared and measured; and the nature of language change and historical development, with important consequences for the interpretation of those data, not least for dating.
It emerges, moreover, that these same characteristics of language change also challenge linguists’ own ‘established’ dating of Proto Indo European by the so called ‘linguistic palaeontology’, and how that question is in truth much more open than Indo Europeanist linguists generally admit.
The World Language Tree graphically illustrates relative degrees of lexical similarity holding among 3384 of the world's languages and dialects (henceforth, languages) currently found in the ASJP database (ASJP stands for Automated... more
The World Language Tree graphically illustrates relative degrees of lexical similarity holding among 3384 of the world's languages and dialects (henceforth, languages) currently found in the ASJP database (ASJP stands for Automated Similarity Judgment Program). ...
"The Lezgian database, which consists of high-quality 110-item wordlists of 20 Lezgian lects plus one Proto-Lezgian list, is presented. Various issues of phylogenetic tree building and the methods of lexical reconstruction are... more
"The Lezgian database, which consists of high-quality 110-item wordlists of 20 Lezgian lects plus one Proto-Lezgian list, is presented. Various issues of phylogenetic tree building and the methods of lexical reconstruction are discussed.
The resulting trees (neighbor joining method) conform with the traditional Lezgian classification: two outliers (Udi & Archi) and a large group of nuclear a.k.a. Samur lects with three branches: (1) Proto-West Lezgian [Tsakhur, Rutul], (2) Proto-South Lezgian [Kryts, Budukh], and (3) Proto-East Lezgian [Aghul, Tabasaran, Lezgi]."
The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion–mainly to East... more
The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion–mainly to East Europe and the northern Balkans–resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pools. Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and linguistic affiliation. The data suggest that genetic diversity of the present-day Slavs was predominantly shaped in situ, and we detect two different substrata: ‘central-east European’ for West and East Slavs, and ‘south-east European’ for South Slavs. A pattern of distribution of segments identical by descent between groups of East-West and South Slavs suggests shared ancestry or a modest gene flow between those two groups, which might derive from the historic spread of Slavic people.
The paper is a sequel to an earlier study by the authors, in which they discussed the accuracy of linguistic datings arrived at by the glottochronological method on the basis of data from 110-item wordlists for Romance languages. The... more
The paper is a sequel to an earlier study by the authors, in which they discussed the accuracy of linguistic datings arrived at by the glottochronological method on the basis of data from 110-item wordlists for Romance languages. The object of this second part of the study is the dating of linguistic divergence, i.e. determining the separation dates for two or more modern languages. In this paper, we compare several traditional as well as newly offered models for the glottochronological process, with special attention paid to the margin of error and reliability of glottochronological calculations on different time depths. The results of the study allow for a realistic assessment of the degree of accuracy in the glottochronological dating of the divergence of Romance languages and lead to a number of practical conclusions that will be useful for the application of glottochronology to any other linguistic material.
The World Language Tree graphically illustrates relative degrees of lexical similarity holding among 3384 of the world's languages and dialects (henceforth, languages) currently found in the ASJP database (ASJP stands for Automated... more
The World Language Tree graphically illustrates relative degrees of lexical similarity holding among 3384 of the world's languages and dialects (henceforth, languages) currently found in the ASJP database (ASJP stands for Automated Similarity Judgment Program). ...
The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion–mainly to East... more
The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion–mainly to East Europe and the northern Balkans–resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pools. Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and linguistic affiliation. The data suggest that genetic diversity of the present-day Slavs was predominantly shaped in situ, and we detect two different substrata: ‘central-east European’ for West and East Slavs, and ‘south-east European’ for South Slavs. A pattern of distribution of segments identical by descent between groups of East-West and South Slavs suggests shared ancestry or a modest gene flow between those two groups, which might derive from the historic spread of Slavic people.
The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion-mainly to East... more
The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion-mainly to East Europe and the northern Balkans-resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pools. Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and lingu...
A systematic, computer-automated tool for narrowing down the homelands of linguistic families is presented and applied to 82 of the world’s larger families. The approach is inspired by the well-known idea that the geographical area of... more
A systematic, computer-automated tool for narrowing down the homelands of linguistic families is presented and applied to 82 of the world’s larger families. The approach is inspired by the well-known idea that the geographical area of maximal diversity within a language family corresponds to the original homeland. This is implemented in an algorithm which takes a lexicostatistically derived distance measure and a geographical distance measure and computes a lexical diversity measure for each language in the family relative to all the other related languages. The location of the language with the highest diversity measure is heuristically identified with the homeland.
The World Language Tree graphically illustrates relative degrees of lexical similarity holding among 3384 of the world's languages and dialects (henceforth, languages) currently found in the ASJP database (ASJP stands for Automated... more
The World Language Tree graphically illustrates relative degrees of lexical similarity holding among 3384 of the world's languages and dialects (henceforth, languages) currently found in the ASJP database (ASJP stands for Automated Similarity Judgment Program). ...