Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Цель исследования — мультидисциплинарное изучение феномена полисемии (многозначности) языковых единиц с помощью теоретических, экспериментальных и статистических методов. Хотя полисемии посвящено большое количество работ, это явление... more
Цель исследования — мультидисциплинарное изучение феномена полисемии (многозначности) языковых единиц с помощью теоретических, экспериментальных и статистических методов. Хотя полисемии посвящено большое количество работ, это явление ранее не исследовалось комплексно. Коллективом авторов было проведено исследование, которое сочетало элементы словарного описания, статистического анализа, опросов, а также изучение электроэнцефалограмм и движений глаз. Исследование показало, что при развитии полисемии используется большее количество различных семантических сдвигов, помимо хорошо известных метафоры и метонимии. Эти сдвиги составляют сложную иерархическую систему и часто комбинируются друг с другом при образовании новых значений. Наше восприятие значения как нового связано с когнитивным «расстоянием», которое различается для разных типов сдвигов: так, метафорически образованное значение воспринимается как более далекое от исходного, чем метонимическое значение. Словарное представление значений только отчасти коррелирует с устройством ментального лексикона и с частотностью разных значений. Лексикографическое представление, основанное на семантических принципах, более удобно для восприятия, чем представление, основанное на частоте употребления. В ходе исследования возникли новые вопросы, в частности, различаются ли представления далекой и близкой метонимии в нашем ментальном лексиконе.
Afanas'ev, A. N. The Complete Folktales of A. N. Afanas'ev, Vol. 1. Ed. Jack V. Haney. Jackson: University Press of Mississippi, 2014. xvi, 514 pp. $90.00, hard bound. Andrei, Jean, and Gheorghe H. Popescu. Economy in Romania and... more
Afanas'ev, A. N. The Complete Folktales of A. N. Afanas'ev, Vol. 1. Ed. Jack V. Haney. Jackson: University Press of Mississippi, 2014. xvi, 514 pp. $90.00, hard bound. Andrei, Jean, and Gheorghe H. Popescu. Economy in Romania and the Need for Optimization of Agricultural Production Structures. Frankfurt am Main: Peter Lang, 2014. 170 pp. Appendix. Notes. Bibliography. Index. Figures. Tables. $54.95, paper. Apresjan, Valentina, and Boris Iomdin, eds. Meaning-Text Theory: Current Developments. Wiener Slawistischer Almanach, no. 85. Munich: Verlag Otto Sagner, 2013. 302 pp. Appendix. Bibliography. Figures. Tables. €25.80, paper. Baysha, Olga. The Mythologies of Capitalism and the End of the Soviet Project. Lanham: Lexington Books, 2014. xii, 171 pp. Appendixes. Bibliography. Index. Tables. $79.99, hard bound. Bazarnik, Katarzyna, and Izabela Curyllo-Klag, eds. Incarnations of Materiality Textuality: From Modernism to Liberature. Newcastle upon Tyne: Cambridge Scholars Publishing, 2014. x, 154 pp. Appendix. Index. Illustrations. Photographs. $93.60, hard bound. Beger, Kathleen. Untersuchungen zur Kodifizierung des Ukrainischen: Rechtschreibreformen und ihre Umsetzung in Galizien zwischen 1919 und 1938. Slavische Sprachgeschichte, no. 8. Berlin: LIT Verlag, 2014. 208 pp. Appendix. Notes. Bibliography. Tables. €29.90, paper. Birchall, Christopher. Embassy, Emigrants, and Englishmen: The Three Hundred Year History of a Russian Orthodox Church in London. Jordanville: Holy Trinity Seminary Press, 2014. xx, 712 pp. Appendixes. Notes. Bibliography. Index. Photographs. Maps. $69.00, hard bound. $37.95, paper. Blaha, Filip. Frauenkorper im Fokus: Wahrnehmungzwischen Strajie und Turnplatz in Prag und Dresden vor dem Ersten Weltkrieg. Welt—Korper—Sprache: Perspektiven kultureller Wahrnehmungsund Darstellungsformen, no. 11. Frankfurt am Main: Peter Lang, 2013.282 pp. Appendix. Notes. Bibliography. Illustrations. Photographs. $60.95, hard bound. Briesewitz, Gemot. Raum und Nation in der polnischen Westforschung 1918-1948: Wissenschaftsdiskurse, Raumdeutungen und geopolitische Visionen im Kontext der deutsch-polnischen Beziehungsgeschichte. Einzelveroffentlichungen des Deutschen Historischen Instituts Warschau 32. Osnabruck: fibre Verlag, 2014.526 pp. Appendix. Notes. Bibliography. Index. Illustrations. Maps. €39.80, paper. Coleman, Heather, ed. Orthodox Christianity in Imperial Russia: A Sourcebook on Lived Religion. Bloomington: Indiana University Press, 2014. xiv, 338 pp. Appendix. Notes. Bibliography. Glossary. Index. Illustrations. Photographs. Maps. $35.00, paper. Fuks, Ladislav. Of Mice and Mooshaber. Trans. Mark Corner. Prague: Karolinum, the Charles University Press, 2014. Distributed by University of Chicago Press. 512 pp. Illustrations. $20.00, hard bound. Gibson, James, and Alexei. A. Istomin, eds. Russian California, 1806-1860: A History in Documents, Vols. 1-2. With Valery A. Tishkov. Trans. James R. Gibson. Third Series, nos. 26-27. London: Published by Ashgate for the Hakluyt Society, 2014. lxii + xii, 547 + 640 pp. Appendix. Notes. Bibliography. Glossary. Index. Illustrations. Plates. Figures. Tables. Maps. $225.00 + $225.00, hard bound. Ginsborg, Paul. Family Politics: Domestic Life, Devastation and Survival, 1900-1950. New Haven: Yale University Press, 2014. xviii, 520 pp. Appendix. Notes. Index. Illustrations. Plates. Photographs. $35.00, hard bound. Hlasko, Marek. .4// Backs Were Turned. Trans. Tomasz Mirkowicz. Introduction, George Z. Gasyna. New York: New Vessel Press, 2014.144 pp. $15.99, paper. $9.99, e-book. Jangfeldt, Bengt. Mayakovsky: A Biography. Trans. Harry D. Watson. Chicago: University of Chicago Press, 2015. xii, 612 pp. Bibliography. Chronology. Index. Illustrations. Photographs. Figures. $35.00, hard bound.
Text complexity assessment is a challenging task requiring various linguistic aspects to be taken into consideration. The complexity level of the text should correspond to the reader’s competence. A too complicated text could be... more
Text complexity assessment is a challenging task requiring various linguistic aspects to be taken into consideration. The complexity level of the text should correspond to the reader’s competence. A too complicated text could be incomprehensible, whereas a too simple one could be boring. For many years, simple features were used to assess readability, e.g. average length of words and sentences or vocabulary variety. Thanks to the development of natural language processing methods, the set of text parameters used for evaluating readability has expanded significantly. In recent years, many articles have been published the authors of which investigated the contribution of various lexical, morphological, and syntactic features to the readability level. Nevertheless, as the methods and corpora are quite diverse, it may be hard to draw general conclusions as to the effectiveness of linguistic information for evaluating text complexity due to the diversity of methods and corpora. Moreover,...
Educational texts for children have two distinctly differing purposes: their readers must understand them and at the same time learn new words from them. It seems important and useful to be able to automatically detect words that may be... more
Educational texts for children have two distinctly differing purposes: their readers must understand them and at the same time learn new words from them. It seems important and useful to be able to automatically detect words that may be unfamiliar to children of different ages. A challenging task is to identify words that readers perceive as familiar and understandable, but in fact understand them incorrectly. We propose a metric, called word deceptiveness, which is based on surveying and calculated as the product of the number of those respondents who mark the word as familiar by the number of those who correctly determine its meaning. We conducted a series of experiments and discovered several deceptive words in Russian. Several hypothetical mechanisms for the emergence of such words have been identified. In general, these are closeness to other, more familiar linguistic units: words, morphemes and word formation models. Future work will include an endeavor to learn to identify de...
Proceedings of the 6th International Conference on
Many words that according to the dictionaries have just one meaning are in fact understood in different ways by different speakers. In this article we deal with Russian nouns denoting everyday life objects which are subject to much... more
Many words that according to the dictionaries have just one meaning are in fact understood in different ways by different speakers. In this article we deal with Russian nouns denoting everyday life objects which are subject to much variation by age, gender, and region and are poorly described by the existing dictionaries. We report the results of a multilevel survey, propose some possible metrics of word knowledge and show to what extent the words we studied are known among a certain population. We also claim that different speakers possess different sets of meanings for each word, propose ways to discover the distribution patterns for these sets and introduce the notion of disperse polysemy. We believe that our findings may be useful in lexicography (providing detailed information on current word usage in different social groups), lexical semantics (researching meaning shifts and patterns of its distribution among speakers), and language testing (more precise detection of the vocab...
Although word sense frequency information is important for theoretical study of polysemy and practical purposes of lexicography, the problem of sense frequency distribution is a neglected area in linguistics. It is probably because sense... more
Although word sense frequency information is important for theoretical study of polysemy and practical purposes of lexicography, the problem of sense frequency distribution is a neglected area in linguistics. It is probably because sense frequency is not easy to estimate. In this paper we deal with the problem of automated word sense frequency estimation for Russian nouns. We developed and tested an automated system based on semantic context vectors, supplied with contexts and collocations from the Active Dictionary of Russian – a full-fledged production dictionary that reflects contemporary Russian. The study was performed on RuTenTen11 web-corpus. This allows us to reach a frequency estimation error of 11% without any additional labelled data. We compared sense frequencies obtained automatically with sense ordering in different dictionaries for several words. The method presented in this paper can be applied to any language with a sufficiently large corpus and a good dictionary th...
The paper discusses valency frames of a number of Russian verbal predicates whose semantics includes speech acts and, at a cetrain step of semantic decomposition, the negation, like vozražat’ ‘object, retort’, vozmuščat’sja ‘resent, be... more
The paper discusses valency frames of a number of Russian verbal predicates whose semantics includes speech acts and, at a cetrain step of semantic decomposition, the negation, like vozražat’ ‘object, retort’, vozmuščat’sja ‘resent, be indignant’ or izvinjat’sja ‘apologize’. It is hypothesized that the frames of such predicates include a pair of propositional valencies distinctly opposed to each other: (1) the valency of stimulus that expresses the state of events and (2) the valency of response that introduces a speech act performed by the subject as a reaction to this state of event and offering an explanation. For example, in the sentence Ivan izvinilsja, čto ne prišel na moj den’ rożdenija ‘Ivan apologized that he did not come to my birthday party’ the clause starting with čto ‘that’ represents the state of events, whilst in the sentence Ivan izvinilsja, čto ploxo sebja čuvstvoval ‘Ivan apologized that he was not feeling well’ the čto-clause introduces Ivan’s response to the sti...
Our study tackles Russian interrogative-relative pronouns ( wh -words) as a lexicographic type which requires a unified treatment. Our objective is to give a systematic description and explanation of the numerous collocational and... more
Our study tackles Russian interrogative-relative pronouns ( wh -words) as a lexicographic type which requires a unified treatment. Our objective is to give a systematic description and explanation of the numerous collocational and constructional properties of the Russian wh -words using lexicographic and corpus methods. The dataset and statistics were extracted from the Russian National Corpus, at least 100 examples for each of the pronouns were analysed. Methodologically the study is based on the principles of the Moscow School of Semantics (namely, integral description of language and systematic lexicography) which are to a large extent rooted in the “Meaning⇔Text” theory. They include analysis of linguistic items on all levels of language; a focus on their semantic and combinatorial properties; acknowledged validity of dictionary as an instrument of linguistic research. The paper considers semantic, syntactic and co-occurrence properties shared by many Russian interrogative prono...
Classifications of everyday items (category words for clothing, stationery, personal hygiene, beauty products etc.) are studied. A survey of 40 languages was performed. Several results are reported. Speakers of some languages provide... more
Classifications of everyday items (category words for clothing, stationery, personal hygiene, beauty products etc.) are studied. A survey of 40 languages was performed. Several results are reported. Speakers of some languages provide generic terms relatively easy, while for speakers of other languages it is often difficult to perform this task. Some items (such as keys, ear plugs, umbrellas) are virtually unclassifiable in all languages. All languages have covert classes without well-established names (such as personal hygiene or data storage), and people either resort to awkward official phrases like Russian предметы личной гигиены or highly colloquial occasional words like Russian умывалки. For items belonging to such classes, high variation of category words was observed. Classes existing in several languages often overlap and include different items. So, посуда in Russian corresponds to dishes, cookware and cutlery in English. Possible areas of further research are discussed, in...
Research Interests:
The object of the paper is the class of Russian sentences that have more than one probability qualifier (PQ) with intersecting scopes. As it appears, modern Russian texts abound with such phenomena. Our goal is to identify meanings and... more
The object of the paper is the class of Russian sentences that have more than one probability qualifier (PQ) with intersecting scopes. As it appears, modern Russian texts abound with such phenomena. Our goal is to identify meanings and uses of such utterances. We analyze the most typical cases and proceed to more controversial issues. The analysis shows that even though the language has plenty of tools for fine differentiation of probability estimations, speakers often avoid straightforward statements, resorting to a variety of means to emphasize the approximate and subjective character of their estimates and thus declining the responsibility for the statements they make.
We describe a project aimed at creating a deeply annotated corpus of Russian texts. The annotation consists of comprehensive morphological marking, syntactic tagging in the form of a complete dependency tree, and semantic tagging within a... more
We describe a project aimed at creating a deeply annotated corpus of Russian texts. The annotation consists of comprehensive morphological marking, syntactic tagging in the form of a complete dependency tree, and semantic tagging within a restricted semantic dictionary. Syntactic tagging is using about 80 dependency relations. The syntactically annotated corpus counts more than 28,000 sentences and makes an autonomous part of the Russian National Corpus (www.ruscorpora.ru). Semantic tagging is based on an inventory of semantic features (descriptors) and a dictionary comprising about 3,000 entries, with a set of tags assigned to each lexeme and its argument slots. The set of descriptors assigned to words has been designed in such a way as to construct a linguistically relevant classification for the whole Russian vocabulary. This classification serves for discovering laws according to which the elements of various lexical and semantic classes interact in the texts. The inventory of s...
The paper deals with the Russian verb "spoxvatit'sja", hard for translation. The verb is a unique example of three different mental activities (memory, perception, and comprehension) fused in a single Russian word. It has a... more
The paper deals with the Russian verb "spoxvatit'sja", hard for translation. The verb is a unique example of three different mental activities (memory, perception, and comprehension) fused in a single Russian word. It has a peculiar set of seemingly quite different and even opposite meanings which turn to be organised in a logical polysemy structure. It also has a variety of interesting syntactic features, partly shared by other Russian verbs denoting mental acts.
The paper is focused on self-contained linguistic problems based on text corpora. We argue that corpus-based problems differ from traditional linguistic problems because they make it possible to represent language variation. Furthermore,... more
The paper is focused on self-contained linguistic problems based on text corpora. We argue that corpus-based problems differ from traditional linguistic problems because they make it possible to represent language variation. Furthermore, they often require basic statistical thinking from the students. The practical value of using data obtained from text corpora for teaching linguistics through linguistic problems is shown.
Polysemy is a key issue in theoretical semantics and lexicography as well as in computational linguistics. When words have several senses, it is important to describe them properly in the dictionary (a lexicographic task) and to be able... more
Polysemy is a key issue in theoretical semantics and lexicography as well as in computational linguistics. When words have several senses, it is important to describe them properly in the dictionary (a lexicographic task) and to be able to distinguish between them in given context (a computational linguistics task, known as WSD). Recently attention has been drawn to the fact that different senses normally have different frequencies in corpora. Elsewhere we reported on our research into that issue and introduced several techniques for determining sense frequency based on dictionary entries matched with data from large corpora. Information about word sense frequency may enrich language learning resources and help lexicographers order senses within a word according to frequency, if needed. When learning a foreign language, a student may encounter a word that exists in his/her native language (as a borrowing or an international word), and is tempted to assume that the foreign word and i...
Russian constructions that involve the ambiguity of valencies are considered regarding the extent in which it can be successfully resolved by man or machine. The material includes two types of phenomena: 1) Russian counterparts of noun... more
Russian constructions that involve the ambiguity of valencies are considered regarding the extent in which it can be successfully resolved by man or machine. The material includes two types of phenomena: 1) Russian counterparts of noun phrases like (a) the phases of sleep vs. (b) the phase of active sleep in (a), sleep instantiates the subject valency of phase whereas in (b) sleep is the content of the phase; 2) subject and object infinitives with the verbs prosit "ask" and predlagat "suggest/offer": Rebenok prosit est lit."The child asks to eat" vs. Rebenok prosit podojti lit."The child asks (for someone) to come up", On predlo il vstretit menja "he offered to meet me" vs. On predloil prijti k nemu "he suggested that (I) should come round to him".
Необходимость оценить сложность текста для читателя может возникнуть в разных ситуациях: составление текстов договоров и законов, создание инструкций к приборам, написание учебников родного или иностранного языка, подбор литературы для... more
Необходимость оценить сложность текста для читателя может возникнуть в разных ситуациях: составление текстов договоров и законов, создание инструкций к приборам, написание учебников родного или иностранного языка, подбор литературы для внеклассного чтения. Особенно интересна оценка сложности учебных текстов для детей, поскольку к таким текстам предъявляется сразу несколько требований, отчасти противоречащих друг другу. Дети должны хорошо понимать эти тексты, они должны быть актуальны и интересны и одновременно учить читателей как новым концепциям, так и новым словам и конструкциям. Сейчас возрастная маркировка текстов для детей проводится экспертами вручную, что делает процесс долгим и трудоемким, а результат субъективным. В статье предлагается метод автоматической классификации текстов по сложности с использованием нейросетевой модели. Этот метод предполагается использовать для создания корпуса текстов детской литературы с разметкой по возрасту (в рамках НКРЯ). Качество предсказаний нашей модели достигло 0,92, она достаточно хорошо учитывает разнообразие лексики и набор тем. Появление автоматического механизма, с приемлемой точностью оценивающего степень сложности текста, позволит в короткие сроки создать представительный корпус текстов, написанных для детей, с возможностью подбора в нем текстов, заведомо понятных детям заданного возраста. Такой корпус будет востребован и учителями, и родителями, и переводчиками художественной литературы, и лингвистами, и всеми, кому важна возможность подбора понятных детям художественных текстов.
... Вряд ли... Сомнительно что-то...(Стругацкие)'“What are you saying!” said Edik with diffidence. “This is impossible... Hardly so... ... This will hardly happen, of course. They'll either pay nothing, or impose such a... more
... Вряд ли... Сомнительно что-то...(Стругацкие)'“What are you saying!” said Edik with diffidence. “This is impossible... Hardly so... ... This will hardly happen, of course. They'll either pay nothing, or impose such a tax…'(Ivanov). 2.4. ...
В докладе на материале русских существительных с предметными значениями обосновывается необходимость создания частотного словаря значений слов. Предлагаются методы приближенного определения частот, основанные на анализе данных опросов... more
В докладе на материале русских существительных с предметными значениями обосновывается необходимость создания частотного словаря значений слов. Предлагаются методы приближенного определения частот, основанные на анализе данных опросов информантов и аннотировании наиболее частотных коллокаций в большом корпусе текстов (в настоящей работе был использован самый объемный на сегодняшний день корпус RuTenTen11, интегрированный в систему Sketch Engine). Такой словарь мог бы быть востребован в различных компьютерно-лингвистических приложениях (в частности, для вероятностного разрешения многозначности в отсутствие контекста), при создании обучающих ресурсов, в традиционной толковой лексикографии. Исследования наборов значений многозначных слов и их сравнительной частотности представляют и теоретический интерес для изучения эволюции лексической системы языка.
Research Interests:
В статье рассматриваются валентные рамки ряда русских глагольных предикатов, в значение которых входит речевой акт, а также, на некоторой стадии семантического разложения, отрицание — такие как возражать, возмущаться, извиняться и др.... more
В статье рассматриваются валентные рамки ряда русских глагольных предикатов, в значение которых входит речевой акт, а также, на некоторой стадии семантического разложения, отрицание — такие как возражать, возмущаться, извиняться и др. Высказывается предположение, что валентные рамки таких предикатов включают в себя пару пропозициональных валентностей, отчетливо противопоставленных друг другу: (1) валентность стимула, которая выражает положение дел, и (2) валентность реакции, которая вводит речевой акт, совершаемый субъектом в качестве отклика на это положение дел и предлагающий его объяснение. Например, в предложении Иван извинился, что не пришел на мой день рождения клауза, вводимая союзом что, выражает положение дел, а в предложении Иван извинился, что плохо себя чувствовал такая клауза передает речевую реакцию Ивана на положение дел (например, отсутствие на моем дне рождения), стимулирующее его дать объяснение этому отсутствию. Показано, что эти валентности нельзя адекватно описать в рамках единой семантической роли содержания. Авторы также предлагают обобщение этого явления, сравнивая его с другими типами валентных пар, и выдвигают гипотезу о существовании предикатов, имеющих два валентных центра.
Учебные тексты для детей призваны решать противонаправленные задачи: дети должны хорошо понимать их, но в то же время такие тексты должны учить читателей новым словам. Кажется важным иметь возможность автоматически обнаруживать слова,... more
Учебные тексты для детей призваны решать противонаправленные задачи: дети должны хорошо понимать их, но в то же время такие тексты должны учить читателей новым словам. Кажется важным иметь возможность автоматически обнаруживать слова, которые могут быть не- знакомы детям разных возрастов. Сложной задачей является определение слов, которые читатели воспринимают как знакомые и понятные, но на самом деле понимают неправильно. Мы предлагаем метрику коварности слов, которая вычисляется как произведение доли тех респондентов, которые помечают слово как знакомое, на долю тех из них, которые правильно определяют его значение. Мы провели серию экспериментов и обнаружили несколько коварных слов русского языка. Мы выделили несколько гипотетических механизмов появления таких слов, отражающих близость к другим, более распространённым языковым единицам: словам, морфемам и словообразовательным моделям. Следующая задача — научиться выявлять коварные слова на основе различных языковых факторов.
Цель исследования — мультидисциплинарное изучение феномена полисемии (многозначности) языковых единиц с помощью теоретических, экспериментальных и статистических методов. Хотя полисемии посвящено большое количество работ, это явление... more
Цель исследования — мультидисциплинарное изучение феномена полисемии (многозначности) языковых единиц с помощью теоретических, экспериментальных и статистических методов. Хотя полисемии посвящено большое количество работ, это явление ранее не исследовалось комплексно. Коллективом авторов было проведено исследование, которое сочетало элементы словарного описания, статистического анализа, опросов, а также изучение электроэнцефалограмм и движений глаз. Исследование показало, что при развитии полисемии используется большее количество различных семантических сдвигов, помимо хорошо известных метафоры и метонимии. Эти сдвиги составляют сложную иерархическую систему и часто комбинируются друг с другом при образовании новых значений. Наше восприятие значения как нового связано с когнитивным «расстоянием», которое различается для разных типов сдвигов: так, метафорически образованное значение воспринимается как более далекое от исходного, чем метонимическое значение. Словарное представление значений только отчасти коррелирует с устройством ментального лексикона и с частотностью разных значений. Лексикографическое представление, основанное на семантических принципах, более удобно для восприятия, чем представление, основанное на частоте употребления. В ходе исследования возникли новые вопросы, в частности, различаются ли представления далекой и близкой метонимии в нашем ментальном лексиконе.
The article contains 36 linguistic problems, represented as multiple-choice questions, that were used at the Russkij Medvezhonok linguistic competition for school students. The problems deal with ambiguity at all levels of the language:... more
The article contains 36 linguistic problems, represented as multiple-choice questions, that were used at the Russkij Medvezhonok linguistic competition for school students. The problems deal with ambiguity at all levels of the language: polysemy, various types of lexical homonymy (full and partial homonyms, homographs, homophones), syntactic ambiguity, etc. In some of the problems, ambiguity is explicitly named, but in most cases it must be discovered by the solvers. It is the surprise factor that makes such problems useful, since the ability to find an unexpected interpretation of a text is a very useful skill. Many problems are based on quotations from prose and poetry and often contain wordplay of various kinds, ambiguity being one of the main sources of humour. Polysemy constitutes one of the greatest challenges for natural language processing, too, and this is demonstrated in the article by several examples of machine translation system failure. The article has three sections with problems for elementary, secondary and high school students. Each problem is supplied with the answer, its detailed ex- planation, and sometimes additional linguistically relevant information that can be used in lessons, elective courses, or linguistic skill workshops.
In elementary school, words with the same root are defined as having “a common part with a common meaning.” At the same time, what exactly is meant by this common meaning is not usually specified. In practice, whether words have the same... more
In elementary school, words with the same root are defined as having “a common part with a common meaning.” At the same time, what exactly is meant by this common meaning is not usually specified. In practice, whether words have the same root is most often determined in Russian schools by the so-called Vinokur criterion, which requires reconstructing a derivational chain. As a result of using this criterion, cognate words of the same origin, in contradiction to what the intuition of native speakers suggests, are not recognized as having the same root, like poezd and ezdit’ (‘train’ and ‘to ride’), prestol and stolitsa (‘throne’ and ‘capital’), zapasnoj and pripasy (‘spare’ and ‘supplies’). The article analyses information from dictionaries and also from survey findings with data gathered from native speakers. It suggests introducing a more flexible approach in elementary schools, which allows reasoning based not only on a strictly synchronic approach, but also on a diachronic approach, provided that semantic similarity between the words in question is evident. The first approach shows the student’s ability to identify language structure, while the second one demonstrates their developed language intuition, which often makes it easier to learn the correct spelling of words.
Polysemy is a key issue in theoretical semantics and lexicography as well as in computational linguistics. When words have several senses, it is important to describe them properly in the dictionary (a lexicographic task) and to be able... more
Polysemy is a key issue in theoretical semantics and lexicography as well as in computational linguistics. When words have several senses, it is important to describe them properly in the dictionary (a lexicographic task) and to be able to distinguish between them in given context (a computational linguistics task, known as WSD). Recently attention has been drawn to the fact that different senses normally have different frequencies in corpora. Elsewhere we reported on our research into that issue and introduced several techniques for determining sense frequency based on dictionary entries matched with data from large corpora. Information about word sense frequency may enrich language learning resources and help lexicographers order senses within a word according to frequency, if needed. When learning a foreign language, a student may encounter a word that exists in his/her native language (as a borrowing or an international word), and is tempted to assume that the foreign word and its equivalent have the same meaning structure. However, sometimes this is not the case, and the most frequent sense of a word in one language may be much less frequent for its cognate. We propose a method for detecting such cases. For that purpose, we selected a set of Russian words included into the Active Dictionary of Russian, which have more than two dictionary senses and have cognates in English. We estimated frequencies for English and Russian senses using SemCor and Russian National Corpus respectively, matched senses in each pair of words and compared their frequencies. In this way, we revealed cases in which the most frequent senses and the whole meaning structures are, cross-linguistically, substantially different and studied them in more detail. As a result, we obtained information that may prove useful for learners of Russian or English as well as for lexicographers and computational linguists dealing with machine translation.
Although word sense frequency information is important for theoretical study of polysemy and practical purposes of lexicography, the problem of sense frequency distribution is a neglected area in linguistics. It is probably because sense... more
Although word sense frequency information is important for theoretical study of polysemy and practical purposes of lexicography, the problem of sense frequency distribution is a neglected area in linguistics. It is probably because sense frequency is not easy to estimate. In this paper we deal
with the problem of automated word sense frequency estimation for Russian nouns. We developed and tested an automated system based on semantic context vectors, supplied with contexts and collocations from the Active Dictionary of Russian — a full-fledged production dictionary that
reflects contemporary Russian. The study was performed on RuTenTen11 web-corpus. This allows us to reach a frequency estimation error of 11% without any additional labeled data. We compared sense frequencies obtained automatically with sense ordering in different dictionaries for several words. The method presented in this paper can be applied to any language with a sufficiently large corpus and a good dictionary that provides examples for each sense. The results may enrich language learning resources and help lexicographers order senses within a word according to frequency if needed.
Words denoting numbers (cardinal and ordinal numerals, or adjectives) represent a small (although potentially infinite) lexicographic type. In this article we deal with the polysemy structure of these two lexical classes. We propose a... more
Words denoting numbers (cardinal and ordinal numerals, or adjectives) represent a small (although potentially infinite) lexicographic type. In this article we deal with the polysemy structure of these two lexical classes. We propose a lexicographic pattern and study standard types of semantic shifts, including regular metaphors and metonymies. The words of both classes normally develop special senses with conversion into other parts of speech. Additional senses, different for different words, can appear due to cultural conventions.
Many words that according to the dictionaries have just one meaning are in fact understood in different ways by different speakers. In this article we deal with Russian nouns denoting everyday life objects which are subject to much... more
Many words that according to the dictionaries have just one meaning are in fact understood in different ways by different speakers. In this article we deal with Russian nouns denoting everyday life objects which are subject to much variation by age, gender, and region and are poorly described by the existing dictionaries. We report the results of a multilevel survey, propose some possible metrics of word knowledge and show to what extent the words we studied are known among a certain population. We also claim that different speakers possess different sets of meanings for each word, propose ways to discover the distribution patterns for these sets and introduce the notion of disperse polysemy. We believe that our findings may be useful in lexicography (providing detailed information on current word usage in different social groups), lexical semantics (researching meaning shifts and patterns of its distribution among speakers), and language testing (more precise detection of the vocabulary sizes both in native speakers and in language learners).
The assumption that senses are mutually disjoint and have clear boundaries has been drawn into doubt by several linguists and psychologists. The problem of word sense granularity is widely discussed both in lexicographic and in NLP... more
The assumption that senses are mutually disjoint and have clear boundaries has been drawn into doubt by several linguists and psychologists. The problem of word sense granularity is widely discussed both in lexicographic and in NLP studies. We aim to study word senses in the wild—in raw corpora— by performing word sense induction (WSI). WSI is the task of automatically inducing the different senses of a given word in the form of an unsupervised learning task with senses represented as clusters of token instances. In this paper, we compared four WSI techniques: Adaptive Skip-gram (AdaGram), Latent Dirichlet Allocation (LDA), clustering of contexts and clustering of synonyms. We quantitatively and qualitatively evaluated them and performed a deep study of the AdaGram method comparing AdaGram clusters for 126 words (nouns, adjectives, and verbs) and their senses in published dictionaries. We found out that AdaGram is quite good at distinguishing homonyms and metaphoric meanings. It ignores disappearing and obsolete senses, but induces new and domain-specific senses which are sometimes absent in dictionaries. However it works better for nouns than for verbs, ignoring the structural differences (e.g. causative meanings or different government patterns). The Adagram database is available online: http://adagram.ll-cl.org/. 1 This research was supported by RSF (project No.16-18-02054: Semantic, statistic and psy-cholinguistic analysis of lexical polysemy as a component of Russian linguistic worldview). The authors would also like to thank students of the Higher School of Economics and Yandex School of Data Analysis for their help in annotating dictionary senses.
Research Interests:
... State of the Art and Prospects 1 ... For generic descriptors (genus proximum), nouns are used ('animal', 'vegetable', 'state',... more
... State of the Art and Prospects 1 ... For generic descriptors (genus proximum), nouns are used ('animal', 'vegetable', 'state', 'action', etc), whereas specific descriptors (differentia ... Both have grown out of independent research in the domain of systemic lexicography based on the idea ...
A brief review of the academic work of Juri Apresjan
Research Interests:
... State of the Art and Prospects 1 ... For generic descriptors (genus proximum), nouns are used ('animal', 'vegetable', 'state',... more
... State of the Art and Prospects 1 ... For generic descriptors (genus proximum), nouns are used ('animal', 'vegetable', 'state', 'action', etc), whereas specific descriptors (differentia ... Both have grown out of independent research in the domain of systemic lexicography based on the idea ...
When words have several senses, it is important to describe them properly in dictionary (a lexicographic task) and to be able to distinguish them in a given context (a computational linguistics task, WSD). Different senses normally have... more
When words have several senses, it is important to describe them properly in dictionary (a lexicographic task) and to be able to distinguish them in a given context (a computational linguistics task, WSD). Different senses normally have different frequencies in corpora. We introduced several techniques for determining sense frequency based on dictionary entries matched with data from large corpora. Information about word sense frequency is not only useful for explanatory lexicography and WSD, but it also may enrich language learning resources. Learners of a foreign language who encounter a word similar to one of their native language are often tempted to assume that the foreign word and its equivalent have the same meaning structure. Sometimes, however, this is not the case, and the most frequent sense of a word in one language may be much less frequent for its cognate. We proposed a method for detecting such cases. Having selected a set of Russian words included into the Active Dictionary of Russian, which have more than two dictionary senses and have cognates in English, we estimated the frequencies for English and Russian senses using SemCor and Russian National Corpus respectively, matched the senses in each pair of words and compared their frequencies., we revealed cases in which the most frequent senses and whole meaning structures are, cross-linguistically, substantially different and studied them in more detail. This technique can be applied not only to cognates, but also to pairs of words which are usually offered by the dictionaries as the translation equivalents of each other.
Research Interests:
The object of the paper is the class of Russian sentences that have more than one probability qualifier (PQ) with intersecting scopes. As it appears, modern Russian texts abound with such phenomena. Our goal is to identify meanings and... more
The object of the paper is the class of Russian sentences that have more than one probability qualifier (PQ) with intersecting scopes. As it appears, modern Russian texts abound with such phenomena. Our goal is to identify meanings and uses of such utterances. We analyze the most typical cases and proceed to more controversial issues. The analysis shows that even though the language has plenty of tools for fine differentiation of probability estimations, speakers often avoid straightforward statements, resorting to a variety of means to emphasize the approximate and subjective character of their estimates and thus declining the responsibility for the statements they make.
Research Interests:
When describing words which denote real life objects, dictionaries tend to use scientific terms and classifications, even when dealing with natural language. This approach may lead to misunderstanding, especially in cases when scientific... more
When describing words which denote real life objects, dictionaries tend to use scientific terms and classifications, even when dealing with natural language. This approach may lead to misunderstanding, especially in cases when scientific classification (e.g. in biology) differs from what is found in natural language data. One of such cases is discussed here, namely the small but rather interesting class of nuts (Russian orexi). In the botanic world view nuts usually include hazelnuts and chestnuts, but do not include walnuts or almonds (which are considered stone fruits), pine nuts (seeds), peanuts (legumes), pistachio (kernels), etc. The Russian orex, English nut, Latin nux exhibit similar behaviour here. Explanatory dictionaries of Russian more or less follow the botanical definitions, even though in many fields (such as cooking, food industry, medicine, etc.) nuts are classified differently. In order to establish the boundaries of nuts in Russian, more than 1000 native speakers were questioned and multiple texts of different periods were studied. The result is a peculiar class which could not be identified with any of the natural language supercategories described by Anna Wierzbicka. A new lexicographic description is proposed for some words included into this class.
The paper discusses various techniques of discovering and describing lexical ambiguity. This is one of the top issues in computational linguistics. A variety of techniques are used for word sense disambiguation, but all of them are based... more
The paper discusses various techniques of discovering and describing lexical ambiguity. This is one of the top issues in computational linguistics. A variety of techniques are used for word sense disambiguation, but all of them are based on context. Yet, studying how word senses work without context and what patterns of polysemous words could be found in speakers’ minds also seems an interesting and important issue. The main approaches to WSD with or without context (in narrow and broad sense, including the situational context) are evaluated. The importance of corpora in discovering word senses is substantiated. New experimental data are presented, which allow defining subsets of senses for polysemous words for different speakers and rating the senses in the dictionary. Finally, the paper proposes to distinguish between absolute and relative polysemy and to search for ways of their adequate lexicographic description.
The paper deals with metalanguage lexical units that convey certain relations of names of different objects: these are Russian units одноимённый ‘of the same name, cognominal’ (and its derivates) and так и называется ≈ ‘called exactly... more
The paper deals with metalanguage lexical units that convey certain relations of names of different objects: these are Russian units одноимённый ‘of the same name, cognominal’ (and its derivates) and так и называется ≈ ‘called exactly this way’. Such items are difficult to interpret in NLP applications. Lexicographic definitions are proposed based on a number of key senses identified by the author: ideas of coincidence, correspondence, and simplicity.
Russian constructions that involve the ambiguity of valencies are considered regarding the extent in which it can be successfully resolved by man or machine. The material includes two types of phenomena: 1) Russian counterparts of noun... more
Russian constructions that involve the ambiguity of valencies are considered regarding the extent in which it can be successfully resolved by man or machine. The material includes two
types of phenomena: 1) Russian counterparts of noun phrases like (a) the phases of sleep vs. (b) the phase of active sleep – in (a), sleep instantiates the subject valency of phase whereas in
(b) sleep is the content of the phase; 2) subject and object infinitives with the verbs prosit’ ‘ask’ and predlagat’ ‘suggest/offer’: Rebënok prosit est’ lit.‘The child asks to eat’ vs. Rebënok
prosit podojti lit.‘The child asks (for someone) to come up’, On predložil vstretit menja ‘he offered to meet me’ vs. On predložil prijti k nemu ‘he suggested that (I) should come round to
him’.
Classifications of everyday items (category words for clothing, stationery, personal hygiene, beauty products etc.) are studied. A survey of 40 languages was performed. Several results are reported. Speakers of some languages provide... more
Classifications of everyday items (category words for clothing, stationery, personal hygiene, beauty products etc.) are studied. A survey of 40 languages was performed. Several results are reported. Speakers of some languages provide generic terms relatively easy, while for speakers of other languages it is often difficult to perform this task. Some items (such as keys, ear plugs, umbrellas) are virtually unclassifiable in all languages. All languages have covert classes without well-established names (such as personal hygiene or data storage), and people either resort to awkward official phrases like Russian предметы личной гигиены or highly colloquial occasional words like Russian умывалки. For items belonging to such classes, high variation of category words was observed. Classes existing in several languages often overlap and include different items. So, посуда in Russian corresponds to dishes, cookware and cutlery in English. Possible areas of further research are discussed, including studies of language acquisition and bilingualism and comparisons with folk biology and folksonomies.
Analyzing several Russian nouns denoting everyday life objects, we explain why a word sense frequency dictionary is necessary. Techniques of calculating the approximate frequencies are proposed, based on the analysis of native speaker... more
Analyzing several Russian nouns denoting everyday life objects, we explain why a word sense frequency dictionary is necessary. Techniques of calculating the approximate frequencies are proposed, based on the analysis of native speaker surveys and the annotation of the most frequent collocations in a large text corpus (we used the huge RuTenTen11 corpus integrated into the Sketch Engine system). A word sense dictionary could be used in a variety of NLP tasks, in particular for a probabilistic word sense disambiguation without available context, in creating second language learning resources, as well as in academic lexicography. Besides, studies of sense sets of polysemous words and their comparative frequencies are important for the linguistic theory, because they shed light on the evolution of the lexical system.

And 8 more

В Словаре публикуются 354 синонимических ряда, представляющих основные разряды антропоцентрической лексики русского языка и — эпизодически — некоторые другие пласты лексики. Новый объяснительньй словарь синонимов — это словарь... more
В Словаре публикуются 354 синонимических ряда, представляющих основные разряды антропоцентрической лексики русского языка и — эпизодически — некоторые другие пласты лексики.
Новый объяснительньй словарь синонимов — это словарь активного типа, согласованный с определенным грамматическим описанием русского языка, реализующий принципы системной лексикографии и фиетированный на отражение языковой, или «наивной», картины мира. Установка на детальное лингвистическое портретирование сочетается в нем с установкой на единообразное описание лексем, относящихся к одному лексикографическому типу. В Словаре последовательно отражаются семантические, референциальные, прагматические, коннотативные, коммуникативные, синтаксические, сочетаемостные, морфологические и просодические сходства и различия между синонимами, а также условия нейтрализации различий. Все словарные статьи содержат обширные справочные зоны, в которых перечисляются фразеологические синонимы, аналоги, точные и неточные конверсивы, конверсивы к аналогам, точные и неточные антонимы и дериваты (включая семантические) к элементам данного синонимического ряда. В некоторых случаях указываются специальные лингвистические работы, посвященные одной или нескольким лексемам, входящим в данный ряд.
Книга обращена к широкому кругу филологов, интересующихся лексикологией. лексикографией и теоретической семантикой, к преподавателям русского языка как родного, неродного или иностранного, а также к писателям, журналистам, редакторам и представителям других профессий, имеющих дело с русским языком как объектом изучения или орудием их работы.

The second, updated and enlarged edition of the Dictionary contains 354 entries representing the basic groups of the anthropocentric lexica of Russian and- less systematically-some other layers of the lexicon.
The New Explanatory Dictionary of Russian Synonyms is a production dictionary coordinated with a certain grammatical description of Russian. It implements the principles of systematic lexicography and aims at the reflection of the language-specific ("попе") picture of the world. It also purports to combine detailed linguistic portraits of separate lexical items with a unified description of all lexical items belonging to a single lexicographic type. Every dictionary entry stores the information on the semantic, referential, pragmatic, connotative, communicate e, syntactic, selectional, morphological, and prosodic similarities and differences between synonyms, as well as on the conditions of neutralization of the differences. All dictionary entries include large supplementary zones listing phraseological synonyms, analogues, exact and inexact converse terms, exact and inexact antonyms and derivatives of the items comprising the synonym series. In a number of cases the entry supplies references to technical literature devoted to one or several of the items making the series.
The book is addressed to a wide audience of philologists with an interest in lexicology, lexicography and theoretical semantics; to teachers and students of Russian as a foreign language or their mother tongue; and also to writers, reporters, editors and other professionals who handle Russian as the object of study or the tool of their work.