Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 2019
This article provides a consistent formal grammatical and ontological description of the model of... more This article provides a consistent formal grammatical and ontological description of the model of the Tibetan compounds system, developed and used for automatic syntactic and semantic analysis of Tibetan texts, on the material of a hand-verified corpus. This model covers all types of Tibetan compounds, which were previously introduced by other authors, and introduces a number of new classes of compounds, taking into account their derivation, structure and semantics. The article describes the tools used for ontological modeling of Tibetan compounds; special attention is paid to the problem of modeling the semantics of verbs and verbal compounds. Nominal and verbal compounds are considered separately, it is noted that the importance of verbal compounds for the Tibetan language system is not less than that of nominal compounds. The statistical data on the absolute frequency distribution of the use of compounds of different types in the current version of the corpus annotation and on the amounts of ontology concepts associated with each class of compounds are given.
Digital Transformation and Global Society. DTGS 2018. Communications in Computer and Information Science, vol 859, 2018
The article presents the experience of developing computer ontology as one of the tools for autom... more The article presents the experience of developing computer ontology as one of the tools for automatic natural language processing. A computer on-tology that contains a consistent specification of meanings of lexical units with different relations between them represents a model of lexical semantics and both syntactic and semantic valencies, reflecting the Tibetan linguistic picture of the world. The article describes the approach of using computer ontology as a means of introducing semantic restrictions for morphosyntactic disambigua-tion on the basis of the corpus of indigenous grammatical treatises.
The paper contains the research of noun-compounds from modern Tibetan corpus with the use of a re... more The paper contains the research of noun-compounds from modern Tibetan corpus with the use of a relational lexical database. The lexical database represents a consistent classification of meanings of Tibetan lexical units with different relations between them. The paper describes the structure of the database; principles of process work with Tibetan compounds; recognized types of compounds semantic structure. For the most part Tibetan compounds belong to grammatical, religious philosophical and general scientific terms. Therefore the paper specifies the processing principles of subject area compounds, including areas, identified by Tibetan linguistic picture of the world.
The project aims at developing a model of a corpus of Tibetan
traditional grammar treatises which... more The project aims at developing a model of a corpus of Tibetan traditional grammar treatises which is proposed to date back to 7-8l h centuries C.E. The corpus will be useful to scholars focusing on Tibetan traditional grammar treatises and as well for linguistic research on classical and modern Tibetan language, its description and teaching.
This paper describes the creation of the parallel Tibetan–Russian corpus of works of the Tibetan
... more This paper describes the creation of the parallel Tibetan–Russian corpus of works of the Tibetan grammatical tradition that formed in the 7–8th centuries AD. On the basis of the corpus, a special lexical base of grammatical terminology is formed that could be of interest for Tibetologists and specialists in general lin guistics. The corpus can be used for linguistic research, teaching, and the study of the classical and modern Tibetan language, as well as the Tibetan grammatical tradition.
Описывается создание параллельного тибетско-русского корпуса памятников тибетской грамматической ... more Описывается создание параллельного тибетско-русского корпуса памятников тибетской грамматической традиции, cформировавшейся в VII–VIII вв. н.э. На основе корпуса формируется специальная лексическая база грамматической терминологии, представляющая интерес как для тибетологов, так и для специалистов по общему языкознанию. Корпус может быть использован для лингвистических исследований, преподавания и изучения классического и современного тибетского языка, а также тибетской грамматической традиции.
* The authors acknowledge Saint-Petersburg State University for a research grant 2.38.293.2014 Mo... more * The authors acknowledge Saint-Petersburg State University for a research grant 2.38.293.2014 Modernizing the Tibetan Literary Tradition for a study of the content of Tibetan grammar treatises. The model of linguistic data presentation in the parallel corpus and lexical database were developed with financial support of the Russian Foundation for Basic Research as a part of the research project 13–06–00621 "The Pilot Version of Tibetan Grammar Texts' Electronic Corpus". Abstract-The paper is devoted to Tibetan grammatical terminology. For this purpose Tibetan grammatical works corpus was created. At the same time Russian translations of the works were added to the corpus, so it is factually a parallel Tibetan-Russian corpus. The corpus represents the collection of grammar treatises of the Tibetan grammatical tradition formed in VII-VIII cc. The corpus is useful to researchers of the Tibetan linguistic tradition as well as to those specialized in linguistic studies of classical and modern Tibetan and its teaching. On the basis of corpus a specific grammatical lexical database is created. The database will be useful both to tibetologists and general linguistics specialists.
The paper is devoted to Tibetan grammatical terminology. For this purpose Tibetan grammatical wor... more The paper is devoted to Tibetan grammatical terminology. For this purpose Tibetan grammatical works corpus was created. At the same time Russian translations of the works were added to the corpus, so it is factually a parallel Tibetan-Russian corpus. The corpus represents the collection of grammar treatises of the Tibetan grammatical tradition formed in VII-VIII c. The corpus is useful to researchers of the Tibetan linguistic tradition as well as to those specialized in linguistic studies of classical and modern Tibetan and its teaching. On the basis of corpus a specific grammatical lexical database is created. The database will be useful both to tibetologists and general linguistics specialists.
The article presents the experience of developing computer ontology as one of the tools for Tibet... more The article presents the experience of developing computer ontology as one of the tools for Tibetan idioms processing. A computer ontology that contains a consistent specification of meanings of lexical units with different relations between them represents a model of lexical semantics and both syntactic and semantic valencies, reflecting the Tibetan linguistic picture of the world. The article presents an attempt to classify Tibetan idioms, including compounds, which are idiomatized clips of syntactic groups that have frozen inner syntactic relations and are often characterized by omission of grammatical morphemes; and the application of this classification for idioms processing in computer ontol-ogy. The article also proposes methods of using computer ontology for avoiding idioms processing ambiguity.
Тибетская традиционная лингвистика (санскр. śabdavidyā, тиб. sgra'i rig pa), включаемая тибетцами... more Тибетская традиционная лингвистика (санскр. śabdavidyā, тиб. sgra'i rig pa), включаемая тибетцами в число пяти великих наук, сформировалась под влиянием индийского языкознания благодаря переводам индийских буддийских текстов, объединенных в канонические сборники Кангьюр и Тэнгьюр, последний из которых содержит более 40 переведенных с санскрита лингвистических сочинений. В отличие от индийской лингвистической традиции, первоначально существовавшей в устной форме, развитие тибетского языкознания началось после появления письменности в VII в. Впоследствии тибетские авторы создавали комментарии к переведенным грамматическим сочинениям, а также собственные труды, посвященные фонологии, грамматике и семантике санскрита, правилам перевода на тибетский язык. Авторство первых трактатов, предметом описания которых был тибетский язык,– «Сумчупы» (тиб. Sum cu pa) и «Тагкичжугпы» (тиб. Rtags kyi ‘jug pa), приписывается Тхонми Самбхоте, советнику царя Сонгцэн Гампо (VII в.). Важная особенность описания элементов графики, не отделяемой от фонологии (тиб. yi ge ‘фонема’, ‘графема’), комбинаторных правил слогообразования, значений служебных лексем и морфем в двух основополагающих трактатах тибетской грамматической традиции – активное использование приемов индийских лингвистов (например, анувритти), калькирование и фонетическое заимствование санскритских грамматических терминов. Для традиционного тибетского языкознания, как и для других традиционных буддийских наук, характерно наличие большого количества подробных прозаических комментариев к основополагающим кратким трактатам в стихотворной форме, причем количество комментариев продолжает умножаться и сегодня, а современные тибетские грамматисты строго следуют традициям грамматического описания, заложенного трактатами «Сумчупа» и «Тагкичжугпа» и комментариями к ним, в некоторых случаях позволяя себе дополнять и уточнять классические формулировки.
Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 2019
This article provides a consistent formal grammatical and ontological description of the model of... more This article provides a consistent formal grammatical and ontological description of the model of the Tibetan compounds system, developed and used for automatic syntactic and semantic analysis of Tibetan texts, on the material of a hand-verified corpus. This model covers all types of Tibetan compounds, which were previously introduced by other authors, and introduces a number of new classes of compounds, taking into account their derivation, structure and semantics. The article describes the tools used for ontological modeling of Tibetan compounds; special attention is paid to the problem of modeling the semantics of verbs and verbal compounds. Nominal and verbal compounds are considered separately, it is noted that the importance of verbal compounds for the Tibetan language system is not less than that of nominal compounds. The statistical data on the absolute frequency distribution of the use of compounds of different types in the current version of the corpus annotation and on the amounts of ontology concepts associated with each class of compounds are given.
Digital Transformation and Global Society. DTGS 2018. Communications in Computer and Information Science, vol 859, 2018
The article presents the experience of developing computer ontology as one of the tools for autom... more The article presents the experience of developing computer ontology as one of the tools for automatic natural language processing. A computer on-tology that contains a consistent specification of meanings of lexical units with different relations between them represents a model of lexical semantics and both syntactic and semantic valencies, reflecting the Tibetan linguistic picture of the world. The article describes the approach of using computer ontology as a means of introducing semantic restrictions for morphosyntactic disambigua-tion on the basis of the corpus of indigenous grammatical treatises.
The paper contains the research of noun-compounds from modern Tibetan corpus with the use of a re... more The paper contains the research of noun-compounds from modern Tibetan corpus with the use of a relational lexical database. The lexical database represents a consistent classification of meanings of Tibetan lexical units with different relations between them. The paper describes the structure of the database; principles of process work with Tibetan compounds; recognized types of compounds semantic structure. For the most part Tibetan compounds belong to grammatical, religious philosophical and general scientific terms. Therefore the paper specifies the processing principles of subject area compounds, including areas, identified by Tibetan linguistic picture of the world.
The project aims at developing a model of a corpus of Tibetan
traditional grammar treatises which... more The project aims at developing a model of a corpus of Tibetan traditional grammar treatises which is proposed to date back to 7-8l h centuries C.E. The corpus will be useful to scholars focusing on Tibetan traditional grammar treatises and as well for linguistic research on classical and modern Tibetan language, its description and teaching.
This paper describes the creation of the parallel Tibetan–Russian corpus of works of the Tibetan
... more This paper describes the creation of the parallel Tibetan–Russian corpus of works of the Tibetan grammatical tradition that formed in the 7–8th centuries AD. On the basis of the corpus, a special lexical base of grammatical terminology is formed that could be of interest for Tibetologists and specialists in general lin guistics. The corpus can be used for linguistic research, teaching, and the study of the classical and modern Tibetan language, as well as the Tibetan grammatical tradition.
Описывается создание параллельного тибетско-русского корпуса памятников тибетской грамматической ... more Описывается создание параллельного тибетско-русского корпуса памятников тибетской грамматической традиции, cформировавшейся в VII–VIII вв. н.э. На основе корпуса формируется специальная лексическая база грамматической терминологии, представляющая интерес как для тибетологов, так и для специалистов по общему языкознанию. Корпус может быть использован для лингвистических исследований, преподавания и изучения классического и современного тибетского языка, а также тибетской грамматической традиции.
* The authors acknowledge Saint-Petersburg State University for a research grant 2.38.293.2014 Mo... more * The authors acknowledge Saint-Petersburg State University for a research grant 2.38.293.2014 Modernizing the Tibetan Literary Tradition for a study of the content of Tibetan grammar treatises. The model of linguistic data presentation in the parallel corpus and lexical database were developed with financial support of the Russian Foundation for Basic Research as a part of the research project 13–06–00621 "The Pilot Version of Tibetan Grammar Texts' Electronic Corpus". Abstract-The paper is devoted to Tibetan grammatical terminology. For this purpose Tibetan grammatical works corpus was created. At the same time Russian translations of the works were added to the corpus, so it is factually a parallel Tibetan-Russian corpus. The corpus represents the collection of grammar treatises of the Tibetan grammatical tradition formed in VII-VIII cc. The corpus is useful to researchers of the Tibetan linguistic tradition as well as to those specialized in linguistic studies of classical and modern Tibetan and its teaching. On the basis of corpus a specific grammatical lexical database is created. The database will be useful both to tibetologists and general linguistics specialists.
The paper is devoted to Tibetan grammatical terminology. For this purpose Tibetan grammatical wor... more The paper is devoted to Tibetan grammatical terminology. For this purpose Tibetan grammatical works corpus was created. At the same time Russian translations of the works were added to the corpus, so it is factually a parallel Tibetan-Russian corpus. The corpus represents the collection of grammar treatises of the Tibetan grammatical tradition formed in VII-VIII c. The corpus is useful to researchers of the Tibetan linguistic tradition as well as to those specialized in linguistic studies of classical and modern Tibetan and its teaching. On the basis of corpus a specific grammatical lexical database is created. The database will be useful both to tibetologists and general linguistics specialists.
The article presents the experience of developing computer ontology as one of the tools for Tibet... more The article presents the experience of developing computer ontology as one of the tools for Tibetan idioms processing. A computer ontology that contains a consistent specification of meanings of lexical units with different relations between them represents a model of lexical semantics and both syntactic and semantic valencies, reflecting the Tibetan linguistic picture of the world. The article presents an attempt to classify Tibetan idioms, including compounds, which are idiomatized clips of syntactic groups that have frozen inner syntactic relations and are often characterized by omission of grammatical morphemes; and the application of this classification for idioms processing in computer ontol-ogy. The article also proposes methods of using computer ontology for avoiding idioms processing ambiguity.
Тибетская традиционная лингвистика (санскр. śabdavidyā, тиб. sgra'i rig pa), включаемая тибетцами... more Тибетская традиционная лингвистика (санскр. śabdavidyā, тиб. sgra'i rig pa), включаемая тибетцами в число пяти великих наук, сформировалась под влиянием индийского языкознания благодаря переводам индийских буддийских текстов, объединенных в канонические сборники Кангьюр и Тэнгьюр, последний из которых содержит более 40 переведенных с санскрита лингвистических сочинений. В отличие от индийской лингвистической традиции, первоначально существовавшей в устной форме, развитие тибетского языкознания началось после появления письменности в VII в. Впоследствии тибетские авторы создавали комментарии к переведенным грамматическим сочинениям, а также собственные труды, посвященные фонологии, грамматике и семантике санскрита, правилам перевода на тибетский язык. Авторство первых трактатов, предметом описания которых был тибетский язык,– «Сумчупы» (тиб. Sum cu pa) и «Тагкичжугпы» (тиб. Rtags kyi ‘jug pa), приписывается Тхонми Самбхоте, советнику царя Сонгцэн Гампо (VII в.). Важная особенность описания элементов графики, не отделяемой от фонологии (тиб. yi ge ‘фонема’, ‘графема’), комбинаторных правил слогообразования, значений служебных лексем и морфем в двух основополагающих трактатах тибетской грамматической традиции – активное использование приемов индийских лингвистов (например, анувритти), калькирование и фонетическое заимствование санскритских грамматических терминов. Для традиционного тибетского языкознания, как и для других традиционных буддийских наук, характерно наличие большого количества подробных прозаических комментариев к основополагающим кратким трактатам в стихотворной форме, причем количество комментариев продолжает умножаться и сегодня, а современные тибетские грамматисты строго следуют традициям грамматического описания, заложенного трактатами «Сумчупа» и «Тагкичжугпа» и комментариями к ним, в некоторых случаях позволяя себе дополнять и уточнять классические формулировки.
Uploads
Papers by Maria Smirnova
For the most part Tibetan compounds belong to grammatical, religious philosophical and general scientific terms. Therefore the paper specifies the processing principles of subject area compounds, including areas, identified by Tibetan linguistic picture of the world.
traditional grammar treatises which is proposed to date back to 7-8l h centuries C.E. The corpus will be useful to scholars focusing on Tibetan traditional grammar treatises and as well for linguistic research on classical and modern Tibetan language, its description and teaching.
grammatical tradition that formed in the 7–8th centuries AD. On the basis of the corpus, a special lexical base
of grammatical terminology is formed that could be of interest for Tibetologists and specialists in general lin
guistics. The corpus can be used for linguistic research, teaching, and the study of the classical and modern
Tibetan language, as well as the Tibetan grammatical tradition.
В отличие от индийской лингвистической традиции, первоначально существовавшей в устной форме, развитие тибетского языкознания началось после появления письменности в VII в.
Впоследствии тибетские авторы создавали комментарии к переведенным грамматическим сочинениям, а также собственные труды, посвященные фонологии, грамматике и семантике санскрита, правилам перевода на тибетский язык.
Авторство первых трактатов, предметом описания которых был тибетский язык,– «Сумчупы» (тиб. Sum cu pa) и «Тагкичжугпы» (тиб. Rtags kyi ‘jug pa), приписывается Тхонми Самбхоте, советнику царя Сонгцэн Гампо (VII в.). Важная особенность описания элементов графики, не отделяемой от фонологии (тиб. yi ge ‘фонема’, ‘графема’), комбинаторных правил слогообразования, значений служебных лексем и морфем в двух основополагающих трактатах тибетской грамматической традиции – активное использование приемов индийских лингвистов (например, анувритти), калькирование и фонетическое заимствование санскритских грамматических терминов.
Для традиционного тибетского языкознания, как и для других традиционных буддийских наук, характерно наличие большого количества подробных прозаических комментариев к основополагающим кратким трактатам в стихотворной форме, причем количество комментариев продолжает умножаться и сегодня, а современные тибетские грамматисты строго следуют традициям грамматического описания, заложенного трактатами «Сумчупа» и «Тагкичжугпа» и комментариями к ним, в некоторых случаях позволяя себе дополнять и уточнять классические формулировки.
For the most part Tibetan compounds belong to grammatical, religious philosophical and general scientific terms. Therefore the paper specifies the processing principles of subject area compounds, including areas, identified by Tibetan linguistic picture of the world.
traditional grammar treatises which is proposed to date back to 7-8l h centuries C.E. The corpus will be useful to scholars focusing on Tibetan traditional grammar treatises and as well for linguistic research on classical and modern Tibetan language, its description and teaching.
grammatical tradition that formed in the 7–8th centuries AD. On the basis of the corpus, a special lexical base
of grammatical terminology is formed that could be of interest for Tibetologists and specialists in general lin
guistics. The corpus can be used for linguistic research, teaching, and the study of the classical and modern
Tibetan language, as well as the Tibetan grammatical tradition.
В отличие от индийской лингвистической традиции, первоначально существовавшей в устной форме, развитие тибетского языкознания началось после появления письменности в VII в.
Впоследствии тибетские авторы создавали комментарии к переведенным грамматическим сочинениям, а также собственные труды, посвященные фонологии, грамматике и семантике санскрита, правилам перевода на тибетский язык.
Авторство первых трактатов, предметом описания которых был тибетский язык,– «Сумчупы» (тиб. Sum cu pa) и «Тагкичжугпы» (тиб. Rtags kyi ‘jug pa), приписывается Тхонми Самбхоте, советнику царя Сонгцэн Гампо (VII в.). Важная особенность описания элементов графики, не отделяемой от фонологии (тиб. yi ge ‘фонема’, ‘графема’), комбинаторных правил слогообразования, значений служебных лексем и морфем в двух основополагающих трактатах тибетской грамматической традиции – активное использование приемов индийских лингвистов (например, анувритти), калькирование и фонетическое заимствование санскритских грамматических терминов.
Для традиционного тибетского языкознания, как и для других традиционных буддийских наук, характерно наличие большого количества подробных прозаических комментариев к основополагающим кратким трактатам в стихотворной форме, причем количество комментариев продолжает умножаться и сегодня, а современные тибетские грамматисты строго следуют традициям грамматического описания, заложенного трактатами «Сумчупа» и «Тагкичжугпа» и комментариями к ним, в некоторых случаях позволяя себе дополнять и уточнять классические формулировки.