Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

text collections
Recently Published Documents





2022 ◽  
Vol 3 (1) ◽  
pp. 1-16
Haoran Ding ◽  
Xiao Luo

Searching, reading, and finding information from the massive medical text collections are challenging. A typical biomedical search engine is not feasible to navigate each article to find critical information or keyphrases. Moreover, few tools provide a visualization of the relevant phrases to the query. However, there is a need to extract the keyphrases from each document for indexing and efficient search. The transformer-based neural networks—BERT has been used for various natural language processing tasks. The built-in self-attention mechanism can capture the associations between words and phrases in a sentence. This research investigates whether the self-attentions can be utilized to extract keyphrases from a document in an unsupervised manner and identify relevancy between phrases to construct a query relevancy phrase graph to visualize the search corpus phrases on their relevancy and importance. The comparison with six baseline methods shows that the self-attention-based unsupervised keyphrase extraction works well on a medical literature dataset. This unsupervised keyphrase extraction model can also be applied to other text data. The query relevancy graph model is applied to the COVID-19 literature dataset and to demonstrate that the attention-based phrase graph can successfully identify the medical phrases relevant to the query terms.

Miina Norvik ◽  
Uldis Balodis ◽  
Valts Ernštreits ◽  
Gunta Kļava ◽  
Helle Metslang ◽  

This article offers a comparative analysis of several morphosyntactic and phonological features in the South Estonian language islands: Leivu, Lutsi, and Kraasna. The objective is to give an overview of the distribution of selected features, their (in)stability over time, and discuss their form and use in a broader areal context. To achieve this goal, comparative information was also included from the closest cognate varieties (Estonian and the South Estonian varieties, Courland Livonian and Salaca Livonian) and the main contact varieties (Latgalian, Latvian, and Russian). The data analysed in this study originated from various sources: text collections, dictionaries, and language corpora. The results reveal a multitude of linguistic patterns and distribution patterns, which means that the studied varieties are similar to / different from one another in various ways and points to multifaceted contact situations and outcomes in this area. Kokkuvõte. Miina Norvik, Uldis Balodis, Valts Ernštreits, Gunta Kļava, Helle Metslang, Karl Pajusalu, Eva Saar: Lõunaeesti keelesaared Kesk-Balti mõjuväljas. Artikkel esitab lõunaeesti keelesaarte – Leivu, Lutsi ja Kraasna – mitme morfosüntaktilise ja fonoloogilise joone võrdleva analüüsi. Uurimuse eesmärgiks on anda ülevaade valitud joonte levikust ja püsivusest ajas ning arutleda nende vormide ja kasutuse üle laiemas areaalses kontekstis. Selleks võetakse arvesse lähimate sugulaskeelte (eesti ja lõunaeesti, Kuramaa ja Salatsi liivi) ja -murrete ning tähtsamate kontaktkeelte (latgali, läti, vene) esinemusi. Analüüsitakse erinevatest allikatest, mh tekstikogudest, sõna- raamatutest ja keelekorpustest pärit ainest. Uurimistulemused toovad esile mitmesuguseid vormiseoseid ja muutuste levikuviise, osutades uuritud keelte ja murrete omavaheliste kontaktide mitmelaadsusele ning sellest tingitud erinevatele keelesüsteemi arengutele.

2021 ◽  
Vol 40 ◽  
pp. 7-19
Olga Barabasz-Rewak

This article is a part of a study on the integral linguistic image of God in the Ukrainian translation of the Psalter translated by Ivan Ohienko. The important role of Ohienko’s texts comes from the scientific nature of the translation and the influences in the formation of literary language. The author of the study is interested in the ways and means by which the concept of the RIGHTEOUS – one of the most frequent elements God functions with in text collections – is verbally expressed. Therefore, in this study, attention is focused on an attempt to ethnolinguistically analyse (based on the conception of profiling by J. Bartmiński) of the Ukrainian lingual implementation of such biblical concepts as ‘righteous person’, ‘the main signs of a righteous person associated with God’, and ‘the actions of a righteous person towards a) God, b) sinners’. As a result, it will be possible to trace the richness and diversity of the language image ‘righteous’ created by Ivan Ohienko, by bringing readers closer to the understanding of the ways of linguistic filling of in Old Testament texts with Ukrainian language means.

2021 ◽  
Vol 19 (3) ◽  
pp. 61-69
N. I. Tikhonov

Visualizations are used to better understand collections of scientific publications. Various methods of analyzing text collections can be used to build these visualizations. This article discusses two methods Paper2vec and Cite2vec that get vector representations of documents using citation information. To demonstrate a work of these techniques and an example of their application, visualizations were developed, which are described in this paper.

N. I. Tikhonov

Collections of scientific publications are growing rapidly. Scientists have access to portals containing a large number of documents. Such a large amount of data is difficult to investigate. Methods of document visualization are used to reduce labor costs, search for necessary and similar documents, evaluate the scientific contribution of certain publications and reveal hidden links between documents. The methods of document visualization can be based on various models of document representation. In recent years, word embedding methods for natural language processing have become extremely popular. Following them, methods for analyzing text collections began to appear to obtain vector representations of documents. Although there are many document analyzing systems, new methods can give new understandings of collections, have better performance for analyzing large collections of documents, or find new relationships between documents. This article discusses two methods Paper2vec and Cite2vec that get vector representations of documents using citation information. The text provides a brief description of the considered methods for analyzing collections of scientific publications, describes experiments with these methods, including the visualization of the results of the methods and a description of the problems that arise.

Religions ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 817
Maroussia Bednarkiewicz

For more than two centuries, Muslims have been retelling different stories about the origin of their call to prayer. While the converging details of these narratives offer a glimpse of Muslim cultural memory and its preservation, the diverging elements reflect different mechanisms that facilitate the adaption of this cultural memory to new contexts and concerns. Based on the work of Jan Assmann, the present study explores how Muslims conserved and adapted their cultural memory to keep their common identity and expand their diversity following distinctive religious, political, or personal forms of belongings. The narratives concerned with the origin of the Islamic call to prayer and preserved in various written text collections offer a fertile ground to analyze how this part of Muslim cultural memory became the vehicle of a permanent but adaptable Muslim identity.

Daniel Maier ◽  
Christian Baden ◽  
Daniela Stoltenberg ◽  
Maya De Vries-Kedem ◽  
Annie Waldherr

2021 ◽  
Vol 11 (1) ◽  
Guanghao You ◽  
Balthasar Bickel ◽  
Moritz M. Daum ◽  
Sabine Stoll

AbstractThe way infants learn language is a highly complex adaptive behavior. This behavior chiefly relies on the ability to extract information from the speech they hear and combine it with information from the external environment. Most theories assume that this ability critically hinges on the recognition of at least some syntactic structure. Here, we show that child-directed speech allows for semantic inference without relying on explicit structural information. We simulate the process of semantic inference with machine learning applied to large text collections of two different types of speech, child-directed speech versus adult-directed speech. Taking the core meaning of causality as a test case, we find that in child-directed speech causal meaning can be successfully inferred from simple co-occurrences of neighboring words. By contrast, semantic inference in adult-directed speech fundamentally requires additional access to syntactic structure. These results suggest that child-directed speech is ideally shaped for a learner who has not yet mastered syntactic structure.

Export Citation Format

Share Document