Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

scholarly journals Modern Linguistic Technologies: Strategy for Teaching Translation Studies

Author(s):  
Bilous O ◽  
◽  
Mishchenko A ◽  
Datska T ◽  
Ivanenko N ◽  
...  

How often students use IT resources is a key factor in the acquisition of skills associated to the new technologies. Strategies aimed at increasing student autonomy need to be developed and should offer resources that encourage them to make use of computing tools in class hours. The analysis of the modern linguistic technologies, concerning intellectual language processing necessary for the creation and function of the highly effective technologies of knowledge operation was considered in the paper under consideration. Computerization of the information sphere has triggered extensive search for solving the problem of the use of natural language mechanisms in automated systems of various types. One of them was creating Controlled languages based on a set of features which made machine translation more refined. Triggered by the economic demand, they are not artificial languages like Esperanto, but natural simplified languages, in terms of vocabulary, grammatical and syntactic structures. More than ever, the tasks of modern computer linguistics behold creating software for natural language processing, information retrieval in large data sets, support of technical authors in the process of creating professional texts and users of computer technology, hence creating new translation tools. Such powerful linguistic resources as corpora of texts, terminology databases and ontologies may facilitate more efficient use of modern multilingual information technology. Creating and improving all methods considered will help make the job of a translator more efficient. One of the programs, CLAT does not aim at producing machine translation, but allows technical editors to create flawless, sequential professional texts through integrated punctuation and spelling modules. Other programs under consideration are to be implemented in Ukrainian translation departments. Moreover, the databases considered in the paper enable studying of the dynamics of the linguistic system and developing areas of applied research such as terminography, terminology, automated data processing etc. Effective cooperation of developers, translators and declarative institutes in the creation of innovative linguistic technologies will promote further development of translation and applied linguistics.

2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Author(s):  
Rohan Pandey ◽  
Vaibhav Gautam ◽  
Ridam Pal ◽  
Harsh Bandhey ◽  
Lovedeep Singh Dhingra ◽  
...  

BACKGROUND The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. OBJECTIVE We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. METHODS We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. RESULTS A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot “Satya” increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. CONCLUSIONS We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation. CLINICALTRIAL Not Applicable


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Fridah Katushemererwe ◽  
Andrew Caines ◽  
Paula Buttery

AbstractThis paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore–Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-11 ◽  
Author(s):  
Wenbin Xu ◽  
Chengbo Yin

With the continuous advancement of technology, the amount of information and knowledge disseminated on the Internet every day has been developing several times. At the same time, a large amount of bilingual data has also been produced in the real world. These data are undoubtedly a great asset for statistical machine translation research. Based on the dual-sentence quality corpus screening, two corpus screening strategies are proposed first, based on the double-sentence pair length ratio method and the word-based alignment information method. The innovation of these two methods is that no additional linguistic resources such as bilingual dictionary and syntactic analyzer are needed as auxiliary. No manual intervention is required, and the poor quality sentence pairs can be automatically selected and can be applied to any language pair. Secondly, a domain adaptive method based on massive corpus is proposed. The method based on massive corpus utilizes massive corpus mechanism to carry out multidomain automatic model migration. In this domain, each domain learns the intradomain model independently, and different domains share the same general model. Through the method of massive corpus, these models can be combined and adjusted to make the model learning more accurate. Finally, the adaptive method of massive corpus filtering and statistical machine translation based on cloud platform is verified. Experiments show that both methods have good effects and can effectively improve the translation quality of statistical machines.


2019 ◽  
Author(s):  
Negacy D. Hailu ◽  
Michael Bada ◽  
Asmelash Teka Hadgu ◽  
Lawrence E. Hunter

AbstractBackgroundthe automated identification of mentions of ontological concepts in natural language texts is a central task in biomedical information extraction. Despite more than a decade of effort, performance in this task remains below the level necessary for many applications.Resultsrecently, applications of deep learning in natural language processing have demonstrated striking improvements over previously state-of-the-art performance in many related natural language processing tasks. Here we demonstrate similarly striking performance improvements in recognizing biomedical ontology concepts in full text journal articles using deep learning techniques originally developed for machine translation. For example, our best performing system improves the performance of the previous state-of-the-art in recognizing terms in the Gene Ontology Biological Process hierarchy, from a previous best F1 score of 0.40 to an F1 of 0.70, nearly halving the error rate. Nearly all other ontologies show similar performance improvements.ConclusionsA two-stage concept recognition system, which is a conditional random field model for span detection followed by a deep neural sequence model for normalization, improves the state-of-the-art performance for biomedical concept recognition. Treating the biomedical concept normalization task as a sequence-to-sequence mapping task similar to neural machine translation improves performance.


2017 ◽  
Vol 68 (2) ◽  
pp. 169-178
Author(s):  
Leonid Iomdin

Abstract Microsyntax is a linguistic discipline dealing with idiomatic elements whose important properties are strongly related to syntax. In a way, these elements may be viewed as transitional entities between the lexicon and the grammar, which explains why they are often underrepresented in both of these resource types: the lexicographer fails to see such elements as full-fledged lexical units, while the grammarian finds them too specific to justify the creation of individual well-developed rules. As a result, such elements are poorly covered by linguistic models used in advanced modern computational linguistic tasks like high-quality machine translation or deep semantic analysis. A possible way to mend the situation and improve the coverage and adequate treatment of microsyntactic units in linguistic resources is to develop corpora with microsyntactic annotation, closely linked to specially designed lexicons. The paper shows how this task is solved in the deeply annotated corpus of Russian, SynTagRus.


Export Citation Format

Share Document