Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
introduction
Free access

Introduction to the Special Issue of Recent Advances in Computational Linguistics for Asian Languages

Published: 14 April 2023 Publication History

Editorial Notes

The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected Version of Record was published on May 05, 2023. For reference purposes, the VoR may still be accessed via the Supplemental Material section on this citation page.
Asia is one of the largest and most populous continents, home to about 60% of the world's population, and has a great diversity of religions, ethnicities, and societies that shape its culture. In recent years, communication between Asian countries has increased dramatically, which will help strengthen economic and social ties between nations. Socio-economic relations between Asian countries have become more vital, and their interconnection reinforces the need to invent a medium that facilitates interaction between their people who speak different Asian languages. The linguistic barriers between the Asian countries are a major challenge to overcome. Recent advances in computational linguistics are encouraging Asian countries to promote the socioeconomic growth of their nations by enabling communication between Asians from another region speaking a foreign language. Technological advances, including cloud computing, Big Data, artificial intelligence (AI), machine learning (ML), and deep learning, are calling computational linguistics to understand the structure of human language and its use in social settings. With such technological advances and the availability of an enormous number of linguistic datasets, computational linguistics is gaining a lot of attention from researchers. Since AI seriously impacts computational linguistics (CL) by applying various computational methods to understand, learn, and generate structural content, its application in Asian languages can reduce the linguistic disadvantages of the Asian population. Computational linguistics promotes human-human interaction, the interaction between humans and machines, by using various machine-dependent language translation systems and conversational agents. The advancement and integration of computational linguistics with the latest intelligent techniques benefits both humans and machines in analyzing and learning large amounts of linguistic data by enabling smooth interaction.

In This Special Issue

This special issue addresses recent advances in computational linguistics for Asian languages, exploring research methods and mathematical analysis on various aspects of computational linguistics and natural language processing, including morphology, machine translation, computational resources, grammar, syntax, and semantics. The original and technology-oriented research articles involving in-depth analysis of linguistic data for machine translation, automatic speech recognition, and text-to-speech based on the structural content of Asian language are considered in this special issue. The collected contributions of the special issue can be briefly summarized as follows.
For the article: An Effective Learning Evaluation Method Based on Text Data with Real-time Attribution - A Case Study for Mathematical Class with Students of Junior Middle School in China,”, the authors first employed perception technology to extract learning text data based on time and operation attributes. Moreover, based on the real-time attributes of text data, such as time and operation attributes, a learning evaluation method based on real-time text data is proposed. Finally, the developed model compares the traditional evaluation method with the proposed method. The results show that using text data with real-time attributes is more effective in measuring student learning.
For the article: Unsupervised Parallel Sentences of Machine Translation for Asian Language Pairs, the authors proposed a new unsupervised similarity computation and dynamic selection metric to obtain parallel sentence pairs in an unsupervised situation. The developed method first maps bilingual word embedding (BWE) by postdoc adversarial training, which rotates the source space to match the target space without parallel data. Then, a new cross-domain similarity adaptation is introduced to obtain parallel sentence pairs. Experimental results on real datasets show that the developed model can achieve better accuracy and recall in obtaining parallel sentence pairs.
For the article: Effective College English Teaching based on Teacher-Student Interactive Model, the authors developed a model involving hypothesized relationships between college students' perceptions of English learning, their perceptions of the learning space environment, and their approaches to learning. Students are studied using the Pre-trained Teacher-Student Fixed Interactive Model (PTSFIM). This model proposes a new way to develop the teaching process, which is the basis for strategic performance monitoring for an institute. In addition, reciprocal instructional analysis optimizes student models for longer duration. The analysis of the results of the study suggests that interactive learning can help students who predict different outcomes to participate in the speech system and gain the best knowledge.
For the article: An Intelligent Telugu Handwritten Character Recognition using Multi-Objective Mayfly Optimization with Deep Learning Based DenseNet Model, the authors presented an intelligent Telugu character recognition using a model to optimize multi-objective mayfly optimization with deep learning (MOMFO-DL). The proposed MOMFO-DL technique incorporates the DenseNet-169 model as a feature extractor to generate a useful set of feature vectors. Also, a Functional Link Neural Network (FLNN) is used as a classification model to recognize and classify the printer characters. The use of MOMFO technique helps to tune the parameters optimally so that the overall performance can be improved. The experimental results show the superiority of the MOMFO technique over the recent state-of-the-art methods.
For the article: Towards Explainable Dialogue System using Two-Stage Response Generation, the authors have proposed a Two-Stage Dialogue Response Generation (TSRG) model that describes a method for generating diverse and informative responses based on an interpretable procedure between stages. TSRG is based on a two-stage process in which a candidate response is first generated and then instantiated as the final response. Positional information and a resident token are injected into the candidate response to stabilize and mitigate the shortcomings of the multistage framework. Moreover, TSRG enables the adjustment and interpretation of the interaction pattern between the two generation stages and makes the generation response reasonably explainable and controllable. The experimental results show that TSRG can produce more diverse and informative responses and remains fluid and relevant compared to the previous multi-stage dialogue generation models.
For the article: Research and Implementation of Automatic Indexing Method of PDF for Digital Publishing, the authors proposed a PDF automatic indexing scheme, which can identify all the element information in PDF and output structured data automatically, and then extract all the key information in it to generate a keyword library with tag weight. The scheme mainly involves two key technical points: parsing PDF based on text features and grammar rules and extracting keywords based on tag weight. The former visualizes the text block in PDF into a rectangular area, and divides the elements by clustering algorithm, and finally outputs structured data containing all the information. The latter combines the tags and their weights in the structured data and extracts the keywords in it by the inter-word relation algorithm. The structured data and keywords database produced by this scheme can be used to produce intelligent e-book and build knowledge graph, thus helping publishing enterprises to transform from a content service provider to an intelligent knowledge service provider. This transformation can deeply excavate the core value of the content held by the publishing industry and promote the digitization and intelligentization process of the whole industry.
For the article: Context-Aware Urdu Information Retrieval System, the authors proposed an Urdu language- based information retrieval system that uses various techniques related to the architecture of a semantic web search engine to efficiently retrieve the relevant information and solve the problem of word sense ambiguity (WSA). The proposed system has an average accuracy of 96% compared to an average accuracy of 74% and an average accuracy of 75% for Google for single word queries. For long text queries, the developed system outperforms well-known search engines such as Bing and Google with an accuracy of 16.50% and 16%, respectively, and an accuracy of 92%. Also, for single word queries, the retrieval accuracy of the proposed system is 32.25% compared to 25% for Bing and 25% for Google. The retrieval rate results for long text queries are also better and are 6.38% compared to 6.20% and 4.8% for Bing and Google, respectively. From the above results, it is evident that the proposed system gives better and efficient results as compared to the existing systems for Urdu language.
For the article: Research on Chinese Audio and Text Alignment Algorithm Based on AIC-FCM and Doc2Vec, the authors proposed an audio-text matching algorithm using deep learning and neural network technology to improve the efficiency and quality of audio book production. The designed algorithm first uses dual-threshold endpoint detection technology to segment long audio into short audio with sentence dimensions and recognize it as short text. The threshold is calculated by Akaike Information Criterion-Fuzzy C-Means (AIC-FCM) and optimized based on the simulated annealing genetic algorithm. Then the algorithm uses Doc2vec optimized by the threshold prediction method based on the average length of the short text to calculate the text similarity. Finally, the text sequence and audio segment are checked and output, which are matched in the time dimension to meet the requirements of audio book production. Experiments show that the proposed algorithm is closer to the ideal segmentation result when segmenting long audio segments compared to conventional audio and text alignment algorithms. The alignment effect is basically the same as Doc2vec and the time complexity is reduced by about 35%.
For the article: A Weak-Region Enhanced Bayesian Classification for Spam Content-Based Filtering, the authors proposed an improved Bayesian scheme by focusing on the area where Bayesian may not be able to correctly identify labels and improve classification performance by addressing these errors. In a spam detection problem, the prediction of the Bayesian classifier can be expected to be weak if the probabilities determined for the spam and non-spam classes are close to each other. Therefore, the authors defined a threshold to distinguish a weak prediction from a strong prediction. A hybrid strategy using a two-layer Bayesian approach is presented: Basic Bayesian (BBayes) and Corrected Weak Region Bayesian (CWRBayes), which deal with strong and weak predictions, respectively. Both techniques, BBayes and CWRBayes, have the same classification mechanism but use different feature selection mechanisms. The results show that the proposed method performs better than the baseline of the naïve Bayesian and some other Bayesian variants.
For the article: Integrating Heterogeneous Ontologies in Asian Languages Through Compact Genetic Algorithm with Annealing Re-sample Inheritance Mechanism, the authors proposed a Compact GA with Annealing Re-sample Inheritance mechanism (CGA-ARI) to efficiently solve the COM problem. Specifically, a Cross-lingual Similarity Metric (CSM) is presented to distinguish two cross-lingual entities, a discrete optimal model is established to define the COM problem, the compact encoding mechanism and the Annealing Re-sample Inheritance mechanism (ARI) are introduced to improve the search performance of CGA. Experimental results show that CGA-ARI is able to significantly improve the performance of GA and CGA and identify better matches than state-of-the-art ontology matching systems.
For the article: Deep Learning in Computational Linguistics for Chinese Language Translation, the authors employed the Bi-directional Long Short-Term Memory (BiLSTM) network to extract Chinese text features related to the overlapping semantic roles in Chinese language translation and the hard-to-converge training of high-dimensional text word vectors in text classification during translation. Moreover, AlexNet is optimized to extract the local features of the text and meanwhile update and learn the network parameters in the deep network. Then the attention mechanism is introduced to develop a prediction algorithm for Chinese language translation based on BiLSTM and improved AlexNet. Finally, the prediction algorithm is simulated to validate its performance. Some state-of-the-art algorithms were selected for comparative experiment, including Long Short-Term Memory, Regions with Convolutional Neural Networks features, AlexNet, and Support Vector Machine. The results show that the prediction algorithm is significantly superior to the other algorithms and can improve the performance of machine translation. The experiments show that the Chinese language translation algorithm developed here improves the translation performance while ensuring a high recognition rate. This provides experimental references for the later intelligent development of Chinese language translation in CL.
For the article: A Decision Model for Ranking Asian Higher Education Institutes using an NLP-based Text Analysis Approach, the authors proposed a Natural Language Processing (NLP)-based decision model for identifying the best higher education institution using Multiple Criteria Decision Making (MCDM) methods. The existing decision models for selecting the best higher education institutions consider a limited number of criteria for decision making. In this proposed model, 17 criteria and 15 institution datasets were identified for developing the decision model through extensive research and expert opinions. The NLP-based text analysis approach is applied to extract the relevant information and convert it into a suitable format. Since the relative importance of the criteria plays a crucial role in decision making, the CRITIC and Rank centroid methods are used to calculate the relative weights of the criteria. The TOPSIS method is used to rank the alternatives for each criterion. An objective function is defined to calculate the evaluation scores and select the best institute for higher education. It was found that the ranks obtained by the developed model match fairly well with the ranks obtained by other MCDM methods and the experts.
For the article: Domain-Invariant Feature Progressive Distillation with Adversarial Adaptive Augmentation for Low-Resource Cross-Domain NER, the authors presented an adversarial adaptive augmentation, where we integrate the adversarial strategy into a multitasking leaner to augment and qualify domain-adaptive data. The domain-invariant features of the adaptive data are extracted to bridge the cross-domain gap while mitigating the label sparsity problem. Thus, another important component in this paper is progressive domain-invariant feature distillation. A multi-grained MMD (Maximum Mean Discrepancy) approach within the framework to extract the domain invariant features at multiple levels and enable knowledge transfer across domains through the adversarial adaptive data. The advanced Knowledge Distillation (KD) scheme enables stepwise domain adaptation through the powerful pre-trained language models and the multi-level domain invariant features. Extensive comparison experiments with four English and two Chinese benchmarks demonstrate the importance of adversarial augmentation and effective adaptation of resource-intensive domains to resource-poor target domains. The comparison with two vanilla and four current baselines shows that the system achieves state-of-the-art and superior performance in both zero and minimal resource scenarios.
Jerry Chun-Wei Lin
Western Norway University of Applied Sciences, Bergen, Norway
Vicente García Díaz
University of Oviedo, Spain
Juan Antonio Morente Molinera
University of Granada, Spain

Supplementary Material

3588316-vor (3588316-vor.pdf)
Version of Record for "Introduction to the Special Issue of Recent Advances in Computational Linguistics for Asian Languages" by Lin et al., ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 22, No. 3 (TALLIP 22:3).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 3
March 2023
570 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3579816
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 April 2023
Published in TALLIP Volume 22, Issue 3

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Introduction

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 177
    Total Downloads
  • Downloads (Last 12 months)140
  • Downloads (Last 6 weeks)31
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media