Phrase based code-switching for cross-lingual question understanding

Haisa, Gulizada; Altenbek, Gulila; Li, Wen

doi:10.1007/s11042-023-16909-2

Phrase based code-switching for cross-lingual question understanding

Published: 20 September 2023

Volume 83, pages 32159–32175, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Gulizada Haisa^1,2,3,
Gulila Altenbek^1,2,3 &
Wen Li^1,2,3

105 Accesses
Explore all metrics

Abstract

Cross-lingual question understanding involves identifying named entities and question intent in the target language based on corresponding texts from the source-language training dataset. However, relying solely on bilingual parallel corpora has limitations, especially for low-resource languages where such corpora are scarce or unavailable. This paper argues that current cross-lingual techniques hinder the effectiveness of various phrases, particularly noun phrases and interrogative phrases. To address this, a new code-switching data augmentation method called PBCS is introduced for zero-shot cross-lingual training. Unlike recent methods, this approach utilizes small bilingual phrase dictionaries instead of relying on a large bilingual parallel corpus. Moreover, a cross-lingual question understanding model, XQUM, is proposed. At the lower level, the model shares input features and hidden layer states to mitigate error accumulation. Additionally, at the top level, model performance is enhanced through a bi-directional correlation layer based on an iterative mechanism, specifically tailored for the given task. Experimental results on the MQUC and MTOD datasets demonstrate that XQUM significantly improves the accuracy of cross-lingual question understanding tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-stage transfer learning with BERTology-based language models for question answering system in vietnamese

Article 30 January 2023

Multilingual Question Answering Applied to Conversational Agents

Enhancing Low-Resource Languages Question Answering with Syntactic Graph

References

Mamtimin I, Du W, Hamdulla A (2023) M2asr-kirghiz: a free kirghiz speech database and accompanied baselines. Information 14(1):55
Article Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Conneau A, Lample G (2019) Cross-lingual language model pretraining. Adv Neural Inf Process Syst 32
Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2019) Unsupervised cross-lingual representation learning at scale. arXiv:1911.02116
Lauscher A, Ravishankar V, Vulić, I, Glavaš G (2020) From zero to hero: on the limitations of zero-shot cross-lingual transfer with multilingual transformers. arXiv:2005.00633
Yang Z, Xu Z, Cui Y, Wang B, Lin M, Wu D, Chen Z (2022) Cino: a chinese minority pre-trained language model. arXiv:2202.13558
Yu K, Li H, Oguz B (2018) Multilingual seq2seq training with similarity loss for cross-lingual document classification. In: Proceedings of the third workshop on representation learning for NLP, pp 175–179
Chen, W, Chen J, Su Y, Wang X, Yu D, Yan X, Wang WY (2018) Xl-nbt: a cross-lingual neural belief tracking framework. arXiv:1808.06244
Schuster S, Gupta S, Shah R, Lewis M (2018) Cross-lingual transfer learning for multilingual task oriented dialog. arXiv:1810.13327
Krishnan J, Anastasopoulos A, Purohit H, Rangwala H (2021) Multilingual code-switching for zero-shot cross-lingual intent prediction and slot filling. arXiv:2103.07792
Xu W, Haider B, Mansour S (2020) End-to-end slot alignment and recognition for cross-lingual nlu. arXiv:2004.14353
McConvell P, Meakins F (2005) Gurindji kriol: a mixed language emerges from code-switching. Aust J Linguist 25(1):9–30
Article Google Scholar
Jose N, Chakravarthi BR, Suryawanshi S, Sherly E, McCrae JP (2020) A survey of current datasets for code-switching research. In: 2020 6th international conference on advanced computing and communication systems (ICACCS). IEEE, pp 136–141
Vilares D, Alonso MA, Gómez-Rodríguez C (2016) En-es-cs: an english-spanish code-switching twitter corpus for multilingual sentiment analysis. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16), pp 4149–4153
Winata GI, Lin Z, Fung P (2019) Learning multilingual meta-embeddings for code-switching named entity recognition. In: Proceedings of the 4th workshop on representation learning for NLP (RepL4NLP-2019), pp 181–186
Maimaiti M, Liu Y, Luan H, Sun M (2022) Data augmentation for low-resource languages nmt guided by constrained sampling. Int J Intell Syst 37(1):30–51
Article Google Scholar
Qin L, Ni M, Zhang Y, Che W (2020) Cosda-ml: multi-lingual code-switching data augmentation for zero-shot cross-lingual nlp. arXiv:2006.06402
Liu L, Shang J, Ren X, Xu F, Gui H, Peng J, Han J (2018) Empower sequence labeling with task-aware neural language model. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Wang S, Chen Z, Ni J, Yu X, Li Z, Chen H, Yu PS (2019) Adversarial defense framework for graph neural network. arXiv:1905.03679
Peng D, Wang Y, Liu C, Chen Z (2020) Tl-ner: a transfer learning model for Chinese named entity recognition. Inf Syst Front 22:1291–1304
Article Google Scholar
Ding R, Xie P, Zhang X, Lu W, Li L, Si L (2019) A neural multi-digraph model for chinese ner with gazetteers. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1462–1467
Zhang J, Hao K, Tang X-s, Cai X, Xiao Y, Wang T (2020) A multi-feature fusion model for chinese relation extraction with entity sense. Knowl Based Syst 206:106348
Article Google Scholar
Wang X, Wang H, Zhao G, Liu Z, Wu H (2021) Albert over match-lstm network for intelligent questions classification in chinese. Agronomy 11(8):1530
Article CAS Google Scholar
Xia W, Zhu W, Liao B, Chen M, Cai L, Huang L (2018) Novel architecture for long short-term memory used in question classification. Neurocomputing 299:20–31
Article Google Scholar
Mohammed M, Omar N (2020) Question classification based on bloom’s taxonomy cognitive domain using modified tf-idf and word2vec. PloS one 15(3):0230442
Article Google Scholar
Fang A (2019) Short-text question classification based on dependency parsing and attention mechanism. In: 2019 international conference on machine learning and cybernetics (ICMLC). IEEE, pp 1–6
Tohti T, Abdurxit M, Hamdulla A (2022) Medical qa oriented multi-task learning model for question intent classification and named entity recognition. Information 13(12):581
Article Google Scholar
Liu B, Lane I (2016) Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv:1609.01454
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Mrini K, Dernoncourt F, Yoon S, Bui T, Chang W, Farcas E, Nakashole N (2021) A gradually soft multi-task and data-augmented approach to medical question understanding. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (vol 1: long papers), pp 1505–1515
Gao J, Galley M, Li L (2018) Neural approaches to conversational ai. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 1371–1374
Su Z, Kai Y, (2017) Neural approaches, Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding, 2017 IEEE Int Conf Acoust, Speech Signal Process (ICASSP), pp 5675–5679
Goo C-W, Gao G, Hsu Y-K, Huo C-L, Chen T-C, Hsu K-W, Chen Y-N (2018) Slot-gated modeling for joint slot filling and intent prediction. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 2 (short papers), pp 753–757
Niu P, Chen Z, Song M et al (2019) A novel bi-directional interrelated model for joint intent detection and slot filling. arXiv:1907.00390
Liu Z, Winata GI, Lin Z, Xu P, Fung P (2020) Attention-informed mixed-language training for zero-shot cross-lingual task-oriented dialogue systems. Proc AAAI Conf Artif Intell 34:8433–8440
Google Scholar
Altenbek G, Wang X, Haisha G (2014) Identification of basic phrases for kazakh language using maximum entropy model. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 1007–1014
Yuan YA (2019) Generalized unrestricted account of chinese wh-phrase in-situ and ex-situ. Foreign Lang Teach Res
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv:1907.11692

Download references

Funding

This work was supported by the National Natural Science Foundation of China (No. 62062062).

Author information

Authors and Affiliations

College of Information Science and Engineering, Xinjiang University, Urumqi, 830017, Xinjiang, China
Gulizada Haisa, Gulila Altenbek & Wen Li
The Base of Kazakh and Kirghiz Language of National Language Resource Monitoring and Research Center on Minority Languages, Xinjiang University, Urumqi, 830017, Xinjiang, China
Gulizada Haisa, Gulila Altenbek & Wen Li
Xinjiang Laboratory of Multi-language Information Technology, Xinjiang University, Urumqi, 830017, Xinjiang, China
Gulizada Haisa, Gulila Altenbek & Wen Li

Authors

Gulizada Haisa
View author publications
You can also search for this author in PubMed Google Scholar
Gulila Altenbek
View author publications
You can also search for this author in PubMed Google Scholar
Wen Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Gulizada Haisa or Gulila Altenbek.

Ethics declarations

Conflicts of interest

All authors declare that they have no competing interest.

Ethical standard

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Haisa, G., Altenbek, G. & Li, W. Phrase based code-switching for cross-lingual question understanding. Multimed Tools Appl 83, 32159–32175 (2024). https://doi.org/10.1007/s11042-023-16909-2

Download citation

Received: 28 June 2022
Revised: 25 July 2023
Accepted: 04 September 2023
Published: 20 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16909-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phrase based code-switching for cross-lingual question understanding

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-stage transfer learning with BERTology-based language models for question answering system in vietnamese

Multilingual Question Answering Applied to Conversational Agents

Enhancing Low-Resource Languages Question Answering with Syntactic Graph

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflicts of interest

Ethical standard

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Phrase based code-switching for cross-lingual question understanding

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-stage transfer learning with BERTology-based language models for question answering system in vietnamese

Multilingual Question Answering Applied to Conversational Agents

Enhancing Low-Resource Languages Question Answering with Syntactic Graph

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflicts of interest

Ethical standard

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation