Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

MizBERT: A Mizo BERT Model

Published: 26 June 2024 Publication History

Abstract

This research investigates the utilization of pre-trained BERT transformers within the context of the Mizo language. BERT, an abbreviation for Bidirectional Encoder Representations from Transformers, symbolizes Google’s forefront neural network approach to Natural Language Processing (NLP), renowned for its remarkable performance across various NLP tasks. However, its efficacy in handling low-resource languages such as Mizo remains largely unexplored. In this study, we introduce MizBERT, a specialized Mizo language model. Through extensive pre-training on a corpus collected from diverse online platforms, MizBERT has been tailored to accommodate the nuances of the Mizo language. Evaluation of MizBERT’s capabilities is conducted using two primary metrics: masked language modeling and perplexity, yielding scores of 76.12% and 3.2565, respectively. Additionally, its performance in a text classification task is examined. Results indicate that MizBERT outperforms both the Multilingual BERT model and the Support Vector Machine algorithm, achieving an accuracy of 98.92%. This underscores MizBERT’s proficiency in understanding and processing the intricacies inherent in the Mizo language.

References

[1]
Wissam Antoun, Fady Baly, and Hazem Hajj. 2020. AraBERT: Transformer-based model for arabic language understanding. arXiv preprint arXiv:2003.00104 (2020).
[2]
Andrew Bawitlung, Sandeep Kumar Dash, Robert Lalramhluna, and Alexander Gelbukh. 2024. An approach to Mizo language news classification using machine learning. In Data Science and Network Engineering, Suyel Namasudra, Munesh Chandra Trivedi, Ruben Gonzalez Crespo, and Pascal Lorenz (Eds.). Springer Nature Singapore, Singapore, 165–180.
[3]
Jereemi Bentham, Partha Pakray, Goutam Majumder, Sunday Lalbiaknia, and Alexander Gelbukh. 2016. Identification of rules for recognition of named entity classes in Mizo language. In Proceedings of the 2016 15th Mexican International Conference on Artificial Intelligence (MICAI’16). IEEE, 8–13.
[4]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. arxiv:2005.14165 [cs.CL] (2020).
[5]
Branden Chan, Stefan Schweter, and Timo Möller. 2020. German’s next language model. In Proceedings of the 28th International Conference on Computational Linguistics. 6788–6796.
[6]
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8440–8451.
[7]
Wietse De Vries, Andreas van Cranenburgh, Arianna Bisazza, Tommaso Caselli, Gertjan van Noord, and Malvina Nissim. 2019. BERTje: A Dutch BERT model. arXiv preprint arXiv:1912.09582 (2019).
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[9]
William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the 3rd International Workshop on Paraphrasing (IWP’05). https://aclanthology.org/I05-5002
[10]
Vanlalmuansangi Khenglawt, Sahinur Rahman Laskar, Santanu Pal, Partha Pakray, and Ajoy Kumar Khan. 2022. Language resource building and English-to-Mizo neural machine translation encountering tonal words. In Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference. 48–54. https://aclanthology.org/2022.wildre-1.9
[11]
Candy Lalrempuii, Badal Soni, and Partha Pakray. 2021. An improved English-to-Mizo neural machine translation. ACM Transactions on Asian Low-Resource Language Information Processing 20, 4 (May 2021), Article 61, 21 pages.
[12]
Mercy Lalthangmawii, Ranjita Das, and Robert Lalramhluna. 2023. Mizo news classification using machine learning techniques. In Evolution in Computational Intelligence: Proceedings of the 10th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA’22). Springer, 577–585.
[13]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[14]
Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de La Clergerie, Djamé Seddah, and Benoît Sagot. 2019. CamemBERT: A tasty French language model. arXiv preprint arXiv:1911.03894 (2019).
[15]
Morrel V. L. Nunsanga, Parthat Pakray, C. Lallawmsanga, and L. Lolit Kumar Singh. 2021. Part-of-speech tagging for Mizo language using conditional random field. In Thematic Issue on Artificial Intelligence for Industry 4.0. Vol. 25. Computación y Sistemas.
[16]
Amarnath Pathak, Partha Pakray, and Jereemi Bentham. 2019. English–Mizo machine translation using neural and statistical approaches. Neural Computing and Applications 31, 11 (2019), 7615–7631.
[17]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Preprint.
[18]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.
[19]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 140 (2020), 1–67. http://jmlr.org/papers/v21/20-074.html
[20]
Leonard Richardson. 2007. Beautiful Soup Documentation. Leonard Richardson.
[21]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1631–1642. https://aclanthology.org/D13-1170
[22]
Yuan Sun, Sisi Liu, Junjie Deng, and Xiaobing Zhao. 2022. TiBERT: Tibetan pre-trained language model. arxiv:2205.07303[cs.CL] (2022).
[23]
Zaitinkhuma Thihlum, Vanlalmuansangi Khenglawt, and Somen Debnath. 2020. Machine translation of English language to Mizo language. In Proceedings of the 2020 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM’20). 92–97.
[24]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arxiv:1706.03762[cs.CL] (2017).
[25]
Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. CoRR abs/1704.05426 (2017). http://arxiv.org/abs/1704.05426
[26]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
[27]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arxiv:1609.08144[cs.CL] (2016).
[28]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2020. XLNet: Generalized autoregressive pretraining for language understanding. arxiv:1906.08237[cs.CL] (2020).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 7
July 2024
254 pages
EISSN:2375-4702
DOI:10.1145/3613605
  • Editor:
  • Imed Zitouni
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2024
Online AM: 25 May 2024
Accepted: 21 May 2024
Revised: 13 May 2024
Received: 09 October 2023
Published in TALLIP Volume 23, Issue 7

Check for updates

Author Tags

  1. Mizo
  2. BERT
  3. pre-trained language model

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 158
    Total Downloads
  • Downloads (Last 12 months)158
  • Downloads (Last 6 weeks)17
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media