Efficient Fine-Tuning for Low-Resource Tibetan Pre-trained Language Models

Zhou, Mingjun; Daiqing, Zhuoma; Qun, Nuo; Nyima, Tashi

doi:10.1007/978-3-031-72350-6_28

Mingjun Zhou ORCID: orcid.org/0009-0007-8351-4689^11,12,
Zhuoma Daiqing^11,12,
Nuo Qun^11,12,13 &
…
Tashi Nyima^11,12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15022))

Included in the following conference series:

International Conference on Artificial Neural Networks

360 Accesses

Abstract

For low-resource languages like Tibetan, the availability of pre-trained language models (PLMs) is severely limited both in quantity and performance. Therefore, it is crucial to explore the optimization of these limited PLMs. In this paper, leveraging the downstream tasks of Tibetan news title classification task provided by the Tibetan News Classification Corpus (TNCC), we conducted optimization experiments on multiple Tibetan PLMs using prompt learning and LoRA (Low-Rank Adaptation) efficient fine-tuning techniques. We mainly conducted two types of experimental investigations: full-shot and few-shot for prompt learning. The full-shot experiment demonstrate that prompt learning improves the classification performance of PLMs, while LoRA achieves a significant reduction in the number of trainable parameters with very minor performance degradation for PLMs. Notably, from the few-shot experiments, we observe that prompt learning significantly enhances the classification performance of PLMs in low-resource scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/iflytek/cino

References

Kenton, J., Devlin, M.-W.C., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT (2019)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
MathSciNet Google Scholar
Radford, A., et al.: Improving language understanding by generative pre-training (2018)
Google Scholar
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Schick, T., Schütze, H.: Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676 (2020)
Schick, T., Schmid, H., Schütze, H.: Automatically identifying words that can serve as labels for few-shot text classification. arXiv preprint arXiv:2010.13641 (2020)
Qun, N., Li, X., Qiu, X., Huang, X.: End-to-end neural text classification for Tibetan. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds.) CCL/NLP-NABD -2017. LNCS (LNAI), vol. 10565, pp. 472–480. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69005-6_39
Chapter Google Scholar
Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723 (2020)
Shin, T., et al.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980 (2020)
Ben-David, E., Oved, N., Reichart, R.: PADA: example-based prompt learning for on-the-fly adaptation to unseen domains. Trans. Assoc. Comput. Linguist. 10, 414–433 (2022)
Article Google Scholar
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021)
Vu, T., et al.: SPoT: better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904 (2021)
Hu, S., et al.: Knowledgeable prompt-tuning: incorporating knowledge into prompt verbalizer for text classification. arXiv preprint arXiv:2108.02035 (2021)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Jin, F., et al.: Instance-aware prompt learning for language understanding and generation. In: ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, issue 7, pp. 1–18 (2023)
Google Scholar
Wu, Z., et al.: IDPG: an instance-dependent prompt generation method. arXiv preprint arXiv:2204.04497 (2022)
Chen, Y., et al.: Exploring lottery prompts for pre-trained language models. arXiv preprint arXiv:2305.19500 (2023)
Li, C., et al.: Measuring the intrinsic dimension of objective landscapes. arXiv preprint arXiv:1804.08838 (2018)
Aghajanyan, A., Gupta, S., Zettlemoyer, L.: Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2021)
Google Scholar
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: International Conference on Machine Learning. PMLR (2019)
Google Scholar
Lin, Z., Madotto, A., Fung, P.: Exploring versatile generative language model via parameter-efficient transfer learning. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 441–459 (2020)
Google Scholar
Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Dettmers, T., et al.: QLoRA: efficient finetuning of quantized LLMs. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Google Scholar
Chen, Y., et al.: LongLoRA: efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307 (2023)
Yang, Z., Xu, Z., Cui, Y., et al.: CINO: A Chinese minority pre-trained language model. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 3937–3949 (2022)
Google Scholar
Liu, S., Deng, J., Sun, Y., et al.: TiBERT: Tibetan pre-trained language model. In: 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2956–2961. IEEE (2022)
Google Scholar
Zhang, J., Kazhuo, D., Gadeng, L., et al.: Research and application of Tibetan pre-training language model based on BERT. In: Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics, pp. 519–524 (2022)
Google Scholar
Ding, N., Hu, S., Zhao, W., et al.: OpenPrompt: an open-source framework for prompt-learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 105–113 (2022)
Google Scholar
Hu, S., Ding, N., Zhao, W., et al.: OpenDelta: a plug-and-play library for parameter-efficient adaptation of pre-trained models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 274–281 (2023)
Google Scholar
Liang, Y., Lv, H., Li, Y., et al.: Tibetan-BERT-wwm: a Tibetan pretrained model with whole word masking for text classification. In: IEEE Transactions on Computational Social Systems (2024)
Google Scholar
Zhou, M., Daiqing, Z., Qun, N., et al.: Tibetan punctuation is all you need in pretrained language model. In: 2023 International Conference on Intelligent Management and Software Engineering (IMSE), pp. 161–168. IEEE (2023)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Science and Technology Major Project (No. 2022ZD0116100), the Key Project of Natural Science Foundation of Tibet Autonomous Region (No. XZ202401ZR0040) in 2024, and the Graduate “High-level Talents Cultivation Program” Project of Tibet University (No. 2022-GSP-S102).

Author information

Authors and Affiliations

School of Information Science and Technology, Tibet University, Lhasa, Tibet, China
Mingjun Zhou, Zhuoma Daiqing, Nuo Qun & Tashi Nyima
Collaborative Innovation Center for Tibet Informatization by MOE and Tibet Autonomous Region, Lhasa, Tibet, China
Mingjun Zhou, Zhuoma Daiqing, Nuo Qun & Tashi Nyima
Engineering Research Center of Tibetan Information Technology, Ministry of Education, Tibet University, Lhasa, 850000, Tibet, China
Nuo Qun & Tashi Nyima

Authors

Mingjun Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoma Daiqing
View author publications
You can also search for this author in PubMed Google Scholar
Nuo Qun
View author publications
You can also search for this author in PubMed Google Scholar
Tashi Nyima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nuo Qun .

Editor information

Editors and Affiliations

IDSIA USI-SUPSI, Lugano, Switzerland
Michael Wand
Comenius University, Bratislava, Slovakia
Kristína Malinovská
KAUST Center of Generative AI, Thuwal, Saudi Arabia
Jürgen Schmidhuber
Helmholtz Zentrum München, Neuherberg, Germany
Igor V. Tetko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, M., Daiqing, Z., Qun, N., Nyima, T. (2024). Efficient Fine-Tuning for Low-Resource Tibetan Pre-trained Language Models. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15022. Springer, Cham. https://doi.org/10.1007/978-3-031-72350-6_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-72350-6_28
Published: 17 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72349-0
Online ISBN: 978-3-031-72350-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Fine-Tuning for Low-Resource Tibetan Pre-trained Language Models