Abstract
For low-resource languages like Tibetan, the availability of pre-trained language models (PLMs) is severely limited both in quantity and performance. Therefore, it is crucial to explore the optimization of these limited PLMs. In this paper, leveraging the downstream tasks of Tibetan news title classification task provided by the Tibetan News Classification Corpus (TNCC), we conducted optimization experiments on multiple Tibetan PLMs using prompt learning and LoRA (Low-Rank Adaptation) efficient fine-tuning techniques. We mainly conducted two types of experimental investigations: full-shot and few-shot for prompt learning. The full-shot experiment demonstrate that prompt learning improves the classification performance of PLMs, while LoRA achieves a significant reduction in the number of trainable parameters with very minor performance degradation for PLMs. Notably, from the few-shot experiments, we observe that prompt learning significantly enhances the classification performance of PLMs in low-resource scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kenton, J., Devlin, M.-W.C., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT (2019)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Radford, A., et al.: Improving language understanding by generative pre-training (2018)
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Schick, T., Schütze, H.: Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676 (2020)
Schick, T., Schmid, H., Schütze, H.: Automatically identifying words that can serve as labels for few-shot text classification. arXiv preprint arXiv:2010.13641 (2020)
Qun, N., Li, X., Qiu, X., Huang, X.: End-to-end neural text classification for Tibetan. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds.) CCL/NLP-NABD -2017. LNCS (LNAI), vol. 10565, pp. 472–480. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69005-6_39
Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723 (2020)
Shin, T., et al.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980 (2020)
Ben-David, E., Oved, N., Reichart, R.: PADA: example-based prompt learning for on-the-fly adaptation to unseen domains. Trans. Assoc. Comput. Linguist. 10, 414–433 (2022)
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021)
Vu, T., et al.: SPoT: better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904 (2021)
Hu, S., et al.: Knowledgeable prompt-tuning: incorporating knowledge into prompt verbalizer for text classification. arXiv preprint arXiv:2108.02035 (2021)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Jin, F., et al.: Instance-aware prompt learning for language understanding and generation. In: ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, issue 7, pp. 1–18 (2023)
Wu, Z., et al.: IDPG: an instance-dependent prompt generation method. arXiv preprint arXiv:2204.04497 (2022)
Chen, Y., et al.: Exploring lottery prompts for pre-trained language models. arXiv preprint arXiv:2305.19500 (2023)
Li, C., et al.: Measuring the intrinsic dimension of objective landscapes. arXiv preprint arXiv:1804.08838 (2018)
Aghajanyan, A., Gupta, S., Zettlemoyer, L.: Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2021)
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: International Conference on Machine Learning. PMLR (2019)
Lin, Z., Madotto, A., Fung, P.: Exploring versatile generative language model via parameter-efficient transfer learning. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 441–459 (2020)
Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Dettmers, T., et al.: QLoRA: efficient finetuning of quantized LLMs. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Chen, Y., et al.: LongLoRA: efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307 (2023)
Yang, Z., Xu, Z., Cui, Y., et al.: CINO: A Chinese minority pre-trained language model. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 3937–3949 (2022)
Liu, S., Deng, J., Sun, Y., et al.: TiBERT: Tibetan pre-trained language model. In: 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2956–2961. IEEE (2022)
Zhang, J., Kazhuo, D., Gadeng, L., et al.: Research and application of Tibetan pre-training language model based on BERT. In: Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics, pp. 519–524 (2022)
Ding, N., Hu, S., Zhao, W., et al.: OpenPrompt: an open-source framework for prompt-learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 105–113 (2022)
Hu, S., Ding, N., Zhao, W., et al.: OpenDelta: a plug-and-play library for parameter-efficient adaptation of pre-trained models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 274–281 (2023)
Liang, Y., Lv, H., Li, Y., et al.: Tibetan-BERT-wwm: a Tibetan pretrained model with whole word masking for text classification. In: IEEE Transactions on Computational Social Systems (2024)
Zhou, M., Daiqing, Z., Qun, N., et al.: Tibetan punctuation is all you need in pretrained language model. In: 2023 International Conference on Intelligent Management and Software Engineering (IMSE), pp. 161–168. IEEE (2023)
Acknowledgements
This work was supported by the National Science and Technology Major Project (No. 2022ZD0116100), the Key Project of Natural Science Foundation of Tibet Autonomous Region (No. XZ202401ZR0040) in 2024, and the Graduate “High-level Talents Cultivation Program” Project of Tibet University (No. 2022-GSP-S102).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, M., Daiqing, Z., Qun, N., Nyima, T. (2024). Efficient Fine-Tuning for Low-Resource Tibetan Pre-trained Language Models. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15022. Springer, Cham. https://doi.org/10.1007/978-3-031-72350-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-72350-6_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72349-0
Online ISBN: 978-3-031-72350-6
eBook Packages: Computer ScienceComputer Science (R0)