Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Efficient Fine-Tuning for Low-Resource Tibetan Pre-trained Language Models

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2024 (ICANN 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15022))

Included in the following conference series:

  • 360 Accesses

Abstract

For low-resource languages like Tibetan, the availability of pre-trained language models (PLMs) is severely limited both in quantity and performance. Therefore, it is crucial to explore the optimization of these limited PLMs. In this paper, leveraging the downstream tasks of Tibetan news title classification task provided by the Tibetan News Classification Corpus (TNCC), we conducted optimization experiments on multiple Tibetan PLMs using prompt learning and LoRA (Low-Rank Adaptation) efficient fine-tuning techniques. We mainly conducted two types of experimental investigations: full-shot and few-shot for prompt learning. The full-shot experiment demonstrate that prompt learning improves the classification performance of PLMs, while LoRA achieves a significant reduction in the number of trainable parameters with very minor performance degradation for PLMs. Notably, from the few-shot experiments, we observe that prompt learning significantly enhances the classification performance of PLMs in low-resource scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/iflytek/cino

References

  1. Kenton, J., Devlin, M.-W.C., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT (2019)

    Google Scholar 

  2. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  3. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)

  4. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  5. Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  6. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)

    MathSciNet  Google Scholar 

  7. Radford, A., et al.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  8. Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  9. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  10. Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

  11. Schick, T., Schütze, H.: Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676 (2020)

  12. Schick, T., Schmid, H., Schütze, H.: Automatically identifying words that can serve as labels for few-shot text classification. arXiv preprint arXiv:2010.13641 (2020)

  13. Qun, N., Li, X., Qiu, X., Huang, X.: End-to-end neural text classification for Tibetan. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds.) CCL/NLP-NABD -2017. LNCS (LNAI), vol. 10565, pp. 472–480. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69005-6_39

    Chapter  Google Scholar 

  14. Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723 (2020)

  15. Shin, T., et al.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980 (2020)

  16. Ben-David, E., Oved, N., Reichart, R.: PADA: example-based prompt learning for on-the-fly adaptation to unseen domains. Trans. Assoc. Comput. Linguist. 10, 414–433 (2022)

    Article  Google Scholar 

  17. Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)

  18. Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021)

  19. Vu, T., et al.: SPoT: better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904 (2021)

  20. Hu, S., et al.: Knowledgeable prompt-tuning: incorporating knowledge into prompt verbalizer for text classification. arXiv preprint arXiv:2108.02035 (2021)

  21. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

    Google Scholar 

  22. Jin, F., et al.: Instance-aware prompt learning for language understanding and generation. In: ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, issue 7, pp. 1–18 (2023)

    Google Scholar 

  23. Wu, Z., et al.: IDPG: an instance-dependent prompt generation method. arXiv preprint arXiv:2204.04497 (2022)

  24. Chen, Y., et al.: Exploring lottery prompts for pre-trained language models. arXiv preprint arXiv:2305.19500 (2023)

  25. Li, C., et al.: Measuring the intrinsic dimension of objective landscapes. arXiv preprint arXiv:1804.08838 (2018)

  26. Aghajanyan, A., Gupta, S., Zettlemoyer, L.: Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2021)

    Google Scholar 

  27. Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: International Conference on Machine Learning. PMLR (2019)

    Google Scholar 

  28. Lin, Z., Madotto, A., Fung, P.: Exploring versatile generative language model via parameter-efficient transfer learning. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 441–459 (2020)

    Google Scholar 

  29. Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

  30. Dettmers, T., et al.: QLoRA: efficient finetuning of quantized LLMs. In: Advances in Neural Information Processing Systems, vol. 36 (2024)

    Google Scholar 

  31. Chen, Y., et al.: LongLoRA: efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307 (2023)

  32. Yang, Z., Xu, Z., Cui, Y., et al.: CINO: A Chinese minority pre-trained language model. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 3937–3949 (2022)

    Google Scholar 

  33. Liu, S., Deng, J., Sun, Y., et al.: TiBERT: Tibetan pre-trained language model. In: 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2956–2961. IEEE (2022)

    Google Scholar 

  34. Zhang, J., Kazhuo, D., Gadeng, L., et al.: Research and application of Tibetan pre-training language model based on BERT. In: Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics, pp. 519–524 (2022)

    Google Scholar 

  35. Ding, N., Hu, S., Zhao, W., et al.: OpenPrompt: an open-source framework for prompt-learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 105–113 (2022)

    Google Scholar 

  36. Hu, S., Ding, N., Zhao, W., et al.: OpenDelta: a plug-and-play library for parameter-efficient adaptation of pre-trained models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 274–281 (2023)

    Google Scholar 

  37. Liang, Y., Lv, H., Li, Y., et al.: Tibetan-BERT-wwm: a Tibetan pretrained model with whole word masking for text classification. In: IEEE Transactions on Computational Social Systems (2024)

    Google Scholar 

  38. Zhou, M., Daiqing, Z., Qun, N., et al.: Tibetan punctuation is all you need in pretrained language model. In: 2023 International Conference on Intelligent Management and Software Engineering (IMSE), pp. 161–168. IEEE (2023)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Science and Technology Major Project (No. 2022ZD0116100), the Key Project of Natural Science Foundation of Tibet Autonomous Region (No. XZ202401ZR0040) in 2024, and the Graduate “High-level Talents Cultivation Program” Project of Tibet University (No. 2022-GSP-S102).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuo Qun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, M., Daiqing, Z., Qun, N., Nyima, T. (2024). Efficient Fine-Tuning for Low-Resource Tibetan Pre-trained Language Models. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15022. Springer, Cham. https://doi.org/10.1007/978-3-031-72350-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72350-6_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72349-0

  • Online ISBN: 978-3-031-72350-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics