Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Natural Language Processing for Tulu: Challenges, Review and Future Scope

  • Conference paper
  • First Online:
Speech and Language Technologies for Low-Resource Languages (SPELLL 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2046))

  • 174 Accesses

Abstract

This paper provides a comprehensive analysis of publicly-available research done to date on Natural Language Processing (NLP) in Tulu while exploring its development, challenges, and future scope. Tulu is a low-resource Dravidian language with more than 2.5 million speakers. Work done in NLP for Tulu includes code-mixed corpus generation, optical character recognition of historical manuscripts, machine translation, sentiment analysis, speech recognition, and morphological analysis. However, due to data scarcity, morphological complexity, and code-mixing, challenges arise for NLP practitioners and more research and innovation are needed. Future work in NLP for Tulu involves expanding code-mixed corpora, improving machine translation and speech recognition, cross-lingual transfer learning, specialized named entity recognition, and interdisciplinary collaborations. Unlocking Tulu’s potential as a language with a rich cultural heritage requires addressing these challenges and embracing future opportunities to enhance linguistic diversity and accessibility of NLP technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brückner, H.: Oral Traditions in South India: Essays on Tulu Oral Epics. Harrassowitz Verlag, Wiesbaden (2017). OCLC: ocn995845113

    Google Scholar 

  2. Padmanabha Kekunnaya, K.: A comparative study of Tulu dialects. https://cir.nii.ac.jp/crid/1130282273061170560

  3. Männer, A.: Tulu-English dictionary. Basel Mission Press, Mangalore (1886). Google-Books-ID: FuAUAAAAYAAJ

    Google Scholar 

  4. Somashekar, S.: Developmental Trends in the Acquisition of Relative Clauses: Cross-linguistic Experimental Study of Tulu. Cornell University (1999)

    Google Scholar 

  5. Caldwell, R.: A Comparative Grammar of the Dravidian Or South-Indian Family of Languages. Trübner (1875). Google-Books-ID: rHUZAAAAIAAJ

    Google Scholar 

  6. Navare, N.: Conservation of Culture through Language. (2013)

    Google Scholar 

  7. Gruetzemacher, R.: The power of natural language processing. Harvard Bus. Rev. (2022). https://hbr.org/2022/04/the-power-of-natural-language-processing. ISSN 0017-8012

  8. Zhang, S., Frey, B., Bansal, M.: How can NLP help revitalize endangered languages? A case study and roadmap for the Cherokee language. In: Proceedings Of The 60th Annual Meeting Of The Association For Computational Linguistics (Volume 1: Long Papers), pp. 1529-1541 (2022). https://aclanthology.org/2022.acl-long.108

  9. Hegde, A., Anusha, M., Coelho, S., Shashirekha, H., Chakravarthi, B.: Corpus creation for sentiment analysis in code-mixed Tulu text. In: Proceedings Of The 1st Annual Meeting Of The ELRA/ISCA Special Interest Group On Under-Resourced Languages, pp. 33-40 (2022). https://aclanthology.org/2022.sigul-1.5

  10. Kannadaguli, P.: A code-diverse Tulu-English dataset for NLP based sentiment analysis applications. In: 2021 Advanced Communication Technologies And Signal Processing (ACTS), pp. 1-6 (2021)

    Google Scholar 

  11. Kamila, R.: The Hindu: Karnataka/Mangalore News : ‘Tulu is a highly developed language of the Dravidian family’ (2009)

    Google Scholar 

  12. Antony, P., Raj, H., Sahana, B., Alvares, D., Raj, A.: Morphological analyzer and generator for Tulu language: a novel approach. In: Proceedings Of The International Conference On Advances in Computing, Communications and Informatics, pp. 828-834 (2012)

    Google Scholar 

  13. Amoolya, G., Hans, A., Lakkavalli, V., Durai, S.: Automatic speech recognition for Tulu Language using GMM-HMM and DNN-HMM techniques. In: 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA), pp. 1-6 (2022)

    Google Scholar 

  14. Pan, X., Wang, M., Wu, L., Li, L.: Contrastive learning for many-to-many multilingual neural machine translation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 244–258 (2021)

    Google Scholar 

  15. Bhat, S., Seshikala, G.: Character recognition of Tulu script using convolutional neural network. In: Advances in Artificial Intelligence and Data Engineering, pp. 121-131 (2021)

    Google Scholar 

  16. Savitha, C., Antony, P.: Machine learning approaches for recognition of offline Tulu handwritten scripts. In: Journal Of Physics: Conference Series, vol. 1142, p. 012005 (2018). https://doi.org/10.1088/1742-6596/1142/1/012005

  17. BPEmb. https://bpemb.h-its.org/

  18. Wiki word vectors . fastText. https://fasttext.cc/index.html

  19. DravidianLangTech-2022. https://dravidianlangtech.github.io/2022/

  20. Goyal, V., Lehal, G.: Hindi morphological analyzer and generator. In: Emerging Trends in Engineering Technology, International Conference On, pp. 1156-1159 (2008)

    Google Scholar 

  21. Kessikbayeva, G., Cicekli, I.: A rule based morphological analyzer and a morphological disambiguator for Kazakh Language. Linguis. Lit. Stud. 4, 96–104 (2016)

    Google Scholar 

  22. Hetherington, L.: The MIT finite-state transducer toolkit for speech and language processing. In: Interspeech 2004, pp. 2609-2612 (2004)

    Google Scholar 

  23. Bhat, S., Kalaiah, M., Shastri, U.: Development and validation of Tulu sentence lists to test speech recognition threshold in noise. J. Indian Speech Lang. Hear. Assoc. 35, 50 (2021)

    Article  Google Scholar 

  24. Povey, D., et al.: The Kaldi Speech Recognition Toolkit

    Google Scholar 

  25. H R Kumar, S.: Tamil / Kannada G2P. (Bhashini AI Solutions Pvt Ltd,2023,1). https://github.com/bhashini-ai/g2p, original-date: 2017-11-15T01:48:43Z

  26. Thara, S., Poornachandran, P.: Code-mixing: a brief survey. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2382-2388 (2018)

    Google Scholar 

  27. Tay, M.: Code switching and code mixing as a communicative strategy in multilingual discourse. World Englishes 8, 407–417 (2007)

    Article  Google Scholar 

  28. Yannakakis, G., Martinez, H.: Grounding truth via ordinal annotation. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 574-580 (2015). http://ieeexplore.ieee.org/document/7344627/

  29. Das, B., Chakraborty, S.: An improved text sentiment classification model using TF-IDF and next word negation (2018). http://arxiv.org/abs/1806.06407, arXiv:1806.06407 [cs]

  30. Zhou, P., Qi, Z., Zheng, S., Xu, J.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling

    Google Scholar 

  31. Batra, H., Punn, N., Sonbhadra, S., Agarwal, S.: BERT-based sentiment analysis: a software engineering perspective (2021). http://arxiv.org/abs/2106.02581, arXiv:2106.02581 [cs]

  32. Kiela, D., Wang, C., Cho, K.: Dynamic meta-embeddings for improved sentence representations. In: Proceedings of The 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1466-1477 (2018). https://aclanthology.org/D18-1176

  33. Hegde, A., Shashirekha, H., Madasamy, A., Chakravarthi, B.: A study of machine translation models for Kannada-Tulu. In: Third Congress on Intelligent Systems, pp. 145-161 (2023)

    Google Scholar 

  34. Madasamy, A., et al.: Overview of the shared task on machine translation in Dravidian languages. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 271-278 (2022). https://aclanthology.org/2022.dravidianlangtech-1.41. Conference Name: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages Place: Dublin, Ireland Publisher: Association for Computational Linguistics

  35. Goyal, P., Supriya, M., Dinesh, U., Nayak, A.: Translation Techies@DravidianLangTech-ACL2022-machine translation in Dravidian languages. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages (2022)

    Google Scholar 

  36. Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation (2017). http://arxiv.org/abs/1701.02810, arXiv:1701.02810 [cs]

  37. Kakwani, D., et al.: IndicNLPSuite: monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4948-4961 (2020). https://www.aclweb.org/anthology/2020.findings-emnlp.445

  38. Amrutha Shenoy, M.A., Rao, P., Shenoy, V., Kudva, V., Nayak, V.: English to Tulu Translator. IRJET (2020)

    Google Scholar 

  39. Sreelekha, S.: Statistical vs rule based machine translation; a case study on Indian language perspective. (2017). http://arxiv.org/abs/1708.04559, arXiv:1708.04559 [cs]

  40. Antony, P., Savitha, C.: A framework for recognition of handwritten South Dravidian Tulu script. In: 2016 Conference on Advances in Signal Processing (CASP), pp. 7-12 (2016)

    Google Scholar 

  41. Antony, P., Savitha, C., Ujwal, U.: Efficient binarization technique for handwritten archive of south Dravidian Tulu script. In: Shetty, N., Patnaik, L., Prasad, N., Nalini, N. (eds. Emerging Research in Computing, Information, Communication and Applications. ERCICA 2016, pp. 651–666. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-4741-1_56

  42. Savitha, C.K., Ujwal, U.J., Smitha, M.L.: Detection of single and multi-character Tulu text blocks. In: 2021 IEEE International Conference on Mobile Networks and Wireless Communications (ICMNWC), pp. 1-6 (2021)

    Google Scholar 

  43. Antony, P., Savitha, C.: Segmentation and recognition of characters on Tulu palm leaf manuscripts. Int. J. Comput. Vis. Robot. 9, 438 (2019)

    Article  Google Scholar 

  44. Antony, P., Savitha, C., Ujwal, U.: Haar features based handwritten character recognition system for Tulu script. In: 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), pp. 65-68 (2016)

    Google Scholar 

  45. Manimozhi, I., Challa, M.: An efficient translation of Tulu to Kannada south Indian scripts using optical character recognition. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 952-957 (2021)

    Google Scholar 

  46. Shiva Kumar, H.R., Ramakrishnan, A.G.: Lipi Gnani - A Versatile OCR for Documents in any Language Printed in Kannada Script. (2019). http://arxiv.org/abs/1901.00413, arXiv:1901.00413 [cs]

  47. HR Kumar, S.: TuluDocuments. (MILE lab, IISc,2019,2), https://github.com/MILE-IISc/TuluDocuments, original-date: 2018-10-28T03:28:13Z

  48. Kesiman, M., Burie, J., Wibawantara, G., Sunarya, I., Ogier, J.: AMADI LontarSet: the first handwritten Balinese palm leaf manuscripts dataset. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 168-173 (2016). ISSN: 2167-6445

    Google Scholar 

  49. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 (2001). https://doi.org/10.1109/CVPR.2001.990517

  50. Gu, J., Hassan, H., Devlin, J., Li, V.: Universal neural machine translation for extremely low resource languages. (2018). http://arxiv.org/abs/1802.05368, arXiv:1802.05368 [cs]

  51. Xia, M., Kong, X., Anastasopoulos, A., Neubig, G.: Generalized data augmentation for low-resource translation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5786-5796 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Poorvi Shetty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shetty, P. (2024). Natural Language Processing for Tulu: Challenges, Review and Future Scope. In: Chakravarthi, B.R., et al. Speech and Language Technologies for Low-Resource Languages. SPELLL 2023. Communications in Computer and Information Science, vol 2046. Springer, Cham. https://doi.org/10.1007/978-3-031-58495-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-58495-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-58494-7

  • Online ISBN: 978-3-031-58495-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics