Natural Language Processing for Tulu: Challenges, Review and Future Scope

Shetty, Poorvi

doi:10.1007/978-3-031-58495-4_7

Poorvi Shetty¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2046))

Included in the following conference series:

International Conference on Speech and Language Technologies for Low-resource Languages

217 Accesses

Abstract

This paper provides a comprehensive analysis of publicly-available research done to date on Natural Language Processing (NLP) in Tulu while exploring its development, challenges, and future scope. Tulu is a low-resource Dravidian language with more than 2.5 million speakers. Work done in NLP for Tulu includes code-mixed corpus generation, optical character recognition of historical manuscripts, machine translation, sentiment analysis, speech recognition, and morphological analysis. However, due to data scarcity, morphological complexity, and code-mixing, challenges arise for NLP practitioners and more research and innovation are needed. Future work in NLP for Tulu involves expanding code-mixed corpora, improving machine translation and speech recognition, cross-lingual transfer learning, specialized named entity recognition, and interdisciplinary collaborations. Unlocking Tulu’s potential as a language with a rich cultural heritage requires addressing these challenges and embracing future opportunities to enhance linguistic diversity and accessibility of NLP technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analyzing a Decade of Evolution: Trends in Natural Language Processing

A Survey on LLMs: Evolution, Applications, and Future Frontiers

Challenges for and Perspectives on the Malagasy Language in the Digital Age

References

Brückner, H.: Oral Traditions in South India: Essays on Tulu Oral Epics. Harrassowitz Verlag, Wiesbaden (2017). OCLC: ocn995845113
Google Scholar
Padmanabha Kekunnaya, K.: A comparative study of Tulu dialects. https://cir.nii.ac.jp/crid/1130282273061170560
Männer, A.: Tulu-English dictionary. Basel Mission Press, Mangalore (1886). Google-Books-ID: FuAUAAAAYAAJ
Google Scholar
Somashekar, S.: Developmental Trends in the Acquisition of Relative Clauses: Cross-linguistic Experimental Study of Tulu. Cornell University (1999)
Google Scholar
Caldwell, R.: A Comparative Grammar of the Dravidian Or South-Indian Family of Languages. Trübner (1875). Google-Books-ID: rHUZAAAAIAAJ
Google Scholar
Navare, N.: Conservation of Culture through Language. (2013)
Google Scholar
Gruetzemacher, R.: The power of natural language processing. Harvard Bus. Rev. (2022). https://hbr.org/2022/04/the-power-of-natural-language-processing. ISSN 0017-8012
Zhang, S., Frey, B., Bansal, M.: How can NLP help revitalize endangered languages? A case study and roadmap for the Cherokee language. In: Proceedings Of The 60th Annual Meeting Of The Association For Computational Linguistics (Volume 1: Long Papers), pp. 1529-1541 (2022). https://aclanthology.org/2022.acl-long.108
Hegde, A., Anusha, M., Coelho, S., Shashirekha, H., Chakravarthi, B.: Corpus creation for sentiment analysis in code-mixed Tulu text. In: Proceedings Of The 1st Annual Meeting Of The ELRA/ISCA Special Interest Group On Under-Resourced Languages, pp. 33-40 (2022). https://aclanthology.org/2022.sigul-1.5
Kannadaguli, P.: A code-diverse Tulu-English dataset for NLP based sentiment analysis applications. In: 2021 Advanced Communication Technologies And Signal Processing (ACTS), pp. 1-6 (2021)
Google Scholar
Kamila, R.: The Hindu: Karnataka/Mangalore News : ‘Tulu is a highly developed language of the Dravidian family’ (2009)
Google Scholar
Antony, P., Raj, H., Sahana, B., Alvares, D., Raj, A.: Morphological analyzer and generator for Tulu language: a novel approach. In: Proceedings Of The International Conference On Advances in Computing, Communications and Informatics, pp. 828-834 (2012)
Google Scholar
Amoolya, G., Hans, A., Lakkavalli, V., Durai, S.: Automatic speech recognition for Tulu Language using GMM-HMM and DNN-HMM techniques. In: 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA), pp. 1-6 (2022)
Google Scholar
Pan, X., Wang, M., Wu, L., Li, L.: Contrastive learning for many-to-many multilingual neural machine translation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 244–258 (2021)
Google Scholar
Bhat, S., Seshikala, G.: Character recognition of Tulu script using convolutional neural network. In: Advances in Artificial Intelligence and Data Engineering, pp. 121-131 (2021)
Google Scholar
Savitha, C., Antony, P.: Machine learning approaches for recognition of offline Tulu handwritten scripts. In: Journal Of Physics: Conference Series, vol. 1142, p. 012005 (2018). https://doi.org/10.1088/1742-6596/1142/1/012005
BPEmb. https://bpemb.h-its.org/
Wiki word vectors . fastText. https://fasttext.cc/index.html
DravidianLangTech-2022. https://dravidianlangtech.github.io/2022/
Goyal, V., Lehal, G.: Hindi morphological analyzer and generator. In: Emerging Trends in Engineering Technology, International Conference On, pp. 1156-1159 (2008)
Google Scholar
Kessikbayeva, G., Cicekli, I.: A rule based morphological analyzer and a morphological disambiguator for Kazakh Language. Linguis. Lit. Stud. 4, 96–104 (2016)
Google Scholar
Hetherington, L.: The MIT finite-state transducer toolkit for speech and language processing. In: Interspeech 2004, pp. 2609-2612 (2004)
Google Scholar
Bhat, S., Kalaiah, M., Shastri, U.: Development and validation of Tulu sentence lists to test speech recognition threshold in noise. J. Indian Speech Lang. Hear. Assoc. 35, 50 (2021)
Article Google Scholar
Povey, D., et al.: The Kaldi Speech Recognition Toolkit
Google Scholar
H R Kumar, S.: Tamil / Kannada G2P. (Bhashini AI Solutions Pvt Ltd,2023,1). https://github.com/bhashini-ai/g2p, original-date: 2017-11-15T01:48:43Z
Thara, S., Poornachandran, P.: Code-mixing: a brief survey. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2382-2388 (2018)
Google Scholar
Tay, M.: Code switching and code mixing as a communicative strategy in multilingual discourse. World Englishes 8, 407–417 (2007)
Article Google Scholar
Yannakakis, G., Martinez, H.: Grounding truth via ordinal annotation. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 574-580 (2015). http://ieeexplore.ieee.org/document/7344627/
Das, B., Chakraborty, S.: An improved text sentiment classification model using TF-IDF and next word negation (2018). http://arxiv.org/abs/1806.06407, arXiv:1806.06407 [cs]
Zhou, P., Qi, Z., Zheng, S., Xu, J.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling
Google Scholar
Batra, H., Punn, N., Sonbhadra, S., Agarwal, S.: BERT-based sentiment analysis: a software engineering perspective (2021). http://arxiv.org/abs/2106.02581, arXiv:2106.02581 [cs]
Kiela, D., Wang, C., Cho, K.: Dynamic meta-embeddings for improved sentence representations. In: Proceedings of The 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1466-1477 (2018). https://aclanthology.org/D18-1176
Hegde, A., Shashirekha, H., Madasamy, A., Chakravarthi, B.: A study of machine translation models for Kannada-Tulu. In: Third Congress on Intelligent Systems, pp. 145-161 (2023)
Google Scholar
Madasamy, A., et al.: Overview of the shared task on machine translation in Dravidian languages. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 271-278 (2022). https://aclanthology.org/2022.dravidianlangtech-1.41. Conference Name: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages Place: Dublin, Ireland Publisher: Association for Computational Linguistics
Goyal, P., Supriya, M., Dinesh, U., Nayak, A.: Translation Techies@DravidianLangTech-ACL2022-machine translation in Dravidian languages. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages (2022)
Google Scholar
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation (2017). http://arxiv.org/abs/1701.02810, arXiv:1701.02810 [cs]
Kakwani, D., et al.: IndicNLPSuite: monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4948-4961 (2020). https://www.aclweb.org/anthology/2020.findings-emnlp.445
Amrutha Shenoy, M.A., Rao, P., Shenoy, V., Kudva, V., Nayak, V.: English to Tulu Translator. IRJET (2020)
Google Scholar
Sreelekha, S.: Statistical vs rule based machine translation; a case study on Indian language perspective. (2017). http://arxiv.org/abs/1708.04559, arXiv:1708.04559 [cs]
Antony, P., Savitha, C.: A framework for recognition of handwritten South Dravidian Tulu script. In: 2016 Conference on Advances in Signal Processing (CASP), pp. 7-12 (2016)
Google Scholar
Antony, P., Savitha, C., Ujwal, U.: Efficient binarization technique for handwritten archive of south Dravidian Tulu script. In: Shetty, N., Patnaik, L., Prasad, N., Nalini, N. (eds. Emerging Research in Computing, Information, Communication and Applications. ERCICA 2016, pp. 651–666. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-4741-1_56
Savitha, C.K., Ujwal, U.J., Smitha, M.L.: Detection of single and multi-character Tulu text blocks. In: 2021 IEEE International Conference on Mobile Networks and Wireless Communications (ICMNWC), pp. 1-6 (2021)
Google Scholar
Antony, P., Savitha, C.: Segmentation and recognition of characters on Tulu palm leaf manuscripts. Int. J. Comput. Vis. Robot. 9, 438 (2019)
Article Google Scholar
Antony, P., Savitha, C., Ujwal, U.: Haar features based handwritten character recognition system for Tulu script. In: 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), pp. 65-68 (2016)
Google Scholar
Manimozhi, I., Challa, M.: An efficient translation of Tulu to Kannada south Indian scripts using optical character recognition. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 952-957 (2021)
Google Scholar
Shiva Kumar, H.R., Ramakrishnan, A.G.: Lipi Gnani - A Versatile OCR for Documents in any Language Printed in Kannada Script. (2019). http://arxiv.org/abs/1901.00413, arXiv:1901.00413 [cs]
HR Kumar, S.: TuluDocuments. (MILE lab, IISc,2019,2), https://github.com/MILE-IISc/TuluDocuments, original-date: 2018-10-28T03:28:13Z
Kesiman, M., Burie, J., Wibawantara, G., Sunarya, I., Ogier, J.: AMADI LontarSet: the first handwritten Balinese palm leaf manuscripts dataset. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 168-173 (2016). ISSN: 2167-6445
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 (2001). https://doi.org/10.1109/CVPR.2001.990517
Gu, J., Hassan, H., Devlin, J., Li, V.: Universal neural machine translation for extremely low resource languages. (2018). http://arxiv.org/abs/1802.05368, arXiv:1802.05368 [cs]
Xia, M., Kong, X., Anastasopoulos, A., Neubig, G.: Generalized data augmentation for low-resource translation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5786-5796 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

JSS Science and Technology University, Mysuru, India
Poorvi Shetty

Authors

Poorvi Shetty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Poorvi Shetty .

Editor information

Editors and Affiliations

National University of Ireland, Galway, Ireland
Bharathi Raja Chakravarthi
Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Tamil Nadu, India
Bharathi B
University of Jaén, Jaén, Jaén, Spain
Miguel Ángel García Cumbreras
University of Jaén, Jaén, Jaén, Spain
Salud María Jiménez Zafra
Kongu Engineering College, Erode, Tamil Nadu, India
Malliga Subramanian
Kongu Engineering College, Erode, Tamil Nadu, India
Kogilavani Shanmugavadivel
Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, Abu Dhabi, United Arab Emirates
Preslav Nakov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shetty, P. (2024). Natural Language Processing for Tulu: Challenges, Review and Future Scope. In: Chakravarthi, B.R., et al. Speech and Language Technologies for Low-Resource Languages. SPELLL 2023. Communications in Computer and Information Science, vol 2046. Springer, Cham. https://doi.org/10.1007/978-3-031-58495-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-58495-4_7
Published: 24 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58494-7
Online ISBN: 978-3-031-58495-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Natural Language Processing for Tulu: Challenges, Review and Future Scope