Abstract
This paper provides a comprehensive analysis of publicly-available research done to date on Natural Language Processing (NLP) in Tulu while exploring its development, challenges, and future scope. Tulu is a low-resource Dravidian language with more than 2.5 million speakers. Work done in NLP for Tulu includes code-mixed corpus generation, optical character recognition of historical manuscripts, machine translation, sentiment analysis, speech recognition, and morphological analysis. However, due to data scarcity, morphological complexity, and code-mixing, challenges arise for NLP practitioners and more research and innovation are needed. Future work in NLP for Tulu involves expanding code-mixed corpora, improving machine translation and speech recognition, cross-lingual transfer learning, specialized named entity recognition, and interdisciplinary collaborations. Unlocking Tulu’s potential as a language with a rich cultural heritage requires addressing these challenges and embracing future opportunities to enhance linguistic diversity and accessibility of NLP technologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brückner, H.: Oral Traditions in South India: Essays on Tulu Oral Epics. Harrassowitz Verlag, Wiesbaden (2017). OCLC: ocn995845113
Padmanabha Kekunnaya, K.: A comparative study of Tulu dialects. https://cir.nii.ac.jp/crid/1130282273061170560
Männer, A.: Tulu-English dictionary. Basel Mission Press, Mangalore (1886). Google-Books-ID: FuAUAAAAYAAJ
Somashekar, S.: Developmental Trends in the Acquisition of Relative Clauses: Cross-linguistic Experimental Study of Tulu. Cornell University (1999)
Caldwell, R.: A Comparative Grammar of the Dravidian Or South-Indian Family of Languages. Trübner (1875). Google-Books-ID: rHUZAAAAIAAJ
Navare, N.: Conservation of Culture through Language. (2013)
Gruetzemacher, R.: The power of natural language processing. Harvard Bus. Rev. (2022). https://hbr.org/2022/04/the-power-of-natural-language-processing. ISSN 0017-8012
Zhang, S., Frey, B., Bansal, M.: How can NLP help revitalize endangered languages? A case study and roadmap for the Cherokee language. In: Proceedings Of The 60th Annual Meeting Of The Association For Computational Linguistics (Volume 1: Long Papers), pp. 1529-1541 (2022). https://aclanthology.org/2022.acl-long.108
Hegde, A., Anusha, M., Coelho, S., Shashirekha, H., Chakravarthi, B.: Corpus creation for sentiment analysis in code-mixed Tulu text. In: Proceedings Of The 1st Annual Meeting Of The ELRA/ISCA Special Interest Group On Under-Resourced Languages, pp. 33-40 (2022). https://aclanthology.org/2022.sigul-1.5
Kannadaguli, P.: A code-diverse Tulu-English dataset for NLP based sentiment analysis applications. In: 2021 Advanced Communication Technologies And Signal Processing (ACTS), pp. 1-6 (2021)
Kamila, R.: The Hindu: Karnataka/Mangalore News : ‘Tulu is a highly developed language of the Dravidian family’ (2009)
Antony, P., Raj, H., Sahana, B., Alvares, D., Raj, A.: Morphological analyzer and generator for Tulu language: a novel approach. In: Proceedings Of The International Conference On Advances in Computing, Communications and Informatics, pp. 828-834 (2012)
Amoolya, G., Hans, A., Lakkavalli, V., Durai, S.: Automatic speech recognition for Tulu Language using GMM-HMM and DNN-HMM techniques. In: 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA), pp. 1-6 (2022)
Pan, X., Wang, M., Wu, L., Li, L.: Contrastive learning for many-to-many multilingual neural machine translation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 244–258 (2021)
Bhat, S., Seshikala, G.: Character recognition of Tulu script using convolutional neural network. In: Advances in Artificial Intelligence and Data Engineering, pp. 121-131 (2021)
Savitha, C., Antony, P.: Machine learning approaches for recognition of offline Tulu handwritten scripts. In: Journal Of Physics: Conference Series, vol. 1142, p. 012005 (2018). https://doi.org/10.1088/1742-6596/1142/1/012005
BPEmb. https://bpemb.h-its.org/
Wiki word vectors . fastText. https://fasttext.cc/index.html
DravidianLangTech-2022. https://dravidianlangtech.github.io/2022/
Goyal, V., Lehal, G.: Hindi morphological analyzer and generator. In: Emerging Trends in Engineering Technology, International Conference On, pp. 1156-1159 (2008)
Kessikbayeva, G., Cicekli, I.: A rule based morphological analyzer and a morphological disambiguator for Kazakh Language. Linguis. Lit. Stud. 4, 96–104 (2016)
Hetherington, L.: The MIT finite-state transducer toolkit for speech and language processing. In: Interspeech 2004, pp. 2609-2612 (2004)
Bhat, S., Kalaiah, M., Shastri, U.: Development and validation of Tulu sentence lists to test speech recognition threshold in noise. J. Indian Speech Lang. Hear. Assoc. 35, 50 (2021)
Povey, D., et al.: The Kaldi Speech Recognition Toolkit
H R Kumar, S.: Tamil / Kannada G2P. (Bhashini AI Solutions Pvt Ltd,2023,1). https://github.com/bhashini-ai/g2p, original-date: 2017-11-15T01:48:43Z
Thara, S., Poornachandran, P.: Code-mixing: a brief survey. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2382-2388 (2018)
Tay, M.: Code switching and code mixing as a communicative strategy in multilingual discourse. World Englishes 8, 407–417 (2007)
Yannakakis, G., Martinez, H.: Grounding truth via ordinal annotation. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 574-580 (2015). http://ieeexplore.ieee.org/document/7344627/
Das, B., Chakraborty, S.: An improved text sentiment classification model using TF-IDF and next word negation (2018). http://arxiv.org/abs/1806.06407, arXiv:1806.06407 [cs]
Zhou, P., Qi, Z., Zheng, S., Xu, J.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling
Batra, H., Punn, N., Sonbhadra, S., Agarwal, S.: BERT-based sentiment analysis: a software engineering perspective (2021). http://arxiv.org/abs/2106.02581, arXiv:2106.02581 [cs]
Kiela, D., Wang, C., Cho, K.: Dynamic meta-embeddings for improved sentence representations. In: Proceedings of The 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1466-1477 (2018). https://aclanthology.org/D18-1176
Hegde, A., Shashirekha, H., Madasamy, A., Chakravarthi, B.: A study of machine translation models for Kannada-Tulu. In: Third Congress on Intelligent Systems, pp. 145-161 (2023)
Madasamy, A., et al.: Overview of the shared task on machine translation in Dravidian languages. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 271-278 (2022). https://aclanthology.org/2022.dravidianlangtech-1.41. Conference Name: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages Place: Dublin, Ireland Publisher: Association for Computational Linguistics
Goyal, P., Supriya, M., Dinesh, U., Nayak, A.: Translation Techies@DravidianLangTech-ACL2022-machine translation in Dravidian languages. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages (2022)
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation (2017). http://arxiv.org/abs/1701.02810, arXiv:1701.02810 [cs]
Kakwani, D., et al.: IndicNLPSuite: monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4948-4961 (2020). https://www.aclweb.org/anthology/2020.findings-emnlp.445
Amrutha Shenoy, M.A., Rao, P., Shenoy, V., Kudva, V., Nayak, V.: English to Tulu Translator. IRJET (2020)
Sreelekha, S.: Statistical vs rule based machine translation; a case study on Indian language perspective. (2017). http://arxiv.org/abs/1708.04559, arXiv:1708.04559 [cs]
Antony, P., Savitha, C.: A framework for recognition of handwritten South Dravidian Tulu script. In: 2016 Conference on Advances in Signal Processing (CASP), pp. 7-12 (2016)
Antony, P., Savitha, C., Ujwal, U.: Efficient binarization technique for handwritten archive of south Dravidian Tulu script. In: Shetty, N., Patnaik, L., Prasad, N., Nalini, N. (eds. Emerging Research in Computing, Information, Communication and Applications. ERCICA 2016, pp. 651–666. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-4741-1_56
Savitha, C.K., Ujwal, U.J., Smitha, M.L.: Detection of single and multi-character Tulu text blocks. In: 2021 IEEE International Conference on Mobile Networks and Wireless Communications (ICMNWC), pp. 1-6 (2021)
Antony, P., Savitha, C.: Segmentation and recognition of characters on Tulu palm leaf manuscripts. Int. J. Comput. Vis. Robot. 9, 438 (2019)
Antony, P., Savitha, C., Ujwal, U.: Haar features based handwritten character recognition system for Tulu script. In: 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), pp. 65-68 (2016)
Manimozhi, I., Challa, M.: An efficient translation of Tulu to Kannada south Indian scripts using optical character recognition. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 952-957 (2021)
Shiva Kumar, H.R., Ramakrishnan, A.G.: Lipi Gnani - A Versatile OCR for Documents in any Language Printed in Kannada Script. (2019). http://arxiv.org/abs/1901.00413, arXiv:1901.00413 [cs]
HR Kumar, S.: TuluDocuments. (MILE lab, IISc,2019,2), https://github.com/MILE-IISc/TuluDocuments, original-date: 2018-10-28T03:28:13Z
Kesiman, M., Burie, J., Wibawantara, G., Sunarya, I., Ogier, J.: AMADI LontarSet: the first handwritten Balinese palm leaf manuscripts dataset. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 168-173 (2016). ISSN: 2167-6445
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 (2001). https://doi.org/10.1109/CVPR.2001.990517
Gu, J., Hassan, H., Devlin, J., Li, V.: Universal neural machine translation for extremely low resource languages. (2018). http://arxiv.org/abs/1802.05368, arXiv:1802.05368 [cs]
Xia, M., Kong, X., Anastasopoulos, A., Neubig, G.: Generalized data augmentation for low-resource translation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5786-5796 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shetty, P. (2024). Natural Language Processing for Tulu: Challenges, Review and Future Scope. In: Chakravarthi, B.R., et al. Speech and Language Technologies for Low-Resource Languages. SPELLL 2023. Communications in Computer and Information Science, vol 2046. Springer, Cham. https://doi.org/10.1007/978-3-031-58495-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-58495-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58494-7
Online ISBN: 978-3-031-58495-4
eBook Packages: Computer ScienceComputer Science (R0)