Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Tamil NLP Technologies: Challenges, State of the Art, Trends and Future Scope

  • Conference paper
  • First Online:
Speech and Language Technologies for Low-Resource Languages (SPELLL 2022)

Abstract

This paper aims to summarize the NLP-based technological development of the Tamil language. Tamil is one of the Dravidian languages that are serious about technological development. This phenomenon is reflected in its activities in developing language technology tools and the resources made for technological development. Tamil has successfully developed tools or systems for speech synthesis and recognition, grammatical analysis of grammar, semantics and social media text, along with machine translation. There are many types of research undertaken to orient towards this achievement. Similarly, many activities are developing resources to facilitate technological development. The activities include preparing text corpora for text including monolingual, parallel and lexical along with speech with lexical resources and grammar. What is needed now is to stock-take the achievement made so far and found out where Tamil is in the arena of technological development and looks forward further to its fast technological development. Computational linguistics in Tamil NLP is gaining more attraction, and various data sets available for research is highlighted in this work for further exploration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.tamilvu.org/en/research-development.

  2. 2.

    https://tdil.meity.gov.in/Research_Effort.aspx.

  3. 3.

    https://github.com/nlpc-uom.

  4. 4.

    http://corpora.ciil.org/wordcorpora.htm.

  5. 5.

    http://www.emille.lancs.ac.uk/home.htm.

  6. 6.

    POS tagged corpus is available on request http://www.nlp.amrita.edu/nlpcorpus.html.

  7. 7.

    http://www.au-kbc.org/nlp/corpusrelease.html.

  8. 8.

    http://tdil-dc.in.

  9. 9.

    Parallel Corpus is available on request http://www.nlp.amrita.edu/nlpcorpus.html.

  10. 10.

    DPIL corpus is available on request http://www.nlp.amrita.edu/nlpcorpus.html.

References

  1. Abinaya, N., John, N., Ganesh, B.H., Kumar, A.M., Soman, K.: Amrita_cen@ fire-2014: Named entity recognition for indian languages using rich features. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 103–111 (2014)

    Google Scholar 

  2. Agalya, T.: Comparative analysis for offensive language identification of Tamil text using SVM and logistic classifier (2021)

    Google Scholar 

  3. Akilandeswari, A., Devi, S.L.: Conditional random fields based pronominal resolution in Tamil. Int. J. Comput. Sci. Eng. 5(6), 567 (2013)

    Google Scholar 

  4. Akilandeswari, A., Lalitha Devi, S.: Anaphora Resolution in Tamil Novels. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds.) MIKE 2014. LNCS (LNAI), vol. 8891, pp. 268–277. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13817-6_26

    Chapter  Google Scholar 

  5. Akilandeswari, A., Devi, S.L.: Tamil pronominal resolution boosted by sentence transformation. Aust. J. Basic Appl. Sci. 9(23), 566–572 (2015)

    Google Scholar 

  6. Anand Kumar, M., Dhanalakshmi, V., Rekha, R., Soman, K., Rajendran, S.: A novel data driven algorithm for Tamil morphological generator. Int. J. Comput. Appl. 975, 8887 (2010)

    Google Scholar 

  7. Anand Kumar, M., Dhanalakshmi, V., Soman, K., Rajendran, S.: A sequence labeling approach to morphological analyzer for Tamil language. (IJCSE) Int. J. Comput. Sci. Eng. 2(6), 1944–195 (2010)

    Google Scholar 

  8. Anand Kumar, M., Rajendran, S., Soman, K.: Tamil word sense disambiguation using support vector machines with rich features. Int. J. Appl. Eng. Res. 9(20), 7609–20 (2014)

    Google Scholar 

  9. Anand Kumar, M., Singh, S., Ramanan, P., Sinthiya, V., Soman, K., et al.: Creating paraphrase identification corpus for Indian languages: Opensource data set for paraphrase creation. In: Handbook of Research on Emerging Trends and Applications of Machine Learning, pp. 157–170. IGI Global (2020)

    Google Scholar 

  10. Anandan, P., Saravanan, K., Parthasarathi, R., Geetha, T.: Morphological analyzer for Tamil. In: International Conference on Natural language Processing. 3, 12–22 (2002)

    Google Scholar 

  11. Ananth Ramakrishnan, A., Devi, S.L.: An alternate approach towards meaningful lyric generation in Tamil. In: Proceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity, pp. 31–39 (2010)

    Google Scholar 

  12. Ananth Ramakrishnan, A., Kuppan, S., Devi, S.L.: Automatic generation of Tamil lyrics for melodies. In: Proceedings of the workshop on computational approaches to linguistic creativity, pp. 40–46 (2009)

    Google Scholar 

  13. Anbukkarasi, S., Varadhaganapathy, S.: Deep learning based Tamil parts of speech (POS) tagger. Technical Sciences, Bulletin of the Polish Academy of Sciences (2021)

    Google Scholar 

  14. Anbukkarasi, S., Varadhaganapathy, S.: Neural network-based error handler in natural language processing. Neural Comput. Appl., pp. 1–10 (2022)

    Google Scholar 

  15. Aparna, K.G., Ramakrishnan, A.G.: A Complete Tamil Optical Character Recognition System. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 53–57. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_6

    Chapter  Google Scholar 

  16. Arulmozhi, P., Sobha, L., Kumara Shanmugam, B.: Parts of speech tagger for Tamil. In: Symposium on Indian Morphology, Phonology Language Engineering, pp. 19–21 (2004)

    Google Scholar 

  17. Arulmozhi, S.: Aspects of inflectional morphophonology - a computational approach. Unpublished Ph.D. Thesis (1998)

    Google Scholar 

  18. Arunselvan, S., Anand Kumar, M., Soman, K.: Sentiment analysis of Tamil movie reviews via feature frequency count. Int. J. Appl. Eng. Res. 10(20), 17934–17939 (2015)

    Google Scholar 

  19. Bharathi, B., Agnusimmaculate, A.S.: SSNCSE_NLP@DravidianLangTech-EACL2021: Offensive language identification on multilingual code mixing text. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 313–318. Assoc. Comput. Linguist., Kyiv (2021), https://aclanthology.org/2021.dravidianlangtech-1.45

  20. Banu, M., Karthika, C., Sudarmani, P., Geetha, T.: Tamil document summarization using semantic graph method. In: International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007) 2, pp. 128–134 IEEE (2007)

    Google Scholar 

  21. Baskaran, S.: Semantic analyser for word sense disambiguation. Unpublished MS Thesis (2002)

    Google Scholar 

  22. Bharathi, B., Samyuktha, G.: Machine learning based approach for sentiment analysis on multilingual code mixing text. In: Working Notes of FIRE 2021-Forum for Information Retrieval Evaluation. CEUR (2021)

    Google Scholar 

  23. Bharathi, B., Varsha, J.: Ssncse nlp@ tamilnlp-acl2022: Transformer based approach for detection of abusive comment for Tamil language. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 158–164 (2022)

    Google Scholar 

  24. Chakravarthi, B.R.: Leveraging orthographic information to improve machine translation of under-resourced languages. Ph.D. thesis, NUI Galway (2020)

    Google Scholar 

  25. Chakravarthi, B.R., Arcan, M., McCrae, J.P.: Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages. In: 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs) 70, pp. 61–614. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2019). https://doi.org/10.4230/OASIcs.LDK.2019.6,http://drops.dagstuhl.de/opus/volltexte/2019/10370

  26. Chakravarthi, B.R., et al.: Overview of the HASOC-DravidianCodeMix Shared Task on Offensive Language Detection in Tamil and Malayalam. In: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation. CEUR (2021)

    Google Scholar 

  27. Chakravarthi, B.R., Muralidaran, V., Priyadharshini, R., McCrae, J.P.: Corpus creation for sentiment analysis in code-mixed Tamil-English text. CoRR abs/2006.00206 (2020). https://arxiv.org/abs/2006.00206

  28. Chakravarthi, B.R., Priyadharshini, R., Kumar M, A., Krishnamurthy, P., Sherly, E.: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. Assoc. Comput. Linguist., Kyiv (2021). https://aclanthology.org/2021.dravidianlangtech-1.0

  29. Chakravarthi, B.R., etal.: Findings of the sentiment analysis of dravidian languages in code-mixed text. CoRR abs/2111.09811 (2021), https://arxiv.org/abs/2111.09811

  30. Chakravarthi, B.R., Rani, P., Arcan, M., McCrae, J.P.: A survey of orthographic information in machine translation. arXiv e-prints pp. arXiv-2008 (2020)

    Google Scholar 

  31. Chandrakanth, D., Anand Kumar, M., Gunasekaran, S.: Part-of-speech tagging for Tamil language. Proc. Int. J. Commun. Eng. 6(6), 1 (2012)

    Google Scholar 

  32. Chellamuthu, K.: Russian to Tamil machine translation system at Tamil university. In: Proceedings of Tamil Internet 2002 Conference. http://infitt.org/ti2002/papers/16CHELLA. pdf) (2002)

  33. Chinnuswamy, P., Krishnamoorthy, S.G.: Recognition of handprinted Tamil characters. Pattern Recogn. 12(3), 141–152 (1980)

    Article  Google Scholar 

  34. Cruz, W.: Parsing and generation of Tamil verbs in GSMORPH. Unpublished M.Phil. Dissertation (2002)

    Google Scholar 

  35. Darbari, H., et al.: Enabling linguistic idiosyncrasy in anuvadaksh. Vishwabharat, July-Dec (2013)

    Google Scholar 

  36. Deepa, R.A., Rao, R.R.: A novel nearest interest point classifier for offline Tamil handwritten character recognition. Pattern Anal. Appl. 23(1), 199–212 (2020)

    Article  Google Scholar 

  37. Deivasundaram, N., Gopal, A.: Computational morphology of Tamil. Word Structure in Dravidian, Kuppam: Dravidian University, pp. 406–410 (2003)

    Google Scholar 

  38. Devi, G.R., Kumar, M.A., Soman, K.: Extraction of named entities from social media text in Tamil language using n-gram embedding for disaster management. In: Studies in Computational Intelligence, pp. 207–223 (2020)

    Google Scholar 

  39. Devi, S.L., Pralayankar, P., Menaka, S., Bakiyavathi, T., Ram, R.V.S., Kavitha, V.: Verb transfer in a Tamil to Hindi machine translation system. In: 2010 International Conference on Asian Language Processing, pp. 261–264. IEEE (2010)

    Google Scholar 

  40. Devi, S.L., Ram, V.S., Rao, P.R.: Anaphora resolution system for Indian languages. In: Proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation (WILDRE). LREC2014, Reykjavik, Iceland (2014)

    Google Scholar 

  41. Devi, S.L., Ram, V.S., Rao, P.R.: A generic anaphora resolution engine for Indian languages. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 1824–1833 (2014)

    Google Scholar 

  42. Dhanalakshmi, V., Kumar, A.M., Rajendran, S., Soman, K.: POS tagger and chunker for Tamil language. In: Proceedings of the 8th Tamil Internet Conference. Cologne, Germany (2009)

    Google Scholar 

  43. Dhanalakshmi, V., Kumar, A.M., Soman, K., Rajendran, S.: Chunker for Tamil using machine learning. In: 7th International Conference on Natural Language Processing 2009 (ICON 2009), IIIT Hyderabad, India (2009)

    Google Scholar 

  44. Dhanalakshmi, V., Padmavathy, P., Soman, K., Rajendran, S.: Chunker for Tamil. In: 2009 International Conference on Advances in Recent Technologies in Communication and Computing, pp. 436–438. IEEE (2009)

    Google Scholar 

  45. Dhanalakshmi V, Anand Kumar M, Murugesan, C.: Dependency parser for Tamil classical literature: kurunthokai. In: Proceedings of Tamil Internet Conference, pp. 147–152 (2012)

    Google Scholar 

  46. Dhivya, R., Dhanalakshmi, V., Anand Kumar, M., Soman, K.P.: Clause Boundary Identification for Tamil Language Using Dependency Parsing. In: Das, V.V., Ariwa, E., Rahayu, S.B. (eds.) SPIT 2011. LNICST, vol. 62, pp. 195–197. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32573-1_32

    Chapter  Google Scholar 

  47. Dhivyaa, C., Nithya, K., Janani, T., Kumar, K.S., Prashanth, N.: Transliteration based generative pre-trained transformer 2 model for Tamil text summarization. In: 2022 International Conference on Computer Communication and Informatics (ICCCI), p. 1–6. IEEE (2022)

    Google Scholar 

  48. Evangeline, M.M., Shyamala, K., Barathi, L., Sandhya, R.: Frequency Based Feature Extraction Technique for Text Documents in Tamil Language. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser, J., Ören, T., Sonawane, V.R. (eds.) ICACDS 2021. CCIS, vol. 1441, pp. 76–84. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88244-0_8

    Chapter  Google Scholar 

  49. Ezhilarasi, S., Maheswari, P.U.: Depicting a neural model for lemmatization and POS tagging of words from PALAEO graphic stone inscriptions. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1879–1884. IEEE (2021)

    Google Scholar 

  50. Fernando, A., Ranathunga, S., Dias, G.: Data augmentation and terminology integration for domain-specific Sinhala-English-Tamil statistical machine translation. (2020) arXiv preprint arXiv:2011.02821

  51. Ganesan, M.: Functions of the morphological analyser developed at CIIL, Mysore. In: Automatic Automatic Translation (seminar proceedings), Thiruvananthapuram: ISDL (1994)

    Google Scholar 

  52. Ganesan, M.: Computational morphology of Tamil. Word Structure in Dravidian, Kuppam: Dravidian University, pp. 399–405 (2003)

    Google Scholar 

  53. Ganesan, M., Ekka, F.: Morphological analyzer for Indian languages. Information Technology Applications in Language, Script and Speech, New Delhi: BPB Publication (1994)

    Google Scholar 

  54. Ganesh, J., Parthasarathi, R., Geetha, T.V., Balaji, J.: Pattern Based Bootstrapping Technique for Tamil POS Tagging. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds.) MIKE 2014. LNCS (LNAI), vol. 8891, pp. 256–267. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13817-6_25

    Chapter  Google Scholar 

  55. Ganganwar, V., Rajalakshmi, R.: MTDOT: A multilingual translation-based data augmentation technique for offensive content identification in Tamil text data. Electronics 11(21), 3574 (2022)

    Article  Google Scholar 

  56. HandWiki: Tamil_all_character_encoding (2020)

    Google Scholar 

  57. Hariharan, V., Kumar, M.A., Soman, K.: Named entity recognition in Tamil language using recurrent based sequence model. In: Lecture Notes in Networks and Systems, 74 (2019)

    Google Scholar 

  58. Jain, M., Punia, R., Hooda, I.: Neural machine translation for Tamil to English. J. Stat. Manage. Syst. 23(7), 1251–1264 (2020)

    Google Scholar 

  59. Kalamani, M., Krishnamoorthi, M., Valarmathi, R.: Continuous Tamil speech recognition technique under non stationary noisy environments. Int. J. Speech Technol. 22(1), 47–58 (2019)

    Article  Google Scholar 

  60. Kamakshi, S., Rajendren, S.: Preliminaries to the preparation of a machine aid to translate linguistics texts written in English into Tamil. Language in India 3 (2004)

    Google Scholar 

  61. Kannan, R.R., Rajalakshmi, R., Kumar, L.: Indic-BERT based approach for sentiment analysis on code-mixed Tamil tweets (2021)

    Google Scholar 

  62. Kausikaa, N., Uma, V.: Sentiment analysis of English and Tamil tweets using path length similarity based word sense disambiguation. Int. Organ. Sci. Res. J. 1, 82–89 (2016)

    Google Scholar 

  63. Kavirajan, B., Kumar, M.A., Soman, K., Rajendran, S., Vaithehi, S.: Improving the rule based machine translation system using sentence simplification (English to Tamil). In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 957–963. IEEE (2017)

    Google Scholar 

  64. Kohilavani, S., Mala, T., Geetha, T.: Automatic Tamil content generation. In: 2009 International Conference on Intelligent Agent Multi-Agent Systems, p. 1–6. IEEE (2009)

    Google Scholar 

  65. Krishnamurthy, P.: Development of Telugu-Tamil transfer-based machine translation system: an improvisation using divergence index. J. Intell. Syst. 28(3), 493–504 (2019)

    Google Scholar 

  66. Krishnamurthy, P., Sarveswaran, K.: Towards building a modern written tamil treebank. In: Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021), pp. 61–68 (2021)

    Google Scholar 

  67. Krishnan, A.S., Ragavan, S.: Morphology-aware meta-embeddings for Tamil. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 94–111 (2021)

    Google Scholar 

  68. Krishnan, K.G., Pooja, A., Kumar, M.A., Soman, K.: Character based bidirectional LSTM for disambiguating Tamil part-of-speech categories. Int. J. Control Theory Appl 10, 229–235 (2017)

    Google Scholar 

  69. kumar, A.M., Soman, K.: Amrita_cen@ fire-2014: morpheme extraction and lemmatization for Tamil using machine learning. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 112–120 (2014)

    Google Scholar 

  70. Kumar, M.A., Dhanalakshmi, V., Soman, K., Rajendran, S.: Factored statistical machine translation system for English to Tamil language. Pertanika J. Soc. Sci. Humanit. 22(4) (2014)

    Google Scholar 

  71. Kumar, M.A., Premjith, B., Singh, S., Rajendran, S., Soman, K.P.: An overview of the shared task on machine translation in Indian languages (MTIL)–2017. Journal of Intelligent Systems 28(3), 455–464 (2019). https://doi.org/10.1515/jisys-2018-0024https://doi.org/10.1515/jisys-2018-0024

  72. Kumar, M.A., Premjith, B., Singh, S., Rajendran, S., Soman, K.: An overview of the shared task on machine translation in Indian languages (MTIL)-2017. J. Intell. Syst. 28(3), 455–464 (2019)

    Google Scholar 

  73. Kumar, M.A., Rajendran, S., Soman, K.: Cross-lingual preposition disambiguation for machine translation. Procedia Comput. Sci. 54, 291–300 (2015)

    Article  Google Scholar 

  74. Kumar, M.A., Rajendran, S., Soman, K.: Cross-lingual preposition disambiguation for machine translation. Procedia Comput. Sci. 54, 291–300 (2015)

    Article  Google Scholar 

  75. Anand Kumar, M., Singh, S., Kavirajan, B., Soman, K.P.: Shared Task on Detecting Paraphrases in Indian Languages (DPIL): An Overview. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds.) FIRE 2016. LNCS, vol. 10478, pp. 128–140. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73606-8_10

    Chapter  Google Scholar 

  76. LekshmiAmmal, H., Ravikiran, M., et al.: Nitk-it_nlp@ tamilnlp-acl2022: Transformer based model for toxic span identification in Tamil. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 75–78 (2022)

    Google Scholar 

  77. Lokesh, S., Kumar, P.M., Devi, M.R., Parthasarathy, P., Gokulnath, C.: An automatic Tamil speech recognition system by using bidirectional recurrent neural network with self-organizing map. Neural Comput. Appl. 31(5), 1521–1531 (2019)

    Article  Google Scholar 

  78. Lushanthan, S., Weerasinghe, A., Herath, D.: Morphological analyzer and generator for tamil language. In: 2014 14th International Conference on Advances in ICT for Emerging Regions (ICTER), pp. 190–196. IEEE (2014)

    Google Scholar 

  79. Anandkumar, M.: Morphology based prototype statistical machine translation system for English to Tamil language. Unpublished PhD Thesis (2013)

    Google Scholar 

  80. Malarkodi, C., Lex, E., Devi, S.L.: Named entity recognition for the agricultural domain. Res. Comput. Sci. 117, 121–132 (2016)

    Article  Google Scholar 

  81. Malarkodi, C., Sobha, L.: Twitter named entity recognition for Indian languages. In: Proceedings of 18th International Conference on Computational Linguistics and Intelligent Text Processing (2018)

    Google Scholar 

  82. Manone, V., Soman, K., Rajendran, S.: A synchronous syntax for English-Tamil language pair for machine translation. In: 4th International Symposium on Natural Language Processing (NLP’15), Kochi, Kerala, Co-affiliated with 4th International Conference in Computing, Communications and Informatics (ICACCI-2015) (2015)

    Google Scholar 

  83. Marimuthu, K., Amudha, K., Bakiyavathi, T., Devi, S.L.: Word boundary identifier as a catalyzer and performance booster for Tamil morphological analyzer. In: Proceedings of 6th Language and Technology Conference, Human Language Technologies as a challenge for Computer Science and Linguistics, Poznan, Poland. (2013)

    Google Scholar 

  84. Menaka, S., Malarkodi, C., Devi, S.L.: A deep study on causal relations and its automatic identification in tamil. In: Proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation. LREC2014, Reykjavik, Iceland (2014)

    Google Scholar 

  85. Menaka, S., Ram, V.S., Devi, S.L.: Morphological generator for Tamil. Proceedings of the Knowledge Sharing event on Morphological Analysers and Generators, LDC-IL, Mysore, India, pp. 82–96 (2010)

    Google Scholar 

  86. Menon, D.A., Saravanan, S., Loganathan, R., Soman, D.K.: Amrita morph analyzer and generator for Tamil: a rule based approach. In: Proceedings of Tamil Internet Conference, pp. 239–243 (2009)

    Google Scholar 

  87. Mokanarangan, T., et al.: Tamil Morphological Analyzer Using Support Vector Machines. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 15–23. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41754-7_2

    Chapter  Google Scholar 

  88. Mrinalini, K., Nagarajan, T., Vijayalakshmi, P.: Pause-based phrase extraction and effective OOV handling for low-resource machine translation systems. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18(2), 1–22 (2018)

    Google Scholar 

  89. Padmamala, R., Prema, V.: Sentiment analysis of online Tamil contents using recursive neural network models approach for Tamil language. In: 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), pp. 28–31. IEEE (2017)

    Google Scholar 

  90. Pandian, S.L., Geetha, T. V.: CRF Models for Tamil Part of Speech Tagging and Chunking. In: Li, W., Mollá-Aliod, D. (eds.) ICCPOL 2009. LNCS (LNAI), vol. 5459, pp. 11–22. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00831-3_2

    Chapter  Google Scholar 

  91. Pattabhi, R., Rao, T., Ram, R.V.S., Vijayakrishna, R., Sobha, L.: A text chunker and hybrid pos tagger for indian languages. In: Proceedings of International Joint Conference on Artificial Intelligence Workshop on Shallow Parsing for South Asian Languages, IIIT Hyderabad, Hyderabad, India (2007)

    Google Scholar 

  92. Pattabhi, R., Sobha, L.: Identifying similar and co-referring documents across languages. In: Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies, pp. 10–17 (2008)

    Google Scholar 

  93. Pilar, B., et al.: Subword dictionary learning and segmentation techniques for automatic speech recognition in Tamil and Kannada. (2022) arXiv preprint arXiv:2207.13331

  94. Premjith, B., Soman, K.: Deep learning approach for the morphological synthesis in Malayalam and Tamil at the character level. Trans. Asian Low-Resource Lang. Inf. Proc. 20(6), 1–17 (2021)

    Article  Google Scholar 

  95. Priyadharshini, R., et al.: Overview of abusive comment detection in Tamil-ACL 2022. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 292–298 (2022)

    Google Scholar 

  96. Raj, M.A.R., Abirami, S.: Junction point elimination based Tamil handwritten character recognition: An experimental analysis. J. Syst. Sci. Syst. Eng. 29(1), 100–123 (2020)

    Article  Google Scholar 

  97. Raj, M.A.R., Abirami, S.: Structural representation-based off-line Tamil handwritten character recognition. Soft. Comput. 24(2), 1447–1472 (2020)

    Article  Google Scholar 

  98. Rajalakshmi, R., Duraphe, A., Shibani, A.: Dlrg@ dravidianlangtech-acl2022: Abusive comment detection in Tamil using multilingual transformer models. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 207–213 (2022)

    Google Scholar 

  99. Rajalakshmi, R., Reddy, Y., Kumar, L.: Dlrg@ dravidianlangtech-eacl2021: Transformer based approachfor offensive language identification on code-mixed Tamil. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 357–362 (2021)

    Google Scholar 

  100. Rajalakshmi, R., Selvaraj, S., Vasudevan, P., et al.: Hottest: Hate and offensive content identification in Tamil using transformers and enhanced stemming. Computer Speech Language, p. 101464 (2022)

    Google Scholar 

  101. Rajasekar, M., Geetha, A.: Comparison of Machine Learning Methods for Tamil Morphological Analyzer. In: Raj, J.S., Palanisamy, R., Perikos, I., Shi, Y. (eds.) Intelligent Sustainable Systems. LNNS, vol. 213, pp. 385–399. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2422-3_31

    Chapter  Google Scholar 

  102. Rajendran, S.: Spell and grammar checker for tamil. In: Paper read in 27th All India Conference of Dravidian Linguists held in ISDL, Thiruvananthapuram. 17 (1999)

    Google Scholar 

  103. Rajendran, S.: Preliminaries to the preparation of a word net for Tamil. Lang. India 2(1), 467–497 (2002)

    Google Scholar 

  104. Rajendran, S.: Parsing in Tamil: Present state of art. Lang. India 6, 8 (2006)

    Google Scholar 

  105. Rajendran, S.: Complexity of Tamil in POS tagging. Lang. India 7(1) (2007)

    Google Scholar 

  106. Rajendran, S.: Resolution of lexical ambiguity in Tamil. Lang. India 14(1) (2014)

    Google Scholar 

  107. Rajendran, S., Kumar, M.A.: Computing tools for Tamil language teaching and learning. In: 17th Tamil Internet Conference. Tamil Agricultural University, Coimbatore (2018)

    Google Scholar 

  108. Rajendran, S., Viswanathan, S., Kumar, R.: Computational morphology of Tamil verbal complex. Lang. India 3(4) (2003)

    Google Scholar 

  109. Rajkumar, N., Subashini, T., Rajan, K., Ramalingam, V.: An efficient feature extraction with bidirectional long short term memory based deep learning model for Tamil document classification. J. Comput. Theor. Nanosci. 18(3), 568–585 (2021)

    Google Scholar 

  110. Ram, R.V.S., Lalitha Devi, S.: Clause Boundary Identification Using Conditional Random Fields. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 140–150. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78135-6_13

    Chapter  Google Scholar 

  111. Ram, R.V.S., Devi, S.L.: Coreference resolution using tree-CRF. A. Gelbukh (ed), Comput. Linguist. Intell. Text Proc. 7181, 285–296 (2012)

    Google Scholar 

  112. Ram, R.V.S., Devi, S.L.: Pronominal resolution in Tamil using tree CRFS. In: 2013 International Conference on Asian Language Processing, pp. 197–200. IEEE (2013)

    Google Scholar 

  113. Ram, R.V.S., Devi, S.L.: Two layer machine learning approach for mining referential entities for a morphologically rich language. Asian J. Inf. Technol. 15, 2831–2838 (2016)

    Google Scholar 

  114. Ram, R.V.S., Sobha, L.D.: Tamil clause boundary identification: Annotation and evaluation. In: Workshop on Indian Language and Data: Resources and Evaluation. p. 122. LREC, Istanbul (2012)

    Google Scholar 

  115. Ram, R., Devi, S.L.: Noun phrase chunker using finite state automata for an agglutinative language. In: Proceedings of the Tamil Internet-2010 at Coimbatore, India, pp. 23–27 (2010)

    Google Scholar 

  116. Ram, V.S., Menaka, S., Devi, S.L.: Tamil morphological analyser. In: Proceedings of the Knowledge Sharing event on Morphological Analysers and Generators, Mona Parakh, LDC-IL, Mysore, India, pp. 1–18 (2010)

    Google Scholar 

  117. Ramakrishnan, A., Kaushik, L.N., Narayana, L.: Natural language processing for Tamil TTS. In: Proc. 3rd Language and Technology Conference, Poznan, Poland, pp. 192–196 (2007)

    Google Scholar 

  118. Ramanathan, V., Meyyappan, T., Thamarai, S.: Predicting Tamil movies sentimental reviews using Tamil tweets. J. Comput. Sci. 15(11), 1638–1647 (2019)

    Article  Google Scholar 

  119. Ramanathan, V., Meyyappan, T., Thamarai, S.: Sentiment analysis: an approach for analysing tamil movie reviews using Tamil tweets. Recent Adv. Mathe. Res. Comput. Sci. 3, 28–39 (2021)

    Article  Google Scholar 

  120. Ramasamy, L., Bojar, O., Žabokrtskỳ, Z.: Morphological processing for English-Tamil statistical machine translation. In: Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages, pp. 113–122 (2012)

    Google Scholar 

  121. Ramasamy, L., Bojar, O., Žabokrtskỳ, Z.: ENTAM: An English-Tamil parallel corpus (ENTAM v2. 0) (2014)

    Google Scholar 

  122. Ramaswamy, V.: A morphological generator for Tamil. Unpublished Ph.D. Dissertation (2000)

    Google Scholar 

  123. Ramaswamy, V.: A morphological analyzer for Tamil. Unpublished Ph.D. Dissertation (2003)

    Google Scholar 

  124. Ranganathan, V.: A lexical phonological approach to Tamil word by computer. Int. J. Dravidian Linguist. 26(1), 57–70 (1997)

    Google Scholar 

  125. Ranganathan, V.: Computational Approaches To Tamil Linguistics, chap. 3. CRE-A Publications (2016)

    Google Scholar 

  126. Ravikiran, M., Annamalai, S.: DOSA: dravidian code-mixed offensive span identification dataset. In: Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages, pp. 10–17. Assoc. Comput. Linguist., Kyiv (2021). https://aclanthology.org/2021.dravidianlangtech-1.2

  127. Ravikiran, M., et al.: Findings of the shared task on toxic span identification in Tamil. In: Proceedings of the 2nd Workshop on Speech and Language Technologies for Dravidian Languages. Assoc. Comput. Linguist. (2022)

    Google Scholar 

  128. Remmiya Devi, G., Anand Kumar, M., Soman, K.: Co-occurrence based word representation for extracting named entities in Tamil tweets. J. Intell. Fuzzy Syst. 34(3), 1435–1442 (2018)

    Article  Google Scholar 

  129. Rethanya. V, Dhanalakshmi, V., Soman, M., Rajendran, S.: Morphological stemmer and LEMMATIZER for Tamil. In: Proceedings of 18th Tamil Internet Conference. International Forum for Information Technology in Tamil (INFITT) (2019)

    Google Scholar 

  130. RK Rao, P., Devi, S.L.: Patent document summarization using conceptual graphs. Int. J. Nat. Lang. Comput. (IJNLC) 6 (2017)

    Google Scholar 

  131. Sakuntharaj, R., Mahesan, S.: Missing word detection and correction based on context of tamil sentences using n-grams. In: 2021 10th International Conference on Information and Automation for Sustainability (ICIAfS), pp. 42–47. IEEE (2021)

    Google Scholar 

  132. Samuel Manoharan, J.: A novel text-to-speech synthesis system using syllable-based hmm for Tamil language. In: Shakya, S., Du, K.L., Haoxiang, W. (eds.) Proceedings of Second International Conference on Sustainable Expert Systems, pp. 305–314. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-7657-4_26

  133. Sankaralingam, C., Rajendran, S., Kavirajan, B., Kumar, M.A., Soman, K.: Onto-thesaurus for Tamil language: Ontology based intelligent system for information retrieval. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2396–2396. IEEE (2017)

    Google Scholar 

  134. Santosh Kumar, T.: Word sense disambiguation using semantic web for Tamil to English statistical machine translation. IRA-Int. J. Technol. Eng. 5(2), 22–31 (2016)

    Google Scholar 

  135. Sarika, M., et al.: Comparative analysis of Tamil and English news text summarization using text rank algorithm. Turkish J. Comput. Mathe. Educ. (TURCOMAT) 12(9), 2385–2391 (2021)

    Google Scholar 

  136. Sarveswaran, K., Dias, G.: THAMIZHIUDP: A dependency parser for Tamil. (2020) arXiv preprint arXiv:2012.13436

  137. Sarveswaran, K., Dias, G.: Building a part of speech tagger for the Tamil language. In: 2021 International Conference on Asian Language Processing (IALP), pp. 286–291 IEEE (2021)

    Google Scholar 

  138. Sarveswaran, K., Dias, G., Butt, M.: Thamizhifst: A morphological analyser and generator for Tamil verbs. In: 2018 3rd International Conference on Information Technology Research (ICITR). pp. 1–6. IEEE (2018)

    Google Scholar 

  139. Sarveswaran, K., Dias, G., Butt, M.: THAMIZHIMORPH: a morphological parser for the Tamil language. Mach. Transl. 35(1), 37–70 (2021)

    Article  Google Scholar 

  140. Selvi, S.S., Anitha, R.: J. Intell. Fuzzy Syst. (Bilingual corpus-based hybrid POS tagger for low resource Tamil language: A statistical approach), 1–20 (2022)

    Google Scholar 

  141. Sivasankar, E., Krishnakumari, K., Balasubramanian, P.: An enhanced sentiment dictionary for domain adaptation with multi-domain dataset in Tamil language (ESD-da). Soft. Comput. 25(5), 3697–3711 (2021)

    Article  Google Scholar 

  142. Sobha, L.: Pronominal resolution in south dravidian languages. 23rd South Asian Language Analysis, University of Texas, Austin 446 (2003)

    Google Scholar 

  143. Sridhar, R., Janani, V., Gowrisankar, R., Monica, G.: Language relationship model for automatic generation of Tamil stories from hints. Int. J. Intell. Inf. Technol. (IJIIT) 13(2), 21–40 (2017)

    Article  Google Scholar 

  144. Subramoniam, V., Bhattacharya, M., Lohy, A., Tarai, S.: Speech synthesis (Tamil oriya): an application for the blind. Department of Science and Technology, Govt. of India III.5(35) 2001-ET (2001)

    Google Scholar 

  145. Suriyah, M., Anandan, A., Narasimhan, A., Karky, M.: Piripori: morphological analyser for tamil. In: International Conference On Artificial Intelligence, Smart Grid And Smart City Applications. pp. 801–809. Springer (2019) https://doi.org/10.1007/978-3-030-24051-6_75

  146. Thangarajan, R., Natarajan, A.: Syllable based continuous speech recognition for Tamil. South Asian lang. rev. 18(1), 72–85 (2008)

    Google Scholar 

  147. Thangarajan, R., Natarajan, A., Selvam, M.: Word and triphone based approaches in continuous speech recognition for Tamil language. WSEAS Trans. Signal Proc. 4(3), 76–86 (2008)

    Google Scholar 

  148. Thangarasu, M., Manavalan, R.: Stemmers for Tamil language: performance analysis. (2013) arXiv preprint arXiv:1310.0754

  149. Thenmozhi, D., Aravindan, C.: Ontology-based Tamil-English cross-lingual information retrieval system. Sadhana - Academy Proc. Eng. Sci. 43(10), 1–14 (2018)

    Google Scholar 

  150. Vasantharajan, C., Thayasivam, U.: Towards offensive language identification for Tamil code-mixed YouTube comments and posts. SN Computer Science 3(1), 1–13 (2022)

    Article  Google Scholar 

  151. Vel, S.S.: Pre-processing techniques of text mining using computational linguistics and python libraries. In: 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS). pp. 879–884. IEEE (2021)

    Google Scholar 

  152. Vignesh, N., Sowmya, S.: Automatic question generator in Tamil. International J. Eng. Res. Technol. (IJERT) 2 (2013)

    Google Scholar 

  153. Vijayakrishna, R., Sobha, L.: Domain focused named entity recognizer for tamil using conditional random fields. In: Proceedings of the IJCNLP-08 workshop on named entity recognition for South and South East Asian Languages (2008)

    Google Scholar 

  154. Visuwalingam, H., Sakuntharaj, R., Ragel, R.G.: Part of speech tagging for Tamil language using deep learning. In: 2021 IEEE 16th International Conference on Industrial and Information Systems (ICIIS), pp. 157–161 IEEE (2021)

    Google Scholar 

  155. Viswanathan, S.: Tamil morphological analyser. Unpublished MS Thesis (2000)

    Google Scholar 

  156. Viswanathan, S., Ramesh Kumar, S., Kumara Shanmugam, B., Arulmozi, S., Vijay Shanker, K.: A tamil morphological analyser. In: Proceedings of the International Conference on Natural Language Processing (ICON), CIIL, Mysore, India (2003)

    Google Scholar 

  157. Zhang, H., Shi, K., Chen, N.F.: Multilingual speech evaluation: Case studies on English, Malay and Tamil. (2021) arXiv preprint arXiv:2107.03675

Download references

Acknowledgments

This paper is based on the White paper designed on the META-NET. We express our gratitude to Dr. George Rehm (Head of the META-NET), Prof. Joseph Mariani and Prof.Girish Nath Jha from School of Sanskrit and Indic Studies for the white paper development through close interactions. The complete resource links are available in the Tamil data portal, https://sites.google.com/view/tamilnlp.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Anand Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rajendran, S., Anand Kumar, M., Rajalakshmi, R., Dhanalakshmi, V., Balasubramanian, P., Soman, K.P. (2023). Tamil NLP Technologies: Challenges, State of the Art, Trends and Future Scope. In: M, A.K., et al. Speech and Language Technologies for Low-Resource Languages . SPELLL 2022. Communications in Computer and Information Science, vol 1802. Springer, Cham. https://doi.org/10.1007/978-3-031-33231-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33231-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33230-2

  • Online ISBN: 978-3-031-33231-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics