Tamil NLP Technologies: Challenges, State of the Art, Trends and Future Scope

Rajendran, S.; Anand Kumar, M.; Rajalakshmi, Ratnavel; Dhanalakshmi, V.; Balasubramanian, P.; Soman, K P

doi:10.1007/978-3-031-33231-9_6

S. Rajendran¹²,
M. Anand Kumar¹³,
Ratnavel Rajalakshmi¹⁴,
V. Dhanalakshmi¹⁵,
P. Balasubramanian¹⁶ &
…
K P Soman¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1802))

Included in the following conference series:

International Conference on Speech and Language Technologies for Low-resource Languages

284 Accesses
1 Citations

Abstract

This paper aims to summarize the NLP-based technological development of the Tamil language. Tamil is one of the Dravidian languages that are serious about technological development. This phenomenon is reflected in its activities in developing language technology tools and the resources made for technological development. Tamil has successfully developed tools or systems for speech synthesis and recognition, grammatical analysis of grammar, semantics and social media text, along with machine translation. There are many types of research undertaken to orient towards this achievement. Similarly, many activities are developing resources to facilitate technological development. The activities include preparing text corpora for text including monolingual, parallel and lexical along with speech with lexical resources and grammar. What is needed now is to stock-take the achievement made so far and found out where Tamil is in the arena of technological development and looks forward further to its fast technological development. Computational linguistics in Tamil NLP is gaining more attraction, and various data sets available for research is highlighted in this work for further exploration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Accelerating NLP for Technologically Underserved Languages: A Corpus for Moroccan Dialect

A Comprehensive Study on Natural Language Processing, It’s Techniques and Advancements in Nepali Language

A Systematic Literature Review of Natural Language Processing: Current State, Challenges and Risks

Notes

1.
http://www.tamilvu.org/en/research-development.
2.
https://tdil.meity.gov.in/Research_Effort.aspx.
3.
https://github.com/nlpc-uom.
4.
http://corpora.ciil.org/wordcorpora.htm.
5.
http://www.emille.lancs.ac.uk/home.htm.
6.
POS tagged corpus is available on request http://www.nlp.amrita.edu/nlpcorpus.html.
7.
http://www.au-kbc.org/nlp/corpusrelease.html.
8.
http://tdil-dc.in.
9.
Parallel Corpus is available on request http://www.nlp.amrita.edu/nlpcorpus.html.
10.
DPIL corpus is available on request http://www.nlp.amrita.edu/nlpcorpus.html.

References

Abinaya, N., John, N., Ganesh, B.H., Kumar, A.M., Soman, K.: Amrita_cen@ fire-2014: Named entity recognition for indian languages using rich features. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 103–111 (2014)
Google Scholar
Agalya, T.: Comparative analysis for offensive language identification of Tamil text using SVM and logistic classifier (2021)
Google Scholar
Akilandeswari, A., Devi, S.L.: Conditional random fields based pronominal resolution in Tamil. Int. J. Comput. Sci. Eng. 5(6), 567 (2013)
Google Scholar
Akilandeswari, A., Lalitha Devi, S.: Anaphora Resolution in Tamil Novels. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds.) MIKE 2014. LNCS (LNAI), vol. 8891, pp. 268–277. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13817-6_26
Chapter Google Scholar
Akilandeswari, A., Devi, S.L.: Tamil pronominal resolution boosted by sentence transformation. Aust. J. Basic Appl. Sci. 9(23), 566–572 (2015)
Google Scholar
Anand Kumar, M., Dhanalakshmi, V., Rekha, R., Soman, K., Rajendran, S.: A novel data driven algorithm for Tamil morphological generator. Int. J. Comput. Appl. 975, 8887 (2010)
Google Scholar
Anand Kumar, M., Dhanalakshmi, V., Soman, K., Rajendran, S.: A sequence labeling approach to morphological analyzer for Tamil language. (IJCSE) Int. J. Comput. Sci. Eng. 2(6), 1944–195 (2010)
Google Scholar
Anand Kumar, M., Rajendran, S., Soman, K.: Tamil word sense disambiguation using support vector machines with rich features. Int. J. Appl. Eng. Res. 9(20), 7609–20 (2014)
Google Scholar
Anand Kumar, M., Singh, S., Ramanan, P., Sinthiya, V., Soman, K., et al.: Creating paraphrase identification corpus for Indian languages: Opensource data set for paraphrase creation. In: Handbook of Research on Emerging Trends and Applications of Machine Learning, pp. 157–170. IGI Global (2020)
Google Scholar
Anandan, P., Saravanan, K., Parthasarathi, R., Geetha, T.: Morphological analyzer for Tamil. In: International Conference on Natural language Processing. 3, 12–22 (2002)
Google Scholar
Ananth Ramakrishnan, A., Devi, S.L.: An alternate approach towards meaningful lyric generation in Tamil. In: Proceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity, pp. 31–39 (2010)
Google Scholar
Ananth Ramakrishnan, A., Kuppan, S., Devi, S.L.: Automatic generation of Tamil lyrics for melodies. In: Proceedings of the workshop on computational approaches to linguistic creativity, pp. 40–46 (2009)
Google Scholar
Anbukkarasi, S., Varadhaganapathy, S.: Deep learning based Tamil parts of speech (POS) tagger. Technical Sciences, Bulletin of the Polish Academy of Sciences (2021)
Google Scholar
Anbukkarasi, S., Varadhaganapathy, S.: Neural network-based error handler in natural language processing. Neural Comput. Appl., pp. 1–10 (2022)
Google Scholar
Aparna, K.G., Ramakrishnan, A.G.: A Complete Tamil Optical Character Recognition System. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 53–57. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_6
Chapter Google Scholar
Arulmozhi, P., Sobha, L., Kumara Shanmugam, B.: Parts of speech tagger for Tamil. In: Symposium on Indian Morphology, Phonology Language Engineering, pp. 19–21 (2004)
Google Scholar
Arulmozhi, S.: Aspects of inflectional morphophonology - a computational approach. Unpublished Ph.D. Thesis (1998)
Google Scholar
Arunselvan, S., Anand Kumar, M., Soman, K.: Sentiment analysis of Tamil movie reviews via feature frequency count. Int. J. Appl. Eng. Res. 10(20), 17934–17939 (2015)
Google Scholar
Bharathi, B., Agnusimmaculate, A.S.: SSNCSE_NLP@DravidianLangTech-EACL2021: Offensive language identification on multilingual code mixing text. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 313–318. Assoc. Comput. Linguist., Kyiv (2021), https://aclanthology.org/2021.dravidianlangtech-1.45
Banu, M., Karthika, C., Sudarmani, P., Geetha, T.: Tamil document summarization using semantic graph method. In: International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007) 2, pp. 128–134 IEEE (2007)
Google Scholar
Baskaran, S.: Semantic analyser for word sense disambiguation. Unpublished MS Thesis (2002)
Google Scholar
Bharathi, B., Samyuktha, G.: Machine learning based approach for sentiment analysis on multilingual code mixing text. In: Working Notes of FIRE 2021-Forum for Information Retrieval Evaluation. CEUR (2021)
Google Scholar
Bharathi, B., Varsha, J.: Ssncse nlp@ tamilnlp-acl2022: Transformer based approach for detection of abusive comment for Tamil language. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 158–164 (2022)
Google Scholar
Chakravarthi, B.R.: Leveraging orthographic information to improve machine translation of under-resourced languages. Ph.D. thesis, NUI Galway (2020)
Google Scholar
Chakravarthi, B.R., Arcan, M., McCrae, J.P.: Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages. In: 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs) 70, pp. 61–614. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2019). https://doi.org/10.4230/OASIcs.LDK.2019.6,http://drops.dagstuhl.de/opus/volltexte/2019/10370
Chakravarthi, B.R., et al.: Overview of the HASOC-DravidianCodeMix Shared Task on Offensive Language Detection in Tamil and Malayalam. In: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation. CEUR (2021)
Google Scholar
Chakravarthi, B.R., Muralidaran, V., Priyadharshini, R., McCrae, J.P.: Corpus creation for sentiment analysis in code-mixed Tamil-English text. CoRR abs/2006.00206 (2020). https://arxiv.org/abs/2006.00206
Chakravarthi, B.R., Priyadharshini, R., Kumar M, A., Krishnamurthy, P., Sherly, E.: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. Assoc. Comput. Linguist., Kyiv (2021). https://aclanthology.org/2021.dravidianlangtech-1.0
Chakravarthi, B.R., etal.: Findings of the sentiment analysis of dravidian languages in code-mixed text. CoRR abs/2111.09811 (2021), https://arxiv.org/abs/2111.09811
Chakravarthi, B.R., Rani, P., Arcan, M., McCrae, J.P.: A survey of orthographic information in machine translation. arXiv e-prints pp. arXiv-2008 (2020)
Google Scholar
Chandrakanth, D., Anand Kumar, M., Gunasekaran, S.: Part-of-speech tagging for Tamil language. Proc. Int. J. Commun. Eng. 6(6), 1 (2012)
Google Scholar
Chellamuthu, K.: Russian to Tamil machine translation system at Tamil university. In: Proceedings of Tamil Internet 2002 Conference. http://infitt.org/ti2002/papers/16CHELLA. pdf) (2002)
Chinnuswamy, P., Krishnamoorthy, S.G.: Recognition of handprinted Tamil characters. Pattern Recogn. 12(3), 141–152 (1980)
Article Google Scholar
Cruz, W.: Parsing and generation of Tamil verbs in GSMORPH. Unpublished M.Phil. Dissertation (2002)
Google Scholar
Darbari, H., et al.: Enabling linguistic idiosyncrasy in anuvadaksh. Vishwabharat, July-Dec (2013)
Google Scholar
Deepa, R.A., Rao, R.R.: A novel nearest interest point classifier for offline Tamil handwritten character recognition. Pattern Anal. Appl. 23(1), 199–212 (2020)
Article Google Scholar
Deivasundaram, N., Gopal, A.: Computational morphology of Tamil. Word Structure in Dravidian, Kuppam: Dravidian University, pp. 406–410 (2003)
Google Scholar
Devi, G.R., Kumar, M.A., Soman, K.: Extraction of named entities from social media text in Tamil language using n-gram embedding for disaster management. In: Studies in Computational Intelligence, pp. 207–223 (2020)
Google Scholar
Devi, S.L., Pralayankar, P., Menaka, S., Bakiyavathi, T., Ram, R.V.S., Kavitha, V.: Verb transfer in a Tamil to Hindi machine translation system. In: 2010 International Conference on Asian Language Processing, pp. 261–264. IEEE (2010)
Google Scholar
Devi, S.L., Ram, V.S., Rao, P.R.: Anaphora resolution system for Indian languages. In: Proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation (WILDRE). LREC2014, Reykjavik, Iceland (2014)
Google Scholar
Devi, S.L., Ram, V.S., Rao, P.R.: A generic anaphora resolution engine for Indian languages. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 1824–1833 (2014)
Google Scholar
Dhanalakshmi, V., Kumar, A.M., Rajendran, S., Soman, K.: POS tagger and chunker for Tamil language. In: Proceedings of the 8th Tamil Internet Conference. Cologne, Germany (2009)
Google Scholar
Dhanalakshmi, V., Kumar, A.M., Soman, K., Rajendran, S.: Chunker for Tamil using machine learning. In: 7th International Conference on Natural Language Processing 2009 (ICON 2009), IIIT Hyderabad, India (2009)
Google Scholar
Dhanalakshmi, V., Padmavathy, P., Soman, K., Rajendran, S.: Chunker for Tamil. In: 2009 International Conference on Advances in Recent Technologies in Communication and Computing, pp. 436–438. IEEE (2009)
Google Scholar
Dhanalakshmi V, Anand Kumar M, Murugesan, C.: Dependency parser for Tamil classical literature: kurunthokai. In: Proceedings of Tamil Internet Conference, pp. 147–152 (2012)
Google Scholar
Dhivya, R., Dhanalakshmi, V., Anand Kumar, M., Soman, K.P.: Clause Boundary Identification for Tamil Language Using Dependency Parsing. In: Das, V.V., Ariwa, E., Rahayu, S.B. (eds.) SPIT 2011. LNICST, vol. 62, pp. 195–197. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32573-1_32
Chapter Google Scholar
Dhivyaa, C., Nithya, K., Janani, T., Kumar, K.S., Prashanth, N.: Transliteration based generative pre-trained transformer 2 model for Tamil text summarization. In: 2022 International Conference on Computer Communication and Informatics (ICCCI), p. 1–6. IEEE (2022)
Google Scholar
Evangeline, M.M., Shyamala, K., Barathi, L., Sandhya, R.: Frequency Based Feature Extraction Technique for Text Documents in Tamil Language. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser, J., Ören, T., Sonawane, V.R. (eds.) ICACDS 2021. CCIS, vol. 1441, pp. 76–84. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88244-0_8
Chapter Google Scholar
Ezhilarasi, S., Maheswari, P.U.: Depicting a neural model for lemmatization and POS tagging of words from PALAEO graphic stone inscriptions. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1879–1884. IEEE (2021)
Google Scholar
Fernando, A., Ranathunga, S., Dias, G.: Data augmentation and terminology integration for domain-specific Sinhala-English-Tamil statistical machine translation. (2020) arXiv preprint arXiv:2011.02821
Ganesan, M.: Functions of the morphological analyser developed at CIIL, Mysore. In: Automatic Automatic Translation (seminar proceedings), Thiruvananthapuram: ISDL (1994)
Google Scholar
Ganesan, M.: Computational morphology of Tamil. Word Structure in Dravidian, Kuppam: Dravidian University, pp. 399–405 (2003)
Google Scholar
Ganesan, M., Ekka, F.: Morphological analyzer for Indian languages. Information Technology Applications in Language, Script and Speech, New Delhi: BPB Publication (1994)
Google Scholar
Ganesh, J., Parthasarathi, R., Geetha, T.V., Balaji, J.: Pattern Based Bootstrapping Technique for Tamil POS Tagging. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds.) MIKE 2014. LNCS (LNAI), vol. 8891, pp. 256–267. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13817-6_25
Chapter Google Scholar
Ganganwar, V., Rajalakshmi, R.: MTDOT: A multilingual translation-based data augmentation technique for offensive content identification in Tamil text data. Electronics 11(21), 3574 (2022)
Article Google Scholar
HandWiki: Tamil_all_character_encoding (2020)
Google Scholar
Hariharan, V., Kumar, M.A., Soman, K.: Named entity recognition in Tamil language using recurrent based sequence model. In: Lecture Notes in Networks and Systems, 74 (2019)
Google Scholar
Jain, M., Punia, R., Hooda, I.: Neural machine translation for Tamil to English. J. Stat. Manage. Syst. 23(7), 1251–1264 (2020)
Google Scholar
Kalamani, M., Krishnamoorthi, M., Valarmathi, R.: Continuous Tamil speech recognition technique under non stationary noisy environments. Int. J. Speech Technol. 22(1), 47–58 (2019)
Article Google Scholar
Kamakshi, S., Rajendren, S.: Preliminaries to the preparation of a machine aid to translate linguistics texts written in English into Tamil. Language in India 3 (2004)
Google Scholar
Kannan, R.R., Rajalakshmi, R., Kumar, L.: Indic-BERT based approach for sentiment analysis on code-mixed Tamil tweets (2021)
Google Scholar
Kausikaa, N., Uma, V.: Sentiment analysis of English and Tamil tweets using path length similarity based word sense disambiguation. Int. Organ. Sci. Res. J. 1, 82–89 (2016)
Google Scholar
Kavirajan, B., Kumar, M.A., Soman, K., Rajendran, S., Vaithehi, S.: Improving the rule based machine translation system using sentence simplification (English to Tamil). In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 957–963. IEEE (2017)
Google Scholar
Kohilavani, S., Mala, T., Geetha, T.: Automatic Tamil content generation. In: 2009 International Conference on Intelligent Agent Multi-Agent Systems, p. 1–6. IEEE (2009)
Google Scholar
Krishnamurthy, P.: Development of Telugu-Tamil transfer-based machine translation system: an improvisation using divergence index. J. Intell. Syst. 28(3), 493–504 (2019)
Google Scholar
Krishnamurthy, P., Sarveswaran, K.: Towards building a modern written tamil treebank. In: Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021), pp. 61–68 (2021)
Google Scholar
Krishnan, A.S., Ragavan, S.: Morphology-aware meta-embeddings for Tamil. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 94–111 (2021)
Google Scholar
Krishnan, K.G., Pooja, A., Kumar, M.A., Soman, K.: Character based bidirectional LSTM for disambiguating Tamil part-of-speech categories. Int. J. Control Theory Appl 10, 229–235 (2017)
Google Scholar
kumar, A.M., Soman, K.: Amrita_cen@ fire-2014: morpheme extraction and lemmatization for Tamil using machine learning. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 112–120 (2014)
Google Scholar
Kumar, M.A., Dhanalakshmi, V., Soman, K., Rajendran, S.: Factored statistical machine translation system for English to Tamil language. Pertanika J. Soc. Sci. Humanit. 22(4) (2014)
Google Scholar
Kumar, M.A., Premjith, B., Singh, S., Rajendran, S., Soman, K.P.: An overview of the shared task on machine translation in Indian languages (MTIL)–2017. Journal of Intelligent Systems 28(3), 455–464 (2019). https://doi.org/10.1515/jisys-2018-0024 https://doi.org/10.1515/jisys-2018-0024
Kumar, M.A., Premjith, B., Singh, S., Rajendran, S., Soman, K.: An overview of the shared task on machine translation in Indian languages (MTIL)-2017. J. Intell. Syst. 28(3), 455–464 (2019)
Google Scholar
Kumar, M.A., Rajendran, S., Soman, K.: Cross-lingual preposition disambiguation for machine translation. Procedia Comput. Sci. 54, 291–300 (2015)
Article Google Scholar
Kumar, M.A., Rajendran, S., Soman, K.: Cross-lingual preposition disambiguation for machine translation. Procedia Comput. Sci. 54, 291–300 (2015)
Article Google Scholar
Anand Kumar, M., Singh, S., Kavirajan, B., Soman, K.P.: Shared Task on Detecting Paraphrases in Indian Languages (DPIL): An Overview. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds.) FIRE 2016. LNCS, vol. 10478, pp. 128–140. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73606-8_10
Chapter Google Scholar
LekshmiAmmal, H., Ravikiran, M., et al.: Nitk-it_nlp@ tamilnlp-acl2022: Transformer based model for toxic span identification in Tamil. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 75–78 (2022)
Google Scholar
Lokesh, S., Kumar, P.M., Devi, M.R., Parthasarathy, P., Gokulnath, C.: An automatic Tamil speech recognition system by using bidirectional recurrent neural network with self-organizing map. Neural Comput. Appl. 31(5), 1521–1531 (2019)
Article Google Scholar
Lushanthan, S., Weerasinghe, A., Herath, D.: Morphological analyzer and generator for tamil language. In: 2014 14th International Conference on Advances in ICT for Emerging Regions (ICTER), pp. 190–196. IEEE (2014)
Google Scholar
Anandkumar, M.: Morphology based prototype statistical machine translation system for English to Tamil language. Unpublished PhD Thesis (2013)
Google Scholar
Malarkodi, C., Lex, E., Devi, S.L.: Named entity recognition for the agricultural domain. Res. Comput. Sci. 117, 121–132 (2016)
Article Google Scholar
Malarkodi, C., Sobha, L.: Twitter named entity recognition for Indian languages. In: Proceedings of 18th International Conference on Computational Linguistics and Intelligent Text Processing (2018)
Google Scholar
Manone, V., Soman, K., Rajendran, S.: A synchronous syntax for English-Tamil language pair for machine translation. In: 4th International Symposium on Natural Language Processing (NLP’15), Kochi, Kerala, Co-affiliated with 4th International Conference in Computing, Communications and Informatics (ICACCI-2015) (2015)
Google Scholar
Marimuthu, K., Amudha, K., Bakiyavathi, T., Devi, S.L.: Word boundary identifier as a catalyzer and performance booster for Tamil morphological analyzer. In: Proceedings of 6th Language and Technology Conference, Human Language Technologies as a challenge for Computer Science and Linguistics, Poznan, Poland. (2013)
Google Scholar
Menaka, S., Malarkodi, C., Devi, S.L.: A deep study on causal relations and its automatic identification in tamil. In: Proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation. LREC2014, Reykjavik, Iceland (2014)
Google Scholar
Menaka, S., Ram, V.S., Devi, S.L.: Morphological generator for Tamil. Proceedings of the Knowledge Sharing event on Morphological Analysers and Generators, LDC-IL, Mysore, India, pp. 82–96 (2010)
Google Scholar
Menon, D.A., Saravanan, S., Loganathan, R., Soman, D.K.: Amrita morph analyzer and generator for Tamil: a rule based approach. In: Proceedings of Tamil Internet Conference, pp. 239–243 (2009)
Google Scholar
Mokanarangan, T., et al.: Tamil Morphological Analyzer Using Support Vector Machines. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 15–23. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41754-7_2
Chapter Google Scholar
Mrinalini, K., Nagarajan, T., Vijayalakshmi, P.: Pause-based phrase extraction and effective OOV handling for low-resource machine translation systems. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18(2), 1–22 (2018)
Google Scholar
Padmamala, R., Prema, V.: Sentiment analysis of online Tamil contents using recursive neural network models approach for Tamil language. In: 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), pp. 28–31. IEEE (2017)
Google Scholar
Pandian, S.L., Geetha, T. V.: CRF Models for Tamil Part of Speech Tagging and Chunking. In: Li, W., Mollá-Aliod, D. (eds.) ICCPOL 2009. LNCS (LNAI), vol. 5459, pp. 11–22. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00831-3_2
Chapter Google Scholar
Pattabhi, R., Rao, T., Ram, R.V.S., Vijayakrishna, R., Sobha, L.: A text chunker and hybrid pos tagger for indian languages. In: Proceedings of International Joint Conference on Artificial Intelligence Workshop on Shallow Parsing for South Asian Languages, IIIT Hyderabad, Hyderabad, India (2007)
Google Scholar
Pattabhi, R., Sobha, L.: Identifying similar and co-referring documents across languages. In: Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies, pp. 10–17 (2008)
Google Scholar
Pilar, B., et al.: Subword dictionary learning and segmentation techniques for automatic speech recognition in Tamil and Kannada. (2022) arXiv preprint arXiv:2207.13331
Premjith, B., Soman, K.: Deep learning approach for the morphological synthesis in Malayalam and Tamil at the character level. Trans. Asian Low-Resource Lang. Inf. Proc. 20(6), 1–17 (2021)
Article Google Scholar
Priyadharshini, R., et al.: Overview of abusive comment detection in Tamil-ACL 2022. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 292–298 (2022)
Google Scholar
Raj, M.A.R., Abirami, S.: Junction point elimination based Tamil handwritten character recognition: An experimental analysis. J. Syst. Sci. Syst. Eng. 29(1), 100–123 (2020)
Article Google Scholar
Raj, M.A.R., Abirami, S.: Structural representation-based off-line Tamil handwritten character recognition. Soft. Comput. 24(2), 1447–1472 (2020)
Article Google Scholar
Rajalakshmi, R., Duraphe, A., Shibani, A.: Dlrg@ dravidianlangtech-acl2022: Abusive comment detection in Tamil using multilingual transformer models. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 207–213 (2022)
Google Scholar
Rajalakshmi, R., Reddy, Y., Kumar, L.: Dlrg@ dravidianlangtech-eacl2021: Transformer based approachfor offensive language identification on code-mixed Tamil. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 357–362 (2021)
Google Scholar
Rajalakshmi, R., Selvaraj, S., Vasudevan, P., et al.: Hottest: Hate and offensive content identification in Tamil using transformers and enhanced stemming. Computer Speech Language, p. 101464 (2022)
Google Scholar
Rajasekar, M., Geetha, A.: Comparison of Machine Learning Methods for Tamil Morphological Analyzer. In: Raj, J.S., Palanisamy, R., Perikos, I., Shi, Y. (eds.) Intelligent Sustainable Systems. LNNS, vol. 213, pp. 385–399. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2422-3_31
Chapter Google Scholar
Rajendran, S.: Spell and grammar checker for tamil. In: Paper read in 27th All India Conference of Dravidian Linguists held in ISDL, Thiruvananthapuram. 17 (1999)
Google Scholar
Rajendran, S.: Preliminaries to the preparation of a word net for Tamil. Lang. India 2(1), 467–497 (2002)
Google Scholar
Rajendran, S.: Parsing in Tamil: Present state of art. Lang. India 6, 8 (2006)
Google Scholar
Rajendran, S.: Complexity of Tamil in POS tagging. Lang. India 7(1) (2007)
Google Scholar
Rajendran, S.: Resolution of lexical ambiguity in Tamil. Lang. India 14(1) (2014)
Google Scholar
Rajendran, S., Kumar, M.A.: Computing tools for Tamil language teaching and learning. In: 17th Tamil Internet Conference. Tamil Agricultural University, Coimbatore (2018)
Google Scholar
Rajendran, S., Viswanathan, S., Kumar, R.: Computational morphology of Tamil verbal complex. Lang. India 3(4) (2003)
Google Scholar
Rajkumar, N., Subashini, T., Rajan, K., Ramalingam, V.: An efficient feature extraction with bidirectional long short term memory based deep learning model for Tamil document classification. J. Comput. Theor. Nanosci. 18(3), 568–585 (2021)
Google Scholar
Ram, R.V.S., Lalitha Devi, S.: Clause Boundary Identification Using Conditional Random Fields. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 140–150. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78135-6_13
Chapter Google Scholar
Ram, R.V.S., Devi, S.L.: Coreference resolution using tree-CRF. A. Gelbukh (ed), Comput. Linguist. Intell. Text Proc. 7181, 285–296 (2012)
Google Scholar
Ram, R.V.S., Devi, S.L.: Pronominal resolution in Tamil using tree CRFS. In: 2013 International Conference on Asian Language Processing, pp. 197–200. IEEE (2013)
Google Scholar
Ram, R.V.S., Devi, S.L.: Two layer machine learning approach for mining referential entities for a morphologically rich language. Asian J. Inf. Technol. 15, 2831–2838 (2016)
Google Scholar
Ram, R.V.S., Sobha, L.D.: Tamil clause boundary identification: Annotation and evaluation. In: Workshop on Indian Language and Data: Resources and Evaluation. p. 122. LREC, Istanbul (2012)
Google Scholar
Ram, R., Devi, S.L.: Noun phrase chunker using finite state automata for an agglutinative language. In: Proceedings of the Tamil Internet-2010 at Coimbatore, India, pp. 23–27 (2010)
Google Scholar
Ram, V.S., Menaka, S., Devi, S.L.: Tamil morphological analyser. In: Proceedings of the Knowledge Sharing event on Morphological Analysers and Generators, Mona Parakh, LDC-IL, Mysore, India, pp. 1–18 (2010)
Google Scholar
Ramakrishnan, A., Kaushik, L.N., Narayana, L.: Natural language processing for Tamil TTS. In: Proc. 3rd Language and Technology Conference, Poznan, Poland, pp. 192–196 (2007)
Google Scholar
Ramanathan, V., Meyyappan, T., Thamarai, S.: Predicting Tamil movies sentimental reviews using Tamil tweets. J. Comput. Sci. 15(11), 1638–1647 (2019)
Article Google Scholar
Ramanathan, V., Meyyappan, T., Thamarai, S.: Sentiment analysis: an approach for analysing tamil movie reviews using Tamil tweets. Recent Adv. Mathe. Res. Comput. Sci. 3, 28–39 (2021)
Article Google Scholar
Ramasamy, L., Bojar, O., Žabokrtskỳ, Z.: Morphological processing for English-Tamil statistical machine translation. In: Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages, pp. 113–122 (2012)
Google Scholar
Ramasamy, L., Bojar, O., Žabokrtskỳ, Z.: ENTAM: An English-Tamil parallel corpus (ENTAM v2. 0) (2014)
Google Scholar
Ramaswamy, V.: A morphological generator for Tamil. Unpublished Ph.D. Dissertation (2000)
Google Scholar
Ramaswamy, V.: A morphological analyzer for Tamil. Unpublished Ph.D. Dissertation (2003)
Google Scholar
Ranganathan, V.: A lexical phonological approach to Tamil word by computer. Int. J. Dravidian Linguist. 26(1), 57–70 (1997)
Google Scholar
Ranganathan, V.: Computational Approaches To Tamil Linguistics, chap. 3. CRE-A Publications (2016)
Google Scholar
Ravikiran, M., Annamalai, S.: DOSA: dravidian code-mixed offensive span identification dataset. In: Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages, pp. 10–17. Assoc. Comput. Linguist., Kyiv (2021). https://aclanthology.org/2021.dravidianlangtech-1.2
Ravikiran, M., et al.: Findings of the shared task on toxic span identification in Tamil. In: Proceedings of the 2nd Workshop on Speech and Language Technologies for Dravidian Languages. Assoc. Comput. Linguist. (2022)
Google Scholar
Remmiya Devi, G., Anand Kumar, M., Soman, K.: Co-occurrence based word representation for extracting named entities in Tamil tweets. J. Intell. Fuzzy Syst. 34(3), 1435–1442 (2018)
Article Google Scholar
Rethanya. V, Dhanalakshmi, V., Soman, M., Rajendran, S.: Morphological stemmer and LEMMATIZER for Tamil. In: Proceedings of 18th Tamil Internet Conference. International Forum for Information Technology in Tamil (INFITT) (2019)
Google Scholar
RK Rao, P., Devi, S.L.: Patent document summarization using conceptual graphs. Int. J. Nat. Lang. Comput. (IJNLC) 6 (2017)
Google Scholar
Sakuntharaj, R., Mahesan, S.: Missing word detection and correction based on context of tamil sentences using n-grams. In: 2021 10th International Conference on Information and Automation for Sustainability (ICIAfS), pp. 42–47. IEEE (2021)
Google Scholar
Samuel Manoharan, J.: A novel text-to-speech synthesis system using syllable-based hmm for Tamil language. In: Shakya, S., Du, K.L., Haoxiang, W. (eds.) Proceedings of Second International Conference on Sustainable Expert Systems, pp. 305–314. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-7657-4_26
Sankaralingam, C., Rajendran, S., Kavirajan, B., Kumar, M.A., Soman, K.: Onto-thesaurus for Tamil language: Ontology based intelligent system for information retrieval. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2396–2396. IEEE (2017)
Google Scholar
Santosh Kumar, T.: Word sense disambiguation using semantic web for Tamil to English statistical machine translation. IRA-Int. J. Technol. Eng. 5(2), 22–31 (2016)
Google Scholar
Sarika, M., et al.: Comparative analysis of Tamil and English news text summarization using text rank algorithm. Turkish J. Comput. Mathe. Educ. (TURCOMAT) 12(9), 2385–2391 (2021)
Google Scholar
Sarveswaran, K., Dias, G.: THAMIZHIUDP: A dependency parser for Tamil. (2020) arXiv preprint arXiv:2012.13436
Sarveswaran, K., Dias, G.: Building a part of speech tagger for the Tamil language. In: 2021 International Conference on Asian Language Processing (IALP), pp. 286–291 IEEE (2021)
Google Scholar
Sarveswaran, K., Dias, G., Butt, M.: Thamizhifst: A morphological analyser and generator for Tamil verbs. In: 2018 3rd International Conference on Information Technology Research (ICITR). pp. 1–6. IEEE (2018)
Google Scholar
Sarveswaran, K., Dias, G., Butt, M.: THAMIZHIMORPH: a morphological parser for the Tamil language. Mach. Transl. 35(1), 37–70 (2021)
Article Google Scholar
Selvi, S.S., Anitha, R.: J. Intell. Fuzzy Syst. (Bilingual corpus-based hybrid POS tagger for low resource Tamil language: A statistical approach), 1–20 (2022)
Google Scholar
Sivasankar, E., Krishnakumari, K., Balasubramanian, P.: An enhanced sentiment dictionary for domain adaptation with multi-domain dataset in Tamil language (ESD-da). Soft. Comput. 25(5), 3697–3711 (2021)
Article Google Scholar
Sobha, L.: Pronominal resolution in south dravidian languages. 23rd South Asian Language Analysis, University of Texas, Austin 446 (2003)
Google Scholar
Sridhar, R., Janani, V., Gowrisankar, R., Monica, G.: Language relationship model for automatic generation of Tamil stories from hints. Int. J. Intell. Inf. Technol. (IJIIT) 13(2), 21–40 (2017)
Article Google Scholar
Subramoniam, V., Bhattacharya, M., Lohy, A., Tarai, S.: Speech synthesis (Tamil oriya): an application for the blind. Department of Science and Technology, Govt. of India III.5(35) 2001-ET (2001)
Google Scholar
Suriyah, M., Anandan, A., Narasimhan, A., Karky, M.: Piripori: morphological analyser for tamil. In: International Conference On Artificial Intelligence, Smart Grid And Smart City Applications. pp. 801–809. Springer (2019) https://doi.org/10.1007/978-3-030-24051-6_75
Thangarajan, R., Natarajan, A.: Syllable based continuous speech recognition for Tamil. South Asian lang. rev. 18(1), 72–85 (2008)
Google Scholar
Thangarajan, R., Natarajan, A., Selvam, M.: Word and triphone based approaches in continuous speech recognition for Tamil language. WSEAS Trans. Signal Proc. 4(3), 76–86 (2008)
Google Scholar
Thangarasu, M., Manavalan, R.: Stemmers for Tamil language: performance analysis. (2013) arXiv preprint arXiv:1310.0754
Thenmozhi, D., Aravindan, C.: Ontology-based Tamil-English cross-lingual information retrieval system. Sadhana - Academy Proc. Eng. Sci. 43(10), 1–14 (2018)
Google Scholar
Vasantharajan, C., Thayasivam, U.: Towards offensive language identification for Tamil code-mixed YouTube comments and posts. SN Computer Science 3(1), 1–13 (2022)
Article Google Scholar
Vel, S.S.: Pre-processing techniques of text mining using computational linguistics and python libraries. In: 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS). pp. 879–884. IEEE (2021)
Google Scholar
Vignesh, N., Sowmya, S.: Automatic question generator in Tamil. International J. Eng. Res. Technol. (IJERT) 2 (2013)
Google Scholar
Vijayakrishna, R., Sobha, L.: Domain focused named entity recognizer for tamil using conditional random fields. In: Proceedings of the IJCNLP-08 workshop on named entity recognition for South and South East Asian Languages (2008)
Google Scholar
Visuwalingam, H., Sakuntharaj, R., Ragel, R.G.: Part of speech tagging for Tamil language using deep learning. In: 2021 IEEE 16th International Conference on Industrial and Information Systems (ICIIS), pp. 157–161 IEEE (2021)
Google Scholar
Viswanathan, S.: Tamil morphological analyser. Unpublished MS Thesis (2000)
Google Scholar
Viswanathan, S., Ramesh Kumar, S., Kumara Shanmugam, B., Arulmozi, S., Vijay Shanker, K.: A tamil morphological analyser. In: Proceedings of the International Conference on Natural Language Processing (ICON), CIIL, Mysore, India (2003)
Google Scholar
Zhang, H., Shi, K., Chen, N.F.: Multilingual speech evaluation: Case studies on English, Malay and Tamil. (2021) arXiv preprint arXiv:2107.03675

Download references

Acknowledgments

This paper is based on the White paper designed on the META-NET. We express our gratitude to Dr. George Rehm (Head of the META-NET), Prof. Joseph Mariani and Prof.Girish Nath Jha from School of Sanskrit and Indic Studies for the white paper development through close interactions. The complete resource links are available in the Tamil data portal, https://sites.google.com/view/tamilnlp.

Author information

Authors and Affiliations

CEN, Amrita Vishwa Vidyapeetham, Coimbatore, India
S. Rajendran & K P Soman
National Institute of Technology Karnataka, Surathkal, India
M. Anand Kumar
Vellore Institute of Technology, Chennai, India
Ratnavel Rajalakshmi
Pondicherry University, Pondicherry, India
V. Dhanalakshmi
National Institute of Technology, Tiruchirappalli, India
P. Balasubramanian

Authors

S. Rajendran
View author publications
You can also search for this author in PubMed Google Scholar
M. Anand Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Ratnavel Rajalakshmi
View author publications
You can also search for this author in PubMed Google Scholar
V. Dhanalakshmi
View author publications
You can also search for this author in PubMed Google Scholar
P. Balasubramanian
View author publications
You can also search for this author in PubMed Google Scholar
K P Soman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Anand Kumar .

Editor information

Editors and Affiliations

National Institute of Technology Karnataka, Mangalore, India
Anand Kumar M
National University of Ireland, Galway, Ireland
Bharathi Raja Chakravarthi
Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, India
Bharathi B
National University of Ireland, Galway, Ireland
Colm O’Riordan
Indian Institute of Technology Madras, Chennai, India
Hema Murthy
Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, India
Thenmozhi Durairaj
University of Hildesheim, Hildesheim, Germany
Thomas Mandl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajendran, S., Anand Kumar, M., Rajalakshmi, R., Dhanalakshmi, V., Balasubramanian, P., Soman, K.P. (2023). Tamil NLP Technologies: Challenges, State of the Art, Trends and Future Scope. In: M, A.K., et al. Speech and Language Technologies for Low-Resource Languages . SPELLL 2022. Communications in Computer and Information Science, vol 1802. Springer, Cham. https://doi.org/10.1007/978-3-031-33231-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-33231-9_6
Published: 29 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33230-2
Online ISBN: 978-3-031-33231-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tamil NLP Technologies: Challenges, State of the Art, Trends and Future Scope

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Accelerating NLP for Technologically Underserved Languages: A Corpus for Moroccan Dialect

A Comprehensive Study on Natural Language Processing, It’s Techniques and Advancements in Nepali Language

A Systematic Literature Review of Natural Language Processing: Current State, Challenges and Risks

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Tamil NLP Technologies: Challenges, State of the Art, Trends and Future Scope

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Accelerating NLP for Technologically Underserved Languages: A Corpus for Moroccan Dialect

A Comprehensive Study on Natural Language Processing, It’s Techniques and Advancements in Nepali Language

A Systematic Literature Review of Natural Language Processing: Current State, Challenges and Risks

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation