Abstract
This paper aims to summarize the NLP-based technological development of the Tamil language. Tamil is one of the Dravidian languages that are serious about technological development. This phenomenon is reflected in its activities in developing language technology tools and the resources made for technological development. Tamil has successfully developed tools or systems for speech synthesis and recognition, grammatical analysis of grammar, semantics and social media text, along with machine translation. There are many types of research undertaken to orient towards this achievement. Similarly, many activities are developing resources to facilitate technological development. The activities include preparing text corpora for text including monolingual, parallel and lexical along with speech with lexical resources and grammar. What is needed now is to stock-take the achievement made so far and found out where Tamil is in the arena of technological development and looks forward further to its fast technological development. Computational linguistics in Tamil NLP is gaining more attraction, and various data sets available for research is highlighted in this work for further exploration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
http://corpora.ciil.org/wordcorpora.htm.
- 5.
http://www.emille.lancs.ac.uk/home.htm.
- 6.
POS tagged corpus is available on request http://www.nlp.amrita.edu/nlpcorpus.html.
- 7.
http://www.au-kbc.org/nlp/corpusrelease.html.
- 8.
http://tdil-dc.in.
- 9.
Parallel Corpus is available on request http://www.nlp.amrita.edu/nlpcorpus.html.
- 10.
DPIL corpus is available on request http://www.nlp.amrita.edu/nlpcorpus.html.
References
Abinaya, N., John, N., Ganesh, B.H., Kumar, A.M., Soman, K.: Amrita_cen@ fire-2014: Named entity recognition for indian languages using rich features. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 103–111 (2014)
Agalya, T.: Comparative analysis for offensive language identification of Tamil text using SVM and logistic classifier (2021)
Akilandeswari, A., Devi, S.L.: Conditional random fields based pronominal resolution in Tamil. Int. J. Comput. Sci. Eng. 5(6), 567 (2013)
Akilandeswari, A., Lalitha Devi, S.: Anaphora Resolution in Tamil Novels. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds.) MIKE 2014. LNCS (LNAI), vol. 8891, pp. 268–277. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13817-6_26
Akilandeswari, A., Devi, S.L.: Tamil pronominal resolution boosted by sentence transformation. Aust. J. Basic Appl. Sci. 9(23), 566–572 (2015)
Anand Kumar, M., Dhanalakshmi, V., Rekha, R., Soman, K., Rajendran, S.: A novel data driven algorithm for Tamil morphological generator. Int. J. Comput. Appl. 975, 8887 (2010)
Anand Kumar, M., Dhanalakshmi, V., Soman, K., Rajendran, S.: A sequence labeling approach to morphological analyzer for Tamil language. (IJCSE) Int. J. Comput. Sci. Eng. 2(6), 1944–195 (2010)
Anand Kumar, M., Rajendran, S., Soman, K.: Tamil word sense disambiguation using support vector machines with rich features. Int. J. Appl. Eng. Res. 9(20), 7609–20 (2014)
Anand Kumar, M., Singh, S., Ramanan, P., Sinthiya, V., Soman, K., et al.: Creating paraphrase identification corpus for Indian languages: Opensource data set for paraphrase creation. In: Handbook of Research on Emerging Trends and Applications of Machine Learning, pp. 157–170. IGI Global (2020)
Anandan, P., Saravanan, K., Parthasarathi, R., Geetha, T.: Morphological analyzer for Tamil. In: International Conference on Natural language Processing. 3, 12–22 (2002)
Ananth Ramakrishnan, A., Devi, S.L.: An alternate approach towards meaningful lyric generation in Tamil. In: Proceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity, pp. 31–39 (2010)
Ananth Ramakrishnan, A., Kuppan, S., Devi, S.L.: Automatic generation of Tamil lyrics for melodies. In: Proceedings of the workshop on computational approaches to linguistic creativity, pp. 40–46 (2009)
Anbukkarasi, S., Varadhaganapathy, S.: Deep learning based Tamil parts of speech (POS) tagger. Technical Sciences, Bulletin of the Polish Academy of Sciences (2021)
Anbukkarasi, S., Varadhaganapathy, S.: Neural network-based error handler in natural language processing. Neural Comput. Appl., pp. 1–10 (2022)
Aparna, K.G., Ramakrishnan, A.G.: A Complete Tamil Optical Character Recognition System. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 53–57. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_6
Arulmozhi, P., Sobha, L., Kumara Shanmugam, B.: Parts of speech tagger for Tamil. In: Symposium on Indian Morphology, Phonology Language Engineering, pp. 19–21 (2004)
Arulmozhi, S.: Aspects of inflectional morphophonology - a computational approach. Unpublished Ph.D. Thesis (1998)
Arunselvan, S., Anand Kumar, M., Soman, K.: Sentiment analysis of Tamil movie reviews via feature frequency count. Int. J. Appl. Eng. Res. 10(20), 17934–17939 (2015)
Bharathi, B., Agnusimmaculate, A.S.: SSNCSE_NLP@DravidianLangTech-EACL2021: Offensive language identification on multilingual code mixing text. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 313–318. Assoc. Comput. Linguist., Kyiv (2021), https://aclanthology.org/2021.dravidianlangtech-1.45
Banu, M., Karthika, C., Sudarmani, P., Geetha, T.: Tamil document summarization using semantic graph method. In: International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007) 2, pp. 128–134 IEEE (2007)
Baskaran, S.: Semantic analyser for word sense disambiguation. Unpublished MS Thesis (2002)
Bharathi, B., Samyuktha, G.: Machine learning based approach for sentiment analysis on multilingual code mixing text. In: Working Notes of FIRE 2021-Forum for Information Retrieval Evaluation. CEUR (2021)
Bharathi, B., Varsha, J.: Ssncse nlp@ tamilnlp-acl2022: Transformer based approach for detection of abusive comment for Tamil language. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 158–164 (2022)
Chakravarthi, B.R.: Leveraging orthographic information to improve machine translation of under-resourced languages. Ph.D. thesis, NUI Galway (2020)
Chakravarthi, B.R., Arcan, M., McCrae, J.P.: Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages. In: 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs) 70, pp. 61–614. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2019). https://doi.org/10.4230/OASIcs.LDK.2019.6,http://drops.dagstuhl.de/opus/volltexte/2019/10370
Chakravarthi, B.R., et al.: Overview of the HASOC-DravidianCodeMix Shared Task on Offensive Language Detection in Tamil and Malayalam. In: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation. CEUR (2021)
Chakravarthi, B.R., Muralidaran, V., Priyadharshini, R., McCrae, J.P.: Corpus creation for sentiment analysis in code-mixed Tamil-English text. CoRR abs/2006.00206 (2020). https://arxiv.org/abs/2006.00206
Chakravarthi, B.R., Priyadharshini, R., Kumar M, A., Krishnamurthy, P., Sherly, E.: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. Assoc. Comput. Linguist., Kyiv (2021). https://aclanthology.org/2021.dravidianlangtech-1.0
Chakravarthi, B.R., etal.: Findings of the sentiment analysis of dravidian languages in code-mixed text. CoRR abs/2111.09811 (2021), https://arxiv.org/abs/2111.09811
Chakravarthi, B.R., Rani, P., Arcan, M., McCrae, J.P.: A survey of orthographic information in machine translation. arXiv e-prints pp. arXiv-2008 (2020)
Chandrakanth, D., Anand Kumar, M., Gunasekaran, S.: Part-of-speech tagging for Tamil language. Proc. Int. J. Commun. Eng. 6(6), 1 (2012)
Chellamuthu, K.: Russian to Tamil machine translation system at Tamil university. In: Proceedings of Tamil Internet 2002 Conference. http://infitt.org/ti2002/papers/16CHELLA. pdf) (2002)
Chinnuswamy, P., Krishnamoorthy, S.G.: Recognition of handprinted Tamil characters. Pattern Recogn. 12(3), 141–152 (1980)
Cruz, W.: Parsing and generation of Tamil verbs in GSMORPH. Unpublished M.Phil. Dissertation (2002)
Darbari, H., et al.: Enabling linguistic idiosyncrasy in anuvadaksh. Vishwabharat, July-Dec (2013)
Deepa, R.A., Rao, R.R.: A novel nearest interest point classifier for offline Tamil handwritten character recognition. Pattern Anal. Appl. 23(1), 199–212 (2020)
Deivasundaram, N., Gopal, A.: Computational morphology of Tamil. Word Structure in Dravidian, Kuppam: Dravidian University, pp. 406–410 (2003)
Devi, G.R., Kumar, M.A., Soman, K.: Extraction of named entities from social media text in Tamil language using n-gram embedding for disaster management. In: Studies in Computational Intelligence, pp. 207–223 (2020)
Devi, S.L., Pralayankar, P., Menaka, S., Bakiyavathi, T., Ram, R.V.S., Kavitha, V.: Verb transfer in a Tamil to Hindi machine translation system. In: 2010 International Conference on Asian Language Processing, pp. 261–264. IEEE (2010)
Devi, S.L., Ram, V.S., Rao, P.R.: Anaphora resolution system for Indian languages. In: Proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation (WILDRE). LREC2014, Reykjavik, Iceland (2014)
Devi, S.L., Ram, V.S., Rao, P.R.: A generic anaphora resolution engine for Indian languages. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 1824–1833 (2014)
Dhanalakshmi, V., Kumar, A.M., Rajendran, S., Soman, K.: POS tagger and chunker for Tamil language. In: Proceedings of the 8th Tamil Internet Conference. Cologne, Germany (2009)
Dhanalakshmi, V., Kumar, A.M., Soman, K., Rajendran, S.: Chunker for Tamil using machine learning. In: 7th International Conference on Natural Language Processing 2009 (ICON 2009), IIIT Hyderabad, India (2009)
Dhanalakshmi, V., Padmavathy, P., Soman, K., Rajendran, S.: Chunker for Tamil. In: 2009 International Conference on Advances in Recent Technologies in Communication and Computing, pp. 436–438. IEEE (2009)
Dhanalakshmi V, Anand Kumar M, Murugesan, C.: Dependency parser for Tamil classical literature: kurunthokai. In: Proceedings of Tamil Internet Conference, pp. 147–152 (2012)
Dhivya, R., Dhanalakshmi, V., Anand Kumar, M., Soman, K.P.: Clause Boundary Identification for Tamil Language Using Dependency Parsing. In: Das, V.V., Ariwa, E., Rahayu, S.B. (eds.) SPIT 2011. LNICST, vol. 62, pp. 195–197. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32573-1_32
Dhivyaa, C., Nithya, K., Janani, T., Kumar, K.S., Prashanth, N.: Transliteration based generative pre-trained transformer 2 model for Tamil text summarization. In: 2022 International Conference on Computer Communication and Informatics (ICCCI), p. 1–6. IEEE (2022)
Evangeline, M.M., Shyamala, K., Barathi, L., Sandhya, R.: Frequency Based Feature Extraction Technique for Text Documents in Tamil Language. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser, J., Ören, T., Sonawane, V.R. (eds.) ICACDS 2021. CCIS, vol. 1441, pp. 76–84. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88244-0_8
Ezhilarasi, S., Maheswari, P.U.: Depicting a neural model for lemmatization and POS tagging of words from PALAEO graphic stone inscriptions. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1879–1884. IEEE (2021)
Fernando, A., Ranathunga, S., Dias, G.: Data augmentation and terminology integration for domain-specific Sinhala-English-Tamil statistical machine translation. (2020) arXiv preprint arXiv:2011.02821
Ganesan, M.: Functions of the morphological analyser developed at CIIL, Mysore. In: Automatic Automatic Translation (seminar proceedings), Thiruvananthapuram: ISDL (1994)
Ganesan, M.: Computational morphology of Tamil. Word Structure in Dravidian, Kuppam: Dravidian University, pp. 399–405 (2003)
Ganesan, M., Ekka, F.: Morphological analyzer for Indian languages. Information Technology Applications in Language, Script and Speech, New Delhi: BPB Publication (1994)
Ganesh, J., Parthasarathi, R., Geetha, T.V., Balaji, J.: Pattern Based Bootstrapping Technique for Tamil POS Tagging. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds.) MIKE 2014. LNCS (LNAI), vol. 8891, pp. 256–267. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13817-6_25
Ganganwar, V., Rajalakshmi, R.: MTDOT: A multilingual translation-based data augmentation technique for offensive content identification in Tamil text data. Electronics 11(21), 3574 (2022)
HandWiki: Tamil_all_character_encoding (2020)
Hariharan, V., Kumar, M.A., Soman, K.: Named entity recognition in Tamil language using recurrent based sequence model. In: Lecture Notes in Networks and Systems, 74 (2019)
Jain, M., Punia, R., Hooda, I.: Neural machine translation for Tamil to English. J. Stat. Manage. Syst. 23(7), 1251–1264 (2020)
Kalamani, M., Krishnamoorthi, M., Valarmathi, R.: Continuous Tamil speech recognition technique under non stationary noisy environments. Int. J. Speech Technol. 22(1), 47–58 (2019)
Kamakshi, S., Rajendren, S.: Preliminaries to the preparation of a machine aid to translate linguistics texts written in English into Tamil. Language in India 3 (2004)
Kannan, R.R., Rajalakshmi, R., Kumar, L.: Indic-BERT based approach for sentiment analysis on code-mixed Tamil tweets (2021)
Kausikaa, N., Uma, V.: Sentiment analysis of English and Tamil tweets using path length similarity based word sense disambiguation. Int. Organ. Sci. Res. J. 1, 82–89 (2016)
Kavirajan, B., Kumar, M.A., Soman, K., Rajendran, S., Vaithehi, S.: Improving the rule based machine translation system using sentence simplification (English to Tamil). In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 957–963. IEEE (2017)
Kohilavani, S., Mala, T., Geetha, T.: Automatic Tamil content generation. In: 2009 International Conference on Intelligent Agent Multi-Agent Systems, p. 1–6. IEEE (2009)
Krishnamurthy, P.: Development of Telugu-Tamil transfer-based machine translation system: an improvisation using divergence index. J. Intell. Syst. 28(3), 493–504 (2019)
Krishnamurthy, P., Sarveswaran, K.: Towards building a modern written tamil treebank. In: Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021), pp. 61–68 (2021)
Krishnan, A.S., Ragavan, S.: Morphology-aware meta-embeddings for Tamil. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 94–111 (2021)
Krishnan, K.G., Pooja, A., Kumar, M.A., Soman, K.: Character based bidirectional LSTM for disambiguating Tamil part-of-speech categories. Int. J. Control Theory Appl 10, 229–235 (2017)
kumar, A.M., Soman, K.: Amrita_cen@ fire-2014: morpheme extraction and lemmatization for Tamil using machine learning. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 112–120 (2014)
Kumar, M.A., Dhanalakshmi, V., Soman, K., Rajendran, S.: Factored statistical machine translation system for English to Tamil language. Pertanika J. Soc. Sci. Humanit. 22(4) (2014)
Kumar, M.A., Premjith, B., Singh, S., Rajendran, S., Soman, K.P.: An overview of the shared task on machine translation in Indian languages (MTIL)–2017. Journal of Intelligent Systems 28(3), 455–464 (2019). https://doi.org/10.1515/jisys-2018-0024https://doi.org/10.1515/jisys-2018-0024
Kumar, M.A., Premjith, B., Singh, S., Rajendran, S., Soman, K.: An overview of the shared task on machine translation in Indian languages (MTIL)-2017. J. Intell. Syst. 28(3), 455–464 (2019)
Kumar, M.A., Rajendran, S., Soman, K.: Cross-lingual preposition disambiguation for machine translation. Procedia Comput. Sci. 54, 291–300 (2015)
Kumar, M.A., Rajendran, S., Soman, K.: Cross-lingual preposition disambiguation for machine translation. Procedia Comput. Sci. 54, 291–300 (2015)
Anand Kumar, M., Singh, S., Kavirajan, B., Soman, K.P.: Shared Task on Detecting Paraphrases in Indian Languages (DPIL): An Overview. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds.) FIRE 2016. LNCS, vol. 10478, pp. 128–140. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73606-8_10
LekshmiAmmal, H., Ravikiran, M., et al.: Nitk-it_nlp@ tamilnlp-acl2022: Transformer based model for toxic span identification in Tamil. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 75–78 (2022)
Lokesh, S., Kumar, P.M., Devi, M.R., Parthasarathy, P., Gokulnath, C.: An automatic Tamil speech recognition system by using bidirectional recurrent neural network with self-organizing map. Neural Comput. Appl. 31(5), 1521–1531 (2019)
Lushanthan, S., Weerasinghe, A., Herath, D.: Morphological analyzer and generator for tamil language. In: 2014 14th International Conference on Advances in ICT for Emerging Regions (ICTER), pp. 190–196. IEEE (2014)
Anandkumar, M.: Morphology based prototype statistical machine translation system for English to Tamil language. Unpublished PhD Thesis (2013)
Malarkodi, C., Lex, E., Devi, S.L.: Named entity recognition for the agricultural domain. Res. Comput. Sci. 117, 121–132 (2016)
Malarkodi, C., Sobha, L.: Twitter named entity recognition for Indian languages. In: Proceedings of 18th International Conference on Computational Linguistics and Intelligent Text Processing (2018)
Manone, V., Soman, K., Rajendran, S.: A synchronous syntax for English-Tamil language pair for machine translation. In: 4th International Symposium on Natural Language Processing (NLP’15), Kochi, Kerala, Co-affiliated with 4th International Conference in Computing, Communications and Informatics (ICACCI-2015) (2015)
Marimuthu, K., Amudha, K., Bakiyavathi, T., Devi, S.L.: Word boundary identifier as a catalyzer and performance booster for Tamil morphological analyzer. In: Proceedings of 6th Language and Technology Conference, Human Language Technologies as a challenge for Computer Science and Linguistics, Poznan, Poland. (2013)
Menaka, S., Malarkodi, C., Devi, S.L.: A deep study on causal relations and its automatic identification in tamil. In: Proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation. LREC2014, Reykjavik, Iceland (2014)
Menaka, S., Ram, V.S., Devi, S.L.: Morphological generator for Tamil. Proceedings of the Knowledge Sharing event on Morphological Analysers and Generators, LDC-IL, Mysore, India, pp. 82–96 (2010)
Menon, D.A., Saravanan, S., Loganathan, R., Soman, D.K.: Amrita morph analyzer and generator for Tamil: a rule based approach. In: Proceedings of Tamil Internet Conference, pp. 239–243 (2009)
Mokanarangan, T., et al.: Tamil Morphological Analyzer Using Support Vector Machines. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 15–23. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41754-7_2
Mrinalini, K., Nagarajan, T., Vijayalakshmi, P.: Pause-based phrase extraction and effective OOV handling for low-resource machine translation systems. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18(2), 1–22 (2018)
Padmamala, R., Prema, V.: Sentiment analysis of online Tamil contents using recursive neural network models approach for Tamil language. In: 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), pp. 28–31. IEEE (2017)
Pandian, S.L., Geetha, T. V.: CRF Models for Tamil Part of Speech Tagging and Chunking. In: Li, W., Mollá-Aliod, D. (eds.) ICCPOL 2009. LNCS (LNAI), vol. 5459, pp. 11–22. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00831-3_2
Pattabhi, R., Rao, T., Ram, R.V.S., Vijayakrishna, R., Sobha, L.: A text chunker and hybrid pos tagger for indian languages. In: Proceedings of International Joint Conference on Artificial Intelligence Workshop on Shallow Parsing for South Asian Languages, IIIT Hyderabad, Hyderabad, India (2007)
Pattabhi, R., Sobha, L.: Identifying similar and co-referring documents across languages. In: Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies, pp. 10–17 (2008)
Pilar, B., et al.: Subword dictionary learning and segmentation techniques for automatic speech recognition in Tamil and Kannada. (2022) arXiv preprint arXiv:2207.13331
Premjith, B., Soman, K.: Deep learning approach for the morphological synthesis in Malayalam and Tamil at the character level. Trans. Asian Low-Resource Lang. Inf. Proc. 20(6), 1–17 (2021)
Priyadharshini, R., et al.: Overview of abusive comment detection in Tamil-ACL 2022. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 292–298 (2022)
Raj, M.A.R., Abirami, S.: Junction point elimination based Tamil handwritten character recognition: An experimental analysis. J. Syst. Sci. Syst. Eng. 29(1), 100–123 (2020)
Raj, M.A.R., Abirami, S.: Structural representation-based off-line Tamil handwritten character recognition. Soft. Comput. 24(2), 1447–1472 (2020)
Rajalakshmi, R., Duraphe, A., Shibani, A.: Dlrg@ dravidianlangtech-acl2022: Abusive comment detection in Tamil using multilingual transformer models. In: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pp. 207–213 (2022)
Rajalakshmi, R., Reddy, Y., Kumar, L.: Dlrg@ dravidianlangtech-eacl2021: Transformer based approachfor offensive language identification on code-mixed Tamil. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 357–362 (2021)
Rajalakshmi, R., Selvaraj, S., Vasudevan, P., et al.: Hottest: Hate and offensive content identification in Tamil using transformers and enhanced stemming. Computer Speech Language, p. 101464 (2022)
Rajasekar, M., Geetha, A.: Comparison of Machine Learning Methods for Tamil Morphological Analyzer. In: Raj, J.S., Palanisamy, R., Perikos, I., Shi, Y. (eds.) Intelligent Sustainable Systems. LNNS, vol. 213, pp. 385–399. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2422-3_31
Rajendran, S.: Spell and grammar checker for tamil. In: Paper read in 27th All India Conference of Dravidian Linguists held in ISDL, Thiruvananthapuram. 17 (1999)
Rajendran, S.: Preliminaries to the preparation of a word net for Tamil. Lang. India 2(1), 467–497 (2002)
Rajendran, S.: Parsing in Tamil: Present state of art. Lang. India 6, 8 (2006)
Rajendran, S.: Complexity of Tamil in POS tagging. Lang. India 7(1) (2007)
Rajendran, S.: Resolution of lexical ambiguity in Tamil. Lang. India 14(1) (2014)
Rajendran, S., Kumar, M.A.: Computing tools for Tamil language teaching and learning. In: 17th Tamil Internet Conference. Tamil Agricultural University, Coimbatore (2018)
Rajendran, S., Viswanathan, S., Kumar, R.: Computational morphology of Tamil verbal complex. Lang. India 3(4) (2003)
Rajkumar, N., Subashini, T., Rajan, K., Ramalingam, V.: An efficient feature extraction with bidirectional long short term memory based deep learning model for Tamil document classification. J. Comput. Theor. Nanosci. 18(3), 568–585 (2021)
Ram, R.V.S., Lalitha Devi, S.: Clause Boundary Identification Using Conditional Random Fields. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 140–150. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78135-6_13
Ram, R.V.S., Devi, S.L.: Coreference resolution using tree-CRF. A. Gelbukh (ed), Comput. Linguist. Intell. Text Proc. 7181, 285–296 (2012)
Ram, R.V.S., Devi, S.L.: Pronominal resolution in Tamil using tree CRFS. In: 2013 International Conference on Asian Language Processing, pp. 197–200. IEEE (2013)
Ram, R.V.S., Devi, S.L.: Two layer machine learning approach for mining referential entities for a morphologically rich language. Asian J. Inf. Technol. 15, 2831–2838 (2016)
Ram, R.V.S., Sobha, L.D.: Tamil clause boundary identification: Annotation and evaluation. In: Workshop on Indian Language and Data: Resources and Evaluation. p. 122. LREC, Istanbul (2012)
Ram, R., Devi, S.L.: Noun phrase chunker using finite state automata for an agglutinative language. In: Proceedings of the Tamil Internet-2010 at Coimbatore, India, pp. 23–27 (2010)
Ram, V.S., Menaka, S., Devi, S.L.: Tamil morphological analyser. In: Proceedings of the Knowledge Sharing event on Morphological Analysers and Generators, Mona Parakh, LDC-IL, Mysore, India, pp. 1–18 (2010)
Ramakrishnan, A., Kaushik, L.N., Narayana, L.: Natural language processing for Tamil TTS. In: Proc. 3rd Language and Technology Conference, Poznan, Poland, pp. 192–196 (2007)
Ramanathan, V., Meyyappan, T., Thamarai, S.: Predicting Tamil movies sentimental reviews using Tamil tweets. J. Comput. Sci. 15(11), 1638–1647 (2019)
Ramanathan, V., Meyyappan, T., Thamarai, S.: Sentiment analysis: an approach for analysing tamil movie reviews using Tamil tweets. Recent Adv. Mathe. Res. Comput. Sci. 3, 28–39 (2021)
Ramasamy, L., Bojar, O., Žabokrtskỳ, Z.: Morphological processing for English-Tamil statistical machine translation. In: Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages, pp. 113–122 (2012)
Ramasamy, L., Bojar, O., Žabokrtskỳ, Z.: ENTAM: An English-Tamil parallel corpus (ENTAM v2. 0) (2014)
Ramaswamy, V.: A morphological generator for Tamil. Unpublished Ph.D. Dissertation (2000)
Ramaswamy, V.: A morphological analyzer for Tamil. Unpublished Ph.D. Dissertation (2003)
Ranganathan, V.: A lexical phonological approach to Tamil word by computer. Int. J. Dravidian Linguist. 26(1), 57–70 (1997)
Ranganathan, V.: Computational Approaches To Tamil Linguistics, chap. 3. CRE-A Publications (2016)
Ravikiran, M., Annamalai, S.: DOSA: dravidian code-mixed offensive span identification dataset. In: Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages, pp. 10–17. Assoc. Comput. Linguist., Kyiv (2021). https://aclanthology.org/2021.dravidianlangtech-1.2
Ravikiran, M., et al.: Findings of the shared task on toxic span identification in Tamil. In: Proceedings of the 2nd Workshop on Speech and Language Technologies for Dravidian Languages. Assoc. Comput. Linguist. (2022)
Remmiya Devi, G., Anand Kumar, M., Soman, K.: Co-occurrence based word representation for extracting named entities in Tamil tweets. J. Intell. Fuzzy Syst. 34(3), 1435–1442 (2018)
Rethanya. V, Dhanalakshmi, V., Soman, M., Rajendran, S.: Morphological stemmer and LEMMATIZER for Tamil. In: Proceedings of 18th Tamil Internet Conference. International Forum for Information Technology in Tamil (INFITT) (2019)
RK Rao, P., Devi, S.L.: Patent document summarization using conceptual graphs. Int. J. Nat. Lang. Comput. (IJNLC) 6 (2017)
Sakuntharaj, R., Mahesan, S.: Missing word detection and correction based on context of tamil sentences using n-grams. In: 2021 10th International Conference on Information and Automation for Sustainability (ICIAfS), pp. 42–47. IEEE (2021)
Samuel Manoharan, J.: A novel text-to-speech synthesis system using syllable-based hmm for Tamil language. In: Shakya, S., Du, K.L., Haoxiang, W. (eds.) Proceedings of Second International Conference on Sustainable Expert Systems, pp. 305–314. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-7657-4_26
Sankaralingam, C., Rajendran, S., Kavirajan, B., Kumar, M.A., Soman, K.: Onto-thesaurus for Tamil language: Ontology based intelligent system for information retrieval. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2396–2396. IEEE (2017)
Santosh Kumar, T.: Word sense disambiguation using semantic web for Tamil to English statistical machine translation. IRA-Int. J. Technol. Eng. 5(2), 22–31 (2016)
Sarika, M., et al.: Comparative analysis of Tamil and English news text summarization using text rank algorithm. Turkish J. Comput. Mathe. Educ. (TURCOMAT) 12(9), 2385–2391 (2021)
Sarveswaran, K., Dias, G.: THAMIZHIUDP: A dependency parser for Tamil. (2020) arXiv preprint arXiv:2012.13436
Sarveswaran, K., Dias, G.: Building a part of speech tagger for the Tamil language. In: 2021 International Conference on Asian Language Processing (IALP), pp. 286–291 IEEE (2021)
Sarveswaran, K., Dias, G., Butt, M.: Thamizhifst: A morphological analyser and generator for Tamil verbs. In: 2018 3rd International Conference on Information Technology Research (ICITR). pp. 1–6. IEEE (2018)
Sarveswaran, K., Dias, G., Butt, M.: THAMIZHIMORPH: a morphological parser for the Tamil language. Mach. Transl. 35(1), 37–70 (2021)
Selvi, S.S., Anitha, R.: J. Intell. Fuzzy Syst. (Bilingual corpus-based hybrid POS tagger for low resource Tamil language: A statistical approach), 1–20 (2022)
Sivasankar, E., Krishnakumari, K., Balasubramanian, P.: An enhanced sentiment dictionary for domain adaptation with multi-domain dataset in Tamil language (ESD-da). Soft. Comput. 25(5), 3697–3711 (2021)
Sobha, L.: Pronominal resolution in south dravidian languages. 23rd South Asian Language Analysis, University of Texas, Austin 446 (2003)
Sridhar, R., Janani, V., Gowrisankar, R., Monica, G.: Language relationship model for automatic generation of Tamil stories from hints. Int. J. Intell. Inf. Technol. (IJIIT) 13(2), 21–40 (2017)
Subramoniam, V., Bhattacharya, M., Lohy, A., Tarai, S.: Speech synthesis (Tamil oriya): an application for the blind. Department of Science and Technology, Govt. of India III.5(35) 2001-ET (2001)
Suriyah, M., Anandan, A., Narasimhan, A., Karky, M.: Piripori: morphological analyser for tamil. In: International Conference On Artificial Intelligence, Smart Grid And Smart City Applications. pp. 801–809. Springer (2019) https://doi.org/10.1007/978-3-030-24051-6_75
Thangarajan, R., Natarajan, A.: Syllable based continuous speech recognition for Tamil. South Asian lang. rev. 18(1), 72–85 (2008)
Thangarajan, R., Natarajan, A., Selvam, M.: Word and triphone based approaches in continuous speech recognition for Tamil language. WSEAS Trans. Signal Proc. 4(3), 76–86 (2008)
Thangarasu, M., Manavalan, R.: Stemmers for Tamil language: performance analysis. (2013) arXiv preprint arXiv:1310.0754
Thenmozhi, D., Aravindan, C.: Ontology-based Tamil-English cross-lingual information retrieval system. Sadhana - Academy Proc. Eng. Sci. 43(10), 1–14 (2018)
Vasantharajan, C., Thayasivam, U.: Towards offensive language identification for Tamil code-mixed YouTube comments and posts. SN Computer Science 3(1), 1–13 (2022)
Vel, S.S.: Pre-processing techniques of text mining using computational linguistics and python libraries. In: 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS). pp. 879–884. IEEE (2021)
Vignesh, N., Sowmya, S.: Automatic question generator in Tamil. International J. Eng. Res. Technol. (IJERT) 2 (2013)
Vijayakrishna, R., Sobha, L.: Domain focused named entity recognizer for tamil using conditional random fields. In: Proceedings of the IJCNLP-08 workshop on named entity recognition for South and South East Asian Languages (2008)
Visuwalingam, H., Sakuntharaj, R., Ragel, R.G.: Part of speech tagging for Tamil language using deep learning. In: 2021 IEEE 16th International Conference on Industrial and Information Systems (ICIIS), pp. 157–161 IEEE (2021)
Viswanathan, S.: Tamil morphological analyser. Unpublished MS Thesis (2000)
Viswanathan, S., Ramesh Kumar, S., Kumara Shanmugam, B., Arulmozi, S., Vijay Shanker, K.: A tamil morphological analyser. In: Proceedings of the International Conference on Natural Language Processing (ICON), CIIL, Mysore, India (2003)
Zhang, H., Shi, K., Chen, N.F.: Multilingual speech evaluation: Case studies on English, Malay and Tamil. (2021) arXiv preprint arXiv:2107.03675
Acknowledgments
This paper is based on the White paper designed on the META-NET. We express our gratitude to Dr. George Rehm (Head of the META-NET), Prof. Joseph Mariani and Prof.Girish Nath Jha from School of Sanskrit and Indic Studies for the white paper development through close interactions. The complete resource links are available in the Tamil data portal, https://sites.google.com/view/tamilnlp.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rajendran, S., Anand Kumar, M., Rajalakshmi, R., Dhanalakshmi, V., Balasubramanian, P., Soman, K.P. (2023). Tamil NLP Technologies: Challenges, State of the Art, Trends and Future Scope. In: M, A.K., et al. Speech and Language Technologies for Low-Resource Languages . SPELLL 2022. Communications in Computer and Information Science, vol 1802. Springer, Cham. https://doi.org/10.1007/978-3-031-33231-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-33231-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33230-2
Online ISBN: 978-3-031-33231-9
eBook Packages: Computer ScienceComputer Science (R0)