Comparing Approaches to Dravidian Language Identification

Jauhiainen, Tommi; Ranasinghe, Tharindu; Zampieri, Marcos

Computer Science > Computation and Language

arXiv:2103.05552 (cs)

[Submitted on 9 Mar 2021]

Title:Comparing Approaches to Dravidian Language Identification

Authors:Tommi Jauhiainen, Tharindu Ranasinghe, Marcos Zampieri

View PDF

Abstract:This paper describes the submissions by team HWR to the Dravidian Language Identification (DLI) shared task organized at VarDial 2021 workshop. The DLI training set includes 16,674 YouTube comments written in Roman script containing code-mixed text with English and one of the three South Dravidian languages: Kannada, Malayalam, and Tamil. We submitted results generated using two models, a Naive Bayes classifier with adaptive language models, which has shown to obtain competitive performance in many language and dialect identification tasks, and a transformer-based model which is widely regarded as the state-of-the-art in a number of NLP tasks. Our first submission was sent in the closed submission track using only the training set provided by the shared task organisers, whereas the second submission is considered to be open as it used a pretrained model trained with external data. Our team attained shared second position in the shared task with the submission based on Naive Bayes. Our results reinforce the idea that deep learning methods are not as competitive in language identification related tasks as they are in many other text classification tasks.

Comments:	Accepted to VarDial 2021 @ EACL 2021
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2103.05552 [cs.CL]
	(or arXiv:2103.05552v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2103.05552

Submission history

From: Tharindu Ranasinghe Mr [view email]
[v1] Tue, 9 Mar 2021 16:58:55 UTC (44 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-03

Change to browse by:

cs
cs.AI
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Tommi Jauhiainen
Tharindu Ranasinghe
Marcos Zampieri

export BibTeX citation

Computer Science > Computation and Language

Title:Comparing Approaches to Dravidian Language Identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Comparing Approaches to Dravidian Language Identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators