Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi

Das, Mithun; Pandey, Saurabh Kumar; Sethi, Shivansh; Saha, Punyajoy; Mukherjee, Animesh

Computer Science > Computation and Language

arXiv:2402.07262 (cs)

[Submitted on 11 Feb 2024]

Title:Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi

Authors:Mithun Das, Saurabh Kumar Pandey, Shivansh Sethi, Punyajoy Saha, Animesh Mukherjee

View PDF

Abstract:With the rise of online abuse, the NLP community has begun investigating the use of neural architectures to generate counterspeech that can "counter" the vicious tone of such abusive speech and dilute/ameliorate their rippling effect over the social network. However, most of the efforts so far have been primarily focused on English. To bridge the gap for low-resource languages such as Bengali and Hindi, we create a benchmark dataset of 5,062 abusive speech/counterspeech pairs, of which 2,460 pairs are in Bengali and 2,602 pairs are in Hindi. We implement several baseline models considering various interlingual transfer mechanisms with different configurations to generate suitable counterspeech to set up an effective benchmark. We observe that the monolingual setup yields the best performance. Further, using synthetic transfer, language models can generate counterspeech to some extent; specifically, we notice that transferability is better when languages belong to the same language family.

Comments:	Accepted to the Findings of the ACL: EACL 2024
Subjects:	Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2402.07262 [cs.CL]
	(or arXiv:2402.07262v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.07262

Submission history

From: Mithun Das [view email]
[v1] Sun, 11 Feb 2024 18:09:50 UTC (4,682 KB)

Computer Science > Computation and Language

Title:Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators