Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

Korzeniowski, Renard; Rolczyński, Rafał; Sadownik, Przemysław; Korbak, Tomasz; Możejko, Marcin

Computer Science > Computation and Language

arXiv:1906.09325 (cs)

[Submitted on 17 Jun 2019]

Title:Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

Authors:Renard Korzeniowski, Rafał Rolczyński, Przemysław Sadownik, Tomasz Korbak, Marcin Możejko

View PDF

Abstract:This paper presents our contribution to PolEval 2019 Task 6: Hate speech and bullying detection. We describe three parallel approaches that we followed: fine-tuning a pre-trained ULMFiT model to our classification task, fine-tuning a pre-trained BERT model to our classification task, and using the TPOT library to find the optimal pipeline. We present results achieved by these three tools and review their advantages and disadvantages in terms of user experience. Our team placed second in subtask 2 with a shallow model found by TPOT: a~logistic regression classifier with non-trivial feature engineering.

Comments:	this http URL
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1906.09325 [cs.CL]
	(or arXiv:1906.09325v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1906.09325
Journal reference:	Proceedings of the PolEval 2019 Workshop

Submission history

From: Tomek Korbak [view email]
[v1] Mon, 17 Jun 2019 13:11:26 UTC (9 KB)

Computer Science > Computation and Language

Title:Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators