Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set

Ionescu, Radu Tudor; Butnaru, Andrei M.

Computer Science > Computation and Language

arXiv:1808.08409 (cs)

[Submitted on 25 Aug 2018 (v1), last revised 31 Aug 2018 (this version, v2)]

Title:Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set

Authors:Radu Tudor Ionescu, Andrei M. Butnaru

View PDF

Abstract:Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as Arabic dialect identification or native language identification. In this paper, we apply two simple yet effective transductive learning approaches to further improve the results of string kernels. The first approach is based on interpreting the pairwise string kernel similarities between samples in the training set and samples in the test set as features. Our second approach is a simple self-training method based on two learning iterations. In the first iteration, a classifier is trained on the training set and tested on the test set, as usual. In the second iteration, a number of test samples (to which the classifier associated higher confidence scores) are added to the training set for another round of training. However, the ground-truth labels of the added test samples are not necessary. Instead, we use the labels predicted by the classifier in the first training iteration. By adapting string kernels to the test set, we report significantly better accuracy rates in English polarity classification and Arabic dialect identification.

Comments:	Accepted at EMNLP 2018
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1808.08409 [cs.CL]
	(or arXiv:1808.08409v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1808.08409

Submission history

From: Radu Tudor Ionescu [view email]
[v1] Sat, 25 Aug 2018 11:08:28 UTC (1,013 KB)
[v2] Fri, 31 Aug 2018 13:17:00 UTC (1,013 KB)

Computer Science > Computation and Language

Title:Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators