Meta-evaluation of comparability metrics using parallel corpora

Babych, Bogdan; Hartley, Anthony

Computer Science > Computation and Language

arXiv:1404.3759 (cs)

[Submitted on 14 Apr 2014]

Title:Meta-evaluation of comparability metrics using parallel corpora

Authors:Bogdan Babych, Anthony Hartley

View PDF

Abstract:Metrics for measuring the comparability of corpora or texts need to be developed and evaluated systematically. Applications based on a corpus, such as training Statistical MT systems in specialised narrow domains, require finding a reasonable balance between the size of the corpus and its consistency, with controlled and benchmarked levels of comparability for any newly added sections. In this article we propose a method that can meta-evaluate comparability metrics by calculating monolingual comparability scores separately on the 'source' and 'target' sides of parallel corpora. The range of scores on the source side is then correlated (using Pearson's r coefficient) with the range of 'target' scores; the higher the correlation - the more reliable is the metric. The intuition is that a good metric should yield the same distance between different domains in different languages. Our method gives consistent results for the same metrics on different data sets, which indicates that it is reliable and can be used for metric comparison or for optimising settings of parametrised metrics.

Comments:	10 pages, 3 figures, 12th International Conference on Intelligent Text Processing and Computational Linguistics CICLing 2011. February 20 to 26, 2011, Tokyo, Japan. International Journal of Computational Linguistics and Applications, Proceedings volume of CICLing-2011
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1404.3759 [cs.CL]
	(or arXiv:1404.3759v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1404.3759

Submission history

From: Bogdan Babych [view email]
[v1] Mon, 14 Apr 2014 21:33:42 UTC (227 KB)

Computer Science > Computation and Language

Title:Meta-evaluation of comparability metrics using parallel corpora

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Meta-evaluation of comparability metrics using parallel corpora

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators