Machine translation evaluation versus quality estimation

Specia, Lucia; Raj, Dhwaj; Turchi, Marco

doi:10.1007/s10590-010-9077-2

Machine translation evaluation versus quality estimation

Published: 14 May 2010

Volume 24, pages 39–50, (2010)
Cite this article

Machine Translation

Lucia Specia¹,
Dhwaj Raj² &
Marco Turchi³

2014 Accesses
38 Citations
3 Altmetric
Explore all metrics

Abstract

Most evaluation metrics for machine translation (MT) require reference translations for each sentence in order to produce a score reflecting certain aspects of its quality. The de facto metrics, BLEU and NIST, are known to have good correlation with human evaluation at the corpus level, but this is not the case at the segment level. As an attempt to overcome these two limitations, we address the problem of evaluating the quality of MT as a prediction task, where reference-independent features are extracted from the input sentences and their translation, and a quality score is obtained based on models produced from training data. We show that this approach yields better correlation with human evaluation as compared to commonly used metrics, even with models trained on different MT systems, language-pairs and text domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Translation Evaluation: Manual Versus Automatic—A Comparative Study

Machine Translation Quality Estimation: Applications and Future Perspectives

Identification of Relevant and Redundant Automatic Metrics for MT Evaluation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Albrecht J, Hwa R (2007a) A re-examination of machine learning approaches for sentence-level MT evaluation. In: 45th meeting of the association for computational linguistics, Prague, pp 880–887
Albrecht J, Hwa R (2007b) Regression for sentence-level MT evaluation with pseudo references. In: 45th meeting of the association for computational linguistics, Prague, pp 296–303
Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2003) Confidence estimation for machine translation. Technical report. Johns Hopkins University, Baltimore
Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: 20th coling, Geneva, pp 315–321
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: 3rd workshop on statistical machine translation, Columbus, pp 70–106
Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: 4th workshop on statistical machine translation, Athens, pp 1–28
Chang C, Lin C (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1): 37–46
Article Google Scholar
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Conference on human language technology, San Diego, pp 138–145
Gamon M, Aue A, Smets M (2005) Sentence-level MT evaluation without reference translations: beyond language modeling. In: 10th meeting of the European association for machine translation, Budapest
Gandrabur S, Foster G (2003) Confidence estimation for translation prediction. In: 7th conference on natural language learning, Edmonton, pp 95–102
Gimenez J, Marquez L (2008) A smorgasbord of features for automatic MT evaluation. In: 3rd workshop on statistical machine translation, Columbus, OH, pp 195–198
Joachims T (1999) Making large-scale SVM learning practical. In: Schoelkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods—support vector learning. MIT Press, Cambridge
Google Scholar
Johnson H, Sadat F, Foster G, Kuhn R, Simard M, Joanis E, Larkin S (2006) Portage: with smoothed phrase tables and segment choice models. In: Workshop on statistical machine translation, New York, pp 134–137
Kääriäinen M (2009) Sinuhe—statistical machine translation using a globally trained conditional exponential family translation model. In: Conference on empirical methods in natural language processing, Singapore, pp 1027–1036
Kadri Y, Nie JY (2006) Improving query translation with confidence estimation for cross language information retrieval. In: 15th ACM international conference on information and knowledge management, Arlington, pp 818–819
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Conference on empirical methods in natural language processing, Barcelona
Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: 2nd workshop on statistical machine translation, Prague, Czech Republic, pp 228–231
Lin CY, Och FJ (2004) ORANGE: a method for evaluating automatic evaluation metrics for machine translation. In: Coling-2004, Geneva, pp 501–507
Pado S, Galley M, Jurafsky D, Manning CD (2009) Textual entailment features for machine translation evaluation. In: 4th workshop on statistical machine translation, Athens, pp 37–41
Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th meeting of the association for computational linguistics, Morristown, pp 311–318
Quirk CB (2004) Training a sentence-level machine translation confidence measure. In: 4th language resources and evaluation conference, Lisbon, pp 825–828
Saunders C (2008) Application of Markov approaches to SMT. Technical report. SMART Project Deliverable 2.2
Simard M, Cancedda N, Cavestro B, Dymetman M, Gaussier E, Goutte C, Yamada K (2005) Translating with non-contiguous phrases. In: Conference on empirical methods in natural language processing, Vancouver, pp 755–762
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Conference of the 7th association for machine translation in the Americas, Cambridge, MA, pp 223–231
Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009) Estimating the sentence-level quality of machine translation systems. In: 13th meeting of the European association for machine translation, Barcelona
Ueffing N, Ney H (2005) Application of word-level confidence measures in interactive statistical machine translation. In: 10th meeting of the European association for machine translation, Budapest, pp 262–270

Download references

Author information

Authors and Affiliations

Research Group in Computational Linguistics, University of Wolverhampton, Wolverhampton, UK
Lucia Specia
Indian Institute of Information Technology, Allahabad, India
Dhwaj Raj
European Commission – JRC (IPSC), T.P. 267, 21020, Ispra, Italy
Marco Turchi

Authors

Lucia Specia
View author publications
You can also search for this author in PubMed Google Scholar
Dhwaj Raj
View author publications
You can also search for this author in PubMed Google Scholar
Marco Turchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucia Specia.

Additional information

Lucia Specia—Work developed while working at the Xerox Research Centre Europe, France.

Dhwaj Raj—Work developed during an internship at the Xerox Research Centre Europe, France.

Marco Turchi—Work developed while working at the Department of Engineering Mathematics, University of Bristol, UK.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Specia, L., Raj, D. & Turchi, M. Machine translation evaluation versus quality estimation. Machine Translation 24, 39–50 (2010). https://doi.org/10.1007/s10590-010-9077-2

Download citation

Received: 15 May 2009
Accepted: 21 April 2010
Published: 14 May 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10590-010-9077-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine translation evaluation versus quality estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Machine Translation Evaluation: Manual Versus Automatic—A Comparative Study

Machine Translation Quality Estimation: Applications and Future Perspectives

Identification of Relevant and Redundant Automatic Metrics for MT Evaluation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Machine translation evaluation versus quality estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Machine Translation Evaluation: Manual Versus Automatic—A Comparative Study

Machine Translation Quality Estimation: Applications and Future Perspectives

Identification of Relevant and Redundant Automatic Metrics for MT Evaluation

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation