article

Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Authors:

Sebastian Padó,

Christopher D. ManningAuthors Info & Claims

Machine Translation, Volume 23, Issue 2-3

Pages 181 - 193

https://doi.org/10.1007/s10590-009-9060-y

Published: 01 September 2009 Publication History

Abstract

Current evaluation metrics for machine translation have increasing difficulty in distinguishing good from merely fair translations. We believe the main problem to be their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that assesses the quality of MT output through its semantic equivalence to the reference translation, based on a rich set of match and mismatch features motivated by textual entailment. We first evaluate this metric in an evaluation setting against a combination metric of four state-of-the-art scores. Our metric predicts human judgments better than the combination metric. Combining the entailment and traditional features yields further improvements. Then, we demonstrate that the entailment metric can also be used as learning criterion in minimum error rate training (MERT) to improve parameter estimation in MT system training. A manual evaluation of the resulting translations indicates that the new model obtains a significant improvement in translation quality.

References

[1]

Amigó E, Giménez J, Gonzalo J, Màrquez L (2006) MT evaluation: human-like vs. human acceptable. In: Proceedings of COLING/ACL 2006, pp 17-24.

[2]

Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on evaluation measures, pp 65-72.

[3]

Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the ACL workshop on statistical machine translation, pp 70-106.

[4]

Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: Proceedings of EACL. pp 249-256.

[5]

Cer D, Jurafsky D, Manning CD (2008) Regularization and search for minimum error rate training. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 26-34.

[6]

Chan YS, Ng HT (2008) MAXSIM: a maximum similarity metric for machine translation evaluation. In: Proceedings of ACL-08/HLT, pp 55-62.

[7]

Dagan I, Glickman O, Magnini B (2005) The PASCAL recognising textual entailment challenge. In: Proceedings of the PASCAL RTE workshop, pp 177-190.

[8]

de Marneffe M-C, Grenager T, MacCartney B, Cer D, Ramage D, Kiddon C, Manning CD (2007) Aligning semantic graphs for textual inference and machine reading. In: Proceedings of the AAAI spring symposium on machine reading, pp 36-42.

[9]

de Marneffe M-C, MacCartney B, Manning CD (2006) Generating typed dependency parses from phrase structure parses. In: Fifth international conference on language resources and evaluation (LREC 2006), pp 449-454.

[10]

Doddington G (2002) Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Proceedings of HLT, pp 128-132.

[11]

Fabrigar LR, Krosnick JA, MacDougall BL (2005) Attitude measurement: techniques for measuring the unobservable. In: Brock T, GreenM(eds) Persuasion: psychological insights and perspectives, Chap 2. 2nd edn. Sage, Thousand Oaks.

[12]

Giménez J, Márquez L (2008) Heterogeneous automatic MT evaluation through non-parametric metric combinations. In: Proceedings of IJCNLP, pp 319-326.

[13]

Hoang H, Birch A, Callison-Burch C, Zens R, Aachen R, Constantin A, Federico M, Bertoldi N, Dyer C, Cowan B, Shen W, Moran C, Bojar O (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, pp 177-180.

[14]

Kauchak D, Barzilay R (2006) Paraphrasing for automatic evaluation. In: Proceedings of HLT-NAACL, pp 455-462.

[15]

Koehn P, Och F, Marcu D (2003) Statistical Phrase-Based Translation. In: Proceedings of HLT-NAACL. pp 127-133.

[16]

Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140):1-55.

[17]

Lin C-Y, Och FJ (2004) ORANGE: a method for evaluating automatic evaluation metrics for machine translation. In: Proceedings of COLING. pp. 501-507.

[18]

Lin D (1998) Extracting collocations from text corpora. In: First workshop on computational terminology, pp 57-63.

[19]

Liu D, Gildea D (2005) Syntactic features for evaluation of machine translation. In: Proceedings of the ACL workshop on evaluation measures, pp 25-32.

[20]

MacCartney B, Grenager T, de Marneffe M-C, Cer D, Manning CD (2006) Learning to recognize features of valid textual entailments. In: Proceedings of NAACL, pp 41-48.

[21]

Miller GA, Beckwith R, Fellbaum C, Gross D, Miller K (1990) WordNet: an on-line lexical database. Int J Lexicogr 3:235-244.

[22]

Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of ACL, pp 160-167.

[23]

Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19-51.

Digital Library

[24]

Owczarzak K, vanGenabith J,WayA (2008) Evaluatingmachine translationwith LFG dependencies.Mach Transl 21(2):95-119.

[25]

Padó S, Galley M, Jurafsky D, Manning C (2009) Textual entailment features for machine translation evaluation. In: Proceedings of the EACL workshop on machine translation, pp 37-41.

[26]

Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp 311-318.

[27]

Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of AMTA, pp 223-231.

[28]

Snow R, O'Connor B, Jurafsky D, Ng A (2008) Cheap and fast--but is it good? evaluating nonexpert annotations for natural language tasks. In: Proceedings of EMNLP, pp 254-263.

[29]

Stolcke A (2002) SRILM--an extensible language modeling toolkit. In: Proceedings of the international conference on spoken language processing, pp 901-904.

[30]

Takayama Y, Flournoy R, Kaufmann S, Peters S (1999) Information retrieval based on domain-specific word associations. In: Proceedings of PACLING, pp 155-161.

[31]

Tseng H, Chang P-C, Andrew G, Jurafsky D, Manning C (2005) A conditional random field word segmenter for the SIGHAN bakeoff 2005. In: Proceedings of the SIGHAN workshop on chinese language processing, pp 32-39.

[32]

Zhou L, Lin C-Y, Hovy E (2006) Re-evaluating machine translation results with paraphrase support. In: Proceedings of EMNLP, pp 77-84.

Cited By

Huang MZhu XGao J(2020)Challenges in Building Intelligent Open-domain Dialog SystemsACM Transactions on Information Systems10.1145/338312338:3(1-32)Online publication date: 9-Apr-2020
https://dl.acm.org/doi/10.1145/3383123
Du QZong CSu K(2020)Conducting Natural Language Inference with Word-Pair-Dependency and Local ContextACM Transactions on Asian and Low-Resource Language Information Processing10.1145/337770419:3(1-23)Online publication date: 20-Feb-2020
https://dl.acm.org/doi/10.1145/3377704
Comelles EAtserias J(2019)VERTaLanguage Resources and Evaluation10.1007/s10579-018-9430-253:1(57-86)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1007/s10579-018-9430-2
Show More Cited By

Index Terms

Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Index terms have been assigned to the content through auto-classification.

Recommendations

The Meteor metric for automatic evaluation of machine translation

The Meteor Automatic Metric for Machine Translation evaluation, originally developed and released in 2004, was designed with the explicit goal of producing sentence-level scores which correlate well with human judgments of translation quality. Several ...
Hybrid Arabic-French machine translation using syntactic re-ordering and morphological pre-processing

Hybrid Arabic-to-French SMT using rule-based pre-processing and language analysis.Morphologically reduced rules that reduce the morphology of Arabic.Swapping rules for a structural matching on pronouns and verbs.A gain in terms of BLEU score after ...
Statistical machine translation enhancements through linguistic levels: A survey

Machine translation can be considered a highly interdisciplinary and multidisciplinary field because it is approached from the point of view of human translators, engineers, computer scientists, mathematicians, and linguists. One of the most popular ...

Comments

Information & Contributors

Information

Published In

cover image Machine Translation

Machine Translation Volume 23, Issue 2-3

September 2009

120 pages

ISSN:0922-6567

Issue’s Table of Contents

Copyright © Copyright © 2010 Springer Science+Business Media B.V.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 September 2009

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang MZhu XGao J(2020)Challenges in Building Intelligent Open-domain Dialog SystemsACM Transactions on Information Systems10.1145/338312338:3(1-32)Online publication date: 9-Apr-2020
https://dl.acm.org/doi/10.1145/3383123
Du QZong CSu K(2020)Conducting Natural Language Inference with Word-Pair-Dependency and Local ContextACM Transactions on Asian and Low-Resource Language Information Processing10.1145/337770419:3(1-23)Online publication date: 20-Feb-2020
https://dl.acm.org/doi/10.1145/3377704
Comelles EAtserias J(2019)VERTaLanguage Resources and Evaluation10.1007/s10579-018-9430-253:1(57-86)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1007/s10579-018-9430-2
Giménez JMàrquez L(2018)Linguistic measures for automatic machine translation evaluationMachine Translation10.1007/s10590-011-9088-724:3-4(209-240)Online publication date: 21-Dec-2018
https://dl.acm.org/doi/10.1007/s10590-011-9088-7
Yang MZhu JLi SZhao T(2013)Fusion of word and letter based metrics for automatic MT evaluationProceedings of the Twenty-Third international joint conference on Artificial Intelligence10.5555/2540128.2540445(2204-2210)Online publication date: 3-Aug-2013
https://dl.acm.org/doi/10.5555/2540128.2540445
Cohn TLapata M(2013)An abstractive approach to sentence compressionACM Transactions on Intelligent Systems and Technology10.1145/2483669.24836744:3(1-35)Online publication date: 1-Jul-2013
https://dl.acm.org/doi/10.1145/2483669.2483674
Castillo JEstrella P(2012)Semantic textual similarity for MT evaluationProceedings of the Seventh Workshop on Statistical Machine Translation10.5555/2393015.2393020(52-58)Online publication date: 7-Jun-2012
https://dl.acm.org/doi/10.5555/2393015.2393020
Duh KSudoh KWu XTsukada HNagata MLi HLin COsborne M(2012)Learning to translate with multiple objectivesProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 110.5555/2390524.2390526(1-10)Online publication date: 8-Jul-2012
https://dl.acm.org/doi/10.5555/2390524.2390526
Castillo JEstrella PAgirre E(2012)SAGANProceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation10.5555/2387636.2387749(667-672)Online publication date: 7-Jun-2012
https://dl.acm.org/doi/10.5555/2387636.2387749
Wang MCer DAgirre E(2012)StanfordProceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation10.5555/2387636.2387746(648-654)Online publication date: 7-Jun-2012
https://dl.acm.org/doi/10.5555/2387636.2387746
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents