research-article

Free access

Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

Authors:

Alon Lavie,

Abhaya AgarwalAuthors Info & Claims

StatMT '07: Proceedings of the Second Workshop on Statistical Machine Translation

Pages 228 - 231

Published: 23 June 2007 Publication History

PDF eReader

Abstract

Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. It is one of several automatic metrics used in this year's shared task within the ACL WMT-07 workshop. This paper recaps the technical details underlying the metric and describes recent improvements in the metric. The latest release includes improved metric parameters and extends the metric to support evaluation of MT output in Spanish, French and German, in addition to English.

References

[1]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65--72, Ann Arbor, Michigan, June.

Digital Library

Google Scholar

[2]

John Blatz, Erin Fitzgerald, George Foster, Simona Gandrabur, Cyril Goutte, Alex Kulesza, Alberto Sanchis, and Nicola Ueffing. 2003. Confidence Estimation for Machine Translation. Technical Report Natural Language Engineering Workshop Final Report, Johns Hopkins University.

Google Scholar

[3]

Marvin Humphrey. 2007. Perl Interface to Snowball Stemmers. http://search.cpan.org/ creamyg/Lingua-Stem-Snowball-0.941/lib/Lingua/Stem/Snowball.pm.

Google Scholar

[4]

Alon Lavie, Kenji Sagae, and Shyamsundar Jayaraman. 2004. The Significance of Recall in Automatic Metrics for MT Evaluation. In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), pages 134--143, Washington, DC, September.

Crossref

Google Scholar

[5]

Gregor Leusch, Nicola Ueffing, and Hermann Ney. 2006. CDER: Efficient MT Evaluation Using Block Movements. In Proceedings of the Thirteenth Conference of the European Chapter of the Association for Computational Linguistics.

Google Scholar

[6]

I. Dan Melamed, Ryan Green, and Joseph Turian. 2003. Precision and Recall of Machine Translation. In Proceedings of the HLT-NAACL 2003 Conference: Short Papers, pages 61--63, Edmonton, Alberta.

Digital Library

Google Scholar

[7]

George Miller and Christiane Fellbaum. 2007. Word-Net. http://wordnet.princeton.edu/.

Google Scholar

[8]

Franz Josef Och. 2003. Minimum Error Rate Training for Statistical Machine Translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics.

Digital Library

Google Scholar

[9]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 311--318, Philadelphia, PA, July.

Digital Library

Google Scholar

[10]

Martin Porter. 2001. The Porter Stemming Algorithm. http://www.tartarus.org/ martin/PorterStemmer/index.html.

Google Scholar

[11]

Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A Study of Translation Edit Rate with Targeted Human Annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA-2006), pages 223--231, Cambridge, MA, August.

Google Scholar

[12]

C. van Rijsbergen, 1979. Information Retrieval. Butterworths, London, UK, 2nd edition.

Digital Library

Google Scholar

Cited By

View all

Wang HGuo BChen MZhang QDing YZhang YYu Z(2025)Cascade context-oriented spatio-temporal attention network for efficient and fine-grained video-grounded dialoguesFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40387-w19:7Online publication date: 1-Jul-2025
https://dl.acm.org/doi/10.1007/s11704-024-40387-w
Chen MGuo BWang HLi HZhao QLiu JDing YPan YYu Z(2025)The future of cognitive strategy-enhanced persuasive dialogue agents: new perspectives and trendsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40057-x19:5Online publication date: 1-May-2025
https://dl.acm.org/doi/10.1007/s11704-024-40057-x
Yaras CWang PBalzano LQu QSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Compressible dynamics in deep overparameterized low-rank learning & adaptationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694420(56946-56965)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694420
Show More Cited By

Recommendations

Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems
WMT '11: Proceedings of the Sixth Workshop on Statistical Machine Translation

This paper describes Meteor 1.3, our submission to the 2011 EMNLP Workshop on Statistical Machine Translation automatic evaluation metric tasks. New metric features include improved text normalization, higher-precision paraphrase matching, and ...
METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output
StatMT '08: Proceedings of the Third Workshop on Statistical Machine Translation

This paper describes our submissions to the machine translation evaluation shared task in ACL WMT-08. Our primary submission is the Meteor metric tuned for optimizing correlation with human rankings of translation hypotheses. We show significant ...
The Meteor metric for automatic evaluation of machine translation

The Meteor Automatic Metric for Machine Translation evaluation, originally developed and released in 2004, was designed with the explicit goal of producing sentence-level scores which correlate well with human judgments of translation quality. Several ...

Comments

Information & Contributors

Information

Published In

StatMT '07: Proceedings of the Second Workshop on Statistical Machine Translation

June 2007

281 pages

Program Chairs:
Chris Callison-Burch
Johns Hopkins University
,
Philipp Koehn
University of Edinburgh
,
Christof Monz
Queen Mary, University of London
,
Cameron Shaw Fordyce
Center for the Evaluation of Language and Communication Technologies

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 23 June 2007

Qualifiers

Research-article

Acceptance Rates

StatMT '07 Paper Acceptance Rate 12 of 38 submissions, 32%;

Overall Acceptance Rate 24 of 59 submissions, 41%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

182
Total Citations
View Citations
1,981
Total Downloads

Downloads (Last 12 months)312
Downloads (Last 6 weeks)51

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Wang HGuo BChen MZhang QDing YZhang YYu Z(2025)Cascade context-oriented spatio-temporal attention network for efficient and fine-grained video-grounded dialoguesFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40387-w19:7Online publication date: 1-Jul-2025
https://dl.acm.org/doi/10.1007/s11704-024-40387-w
Chen MGuo BWang HLi HZhao QLiu JDing YPan YYu Z(2025)The future of cognitive strategy-enhanced persuasive dialogue agents: new perspectives and trendsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40057-x19:5Online publication date: 1-May-2025
https://dl.acm.org/doi/10.1007/s11704-024-40057-x
Yaras CWang PBalzano LQu QSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Compressible dynamics in deep overparameterized low-rank learning & adaptationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694420(56946-56965)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694420
Wang HZhang JMa QSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Exploring intrinsic dimension for vision-language model pruningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694210(52247-52259)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694210
Jang YKim GKim BKim YLee HLee MSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Degeneration-free policy optimizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692925(21266-21288)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692925
Gardner JDurand SStoller DBittner RSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)LLARKProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692673(15037-15082)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692673
Fu HTan JZhang PLi FSun JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)PinNetProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692636(14157-14174)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692636
Chen DChen RZhang SWang YLiu YZhou HZhang QWan YZhou PSun LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)MLLM-as-a-JudgeProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692324(6562-6595)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692324
Bu ZWang YZha SKarypis GSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Differentially private bias-term fine-tuning of foundation modelsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692259(4730-4751)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692259
Zeng ZZheng YLi YWang S(2024)NewsCap: Audio-guided Context-aware Network for News Video CaptioningProceedings of the 2024 13th International Conference on Computing and Pattern Recognition10.1145/3704323.3704388(407-411)Online publication date: 25-Oct-2024
https://dl.acm.org/doi/10.1145/3704323.3704388
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Recommendations

Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems

METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output

The Meteor metric for automatic evaluation of machine translation

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations