research-article

Free access

GENEVAL: a proposal for shared-task evaluation in NLG

Authors:

Ehud Reiter,

Anja BelzAuthors Info & Claims

INLG '06: Proceedings of the Fourth International Natural Language Generation Conference

Pages 136 - 138

Published: 15 July 2006 Publication History

PDF eReader

Abstract

We propose to organise a series of sharedtask NLG events, where participants are asked to build systems with similar input/output functionalities, and these systems are evaluated with a range of different evaluation techniques. The main purpose of these events is to allow us to compare different evaluation techniques, by correlating the results of different evaluations on the systems entered in the events.

References

[1]

Srinavas Bangalore, Owen Rambow, and Steve Whit-taker. 2000. Evaluation metrics for generation. In Proceedings of INLG-2000, pages 1--8.

Digital Library

Google Scholar

[2]

Anja Belz and Adam Kilgarriff. 2006. Shared-task evaluations in HLT: Lessons for NLG. In Proceedings of INLG-2006.

Digital Library

Google Scholar

[3]

Anja Belz and Ehud Reiter. 2006. Comparing automatic and human evaluation of NLG systems. In Proceedings of EACL-2006, pages 313--320.

Google Scholar

[4]

Lynette Hirschman. 1998. The evolution of evaluation: Lessons from the Message Understanding Conferences. Computer Speech and Language, 12:283--285.

Crossref

Google Scholar

[5]

Anna Law, Yvonne Freer, Jim Hunter, Robert Logie, Neil McIntosh, and John Quinn. 2005. Generating textual summaries of graphical time series data to support medical decision making in the neonatal intensive care unit. Journal of Clinical Monitoring and Computing, 19:183--194.

Crossref

Google Scholar

[6]

Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of ACL-2002, pages 311--318.

Digital Library

Google Scholar

[7]

Ehud Reiter and Robert Dale. 2000. Building Natural Language Generation Systems. Cambridge University Press.

Digital Library

Google Scholar

[8]

Ehud Reiter and Somayajulu Sripada. 2002. Should corpora texts be gold standards for NLG? In Proceedings of INLG-2002, pages 97--104.

Google Scholar

[9]

Ehud Reiter, Roma Robertson, and Liesl Osman. 2003. Lessons from a failure: Generating tailored smoking cessation letters. Artificial Intelligence, 144:41--58.

Digital Library

Google Scholar

[10]

Somayajulu Sripada, Ehud Reiter, Jim Hunter, and Jin Yu. 2003. Exploiting a parallel text-data corpus. In Proceedings of Corpus Linguistics 2003, pages 734--743.

Google Scholar

Cited By

View all

Napoles CVan Durme BCallison-Burch CFilippova KWan S(2011)Evaluating sentence compressionProceedings of the Workshop on Monolingual Text-To-Text Generation10.5555/2107679.2107690(91-97)Online publication date: 24-Jun-2011
https://dl.acm.org/doi/10.5555/2107679.2107690
Belz AKilgarriff AColineau NParis CWan SDale R(2006)Shared-task evaluations in HLTProceedings of the Fourth International Natural Language Generation Conference10.5555/1706269.1706297(133-135)Online publication date: 15-Jul-2006
https://dl.acm.org/doi/10.5555/1706269.1706297

GENEVAL: a proposal for shared-task evaluation in NLG
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

GENEVAL: an object-focused framework for evaluating text-to-image alignment
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Recent breakthroughs in diffusion models, multimodal pretraining, and efficient finetuning have led to an explosion of text-to-image generative models. Given human evaluation is expensive and difficult to scale, automated methods are critical for ...
Multi-attribute comprehensive evaluation of individual research output based on published research papers

This paper proposes a multi-attribute comprehensive evaluation method of individual research output (IRO). It highlights the fact that a single index can never give more than a rough approximation to IRO, and the evaluation of IRO is a multi-attribute ...
Experimental teaching quality evaluation practice based on AHP-fuzzy comprehensive evaluation model
ICIC'13: Proceedings of the 9th international conference on Intelligent Computing Theories and Technology

In this thesis, we use the integration method of AHP and fuzzy comprehensive evaluation as the evaluation model for the experimental teaching evaluation system. First, we build a hierarchy model and calculate the weigh of evaluation factor by AHP, and ...

Comments

Information & Contributors

Information

Published In

INLG '06: Proceedings of the Fourth International Natural Language Generation Conference

July 2006

132 pages

ISBN:1932432728

Program Chairs:
Nathalie Colineau
CSIRO - ICT Centre, Australia
,
Cécile Paris
CSIRO - ICT Centre, Australia
,
Stephen Wan
CSIRO - ICT Centre and Macquarie University, Australia
,
Robert Dale
Macquarie University, Australia

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 15 July 2006

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
202
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)11

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Napoles CVan Durme BCallison-Burch CFilippova KWan S(2011)Evaluating sentence compressionProceedings of the Workshop on Monolingual Text-To-Text Generation10.5555/2107679.2107690(91-97)Online publication date: 24-Jun-2011
https://dl.acm.org/doi/10.5555/2107679.2107690
Belz AKilgarriff AColineau NParis CWan SDale R(2006)Shared-task evaluations in HLTProceedings of the Fourth International Natural Language Generation Conference10.5555/1706269.1706297(133-135)Online publication date: 15-Jul-2006
https://dl.acm.org/doi/10.5555/1706269.1706297

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Recommendations

GENEVAL: an object-focused framework for evaluating text-to-image alignment

Multi-attribute comprehensive evaluation of individual research output based on published research papers

Experimental teaching quality evaluation practice based on AHP-fuzzy comprehensive evaluation model

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations