Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1706269.1706298dlproceedingsArticle/Chapter ViewAbstractPublication PagesinlgConference Proceedingsconference-collections
research-article
Free access

GENEVAL: a proposal for shared-task evaluation in NLG

Published: 15 July 2006 Publication History

Abstract

We propose to organise a series of sharedtask NLG events, where participants are asked to build systems with similar input/output functionalities, and these systems are evaluated with a range of different evaluation techniques. The main purpose of these events is to allow us to compare different evaluation techniques, by correlating the results of different evaluations on the systems entered in the events.

References

[1]
Srinavas Bangalore, Owen Rambow, and Steve Whit-taker. 2000. Evaluation metrics for generation. In Proceedings of INLG-2000, pages 1--8.
[2]
Anja Belz and Adam Kilgarriff. 2006. Shared-task evaluations in HLT: Lessons for NLG. In Proceedings of INLG-2006.
[3]
Anja Belz and Ehud Reiter. 2006. Comparing automatic and human evaluation of NLG systems. In Proceedings of EACL-2006, pages 313--320.
[4]
Lynette Hirschman. 1998. The evolution of evaluation: Lessons from the Message Understanding Conferences. Computer Speech and Language, 12:283--285.
[5]
Anna Law, Yvonne Freer, Jim Hunter, Robert Logie, Neil McIntosh, and John Quinn. 2005. Generating textual summaries of graphical time series data to support medical decision making in the neonatal intensive care unit. Journal of Clinical Monitoring and Computing, 19:183--194.
[6]
Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of ACL-2002, pages 311--318.
[7]
Ehud Reiter and Robert Dale. 2000. Building Natural Language Generation Systems. Cambridge University Press.
[8]
Ehud Reiter and Somayajulu Sripada. 2002. Should corpora texts be gold standards for NLG? In Proceedings of INLG-2002, pages 97--104.
[9]
Ehud Reiter, Roma Robertson, and Liesl Osman. 2003. Lessons from a failure: Generating tailored smoking cessation letters. Artificial Intelligence, 144:41--58.
[10]
Somayajulu Sripada, Ehud Reiter, Jim Hunter, and Jin Yu. 2003. Exploiting a parallel text-data corpus. In Proceedings of Corpus Linguistics 2003, pages 734--743.

Cited By

View all
  • (2011)Evaluating sentence compressionProceedings of the Workshop on Monolingual Text-To-Text Generation10.5555/2107679.2107690(91-97)Online publication date: 24-Jun-2011
  • (2006)Shared-task evaluations in HLTProceedings of the Fourth International Natural Language Generation Conference10.5555/1706269.1706297(133-135)Online publication date: 15-Jul-2006

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
INLG '06: Proceedings of the Fourth International Natural Language Generation Conference
July 2006
132 pages
ISBN:1932432728

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 15 July 2006

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)11
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Evaluating sentence compressionProceedings of the Workshop on Monolingual Text-To-Text Generation10.5555/2107679.2107690(91-97)Online publication date: 24-Jun-2011
  • (2006)Shared-task evaluations in HLTProceedings of the Fourth International Natural Language Generation Conference10.5555/1706269.1706297(133-135)Online publication date: 15-Jul-2006

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media