Abstract
Data-to-text generation systems tend to be knowledge-based and manually built, which limits their reusability and makes them time and cost-intensive to create and maintain. Methods for automating (part of) the system building process exist, but do such methods risk a loss in output quality? In this paper, we investigate the cost/quality trade-off in generation system building. We compare six data-to-text systems which were created by predominantly automatic techniques against six systems for the same domain which were created by predominantly manual techniques. We evaluate the systems using intrinsic automatic metrics and human quality ratings. We find that there is some correlation between degree of automation in the system-building process and output quality (more automation tending to mean lower evaluation scores). We also find that there are discrepancies between the results of the automatic evaluation metrics and the human-assessed evaluation experiments. We discuss caveats in assessing system-building cost and implications of the discrepancies in automatic and human evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Belz, A.: Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Natural Language Engineering 14(4), 431–455 (2008)
Belz, A.: Prodigy-METEO: Pre-alpha release notes (Nov 2009). Tech. Rep. NLTG-09-01, Natural Language Technology Group, CMIS, University of Brighton (2009)
Belz, A., Reiter, E.: Comparing automatic and human evaluation of NLG systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pp. 313–320 (2006)
Belz, A.: That’s nice.. what can you do with it? Computational Linguistics 35(1), 111–118 (2009)
Belz, A., Kow, E.: System building cost vs. output quality in data-to-text generation. In: Proceedings of the 12th European Workshop on Natural Language Generation (2009)
Belz, A., Kow, E., Viethen, J., Gatt, A.: Generating referring expressions in context: The GREC task evaluation challenges. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 294–328. Springer, Heidelberg (2010)
Bertoldi, N., Haddow, B., Fouet, J.: Improved Minimum Error Rate Training in Moses. The Prague Bulletin of Mathematical Linguistics 91, 7–16 (2009)
Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Callison-Burch, C., Osborne, M., Koehn, P.: Re-evaluating the role of BLEU in machine translation research. In: Proceedings of EACL 2006 (2006)
Chiang, D.: An introduction to synchronous grammars (part of the course materials for the ACL 2006 tutorial on synchronous grammars) (2006)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the ARPA Workshop on Human Language Technology (2002)
Gatt, A., Belz, A.: Introducing Shared Tasks to NLG: The TUNA Shared Task Evaluation Challenges. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 264–293. Springer, Heidelberg (2010)
Knight, K., Langkilde, I.: Generation that exploits corpus-based statistical knowledge. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), pp. 704–710 (1998)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), pp. 177–180 (2007)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003), pp. 48–54 (2003)
Langkilde, I.: Forest-based statistical sentence generation. In: Proceedings of the 6th Applied Natural Language Processing Conference and the 1st Meeting of the North American Chapter of the Association of Computational Linguistics (ANLP-NAACL 2000), pp. 170–177 (2000)
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Och, F.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, p. 167. Association for Computational Linguistics (2003)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: A method for automatic evaluation of machine translation. IBM research report, IBM Research Division (2001)
Parmentier, Y., Le Roux, J.: XMG: a Multi-formalism Metagrammatical Framework. In: 17th European Summer School in Logic, Language and Information - ESSLLI 2005, Edinburgh/Scotland (August 2005)
Reidsma, D., Op den Akker, R.: Exploiting ‘subjective’ annotations. In: Proceedings of the COLING 2008 Workshop on Human Judgements in Computational Linguistics, pp. 8–16 (2008)
Reiter, E., Belz, A.: An investigation into the validity of some metrics for automatically evaluating NLG systems. Computational Linguistics 35(4) (2009)
Reiter, E., Dale, R.: Building applied natural language generation systems. Natural Langauge Engineering 3(1), 57–87 (1997)
Reiter, E., Sripada, S., Hunter, J., Yu, J.: Choosing words in computer-generated weather forecasts. Artificial Intelligence 167, 137–169 (2005)
Riezler, S., Maxwell, J.T.: On some pitfalls in automatic evaluation and significance testing for MT. In: Proceedings of the ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization, pp. 57–64 (2005)
Sripada, S., Reiter, E., Hunter, J., Yu, J.: SumTime-Meteo: A parallel corpus of naturally occurring forecast texts and weather data. Tech. Rep. AUCS/TR0201, Computing Science Department, University of Aberdeen (2002)
Wong, Y.W., Mooney, R.: Learning for semantic parsing with statistical machine translation. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2006), pp. 439–446 (2006)
Wong, Y.W., Mooney, R.: Generation by inverting a semantic parser that uses statistical machine translation. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2007), pp. 172–179 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Belz, A., Kow, E. (2010). Assessing the Trade-Off between System Building Cost and Output Quality in Data-to-Text Generation. In: Krahmer, E., Theune, M. (eds) Empirical Methods in Natural Language Generation. EACL ENLG 2009 2009. Lecture Notes in Computer Science(), vol 5790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15573-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-15573-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15572-7
Online ISBN: 978-3-642-15573-4
eBook Packages: Computer ScienceComputer Science (R0)