From the Publisher:
This comprehensive state-of-the-art book is the first devoted to the important and timely issue of evaluating NLP systems. It addresses the whole area of NLP system evaluation, including aims and scope, problems and methodology. The authors provide a wide-ranging and careful analysis of evaluation concepts, reinforced with extensive illustrations; they relate systems to their environments and develop a framework for proper evaluation. The discussion of principles is completed by a detailed review of practice and strategies in the field, covering both systems for specific tasks, like translation, and core language processors. The methodology lessons drawn from the analysis and review are applied in a series of example cases. The book also refers NLP system evaluation to the neighbouring areas of information and speech processing, and addresses issues of tool and data provision for evaluation. A comprehensive bibliography and subject index are included as well as a term glossary. This monograph will be a valuable source of inspiration in research, practice, and teaching.
Cited By
- Boudia M, Hamou R, Amine A and Lokbani A (2020). An adaptation of a F-measure for automatic text summarization by extraction, Cluster Computing, 23:3, (2389-2398), Online publication date: 1-Sep-2020.
- Shackell C and Sitbon L Cognitive Externalities and HCI Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, (1-10)
- Lokbani A (2017). A New Metric of Validation for Automatic Text Summarization by Extraction, International Journal of Strategic Information Technology and Applications, 8:3, (20-40), Online publication date: 1-Jul-2017.
- Ramos‐Soto A, Vazquez‐Barreiros B, Bugarín A, Gewerc A and Barro S (2016). Evaluation of a Data‐To‐Text System for Verbalizing a Learning Analytics Dashboard, International Journal of Intelligent Systems, 32:2, (177-193), Online publication date: 7-Dec-2016.
- Al-Saleh A and Menai M (2016). Automatic Arabic text summarization, Artificial Intelligence Review, 45:2, (203-234), Online publication date: 1-Feb-2016.
- Menéndez H, Plaza L and Camacho D A genetic graph-based clustering approach to biomedical summarization Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics, (1-8)
- Méndez-Cruz C, Torres-Moreno J, Medina-Urrea A and Sierra G Extrinsic evaluation on automatic summarization tasks Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II, (46-57)
- Baroni M and Lenci A How we BLESSed distributional semantic evaluation Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, (1-10)
- Saggion H, Torres-Moreno J, Cunha I and SanJuan E Multilingual summarization evaluation without human models Proceedings of the 23rd International Conference on Computational Linguistics: Posters, (1059-1067)
- Spanger P, Ryu I, Asuka T, Takenobu T and Naoko K Towards an extrinsic evaluation of referring expressions in situated dialogs Proceedings of the 6th International Natural Language Generation Conference, (135-144)
- Gatt A and Portet F Textual properties and task based evaluation Proceedings of the 6th International Natural Language Generation Conference, (57-65)
- Gatt A and Belz A Introducing shared tasks to NLG Empirical methods in natural language generation, (264-293)
- Murray G, Kleinbauer T, Poller P, Becker T, Renals S and Kilgour J (2009). Extrinsic summarization evaluation, ACM Transactions on Speech and Language Processing , 6:2, (1-29), Online publication date: 1-Oct-2009.
- Zhao C, Peng Q, Zhao C and Sun S Chinese text automatic summarization based on affinity propagation cluster Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1, (425-429)
- Saravanan M, Ravindran B and Raman S (2009). Improving legal information retrieval using an ontological framework, Artificial Intelligence and Law, 17:2, (101-124), Online publication date: 1-Jun-2009.
- Liu F and Liu Y Correlation between ROUGE and human evaluation of extractive meeting summaries Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, (201-204)
- Wang C, Jing F, Zhang L and Zhang H Learning query-biased web page summarization Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, (555-562)
- Hirao T, Okumura M, Yasuda N and Isozaki H (2007). Supervised automatic evaluation for summarization with voted regression model, Information Processing and Management: an International Journal, 43:6, (1521-1535), Online publication date: 1-Nov-2007.
- Díaz A and Gervás P (2007). User-model based personalized summarization, Information Processing and Management: an International Journal, 43:6, (1715-1734), Online publication date: 1-Nov-2007.
- Ou S, Khoo C and Goh D (2007). Automatic multidocument summarization of research abstracts: Design and user evaluation, Journal of the American Society for Information Science and Technology, 58:10, (1419-1435), Online publication date: 1-Aug-2007.
- Zhang P, Plettenberg L, Klavans J, Oard D and Soergel D Task-based interaction with an integrated multilingual, multimedia information system Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, (117-126)
- Orasan C and Evans R (2007). NP animacy identification for anaphora resolution, Journal of Artificial Intelligence Research, 29:1, (79-103), Online publication date: 1-May-2007.
- Jones K Information retrieval and digital libraries Proceedings of the 2006 international workshop on Research issues in digital libraries, (1-7)
- Belz A and Kilgarriff A Shared-task evaluations in HLT Proceedings of the Fourth International Natural Language Generation Conference, (133-135)
- Jones K (2006). What's the value of TREC, ACM SIGIR Forum, 40:1, (10-20), Online publication date: 1-Jun-2006.
- Liang S, Devlin S and Tait J Evaluating web search result summaries Proceedings of the 28th European conference on Advances in Information Retrieval, (96-106)
- Mustafa El Hadi W, Dabbadie M, Timimi I, Rajman M, Langlais P, Hartley A and Belis A Work-in-progress project report Proceedings of the Second International Workshop on Language Resources for Translation Work, Research and Training, (16-26)
- Nagel H (2004). Steps toward a cognitive vision system, AI Magazine, 25:2, (31-50), Online publication date: 1-Jun-2004.
- Pouliquen B, Steinberger R, Ignat C and De Groeve T Geographical information recognition and visualization in texts written in various languages Proceedings of the 2004 ACM symposium on Applied computing, (1051-1058)
- Zhang Y, Zincir-Heywood N and Milios E (2004). World wide web site summarization, Web Intelligence and Agent Systems, 2:1, (39-53), Online publication date: 1-Jan-2004.
- Zhang Y, Zincir-Heywood N and Milios E Summarizing web sites automatically Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence, (283-296)
- Lin C and Hovy E Automatic evaluation of summaries using N-gram co-occurrence statistics Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, (71-78)
- King M Living up to standards Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?, (65-72)
- Bontcheva K Reuse and challenges in evaluating language generation systems Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?, (3-9)
- Hui B Measuring user acceptability of machine translations to diagnose system errors Proceedings of the 2002 COLING workshop on Machine translation in Asia - Volume 16, (1-7)
- Hovy E, King M and Popescu-Belis A (2002). Principles of Context-Based Machine Translation Evaluation, Machine Translation, 17:1, (43-75), Online publication date: 28-Aug-2002.
- Saggion H, Teufel S, Radev D and Lam W Meta-evaluation of summaries in a cross-lingual environment using content-based metrics Proceedings of the 19th international conference on Computational linguistics - Volume 1, (1-7)
- Maynard D, Bontcheva K, Saggion H, Cunningham H and Hamza O Using a text engineering framework to build an extendable and portable IE-based summarisation system Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4, (19-26)
- Mani I, Klein G, House D, Hirschman L, Firmin T and Sundheim B (2002). SUMMAC: a text summarization evaluation, Natural Language Engineering, 8:1, (43-68), Online publication date: 1-Mar-2002.
- McDonald S, Lai T and Tait J Evaluating a content based image retrieval system Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, (232-240)
- Mustafa El Hadi W, Timimi I, Béguin A and De Brito M The ARC A3 project Proceedings of the workshop on Evaluation for Language and Dialogue Systems - Volume 9, (1-11)
- Barr V and Klavans J Verification and validation of language processing systems Proceedings of the workshop on Evaluation for Language and Dialogue Systems - Volume 9, (1-7)
- Dybkjær L and Bernsen N Usability evaluation in spoken language dialogue systems Proceedings of the workshop on Evaluation for Language and Dialogue Systems - Volume 9, (1-10)
- Schiffman B, Mani I and Concepcion K Producing biographical summaries Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, (458-465)
- Kehler A, Bear J and Appelt D (2001). The need for accurate alignment in natural language system evaluation, Computational Linguistics, 27:2, (247-248), Online publication date: 1-Jun-2001.
- Goldstein J, Mittal V, Carbonell J and Callan J Creating and evaluating multi-document sentence extract summaries Proceedings of the ninth international conference on Information and knowledge management, (165-172)
- Voss C and Van Ess-Dykema C When is an embedded MT system "good enough" for filtering? ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems, (1-8)
- Voss C and Van Ess-Dykema C When is an embedded MT system "good enough" for filtering? Proceedings of the 2000 NAACL-ANLP Workshop on Embedded machine translation systems - Volume 5, (1-8)
- Goldstein J, Kantrowitz M, Mittal V and Carbonell J Summarizing text documents Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, (121-128)
- Goldstein J and Carbonell J Summarization Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998, (181-195)