Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Information Retrieval Evaluation

  • Book
  • © 2011

Overview

This is a preview of subscription content, log in via an institution to check access.

Access this book

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

eBook USD 19.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 29.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

About this book

Evaluation has always played a major role in information retrieval, with the early pioneers such as Cyril Cleverdon and Gerard Salton laying the foundations for most of the evaluation methodologies in use today. The retrieval community has been extremely fortunate to have such a well-grounded evaluation paradigm during a period when most of the human language technologies were just developing. This lecture has the goal of explaining where these evaluation methodologies came from and how they have continued to adapt to the vastly changed environment in the search engine world today. The lecture starts with a discussion of the early evaluation of information retrieval systems, starting with the Cranfield testing in the early 1960s, continuing with the Lancaster "user" study for MEDLARS, and presenting the various test collection investigations by the SMART project and by groups in Britain. The emphasis in this chapter is on the how and the why of the various methodologies developed. Thesecond chapter covers the more recent "batch" evaluations, examining the methodologies used in the various open evaluation campaigns such as TREC, NTCIR (emphasis on Asian languages), CLEF (emphasis on European languages), INEX (emphasis on semi-structured data), etc. Here again the focus is on the how and why, and in particular on the evolving of the older evaluation methodologies to handle new information access techniques. This includes how the test collection techniques were modified and how the metrics were changed to better reflect operational environments. The final chapters look at evaluation issues in user studies -- the interactive part of information retrieval, including a look at the search log studies mainly done by the commercial search engines. Here the goal is to show, via case studies, how the high-level issues of experimental design affect the final evaluations. Table of Contents: Introduction and Early History / "Batch" Evaluation Since 1992 / Interactive Evaluation/ Conclusion

Similar content being viewed by others

Table of contents (4 chapters)

Authors and Affiliations

  • National Institute of Standards and Technology, USA

    Donna Harman

About the author

Donna Harman graduated from Cornell University as an Electrical Engineer, and started her career working with Professor Gerard Salton in the design and building of several test collections, including the first MEDLARS one. Later work was concerned with searching large volumes of data on relatively small computers, starting with building the IRX system at the National Library of Medicine in 1987, and then the Citator/PRISE system at the National Institute of Standards and Technology (NIST) in 1988. In 1990 she was asked by DARPA to put together a realistic test collection on the order of 2 gigabytes of text, and this test collection was used in the first Text REtrieval Conference (TREC). TREC is now in its 20th year, and along with its sister evaluations such as CLEF,NTCIR,INEX,and FIRE,serves as a major testing ground for information retrieval algorithms. She received the 1999 Strix Award from the U.K Institute of Information Scientists for this effort. Starting in 2000 she worked withPaul Over at NIST to form a new effort (DUC) to evaluate text summarization, which has now been folded into the Text Analysis Conference (TAC), providing evaluation for several areas in NLP.

Bibliographic Information

Publish with us