Measuring Search Engine Quality

Hawking, David; Craswell, Nick; Bailey, Peter; Griffihs, Kathleen

doi:10.1023/A:1011468107287

Measuring Search Engine Quality

Published: April 2001

Volume 4, pages 33–59, (2001)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Measuring Search Engine Quality

Download PDF

David Hawking¹,
Nick Craswell¹,
Peter Bailey² &
…
Kathleen Griffihs³

2460 Accesses
127 Citations
3 Altmetric
Explore all metrics

Abstract

The effectiveness of twenty public search engines is evaluated using TREC-inspired methods and a set of 54 queries taken from real Web search logs. The World Wide Web is taken as the test collection and a combination of crawler and text retrieval system is evaluated. The engines are compared on a range of measures derivable from binary relevance judgments of the first seven live results returned. Statistical testing reveals a significant difference between engines and high intercorrelations between measures. Surprisingly, given the dynamic nature of the Web and the time elapsed, there is also a high correlation between results of this study and a previous study by Gordon and Pathak. For nearly all engines, there is a gradual decline in precision at increasing cutoff after some initial fluctuation. Performance of the engines as a group is found to be inferior to the group of participants in the TREC-8 Large Web task, although the best engines approach the median of those systems. Shortcomings of current Web search evaluation methodology are identified and recommendations are made for future improvements. In particular, the present study and its predecessors deal with queries which are assumed to derive from a need to find a selection of documents relevant to a topic. By contrast, real Web search reflects a range of other information need types which require different judging and different measures.

Article PDF

Challenges for Search Engine Retrieval Effectiveness Evaluations: Universal Search, User Intents, and Results Presentation

An Introduction to Contemporary Search Technology

Evaluating Semantic Search Systems to Identify Future Directions of Research

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Alta Vista Company. Alta Vista web page. http://www.altavista.com/.
Blair DC and Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM, 28(3): 289-299.
Google Scholar
Buckley C and Voorhees E (2000) Evaluating evaluation measure stability. In: Proceedings of SIGIR'00, New York, 2000, pp. 33-40. ACM Press.
Google Scholar
Cleverdon C (1997) The Cranfield tests on index language devices. In: Jones KS and Willett P, Eds. Readings in Information Retrieval, Morgan Kauffman, San Francisco, pp. 47-59. (Reprinted from Aslib Proceedings, 19, 173-192).
Google Scholar
Ding W and Marchionini G (1996) Comparative study of web search service performance. In: Proceedings of the ASIS 1996 Annual Conference, Oct. 1996, pp. 136-142.
Eisenberg M and Barry C (1998) Order effects: A study of the possible influence of presentation order on user judgments of document relevance. Journal of the American Society for Information Science, 39(5): 293-300.
Google Scholar
Electric Knowledge LLC. Electric Monk home page. http://electricmonk.com/.
Gordon M and Pathak P (1999) Finding information on the world wide web: The retrieval effectiveness of search engines. Information Processing and Management, 35(2): 141-180.
Google Scholar
Hawking D and Thistlewaite P (1997) Overview of TREC-6 very large collection track. In: Voorhees EM and Harman DK, Eds. Proceedings of TREC-6. Gaithersburg, MD, Nov. 1997, pp. 93-106. NIST special publication 500-240, http://trec.nist.gov.
Hawking D, Thistlewaite P and Harman D (1999) Scaling up theTRECcollection. Information Retrieval, 1(1): 115-137.
Google Scholar
Hawking D, Voorhees E, Bailey P and Craswell N (1999) Overview of TREC-8 Web Track. In: Proceedings of TREC-8. Gaithersburg, MD, Nov. 1999, pp. 131-150. NIST special publication 500-246, http://trec.nist.gov.
Koster M The web robots pages. http://info.webcrawler.com/mak/projects/robots/robots.html.
Lawrence S and Lee Giles C (1999) Accessibility of information on the web. Nature, 400: 107-109.
Google Scholar
Leighton HV and Srivastava J (1999) First 20 precision among world wide web search services (search engines). Journal of the American Society for Information Science, 50(10): 882-889.
Google Scholar
Lynx. Lynx browser home page. http://lynx.browser.org.
Salton G and Lesk ME (1997) Computer evaluation of indexing and text processing. In: Jones KS and Willett P, Eds. Readings in Information Retrieval, Morgan Kauffman, San Francisco, pp. 60-84. (Reprinted from Journal of the ACM, 15, 8-36).
Google Scholar
Silverstein C, Henzinger M, Marais H and Moricz M (1999) Analysis of a very large web search engine query log. SIGIR Forum, 33(1): 6-12. Previously available as Digital Systems Research Center TR 1998-014 at http://www.research.digital.com/SRC.
Google Scholar
Voorhees EM and Harman DK (1998) Eds. Proceedings of TREC-7, Gaithersburg, MD, Nov. 1998. NIST special publication 500-242, http://trec.nist.gov.
Voorhees EM and Harman DK (1999) Eds. Proceedings of TREC-8, Gaithersburg, MD, Nov. 1999. NIST special publication 500-246, cm http://trec.nist.gov.
Voorhees EM and Harman DK (1996) Overview of the fifth Text Retrieval Conference (TREC-5). In: Voorhees EM and Harman DK, Eds. Proceedings of TREC-5, Gaithersburg, MD, Nov. 1996, pp. 1-28. NIST special publication 500-238, http://trec.nist.gov.
Voorhees EM (1998) Variations in relevance judgments and the measurement of retrieval effectiveness. In: Croft WB, Moffat A, van Rijsbergen CJ, Wilkinson R and Zobel J, Eds. Proceedings of SIGIR'98, Melbourne, Australia, August 1998. pp. 315-323.

Download references

Author information

Authors and Affiliations

CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, Australia, 2601
David Hawking & Nick Craswell
Computer Science Department, Australian National University, Canberra, Australia, 0200
Peter Bailey
Centre for Mental Health Research, Australian National University, Canberra, Australia, 0200
Kathleen Griffihs

Authors

David Hawking
View author publications
You can also search for this author in PubMed Google Scholar
Nick Craswell
View author publications
You can also search for this author in PubMed Google Scholar
Peter Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Kathleen Griffihs
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hawking, D., Craswell, N., Bailey, P. et al. Measuring Search Engine Quality. Information Retrieval 4, 33–59 (2001). https://doi.org/10.1023/A:1011468107287

Download citation

Issue Date: April 2001
DOI: https://doi.org/10.1023/A:1011468107287

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Measuring Search Engine Quality

Abstract

Article PDF

Similar content being viewed by others

Challenges for Search Engine Retrieval Effectiveness Evaluations: Universal Search, User Intents, and Results Presentation

An Introduction to Contemporary Search Technology

Evaluating Semantic Search Systems to Identify Future Directions of Research

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Measuring Search Engine Quality

Abstract

Article PDF

Similar content being viewed by others

Challenges for Search Engine Retrieval Effectiveness Evaluations: Universal Search, User Intents, and Results Presentation

An Introduction to Contemporary Search Technology

Evaluating Semantic Search Systems to Identify Future Directions of Research

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation