Abstract
There is overwhelming evidence suggesting that the real users of IR systems often prefer using extremely short queries (one or two individual words) but they try out several queries if needed. Such behavior is fundamentally different from the process modeled in the traditional test collection-based IR evaluation based on using more verbose queries and only one query per topic. In the present paper, we propose an extension to the test collection-based evaluation. We will utilize sequences of short queries based on empirically grounded but idealized session strategies. We employ TREC data and have test persons to suggest search words, while simulating sessions based on the idealized strategies for repeatability and control. The experimental results show that, surprisingly, web-like very short queries (including one-word query sequences) typically lead to good enough results even in a TREC type test collection. This finding motivates the observed real user behavior: as few very simple attempts normally lead to good enough results, there is no need to pay more effort. We conclude by discussing the consequences of our finding for IR evaluation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jansen, M.B.J., Spink, A., Saracevic, T.: Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web. Inf. Proc. Man. 36(2), 207–227 (2000)
Järvelin, K., Price, S.L., Delcambre, L.M.L., Nielsen, M.L.: Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 4–15. Springer, Heidelberg (2008)
Smith, C.L., Kantor, P.B.: User Adaptation: Good Results from Poor Systems. In: Proc. ACM SIGIR 2008, pp. 147–154 (2008)
Stenmark, D.: Identifying Clusters of User Behavior in Intranet Search Engine Log Files. JASIST 59(14), 2232–2243 (2008)
Turpin, A., Hersh, W.: Why Batch and User Evaluations Do Not Give the Same Results. In: Proc. ACM SIGIR 2001, pp. 225–231 (2001)
Järvelin, K., Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM TOIS 20(4), 422–446 (2002)
Swanson, D.: Information Retrieval as a Trial-and-Error Process. Library Quarterly 47(2), 128–148 (1977)
Sanderson, M.: Ambiguous Queries: Test Collections Need More Sense. In: Proc. ACM SIGIR 2008, pp. 499–506 (2008)
Azzopardi, L.: Position Paper: Towards Evaluating the User Experience of Interactive Information Access Systems. In: SIGIR 2007 Web Information-Seeking and Interaction Workshop, p. 5 (2007)
Lykke, M., Price, S.L., Delcambre, L.M.L., Vedsted, P.: How doctors search: a study of family practitioners’ query behaviour and the impact on search results (in press, 2009)
Cleverdon, C.W., Mills, L., Keen, M.: Factors determining the performance of indexing systems, vol. 1 - design. Aslib Cranfield Research Project, Cranfield (1966)
Salton, G.: Evaluation Problems in Interactive Information Retrieval. Inf. Stor. Retr. 6, 29–44 (1970)
Su, L.T.: Evaluation Measures for Interactive Information Retrieval. Inf. Proc. Man. 28(4), 503–516 (1992)
Hersh, W.: Relevance and Retrieval Evaluation: Perspectives from Medicine. JASIS, 201–206 (April 1994)
ISO: Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs), Part 11: Guidance on Usability. ISO 9241-11:1998 (E) (1998)
Vakkari, P., Sormunen, E.: The Influence of Relevance Levels on the Effectiveness of Interactive Retrieval. JASIST 55(11), 963–969 (2004)
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately Interpreting Click-through Data as Implicit Feedback. In: Proc. ACM SIGIR 2005, pp. 154–161 (2005)
Price, S.L., Nielsen, M.L., Delcambre, L.M.L., Vedsted, P.: Semantic Components Enhance Retrieval of Domain-specific Documents. In: Proc. ACM CIKM 2007, pp. 429–438 (2007)
Sormunen, E.: Liberal Relevance Criteria of TREC - Counting on Negligible Documents? In: Proc. ACM SIGIR 2002, pp. 324–330 (2002)
Voorhees, E., Harman, D.: TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (2005)
Bates, M.J.: The Design of Browsing and Berrypicking Techniques for the Online Search Interface (1989), http://www.gseis.ucla.edu/faculty/bates/berrypicking.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Keskustalo, H., Järvelin, K., Pirkola, A., Sharma, T., Lykke, M. (2009). Test Collection-Based IR Evaluation Needs Extension toward Sessions – A Case of Extremely Short Queries. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-04769-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04768-8
Online ISBN: 978-3-642-04769-5
eBook Packages: Computer ScienceComputer Science (R0)