Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Predicting the relevance of a library catalog search

Published: 13 August 2001 Publication History

Abstract

Relevance has been a difficult concept to define, let alone measure. In this paper, a simple operational definition of relevance is proposed for a Web-based library catalog: whether or not during a search session the user saves, prints, mails, or downloads a citation. If one of those actions is performed, the session is considered relevant to the user. An analysis is presented illustrating the advantages and disadvantages of this definition. With this definition and good transaction logging, it is possible to ascertain the relevance of a session. This was done for 905,970 sessions conducted with the University of California's Melvyl online catalog. Next, a methodology was developed to try to predict the relevance of a session. A number of variables were defined that characterize a session, none of which used any demographic information about the user. The values of the variables were computed for the sessions. Principal components analysis was used to extract a new set of variables out of the original set. A stratified random sampling technique was used to form ten strata such that each new strata of 90,570 sessions contained the same proportion of relevant to nonrelevant sessions. Logistic regression was used to ascertain the regression coefficients for nine of the ten strata. Then, the coefficients were used to predict the relevance of the sessions in the missing strata. Overall, 17.85% of the sessions were determined to be relevant. The predicted number of relevant sessions for all ten strata was 11%, a 6.85% difference. The authors believe that the methodology can be further refined and the prediction improved. This methodology could also have significant application in improving user searching and also in predicting electronic commerce buying decisions without the use of personal demographic data.

References

[1]
Allison, P.D. (1999). Logistic regression using the SAS system: Theory and application. Cary, NC: SAS Institute.
[2]
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont, CA: Wadsworth International Group.
[3]
Chen, H.-M. (2000). An analytical approach to deriving usage patterns in a Web-based information system. PhD dissertation, School of Information Management and Systems, University of California, Berkeley. CA.
[4]
Choo, C., Detlor, B., & Turnbull, D. (1998). A behavioral model of information seeking one Web: Preliminary results of a study of how managers and IT specialists use the Web. ASIS '98. Proceedings of the fist ASIS Annual Meeting, (Vol. 35), Information Today, Medford, NJ, pp. 290-302.
[5]
Cooper, M.D. (1998). Design considerations in instrumenting and monitoring Web-based information retrieval systems. Journal of the American Society for Information Science. 49. 903-919.
[6]
Cooper, M.D. (in press). Usage patterns of a Web-based library catalog. Journal of the American Society for Information Science.
[7]
Cooper, W.S. (1971). A definition of relevance for information retrieval. Information Storage and Retrieval, 7, 19-37.
[8]
Cooper, W.S. (1973a). On selecting a measure of retrieval effectiveness, part I: The "subjective" philosophy of evaluation. Journal of the American Society for Information Science, 24, 87-100.
[9]
Cooper, W.S. (l973b), On selecting a measure of retrieval effectiveness, part 2: Implementation of the philosophy. Journal of the American Society for Information Science, 24, 413-424.
[10]
Cuadra, C.A., & Katter, R.V. (1967). Experimental studies of relevance judgments: Final report, 3 Vols., TM-35201001/00, TM-35201002100, and TM-35201003/00, System Development Corporation, Santa Monica, CA.
[11]
Hansen, M., Hurwitz, W.N., & Madow, W.G. (1953). Sample survey methods and theory: Volume I, Methods and applications. New York: John Wiley & Sons.
[12]
Harter, S.P. (1992). Psychological relevance and information science. Journal of the American Society for Information Science, 43. 602-615.
[13]
Hjrland, B. (2000). Relevance research: The missing perspective(s): "nonrelevance" and "epistemological relevance" (Letter to the Editor). Journal of the American Society for Information Science, 51. 209-211.
[14]
Maron, M.E. (1977). On indexing, retrieval and the meaning of about. Journal of the American Society for Information Science, 28, 38-43.
[15]
Mizzaro, S. (1997). Relevance: The whole history. Journal of the American Society for Information Science, 48, 810-832.
[16]
Rees, A.M., & Schultz, D.G. (1967). A field experimental approach to the study of relevance assessments in relation to document searching (Vol. I). Final Report to the National Science Foundation, NSF Contract Number C-423, Center for Documentation and Communication Research, School of Library Science, Case Western Reserve University Cleveland, 01-1, PB 176080.
[17]
Saracevic, T. (1975). Relevance: A review of and a framework for the thinking on the notion in information science. Journal of the American Society for Information Science, 26. 321-343.
[18]
Saracevic, T. (1976). Relevance: A review of the literature and a framework for thinking on the notion in information science. In M.J. Voigt & M.H. Harris (Eds.), Advances in librarianship (Vol. 6) (pp. 79-138). New York: Academic Press.
[19]
SASInstitute, Inc. (1990). SAS/STAT user's guide (Version 6, 4th ed., Vol. 2). Cary, NC: SAS Institute.
[20]
Schamber, L., Eisenberg, M.B., & Nilan, M.S. (1990). A reexamination of relevance: Toward a dynamic, situational definition. Information Processing & Management, 26, 755-776.
[21]
Wilson, P.G. (1968). Two kinds of power: An essay on bibliographic control. University of California Publications in Librarianship (Vol. 5). Berkeley, CA: University of California Press.
[22]
Wilson, P. (1973). Situational relevance. Information Storage and Retrieval, 9, 457-471.
[23]
Wilson, P.G. (1993). Communication efficiency in research and development. Journal of the American Society for Information Science, 44, 376-382.
[24]
Wilson, P.G. (1995). Unused relevant information in research and development. Journal of the American Society for Information Science, 46, 45-51.
[25]
Wilson, P.G. (1996). Some consequences of information overload and rapid conceptual change. In J. Olaisen, B. Munch-Petersen, & P. Wilson (Eds.), Information science: From the development of the discipline to social interaction. Oslo: Scandinavian University press.

Cited By

View all
  • (2022)Evaluating the quality of linked open data in digital librariesJournal of Information Science10.1177/016555152093095148:1(21-43)Online publication date: 1-Feb-2022
  • (2016)The Influence of Topic Difficulty, Relevance Level, and Document Ordering on Relevance JudgingProceedings of the 21st Australasian Document Computing Symposium10.1145/3015022.3015033(41-48)Online publication date: 5-Dec-2016
  • (2011)Gotta keep 'em separatedProceedings of the 12th Annual Conference of the New Zealand Chapter of the ACM Special Interest Group on Computer-Human Interaction10.1145/2000756.2000772(109-112)Online publication date: 4-Jul-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the American Society for Information Science and Technology
Journal of the American Society for Information Science and Technology  Volume 52, Issue 10
Visual based retrieval systems and web mining
August 2001
80 pages

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 13 August 2001

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Evaluating the quality of linked open data in digital librariesJournal of Information Science10.1177/016555152093095148:1(21-43)Online publication date: 1-Feb-2022
  • (2016)The Influence of Topic Difficulty, Relevance Level, and Document Ordering on Relevance JudgingProceedings of the 21st Australasian Document Computing Symposium10.1145/3015022.3015033(41-48)Online publication date: 5-Dec-2016
  • (2011)Gotta keep 'em separatedProceedings of the 12th Annual Conference of the New Zealand Chapter of the ACM Special Interest Group on Computer-Human Interaction10.1145/2000756.2000772(109-112)Online publication date: 4-Jul-2011
  • (2010)Towards a model of implicit feedback for Web searchJournal of the American Society for Information Science and Technology10.5555/1672957.167296561:1(30-49)Online publication date: 1-Jan-2010
  • (2006)Documents and queries as random variables: History and implicationsJournal of the American Society for Information Science and Technology10.5555/1144500.114450357:9(1138-1154)Online publication date: 1-Jul-2006
  • (2006)Relevance for browsing, relevance for searchingJournal of the American Society for Information Science and Technology10.5555/1107442.110745057:1(69-86)Online publication date: 1-Jan-2006
  • (2004)Display time as implicit feedbackProceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1008992.1009057(377-384)Online publication date: 25-Jul-2004
  • (2003)Implicit feedback for inferring user preferenceACM SIGIR Forum10.1145/959258.95926037:2(18-28)Online publication date: 1-Sep-2003
  • (2002)Stochastic modeling of usage patterns in a Web-based information systemJournal of the American Society for Information Science and Technology10.1002/asi.1007653:7(536-548)Online publication date: 1-Jul-2002
  • (2001)Using clustering techniques to detect usage patterns in a Web-based information systemJournal of the American Society for Information Science and Technology10.1002/asi.1159.abs52:11(888-904)Online publication date: 1-Sep-2001

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media