Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Binary and graded relevance in IR evaluations: comparison of the effects on ranking of IR systems

Published: 01 September 2005 Publication History

Abstract

In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. Kendall's rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures.

References

[1]
Barry, C. L. (1994). User-defined relevance criteria: an exploratory study. Journal of the American Society for Information Science, 45(3), 149-159.
[2]
Borlund, P. (2000). Experimental components for the evaluation of interactive information retrieval systems. Journal of Documentation, 56(1), 71-90.
[3]
Borlund, P., & Ingwersen, P. (1998). Measures of relative relevance and ranked half-life: performance indicators for interactive IR. In W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, & J. Zobel (Eds.), Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp. 324-331.
[4]
Burgin, R. (1992). Variations in relevance judgements and the evaluation of retrieval performance. Information Processing and Management, 28(5), 619-627.
[5]
Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In N. J. Belkin, P. Ingwersen, & M.-K. Leong (Eds.), Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 33-40). ACM, New York.
[6]
Conover, W. J. (1980). Practical nonparametric Statistics (2nd ed.). New York: Wiley.
[7]
Cooper, W. S. (1971). A Definition of Relevance for Information Retrieval. Information Storage and Retrieval, 7(1), 19-37.
[8]
Cosijn, E., & Ingwersen, P. (2000). Dimensions of relevance. Information Processing and Management, 36, 533-550.
[9]
Cuadra, C. A., & Katter, R. V. (1967). Experimental studies of relevance judgments: final report. Project summary (Vol. I). Santa Monica: System Development Corporation.
[10]
Hull, D. (1993). Using statistical testing in the evaluation of retrieval experiments. In R. Korfhage, E. M. Rasmussen & P. Willett (Eds.), Proceedings of the 16th international conference on research and development in information retrieval (pp. 349-338). ACM, New York, NY.
[11]
Järvelin, K., & Kekääläinen, J. (2000). IR evaluation methods for highly relevant documents. In N. J. Belkin, P. Ingwersen, & M.-K. Leong (Eds.), Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 41-48). ACM, New York.
[12]
Järvelin, K., & Kekääläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 442-446.
[13]
Keen, E. M. (1992). Presenting results of experimental retrieval comparisons. Information Processing and Management, 28(4), 491-501.
[14]
Kekäläinen, J., & Jäärvelin, K. (2002). Evaluating information retrieval systems under the challenges of interaction and multidimensional dynamic relevance. In H. Bruce, R. Fidel, P. Ingwersen, & P. (Eds.) Proceedings of the CoLIS 4 conference (pp. 253-270). Greenwood Village, Libraries Unlimited.
[15]
Korfhage, R. R. (1997). Information storage and retrieval. New York: Wiley & Sons.
[16]
Lesk, M. E., & Salton, G. (1968). Relevance assessments and retrieval system evaluation. Information Storage and Retrieval, 4(4), 343-359.
[17]
Pollack, S. M. (1968). Measures for the comparison of information retrieval systems. American Documentation, 19(4), 387-397.
[18]
Robertson, S. E. (1977). The probability ranking principle in IR. Journal of Documentation, 33(4), 294-304.
[19]
Robertson, S. E., & Belkin, N. J. (1978). Ranking in principle. Journal of Documentation, 34(2), 93-100.
[20]
Sakai, T. (2003). Average gain ratio: a simple retrieval performance measure for evaluation with multiple relevance levels. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (pp. 417-418). ACM, New York.
[21]
Saracevic, T. (1975). Relevance: A review of and framework for the thinking on the notion in information science. Journal of the American Society for Information Science, 26(6), 321-343.
[22]
Saracevic, T. (1996). Relevance reconsidered '96. In P. Ingwersen & N. O. Pors (Eds.), Proceedings of the second international conference on conceptions of library and information science: integration in perspective (pp. 201-218). The Royal School of Librarianship, Copenhagen.
[23]
Schamber, L. (1994). Relevance and information behavior. In M. E. Williams (Ed.), Annual review of information science and technology (pp. 3-48), Vol. 29, Information Today, Medford.
[24]
Schamber, L., Eisenberg, M. B., & Nilan, M. S. (1990). A re-examination of relevance: toward a dynamic, situational definition. Information Processing and Management, 26(6), 755-776.
[25]
Sormunen, E. (2002). Liberal relevance criteria of TREC--Counting on negligible documents? In M. Beaulieu, R. Baeza-Yates, & S. H. Myaeng (Eds.), Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (pp. 324-330). ACM, New York.
[26]
Tang, R., Shaw, W. M., & Vevea, J. L. (1999). Towards the identification of the optimal number of relevance categories. Journal of the American Society for Information Science, 50(3), 254-264.
[27]
Voorhees, E. (1998). Variations in relevance judgements and the measurement of retrieval effectiveness. In: W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, & J. Zobel (Eds.), Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 315-323). ACM, New York.
[28]
Voorhees, E. (2000). Variations in relevance judgements and the measurement of retrieval effectiveness. Information Processing and Management, 36, 697-716.
[29]
Voorhees, E. (2001). Evaluation by highly relevant documents. In W. B. Croft, D. J. Harper, D. H. Kraft, & J. Zobel (Eds.), Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 74-82). ACM, New York.
[30]
Voorhees, E., & Harman, D. (1999). Overview of the eight Text REtrieval Conference (TREC-8). In E. Voorhees & D. Harman (Eds.), Proceedings of the eight text retrieval conference (TREC-8), NIST Special Publication 500-246 (pp. 1-24). Retrieved September 30, 2003. Available from 〈http://trec.nist.gov/pubs/trec8/papers/overview_8.pdf〉.

Cited By

View all
  • (2024)The effect of co-opinion on the cocitation-based information retrieval systems’ effectiveness evaluated by semantic similarityJournal of Information Science10.1177/0165551522111651850:5(1131-1147)Online publication date: 1-Oct-2024
  • (2023)Discrete Listwise Content-aware RecommendationACM Transactions on Knowledge Discovery from Data10.1145/360933418:1(1-20)Online publication date: 10-Aug-2023
  • (2020)Interactive faceted query suggestion for exploratory searchJournal of the Association for Information Science and Technology10.1002/asi.2430471:7(742-756)Online publication date: 8-Jun-2020
  • Show More Cited By

Index Terms

  1. Binary and graded relevance in IR evaluations: comparison of the effects on ranking of IR systems

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Information Processing and Management: an International Journal
        Information Processing and Management: an International Journal  Volume 41, Issue 5
        September 2005
        313 pages

        Publisher

        Pergamon Press, Inc.

        United States

        Publication History

        Published: 01 September 2005

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 22 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)The effect of co-opinion on the cocitation-based information retrieval systems’ effectiveness evaluated by semantic similarityJournal of Information Science10.1177/0165551522111651850:5(1131-1147)Online publication date: 1-Oct-2024
        • (2023)Discrete Listwise Content-aware RecommendationACM Transactions on Knowledge Discovery from Data10.1145/360933418:1(1-20)Online publication date: 10-Aug-2023
        • (2020)Interactive faceted query suggestion for exploratory searchJournal of the Association for Information Science and Technology10.1002/asi.2430471:7(742-756)Online publication date: 8-Jun-2020
        • (2019)In quest of new document relationsScientometrics10.1007/s11192-019-03058-3119:2(987-1008)Online publication date: 1-May-2019
        • (2016)An Optimization Framework for Remapping and Reweighting Noisy Relevance LabelsProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911501(105-114)Online publication date: 7-Jul-2016
        • (2016)The twist measure for IR evaluationJournal of the Association for Information Science and Technology10.1002/asi.2341667:3(620-648)Online publication date: 1-Mar-2016
        • (2015)On the Order Equivalence Relation of Binary Association MeasuresInternational Journal of Applied Mathematics and Computer Science10.1515/amcs-2015-004725:3(645-657)Online publication date: 1-Sep-2015
        • (2013)Evaluation in Music Information RetrievalJournal of Intelligent Information Systems10.1007/s10844-013-0249-441:3(345-369)Online publication date: 1-Dec-2013
        • (2011)Diversity-aware evaluation for paraphrase patternsProceedings of the TextInfer 2011 Workshop on Textual Entailment10.5555/2140473.2140478(35-39)Online publication date: 30-Jul-2011
        • (2010)Reconsideration of the simulated work task situationProceedings of the third symposium on Information interaction in context10.1145/1840784.1840808(155-164)Online publication date: 18-Aug-2010
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media