Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3077136.3080674acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Sub-corpora Impact on System Effectiveness

Published: 07 August 2017 Publication History

Abstract

Understanding the factors comprising IR system effectiveness is of primary importance to compare different IR systems. Effectiveness is traditionally broken down, using ANOVA, into a topic and a system effect but this leaves out a key component of our evaluation paradigm: the collections of documents. We break down effectiveness into topic, system and sub-corpus effects and compare it to the traditional break down, considering what happens when different evaluation measures come into play. We found that sub-corpora are a significant effect. The consideration of which allows us to be more accurate in estimating what systems are significantly different. We also found that the sub-corpora affect different evaluation measures in different ways and this may impact on what systems are considered significantly different.

References

[1]
D. Banks, P. Over, and N.-F. Zhang 1999. Blind Men and Elephants: Six Approaches to TREC data. Information Retrieval Vol. 1, 1--2, 7--34.
[2]
O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. CIKM 2009. 621--630.
[3]
N. Ferro and G. Silvello. 2016. A General Linear Mixed Models Approach to Study System Component Effects. SIGIR 2016. 25--34.
[4]
N. Ferro, G. Silvello, H. Keskustalo, A. Pirkola, and K. Järvelin 2016. The Twist Measure for IR Evaluation: Taking User's Effort Into Account. JASIST, Vol. 67, 3, 620--648.
[5]
Y. Hochberg and A. C. Tamhane. 1987. Multiple Comparison Procedures. John Wiley & Sons, USA.
[6]
K. Järvelin and J. Kekäläinen. 2002. Cumulated Gain-Based Evaluation of IR Techniques. ACM TOIS, Vol. 20, 4, 422--446.
[7]
T. Jones, A. Turpin, S. Mizzaro, F. Scholer, and M. Sanderson. 2014. Size and Source Matter: Understanding Inconsistencies in Test Collection-Based Evaluation. In CIKM 2014. 1843--1846.
[8]
A. Moffat and J. Zobel. 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM TOIS, Vol. 27, 1, 2:1--2:27.
[9]
S. Olejnik and J. Algina. 2003. Generalized Eta and Omega Squared Statistics: Measures of Effect Size for Some Common Research Designs. Psychological Methods Vol. 8, 4, 434--447.
[10]
S. E. Robertson and E. Kanoulas. 2012. On Per-topic Variance in IR Evaluation. In SIGIR 2012. 891--900.
[11]
A. Rutherford. 2011. ANOVA and ANCOVA. A GLM Approach (2nd ed.). John Wiley & Sons, New York, USA.
[12]
T. Sakai. 2014. Statistical Reform in Information Retrieval? SIGIR Forum, Vol. 48, 1, 3--12.
[13]
M. Sanderson, A. Turpin, Y. Zhang, and F. Scholer. 2012. Differences in Effectiveness Across Sub-collections. CIKM 2012. 1965--1969.
[14]
J. M. Tague-Sutcliffe and J. Blustein. 1994. A Statistical Analysis of the TREC-3 Data. In TREC-3. 385--398.

Cited By

View all
  • (2023)The Impact of Judgment Variability on the Consistency of Offline Effectiveness MeasuresACM Transactions on Information Systems10.1145/359651142:1(1-31)Online publication date: 18-Aug-2023
  • (2021)Evaluating the Predictivity of IR ExperimentsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463040(1667-1671)Online publication date: 11-Jul-2021
  • (2021)Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot SystemsExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-85251-1_8(91-102)Online publication date: 14-Sep-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2017
1476 pages
ISBN:9781450350228
DOI:10.1145/3077136
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anova
  2. effectiveness model
  3. experimental evaluation
  4. glmm
  5. retrieval effectiveness
  6. sub-corpus effect

Qualifiers

  • Short-paper

Funding Sources

Conference

SIGIR '17
Sponsor:

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)The Impact of Judgment Variability on the Consistency of Offline Effectiveness MeasuresACM Transactions on Information Systems10.1145/359651142:1(1-31)Online publication date: 18-Aug-2023
  • (2021)Evaluating the Predictivity of IR ExperimentsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463040(1667-1671)Online publication date: 11-Jul-2021
  • (2021)Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot SystemsExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-85251-1_8(91-102)Online publication date: 14-Sep-2021
  • (2021)System Effect Estimation by Sharding: A Comparison Between ANOVA Approaches to Detect Significant DifferencesAdvances in Information Retrieval10.1007/978-3-030-72240-1_3(33-46)Online publication date: 30-Mar-2021
  • (2020)Leveraging Behavioral Heterogeneity Across Markets for Cross-Market Training of Recommender SystemsCompanion Proceedings of the Web Conference 202010.1145/3366424.3384362(694-702)Online publication date: 20-Apr-2020
  • (2019)Improving the Accuracy of System Performance Estimation by Using ShardsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3338062(805-814)Online publication date: 18-Jul-2019
  • (2019)On Topic Difficulty in IR EvaluationProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331279(909-912)Online publication date: 18-Jul-2019
  • (2019)Using Collection Shards to Study Retrieval Performance Effect SizesACM Transactions on Information Systems10.1145/331036437:3(1-40)Online publication date: 19-Mar-2019
  • (2018)The Dagstuhl Perspectives Workshop on Performance Modeling and PredictionACM SIGIR Forum10.1145/3274784.327478952:1(91-101)Online publication date: 31-Aug-2018
  • (2017)Toward an anatomy of IR system component performancesJournal of the Association for Information Science and Technology10.1002/asi.2391069:2(187-200)Online publication date: 17-Nov-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media