Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2532508.2532511acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrepsysConference Proceedingsconference-collections
research-article

A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

Published: 12 October 2013 Publication History

Abstract

Offline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which we evaluated various recommendation approaches with both offline and online evaluations. We found that results of offline and online evaluations often contradict each other. We discuss this finding in detail and conclude that offline evaluations may be inappropriate for evaluating research paper recommender systems, in many settings.

References

[1]
O. Küçüktunç, E. Saule, K. Kaya, and Ü.V. Çatalyürek, "Recommendation on Academic Networks using Direction Aware Citation Analysis," arXiv preprint arXiv:1205.1143, 2012, pp. 1--10.
[2]
R. Torres, S. M. McNee, M. Abel, J. A. Konstan, and J. Riedl, "Enhancing digital libraries with TechLens," Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, ACM New York, NY, USA, 2004, pp. 228--236.
[3]
S. M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. K. Lam, A. M. Rashid, J. A. Konstan, and J. Riedl, "On the Recommending of Citations for Research Papers," Proceedings of the ACM Conference on Computer Supported Cooperative Work, New Orleans, Louisiana, USA: ACM, 2002, pp. 116--125.
[4]
J. A. Konstan and J. Riedl, "Recommender systems: from algorithms to user experience," User Modeling and User-Adapted Interaction, 2012, pp. 1--23.
[5]
B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu, and C. Newell, "Explaining the user experience of recommender systems," User Modeling and User-Adapted Interaction, vol. 22, 2012, pp. 441--504.
[6]
A. Gunawardana and G. Shani, "A survey of accuracy evaluation metrics of recommendation tasks," The Journal of Machine Learning Research, vol. 10, 2009.
[7]
G. Karypis, "Evaluation of item-based top-n recommendation algorithms," Proceedings of the tenth international conference on Information and knowledge management, ACM, 2001, pp. 247--254.
[8]
J. Beel, S. Langer, M. Genzmehr, B. Gipp, C. Breitinger, and A. Nürnberger, "Research Paper Recommender System Evaluation: A Quantitative Literature Survey," Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys), 2013.
[9]
M. Ge, C. Delgado-Battenfeld, and D. Jannach, "Beyond accuracy: evaluating recommender systems by coverage and serendipity," Proceedings of the fourth ACM conference on Recommender systems, ACM, 2010, pp. 257--260.
[10]
W. Hersh, A. Turpin, S. Price, B. Chan, D. Kramer, L. Sacherek, and D. Olson, "Do batch and user evaluations give the same results?," Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2000, pp. 17--24.
[11]
D. Jannach, L. Lerche, F. Gedikli, and G. Bonnin, "What Recommenders Recommend--An Analysis of Accuracy, Popularity, and Sales Diversity Effects," User Modeling, Adaptation, and Personalization, Springer, 2013.
[12]
G. Shani and A. Gunawardana, "Evaluating recommendation systems," Recommender systems handbook, Springer, 2011, pp. 257--297.
[13]
A. H. Turpin and W. Hersh, "Why batch and user evaluations do not give the same results," Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2001, pp. 225--231.
[14]
J. Beel, B. Gipp, S. Langer, and M. Genzmehr, "Docear: An Academic Literature Suite for Searching, Organizing and Creating Academic Literature," Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, ACM, 2011, pp. 465--466.
[15]
J. Beel, S. Langer, M. Genzmehr, and A. Nürnberger, "Introducing Docear's Research Paper Recommender System," Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13), ACM, 2013, pp. 459--460.
[16]
K. D. Bollacker, S. Lawrence, and C. L. Giles, "CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications," Proceedings of the 2nd international conference on Autonomous agents, ACM, 1998, pp. 116--123.
[17]
S. M. McNee, N. Kapoor, and J. A. Konstan, "Don't look stupid: avoiding pitfalls when recommending research papers," Proceedings of the 20th anniversary conference on Computer supported cooperative work, ProQuest, 2006, pp. 171--180.
[18]
J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, "Evaluating collaborative filtering recommender systems," ACM Transactions on Information Systems (TOIS), vol. 22, 2004, pp. 5--53.
[19]
F. Ricci, L. Rokach, B. Shapira, and K. B. P., "Recommender systems handbook," Recommender Systems Handbook, 2011, pp. 1--35.
[20]
J. Beel, S. Langer, and M. Genzmehr, "Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling," Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), T. Aalberg, M. Dobreva, C. Papatheodorou, G. Tsakonas, and C. Farrugia, eds., Valletta, Malta: 2013, pp. 395--399.
[21]
J. Beel, S. Langer, A. Nürnberger, and M. Genzmehr, "The Impact of Demographics (Age and Gender) and Other User Characteristics on Evaluating Recommender Systems," Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), T. Aalberg, M. Dobreva, C. Papatheodorou, G. Tsakonas, and C. Farrugia, eds., Valletta, Malta: Springer, 2013, pp. 400--404.
[22]
T. A. Brooks, "Private acts and public objects: an investigation of citer motivations," Journal of the American Society for Information Science, vol. 36, 1985, pp. 223--229.
[23]
M. Liu, "Progress in documentation the complexities of citation practice: a review of citation studies," Journal of Documentation, vol. 49, 1993, pp. 370--408.
[24]
M. H. MacRoberts and B. MacRoberts, "Problems of Citation Analysis," Scientometrics, vol. 36, 1996, pp. 435--444.
[25]
X. Amatriain, J. Pujol, and N. Oliver, "I like it... i like it not: Evaluating user ratings noise in recommender systems," User Modeling, Adaptation, and Personalization, 2009, pp. 247--258.

Cited By

View all
  • (2024)Residual Multi-Task Learner for Applied RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671523(4974-4985)Online publication date: 25-Aug-2024
  • (2024)Recommender Systems: A ReviewJournal of the American Statistical Association10.1080/01621459.2023.2279695119:545(773-785)Online publication date: 4-Jan-2024
  • (2024)ArZiGo: A recommendation system for scientific articlesInformation Systems10.1016/j.is.2024.102367122(102367)Online publication date: May-2024
  • Show More Cited By

Index Terms

  1. A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      RepSys '13: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation
      October 2013
      34 pages
      ISBN:9781450324656
      DOI:10.1145/2532508
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      • CWI: Centrum voor Wiskunde en Informatica - Netherlands

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 October 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. click-through rate
      2. comparative study
      3. evaluation
      4. offline evaluation
      5. online evaluation
      6. research paper recommender systems

      Qualifiers

      • Research-article

      Conference

      RepSys '13
      Sponsor:
      • CWI

      Acceptance Rates

      RepSys '13 Paper Acceptance Rate 4 of 5 submissions, 80%;
      Overall Acceptance Rate 4 of 5 submissions, 80%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)53
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 23 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Residual Multi-Task Learner for Applied RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671523(4974-4985)Online publication date: 25-Aug-2024
      • (2024)Recommender Systems: A ReviewJournal of the American Statistical Association10.1080/01621459.2023.2279695119:545(773-785)Online publication date: 4-Jan-2024
      • (2024)ArZiGo: A recommendation system for scientific articlesInformation Systems10.1016/j.is.2024.102367122(102367)Online publication date: May-2024
      • (2024)Non-binary evaluation of next-basket food recommendationUser Modeling and User-Adapted Interaction10.1007/s11257-023-09369-834:1(183-227)Online publication date: 1-Mar-2024
      • (2024)Collaborative Filtering and Content-Based SystemsRecommender Systems: Algorithms and their Applications10.1007/978-981-97-0538-2_3(19-30)Online publication date: 12-Jun-2024
      • (2023)A Common Misassumption in Online Experiments with Machine Learning ModelsACM SIGIR Forum10.1145/3636341.363635857:1(1-9)Online publication date: 1-Jun-2023
      • (2023)Advancing Automation of Design Decisions in Recommender System PipelinesProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608886(1355-1360)Online publication date: 14-Sep-2023
      • (2023)Improving Recommender Systems Through the Automation of Design DecisionsProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608877(1332-1338)Online publication date: 14-Sep-2023
      • (2023)A Framework and Toolkit for Testing the Correctness of Recommendation AlgorithmsACM Transactions on Recommender Systems10.1145/35911092:1(1-45)Online publication date: 20-Apr-2023
      • (2023)How Important is Periodic Model update in Recommender System?Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591934(2661-2668)Online publication date: 19-Jul-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media