research-article

A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

Authors:

Marcel Genzmehr,

Andreas Nürnberger,

Bela GippAuthors Info & Claims

RepSys '13: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation

Pages 7 - 14

https://doi.org/10.1145/2532508.2532511

Published: 12 October 2013 Publication History

Abstract

Offline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which we evaluated various recommendation approaches with both offline and online evaluations. We found that results of offline and online evaluations often contradict each other. We discuss this finding in detail and conclude that offline evaluations may be inappropriate for evaluating research paper recommender systems, in many settings.

References

[1]

O. Küçüktunç, E. Saule, K. Kaya, and Ü.V. Çatalyürek, "Recommendation on Academic Networks using Direction Aware Citation Analysis," arXiv preprint arXiv:1205.1143, 2012, pp. 1--10.

[2]

R. Torres, S. M. McNee, M. Abel, J. A. Konstan, and J. Riedl, "Enhancing digital libraries with TechLens," Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, ACM New York, NY, USA, 2004, pp. 228--236.

Digital Library

[3]

S. M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. K. Lam, A. M. Rashid, J. A. Konstan, and J. Riedl, "On the Recommending of Citations for Research Papers," Proceedings of the ACM Conference on Computer Supported Cooperative Work, New Orleans, Louisiana, USA: ACM, 2002, pp. 116--125.

Digital Library

[4]

J. A. Konstan and J. Riedl, "Recommender systems: from algorithms to user experience," User Modeling and User-Adapted Interaction, 2012, pp. 1--23.

Digital Library

[5]

B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu, and C. Newell, "Explaining the user experience of recommender systems," User Modeling and User-Adapted Interaction, vol. 22, 2012, pp. 441--504.

Digital Library

[6]

A. Gunawardana and G. Shani, "A survey of accuracy evaluation metrics of recommendation tasks," The Journal of Machine Learning Research, vol. 10, 2009.

Digital Library

[7]

G. Karypis, "Evaluation of item-based top-n recommendation algorithms," Proceedings of the tenth international conference on Information and knowledge management, ACM, 2001, pp. 247--254.

Digital Library

[8]

J. Beel, S. Langer, M. Genzmehr, B. Gipp, C. Breitinger, and A. Nürnberger, "Research Paper Recommender System Evaluation: A Quantitative Literature Survey," Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys), 2013.

Digital Library

[9]

M. Ge, C. Delgado-Battenfeld, and D. Jannach, "Beyond accuracy: evaluating recommender systems by coverage and serendipity," Proceedings of the fourth ACM conference on Recommender systems, ACM, 2010, pp. 257--260.

Digital Library

[10]

W. Hersh, A. Turpin, S. Price, B. Chan, D. Kramer, L. Sacherek, and D. Olson, "Do batch and user evaluations give the same results?," Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2000, pp. 17--24.

Digital Library

[11]

D. Jannach, L. Lerche, F. Gedikli, and G. Bonnin, "What Recommenders Recommend--An Analysis of Accuracy, Popularity, and Sales Diversity Effects," User Modeling, Adaptation, and Personalization, Springer, 2013.

[12]

G. Shani and A. Gunawardana, "Evaluating recommendation systems," Recommender systems handbook, Springer, 2011, pp. 257--297.

[13]

A. H. Turpin and W. Hersh, "Why batch and user evaluations do not give the same results," Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2001, pp. 225--231.

Digital Library

[14]

J. Beel, B. Gipp, S. Langer, and M. Genzmehr, "Docear: An Academic Literature Suite for Searching, Organizing and Creating Academic Literature," Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, ACM, 2011, pp. 465--466.

Digital Library

[15]

J. Beel, S. Langer, M. Genzmehr, and A. Nürnberger, "Introducing Docear's Research Paper Recommender System," Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13), ACM, 2013, pp. 459--460.

Digital Library

[16]

K. D. Bollacker, S. Lawrence, and C. L. Giles, "CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications," Proceedings of the 2nd international conference on Autonomous agents, ACM, 1998, pp. 116--123.

Digital Library

[17]

S. M. McNee, N. Kapoor, and J. A. Konstan, "Don't look stupid: avoiding pitfalls when recommending research papers," Proceedings of the 20th anniversary conference on Computer supported cooperative work, ProQuest, 2006, pp. 171--180.

Digital Library

[18]

J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, "Evaluating collaborative filtering recommender systems," ACM Transactions on Information Systems (TOIS), vol. 22, 2004, pp. 5--53.

Digital Library

[19]

F. Ricci, L. Rokach, B. Shapira, and K. B. P., "Recommender systems handbook," Recommender Systems Handbook, 2011, pp. 1--35.

[20]

J. Beel, S. Langer, and M. Genzmehr, "Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling," Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), T. Aalberg, M. Dobreva, C. Papatheodorou, G. Tsakonas, and C. Farrugia, eds., Valletta, Malta: 2013, pp. 395--399.

[21]

J. Beel, S. Langer, A. Nürnberger, and M. Genzmehr, "The Impact of Demographics (Age and Gender) and Other User Characteristics on Evaluating Recommender Systems," Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), T. Aalberg, M. Dobreva, C. Papatheodorou, G. Tsakonas, and C. Farrugia, eds., Valletta, Malta: Springer, 2013, pp. 400--404.

[22]

T. A. Brooks, "Private acts and public objects: an investigation of citer motivations," Journal of the American Society for Information Science, vol. 36, 1985, pp. 223--229.

Digital Library

[23]

M. Liu, "Progress in documentation the complexities of citation practice: a review of citation studies," Journal of Documentation, vol. 49, 1993, pp. 370--408.

[24]

M. H. MacRoberts and B. MacRoberts, "Problems of Citation Analysis," Scientometrics, vol. 36, 1996, pp. 435--444.

[25]

X. Amatriain, J. Pujol, and N. Oliver, "I like it... i like it not: Evaluating user ratings noise in recommender systems," User Modeling, Adaptation, and Personalization, 2009, pp. 247--258.

Digital Library

Cited By

Fu CWang KWu JChen YHuzhang GNi YZeng AZhou ZBaeza-Yates RBonchi F(2024)Residual Multi-Task Learner for Applied RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671523(4974-4985)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671523
LeBlanc PBanks DFu LLi MTang ZWu Q(2024)Recommender Systems: A ReviewJournal of the American Statistical Association10.1080/01621459.2023.2279695119:545(773-785)Online publication date: 4-Jan-2024
https://doi.org/10.1080/01621459.2023.2279695
Pinedo ILarrañaga MArruarte A(2024)ArZiGo: A recommendation system for scientific articlesInformation Systems10.1016/j.is.2024.102367122(102367)Online publication date: May-2024
https://doi.org/10.1016/j.is.2024.102367
Show More Cited By

Index Terms

A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Offline and online evaluation of news recommender systems at swissinfo.ch
RecSys '14: Proceedings of the 8th ACM Conference on Recommender systems

We report on the live evaluation of various news recommender systems conducted on the website swissinfo.ch. We demonstrate that there is a major difference between offline and online accuracy evaluations. In an offline setting, recommending most popular ...
Research paper recommender system evaluation: a quantitative literature survey
RepSys '13: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation

Over 80 approaches for academic literature recommendation exist today. The approaches were introduced and evaluated in more than 170 research articles, as well as patents, presentations and blogs. We reviewed these approaches and found most evaluations ...
Online and Offline Evaluation in Search Clarification
The effectiveness of clarification question models in engaging users within search systems is currently constrained, casting doubt on their overall usefulness. To improve the performance of these models, it is crucial to employ assessment approaches that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

RepSys '13: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation

October 2013

34 pages

ISBN:9781450324656

DOI:10.1145/2532508

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

CWI: Centrum voor Wiskunde en Informatica - Netherlands

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

RepSys '13

Sponsor:

CWI

RepSys '13: Workshop on Reproducibility and Replication in Recommender Systems Evaluation

October 12, 2013

Hong Kong, China

Acceptance Rates

RepSys '13 Paper Acceptance Rate 4 of 5 submissions, 80%;

Overall Acceptance Rate 4 of 5 submissions, 80%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

85
Total Citations
View Citations
1,061
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)14

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fu CWang KWu JChen YHuzhang GNi YZeng AZhou ZBaeza-Yates RBonchi F(2024)Residual Multi-Task Learner for Applied RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671523(4974-4985)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671523
LeBlanc PBanks DFu LLi MTang ZWu Q(2024)Recommender Systems: A ReviewJournal of the American Statistical Association10.1080/01621459.2023.2279695119:545(773-785)Online publication date: 4-Jan-2024
https://doi.org/10.1080/01621459.2023.2279695
Pinedo ILarrañaga MArruarte A(2024)ArZiGo: A recommendation system for scientific articlesInformation Systems10.1016/j.is.2024.102367122(102367)Online publication date: May-2024
https://doi.org/10.1016/j.is.2024.102367
Liu YAchananuparp PLim E(2024)Non-binary evaluation of next-basket food recommendationUser Modeling and User-Adapted Interaction10.1007/s11257-023-09369-834:1(183-227)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s11257-023-09369-8
Kar PRoy MDatta SKar PRoy MDatta S(2024)Collaborative Filtering and Content-Based SystemsRecommender Systems: Algorithms and their Applications10.1007/978-981-97-0538-2_3(19-30)Online publication date: 12-Jun-2024
https://doi.org/10.1007/978-981-97-0538-2_3
Jeunen O(2023)A Common Misassumption in Online Experiments with Machine Learning ModelsACM SIGIR Forum10.1145/3636341.363635857:1(1-9)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1145/3636341.3636358
Vente T(2023)Advancing Automation of Design Decisions in Recommender System PipelinesProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608886(1355-1360)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608886
Wegmeth L(2023)Improving Recommender Systems Through the Automation of Design DecisionsProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608877(1332-1338)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608877
Michiels LVerachtert RFerraro AFalk KGoethals B(2023)A Framework and Toolkit for Testing the Correctness of Recommendation AlgorithmsACM Transactions on Recommender Systems10.1145/35911092:1(1-45)Online publication date: 20-Apr-2023
https://dl.acm.org/doi/10.1145/3591109
Lee HYoo SLee DKim JChen HDuh WHuang HKato MMothe JPoblete B(2023)How Important is Periodic Model update in Recommender System?Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591934(2661-2668)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591934
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents