research-article

Group Validation in Recommender Systems: Framework for Multi-layer Performance Evaluation

Authors:

Wissam Al Jurdi,

Jacques Bou Abdo,

Jacques Demerjian,

Abdallah MakhoulAuthors Info & Claims

ACM Transactions on Recommender Systems, Volume 2, Issue 1

Article No.: 8, Pages 1 - 25

https://doi.org/10.1145/3640820

Published: 07 March 2024 Publication History

Abstract

Evaluation of recommendation systems continues evolving, especially in recent years. There have been several attempts to standardize the assessment processes and propose replacement metrics better oriented toward measuring effective personalization. However, standard evaluation tools merely possess the capacity to provide a general overview of a system’s performance; they lack consistency and effectiveness in their use, as evidenced by most recent studies on the topic. Furthermore, traditional evaluation techniques fail to detect potentially harmful data on small subsets. Moreover, they generally lack explainable features to interpret how such minor variations could affect the system’s performance. This proposal focuses on data clustering for recommender evaluation and applies a cluster assessment technique to locate such performance issues. Our new approach, named group validation, aids in spotting critical performance variability in compact subsets of the system’s data and unravels hidden weaknesses in predictions where such unfavorable variations generally go unnoticed with typical assessment methods. Group validation for recommenders is a modular evaluation layer that complements regular evaluation and includes a new unique perspective to the evaluation process. Additionally, it allows several applications to the recommender ecosystem, such as model evolution tests, fraud/attack detection, and the capacity for hosting a hybrid model setup.

References

[1]

Alejandro Bellogín and Alan Said. 2021. Improving accountability in recommender systems research through reproducibility. User Model User-Adap Inter 31 (2021), 941–977.

[2]

Craig Macdonald. 2021. The Simpson’s Paradox in the offline evaluation of recommendation systems. ACM Transactions on Information Systems 40, 1 (2021).

[3]

Zohreh Ovaisi, Shelby Heinecke, Jia Li, Yongfeng Zhang, Elena Zheleva, and Caiming Xiong. 2022. RGRecSys: A toolkit for robustness evaluation of recommender systems. arXiv preprint arXiv:2201.04399 (2022).

[4]

Di Han, Yifan Huang, Xiaotian Jing, and Junmin Liu. 2021. AND: Effective coupling of accuracy, novelty and diversity in the recommender system. In 17th International Conference on Mobility, Sensing and Networking (MSN’21). IEEE, 772–777.

[5]

N. Ferro, Norbert Fuhr, Gregory Grefenstette, Joseph A. Konstan, Pablo Castells, Elizabeth M. Daly, Thierry Declerck, Michael D. Ekstrand, Werner Geyer, Julio Gonzalo, Tsvi Kuflik, Krister Lindén, Bernardo Magnini, Jian-Yun Nie, R. Perego, Bracha Shapira, Ian Soboroff, Nava Tintarev, Karin M. Verspoor, Martijn C. Willemsen, and Justin Zobel. 2018. The Dagstuhl Perspectives Workshop on performance modeling and prediction. In ACM SIGIR Forum, Vol. 52. ACM New York, NY, 91–101.

[6]

Daniel Valcarce, Alejandro Bellogín, Javier Parapar, and Pablo Castells. 2018. On the robustness and discriminative power of information retrieval metrics for top-N recommendation. In 12th ACM Conference on Recommender Systems. 260–268.

Digital Library

[7]

Andreas Argyriou, Miguel González-Fierro, and Le Zhang. 2020. Microsoft recommenders: Best practices for production-ready recommendation systems. In the Web Conference. 50–51.

Digital Library

[8]

Zhu Sun, Di Yu, Hui Fang, Jie Yang, Xinghua Qu, Jie Zhang, and Cong Geng. 2020. Are we evaluating rigorously? Benchmarking recommendation for reproducible evaluation and fair comparison. In 14th ACM Conference on Recommender Systems. 23–32.

Digital Library

[9]

Vito Walter Anelli, Alejandro Bellogín, Antonio Ferrara, Daniele Malitesta, Felice Antonio Merra, Claudio Pomo, Francesco Maria Donini, and Tommaso Di Noia. 2021. Elliot: A comprehensive and rigorous framework for reproducible recommender systems evaluation. In 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2405–2414.

Digital Library

[10]

Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In 30th ACM International Conference on Information & Knowledge Management. 4653–4664.

Digital Library

[11]

Jin Yao Chin, Yile Chen, and Gao Cong. 2022. The datasets dilemma: How much do we really know about recommendation datasets? In 15th ACM International Conference on Web Search and Data Mining. 141–149.

Digital Library

[12]

Yan-Martin Tamm, Rinchin Damdinov, and Alexey Vasilev. 2021. Quality metrics in recommender systems: Do we calculate metrics consistently? In 15th ACM Conference on Recommender Systems. 708–713.

Digital Library

[13]

Pablo Castells, Neil Hurley, and Saul Vargas. 2022. Novelty and diversity in recommender systems. In Recommender Systems Handbook. Springer, 603–646.

[14]

Javier Parapar and Filip Radlinski. 2021. Towards unified metrics for accuracy and diversity for recommender systems. In 15th ACM Conference on Recommender Systems. 75–84.

Digital Library

[15]

Marina Drosou and Evaggelia Pitoura. 2010. Search result diversification. ACM SIGMOD Rec. 39, 1 (2010), 41–47.

Digital Library

[16]

Hui Fang and ChengXiang Zhai. 2005. An exploration of axiomatic approaches to information retrieval. In 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 480–487.

Digital Library

[17]

Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. 2004. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22, 1 (2004), 5–53.

Digital Library

[18]

Wissam Al Jurdi, Jacques Bou Abdo, Jacques Demerjian, and Abdallah Makhoul. 2021. Critique on natural noise in recommender systems. ACM Trans. Knowl. Discov. Data 15, 5 (2021), 1–30.

Digital Library

[19]

H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. 2013. Ad click prediction: A view from the trenches. In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1222–1230.

Digital Library

[20]

Yeounoh Chung, Tim Kraska, Steven Euijong Whang, and Neoklis Polyzotis. 2018. Slice finder: Automated data slicing for model interpretability. In SysML Conference.

[21]

Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, and Steven Euijong Whang. 2019. Automated data slicing for model validation: A big data-AI integration approach. IEEE Trans. Knowl. Data Eng. 32, 12 (2019), 2284–2296.

[22]

Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, and Steven Euijong Whang. 2019. Slice finder: Automated data slicing for model validation. In IEEE 35th International Conference on Data Engineering (ICDE’19). IEEE, 1550–1553.

[23]

Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2219–2228.

Digital Library

[24]

Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13, 1 (1967), 21–27.

Digital Library

[25]

Daniel B. Neill. 2015. Subset Scanning for Event and Pattern Detection. Springer, International Publishing, Cham, 1–10.

[26]

Thomas Rincy N. and Roopam Gupta. 2021. An efficient feature subset selection approach for machine learning. Multim. Tools Applic. 80, 8 (2021), 12737–12830.

Digital Library

[27]

Daniel B. Neill. 2012. Fast subset scan for spatial pattern detection. J. R. Stat. Soc. Series B: Stat. Methodol. 74, 2 (2012), 337–360.

[28]

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens datasets: History and context. ACM Trans. Interact. Intell. Syst. 5, 4 (2015), 1–19.

Digital Library

[29]

Wikipedia contributors. 2022. Welch’s t-test — Wikipedia, The Free Encyclopedia. Retrieved from: https://en.wikipedia.org/w/index.php?title=Welch%27s_t-test&oldid=1091289937

[30]

Wikipedia contributors. 2022. Effect Size — Wikipedia, The Free Encyclopedia. Retrieved from: https://en.wikipedia.org/w/index.php?title=Effect_size&oldid=1102056368

[31]

Gail M. Sullivan and Richard Feinn. 2012. Using effect size–or why the P value is not enough. J. Grad. Medic. Educ. 4, 3 (2012), 279–282.

[32]

Jacob Cohen. 2013. Statistical Power Analysis for the Behavioral Sciences. Routledge.

[33]

Bart P. Knijnenburg, Martijn C. Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. Explaining the user experience of recommender systems. User Model. User-adapt. Interact. 22, 4 (2012), 441–504.

Digital Library

[34]

R. Logesh, V. Subramaniyaswamy, D. Malathi, N. Sivaramakrishnan, and Varadarajan Vijayakumar. 2020. Enhancing recommendation stability of collaborative filtering recommender system through bio-inspired clustering ensemble method. Neural Comput. Applic. 32, 7 (2020), 2141–2164.

[35]

Rishabh Ahuja, Arun Solanki, and Anand Nayyar. 2019. Movie recommender system using K-Means clustering and K-Nearest Neighbor. In 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence’19). IEEE, 263–268.

[36]

Qian Zhao, F. Maxwell Harper, Gediminas Adomavicius, and Joseph A. Konstan. 2018. Explicit or implicit feedback? Engagement or satisfaction? A field experiment on machine-learning-based recommender systems. In 33rd Annual ACM Symposium on Applied Computing. 1331–1340.

[37]

Charu C. Aggarwal. 2016. Recommender Systems. Vol. 1. Springer.

[38]

Irina Beregovskaya and Mikhail Koroteev. 2021. Review of clustering-based recommender systems. arXiv preprint arXiv:2109.12839 (2021).

[39]

David Sculley. 2010. Web-scale k-means clustering. In 19th International Conference on World Wide Web. 1177–1178.

Digital Library

[40]

David Arthur and Sergei Vassilvitskii. 2006. k-means++: The Advantages of Careful Seeding. Technical Report. Stanford.

[41]

Farhin Mansur, Vibha Patel, and Mihir Patel. 2017. A review on recommender systems. In International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS’17). IEEE, 1–6.

[42]

Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 659–666.

Digital Library

[43]

Saúl Vargas. 2014. Novelty and diversity enhancement and evaluation in recommender systems and information retrieval. In 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 1281–1281.

Digital Library

[44]

Saúl Vargas. 2015. Novelty and Diversity Evaluation and Enhancement in Recommender Systems. Ph.D. Dissertation. PhD thesis, Universidad Autónoma de Madrid, Spain.

[45]

Li Chen, Yonghua Yang, Ningxia Wang, Keping Yang, and Quan Yuan. 2019. How serendipity improves user satisfaction with recommendations? A large-scale user evaluation. In the World Wide Web Conference. 240–250.

Digital Library

[46]

Miriam El Khoury Badran, Jacques Bou Abdo, Wissam Al Jurdi, and Jacques Demerjian. 2019. Adaptive serendipity for recommender systems: Let it find you. In International Conference on Agents and Artificial Intelligence (ICAART’19). 739–745.

[47]

Wissam Al Jurdi, Chady Abou Jaoude, Miriam El Khoury Badran, Jacques Bou Abdo, Jacques Demerjian, and Abdallah Makhoul. 2019. SCCF parameter and similarity measure optimization and evaluation. In International Conference on Knowledge Science, Engineering and Management. Springer, 118–127.

Digital Library

[48]

Enrique Amigó, Damiano Spina, and Jorge Carrillo-de Albornoz. 2018. An axiomatic analysis of diversity evaluation metrics: Introducing the rank-biased utility metric. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 625–634.

Digital Library

[49]

Wissam Al Jurdi. 2022. Group Validation in Recommender Systems: Framework for Multi-layer Performance Evaluation. Retrieved from: https://github.com/wissamjur/group-validation

[50]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2012. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012).

[51]

Simon Funk. 2006. Netflix Update: Try This at Home. Retrieved from: https://sifter.org/simon/journal/20061211.html

[52]

Andriy Mnih and Russ R. Salakhutdinov. 2007. Probabilistic matrix factorization. Adv. Neural Inf. Process. Syst. 20 (2007).

[53]

Nicolas Hug. 2020. Surprise: A Python library for recommender systems. J. Open Source Softw. 5, 52 (2020), 2174. DOI:

[54]

Tien T. Nguyen, F. Maxwell Harper, Loren Terveen, and Joseph A. Konstan. 2018. User personality and user satisfaction with recommender systems. Inf. Syst. Front. 20, 6 (2018), 1173–1189.

Digital Library

[55]

Yang Zhang, Fuli Feng, Chenxu Wang, Xiangnan He, Meng Wang, Yan Li, and Yongdong Zhang. 2020. How to retrain recommender system? A sequential meta-learning method. In 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1479–1488.

Digital Library

[56]

Raciel Yera Toledo, Yailé Caballero Mota, and Luis Martínez. 2015. Correcting noisy ratings in collaborative recommender systems. Knowl.-based Syst. 76 (2015), 96–108.

Digital Library

[57]

Wissam Al Jurdi, Miriam El Khoury Badran, Chady Abou Jaoude, Jacques Bou Abdo, Jacques Demerjian, and Abdallah Makhoul. 2018. Serendipity-aware noise detection system for recommender systems. In Proceedings of the Information and Knowledge Engineering 1 (2019), 107–114.

[58]

Finn Brunton and Helen Nissenbaum. 2015. Obfuscation: A User’s Guide for Privacy and Protest. MIT Press.

Digital Library

[59]

Wissam Al Jurdi, Jacques Bou Abdo, Jacques Demerjian, and Abdallah Makhoul. 2022. Strategic attacks on recommender systems: An obfuscation scenario. In IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA’22). IEEE, 1–8.

[60]

Deepjyoti Roy and Mala Dutta. 2022. A systematic review and research perspective on recommender systems. J. Big Data 9, 1 (2022), 59.

[61]

Tahereh Pourhabibi, Kok-Leong Ong, Booi H. Kam, and Yee Ling Boo. 2020. Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decis. Supp. Syst. 133 (2020), 113303. DOI:

Digital Library

Cited By

Bauer CSaid AZangerle E(2024)Introduction to the Special Issue on Perspectives on Recommender Systems EvaluationACM Transactions on Recommender Systems10.1145/36483982:1(1-5)Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3648398
Al Jurdi WAbdo JDemerjian JMakhoul A(2022)Strategic Attacks on Recommender Systems: An Obfuscation Scenario2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA56895.2022.10017953(1-8)Online publication date: Dec-2022
https://doi.org/10.1109/AICCSA56895.2022.10017953

Index Terms

Group Validation in Recommender Systems: Framework for Multi-layer Performance Evaluation
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Presentation of retrieval results
    2. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Bridging the Gap Between User-centric and Offline Evaluation of Personalized Recommendation Systems
UMAP '18: Adjunct Publication of the 26th Conference on User Modeling, Adaptation and Personalization

In this paper, we propose to evaluate recommender systems by conducting both offline and user-centric evaluations, while considering multiple quality aspects in realistic settings. This comprehensive evaluation would provide insight on how to improve ...
What We Evaluate When We Evaluate Recommender Systems: Understanding Recommender Systems’ Performance using Item Response Theory
RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Current practices in offline evaluation use rank-based metrics to measure the quality of top-n recommendation lists. This approach has practical benefits as it centres assessment on the output of the recommender system and, therefore, measures ...
Widespread Flaws in Offline Evaluation of Recommender Systems
RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Even though offline evaluation is just an imperfect proxy of online performance – due to the interactive nature of recommenders – it will probably remain the primary way of evaluation in recommender systems research for the foreseeable future, since the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Recommender Systems

ACM Transactions on Recommender Systems Volume 2, Issue 1

March 2024

346 pages

EISSN:2770-6699

DOI:10.1145/3613520

Editors:
Li Chen
Hong Kong Baptist University, China
,
Dietmar Jannach
University of Klagenfurt, Austria

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2024

Online AM: 19 January 2024

Accepted: 22 December 2023

Revised: 12 November 2023

Received: 01 December 2022

Published in TORS Volume 2, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The Lebanese University Research Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
121
Total Downloads

Downloads (Last 12 months)121
Downloads (Last 6 weeks)7

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bauer CSaid AZangerle E(2024)Introduction to the Special Issue on Perspectives on Recommender Systems EvaluationACM Transactions on Recommender Systems10.1145/36483982:1(1-5)Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3648398
Al Jurdi WAbdo JDemerjian JMakhoul A(2022)Strategic Attacks on Recommender Systems: An Obfuscation Scenario2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA56895.2022.10017953(1-8)Online publication date: Dec-2022
https://doi.org/10.1109/AICCSA56895.2022.10017953

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents