Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Group Validation in Recommender Systems: Framework for Multi-layer Performance Evaluation

Published: 07 March 2024 Publication History

Abstract

Evaluation of recommendation systems continues evolving, especially in recent years. There have been several attempts to standardize the assessment processes and propose replacement metrics better oriented toward measuring effective personalization. However, standard evaluation tools merely possess the capacity to provide a general overview of a system’s performance; they lack consistency and effectiveness in their use, as evidenced by most recent studies on the topic. Furthermore, traditional evaluation techniques fail to detect potentially harmful data on small subsets. Moreover, they generally lack explainable features to interpret how such minor variations could affect the system’s performance. This proposal focuses on data clustering for recommender evaluation and applies a cluster assessment technique to locate such performance issues. Our new approach, named group validation, aids in spotting critical performance variability in compact subsets of the system’s data and unravels hidden weaknesses in predictions where such unfavorable variations generally go unnoticed with typical assessment methods. Group validation for recommenders is a modular evaluation layer that complements regular evaluation and includes a new unique perspective to the evaluation process. Additionally, it allows several applications to the recommender ecosystem, such as model evolution tests, fraud/attack detection, and the capacity for hosting a hybrid model setup.

References

[1]
Alejandro Bellogín and Alan Said. 2021. Improving accountability in recommender systems research through reproducibility. User Model User-Adap Inter 31 (2021), 941–977.
[2]
Craig Macdonald. 2021. The Simpson’s Paradox in the offline evaluation of recommendation systems. ACM Transactions on Information Systems 40, 1 (2021).
[3]
Zohreh Ovaisi, Shelby Heinecke, Jia Li, Yongfeng Zhang, Elena Zheleva, and Caiming Xiong. 2022. RGRecSys: A toolkit for robustness evaluation of recommender systems. arXiv preprint arXiv:2201.04399 (2022).
[4]
Di Han, Yifan Huang, Xiaotian Jing, and Junmin Liu. 2021. AND: Effective coupling of accuracy, novelty and diversity in the recommender system. In 17th International Conference on Mobility, Sensing and Networking (MSN’21). IEEE, 772–777.
[5]
N. Ferro, Norbert Fuhr, Gregory Grefenstette, Joseph A. Konstan, Pablo Castells, Elizabeth M. Daly, Thierry Declerck, Michael D. Ekstrand, Werner Geyer, Julio Gonzalo, Tsvi Kuflik, Krister Lindén, Bernardo Magnini, Jian-Yun Nie, R. Perego, Bracha Shapira, Ian Soboroff, Nava Tintarev, Karin M. Verspoor, Martijn C. Willemsen, and Justin Zobel. 2018. The Dagstuhl Perspectives Workshop on performance modeling and prediction. In ACM SIGIR Forum, Vol. 52. ACM New York, NY, 91–101.
[6]
Daniel Valcarce, Alejandro Bellogín, Javier Parapar, and Pablo Castells. 2018. On the robustness and discriminative power of information retrieval metrics for top-N recommendation. In 12th ACM Conference on Recommender Systems. 260–268.
[7]
Andreas Argyriou, Miguel González-Fierro, and Le Zhang. 2020. Microsoft recommenders: Best practices for production-ready recommendation systems. In the Web Conference. 50–51.
[8]
Zhu Sun, Di Yu, Hui Fang, Jie Yang, Xinghua Qu, Jie Zhang, and Cong Geng. 2020. Are we evaluating rigorously? Benchmarking recommendation for reproducible evaluation and fair comparison. In 14th ACM Conference on Recommender Systems. 23–32.
[9]
Vito Walter Anelli, Alejandro Bellogín, Antonio Ferrara, Daniele Malitesta, Felice Antonio Merra, Claudio Pomo, Francesco Maria Donini, and Tommaso Di Noia. 2021. Elliot: A comprehensive and rigorous framework for reproducible recommender systems evaluation. In 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2405–2414.
[10]
Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In 30th ACM International Conference on Information & Knowledge Management. 4653–4664.
[11]
Jin Yao Chin, Yile Chen, and Gao Cong. 2022. The datasets dilemma: How much do we really know about recommendation datasets? In 15th ACM International Conference on Web Search and Data Mining. 141–149.
[12]
Yan-Martin Tamm, Rinchin Damdinov, and Alexey Vasilev. 2021. Quality metrics in recommender systems: Do we calculate metrics consistently? In 15th ACM Conference on Recommender Systems. 708–713.
[13]
Pablo Castells, Neil Hurley, and Saul Vargas. 2022. Novelty and diversity in recommender systems. In Recommender Systems Handbook. Springer, 603–646.
[14]
Javier Parapar and Filip Radlinski. 2021. Towards unified metrics for accuracy and diversity for recommender systems. In 15th ACM Conference on Recommender Systems. 75–84.
[15]
Marina Drosou and Evaggelia Pitoura. 2010. Search result diversification. ACM SIGMOD Rec. 39, 1 (2010), 41–47.
[16]
Hui Fang and ChengXiang Zhai. 2005. An exploration of axiomatic approaches to information retrieval. In 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 480–487.
[17]
Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. 2004. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22, 1 (2004), 5–53.
[18]
Wissam Al Jurdi, Jacques Bou Abdo, Jacques Demerjian, and Abdallah Makhoul. 2021. Critique on natural noise in recommender systems. ACM Trans. Knowl. Discov. Data 15, 5 (2021), 1–30.
[19]
H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. 2013. Ad click prediction: A view from the trenches. In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1222–1230.
[20]
Yeounoh Chung, Tim Kraska, Steven Euijong Whang, and Neoklis Polyzotis. 2018. Slice finder: Automated data slicing for model interpretability. In SysML Conference.
[21]
Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, and Steven Euijong Whang. 2019. Automated data slicing for model validation: A big data-AI integration approach. IEEE Trans. Knowl. Data Eng. 32, 12 (2019), 2284–2296.
[22]
Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, and Steven Euijong Whang. 2019. Slice finder: Automated data slicing for model validation. In IEEE 35th International Conference on Data Engineering (ICDE’19). IEEE, 1550–1553.
[23]
Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2219–2228.
[24]
Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13, 1 (1967), 21–27.
[25]
Daniel B. Neill. 2015. Subset Scanning for Event and Pattern Detection. Springer, International Publishing, Cham, 1–10.
[26]
Thomas Rincy N. and Roopam Gupta. 2021. An efficient feature subset selection approach for machine learning. Multim. Tools Applic. 80, 8 (2021), 12737–12830.
[27]
Daniel B. Neill. 2012. Fast subset scan for spatial pattern detection. J. R. Stat. Soc. Series B: Stat. Methodol. 74, 2 (2012), 337–360.
[28]
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens datasets: History and context. ACM Trans. Interact. Intell. Syst. 5, 4 (2015), 1–19.
[29]
Wikipedia contributors. 2022. Welch’s t-test — Wikipedia, The Free Encyclopedia. Retrieved from: https://en.wikipedia.org/w/index.php?title=Welch%27s_t-test&oldid=1091289937
[30]
Wikipedia contributors. 2022. Effect Size — Wikipedia, The Free Encyclopedia. Retrieved from: https://en.wikipedia.org/w/index.php?title=Effect_size&oldid=1102056368
[31]
Gail M. Sullivan and Richard Feinn. 2012. Using effect size–or why the P value is not enough. J. Grad. Medic. Educ. 4, 3 (2012), 279–282.
[32]
Jacob Cohen. 2013. Statistical Power Analysis for the Behavioral Sciences. Routledge.
[33]
Bart P. Knijnenburg, Martijn C. Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. Explaining the user experience of recommender systems. User Model. User-adapt. Interact. 22, 4 (2012), 441–504.
[34]
R. Logesh, V. Subramaniyaswamy, D. Malathi, N. Sivaramakrishnan, and Varadarajan Vijayakumar. 2020. Enhancing recommendation stability of collaborative filtering recommender system through bio-inspired clustering ensemble method. Neural Comput. Applic. 32, 7 (2020), 2141–2164.
[35]
Rishabh Ahuja, Arun Solanki, and Anand Nayyar. 2019. Movie recommender system using K-Means clustering and K-Nearest Neighbor. In 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence’19). IEEE, 263–268.
[36]
Qian Zhao, F. Maxwell Harper, Gediminas Adomavicius, and Joseph A. Konstan. 2018. Explicit or implicit feedback? Engagement or satisfaction? A field experiment on machine-learning-based recommender systems. In 33rd Annual ACM Symposium on Applied Computing. 1331–1340.
[37]
Charu C. Aggarwal. 2016. Recommender Systems. Vol. 1. Springer.
[38]
Irina Beregovskaya and Mikhail Koroteev. 2021. Review of clustering-based recommender systems. arXiv preprint arXiv:2109.12839 (2021).
[39]
David Sculley. 2010. Web-scale k-means clustering. In 19th International Conference on World Wide Web. 1177–1178.
[40]
David Arthur and Sergei Vassilvitskii. 2006. k-means++: The Advantages of Careful Seeding. Technical Report. Stanford.
[41]
Farhin Mansur, Vibha Patel, and Mihir Patel. 2017. A review on recommender systems. In International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS’17). IEEE, 1–6.
[42]
Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 659–666.
[43]
Saúl Vargas. 2014. Novelty and diversity enhancement and evaluation in recommender systems and information retrieval. In 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 1281–1281.
[44]
Saúl Vargas. 2015. Novelty and Diversity Evaluation and Enhancement in Recommender Systems. Ph.D. Dissertation. PhD thesis, Universidad Autónoma de Madrid, Spain.
[45]
Li Chen, Yonghua Yang, Ningxia Wang, Keping Yang, and Quan Yuan. 2019. How serendipity improves user satisfaction with recommendations? A large-scale user evaluation. In the World Wide Web Conference. 240–250.
[46]
Miriam El Khoury Badran, Jacques Bou Abdo, Wissam Al Jurdi, and Jacques Demerjian. 2019. Adaptive serendipity for recommender systems: Let it find you. In International Conference on Agents and Artificial Intelligence (ICAART’19). 739–745.
[47]
Wissam Al Jurdi, Chady Abou Jaoude, Miriam El Khoury Badran, Jacques Bou Abdo, Jacques Demerjian, and Abdallah Makhoul. 2019. SCCF parameter and similarity measure optimization and evaluation. In International Conference on Knowledge Science, Engineering and Management. Springer, 118–127.
[48]
Enrique Amigó, Damiano Spina, and Jorge Carrillo-de Albornoz. 2018. An axiomatic analysis of diversity evaluation metrics: Introducing the rank-biased utility metric. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 625–634.
[49]
Wissam Al Jurdi. 2022. Group Validation in Recommender Systems: Framework for Multi-layer Performance Evaluation. Retrieved from: https://github.com/wissamjur/group-validation
[50]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2012. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012).
[51]
Simon Funk. 2006. Netflix Update: Try This at Home. Retrieved from: https://sifter.org/simon/journal/20061211.html
[52]
Andriy Mnih and Russ R. Salakhutdinov. 2007. Probabilistic matrix factorization. Adv. Neural Inf. Process. Syst. 20 (2007).
[53]
Nicolas Hug. 2020. Surprise: A Python library for recommender systems. J. Open Source Softw. 5, 52 (2020), 2174. DOI:
[54]
Tien T. Nguyen, F. Maxwell Harper, Loren Terveen, and Joseph A. Konstan. 2018. User personality and user satisfaction with recommender systems. Inf. Syst. Front. 20, 6 (2018), 1173–1189.
[55]
Yang Zhang, Fuli Feng, Chenxu Wang, Xiangnan He, Meng Wang, Yan Li, and Yongdong Zhang. 2020. How to retrain recommender system? A sequential meta-learning method. In 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1479–1488.
[56]
Raciel Yera Toledo, Yailé Caballero Mota, and Luis Martínez. 2015. Correcting noisy ratings in collaborative recommender systems. Knowl.-based Syst. 76 (2015), 96–108.
[57]
Wissam Al Jurdi, Miriam El Khoury Badran, Chady Abou Jaoude, Jacques Bou Abdo, Jacques Demerjian, and Abdallah Makhoul. 2018. Serendipity-aware noise detection system for recommender systems. In Proceedings of the Information and Knowledge Engineering 1 (2019), 107–114.
[58]
Finn Brunton and Helen Nissenbaum. 2015. Obfuscation: A User’s Guide for Privacy and Protest. MIT Press.
[59]
Wissam Al Jurdi, Jacques Bou Abdo, Jacques Demerjian, and Abdallah Makhoul. 2022. Strategic attacks on recommender systems: An obfuscation scenario. In IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA’22). IEEE, 1–8.
[60]
Deepjyoti Roy and Mala Dutta. 2022. A systematic review and research perspective on recommender systems. J. Big Data 9, 1 (2022), 59.
[61]
Tahereh Pourhabibi, Kok-Leong Ong, Booi H. Kam, and Yee Ling Boo. 2020. Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decis. Supp. Syst. 133 (2020), 113303. DOI:

Cited By

View all
  • (2024)Introduction to the Special Issue on Perspectives on Recommender Systems EvaluationACM Transactions on Recommender Systems10.1145/36483982:1(1-5)Online publication date: 7-Mar-2024
  • (2022)Strategic Attacks on Recommender Systems: An Obfuscation Scenario2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA56895.2022.10017953(1-8)Online publication date: Dec-2022

Index Terms

  1. Group Validation in Recommender Systems: Framework for Multi-layer Performance Evaluation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Recommender Systems
      ACM Transactions on Recommender Systems  Volume 2, Issue 1
      March 2024
      346 pages
      EISSN:2770-6699
      DOI:10.1145/3613520
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 March 2024
      Online AM: 19 January 2024
      Accepted: 22 December 2023
      Revised: 12 November 2023
      Received: 01 December 2022
      Published in TORS Volume 2, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Recommender systems
      2. offline evaluation
      3. model validation
      4. data clustering

      Qualifiers

      • Research-article

      Funding Sources

      • The Lebanese University Research Program

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)121
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 27 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Introduction to the Special Issue on Perspectives on Recommender Systems EvaluationACM Transactions on Recommender Systems10.1145/36483982:1(1-5)Online publication date: 7-Mar-2024
      • (2022)Strategic Attacks on Recommender Systems: An Obfuscation Scenario2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA56895.2022.10017953(1-8)Online publication date: Dec-2022

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media