research-article

A Method to Anonymize Business Metrics to Publishing Implicit Feedback Datasets

Authors:

Yoshifumi Seki,

Takanori MaeharaAuthors Info & Claims

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

Pages 4 - 12

https://doi.org/10.1145/3383313.3412256

Published: 22 September 2020 Publication History

Abstract

This paper shows a method for building and publishing datasets in commercial services. Datasets contribute to the development of research in machine learning and recommender systems. In particular, because recommender systems play a central role in many commercial services, publishing datasets from the services are in great demand from the recommender system community. However, the publication of datasets by commercial services may have some business risks to those companies. To publish a dataset, this must be approved by a business manager of the service. Because many business managers are not specialists in machine learning or recommender systems, the researchers are responsible for explaining to them the risks and benefits.

We first summarize three challenges in building datasets from commercial services: (1) anonymize the business metrics, (2) maintain fairness, and (3) reduce the popularity bias. Then, we formulate the problem of building and publishing datasets as an optimization problem that seeks the sampling weight of users, where the challenges are encoded as appropriate loss functions. We applied our method to build datasets from the raw data of our real-world mobile news delivery service. The raw data has more than 1,000,000 users with 100,000,000 interactions. Each dataset was built in less than 10 minutes. We discussed the properties of our method by checking the statistics of the datasets and the performances of typical recommender system algorithms.

References

[1]

Oren Barkan and Noam Koenigstein. 2016. Item2vec: neural item embedding for collaborative filtering. In Proceedings of the IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP’16). IEEE, 1–6.

[2]

Alejandro Bellogín, Pablo Castells, and Iván Cantador. 2017. Statistical Biases in Information Retrieval Metrics for Recommender Systems. Information Retrieval Journal 20, 6 (2017), 606–634.

Digital Library

[3]

James Bennett, Stan Lanning, 2007. The Netflix Prize. In Proceedings of KDD Cup and Workshop, Vol. 2007. 35.

[4]

Vladimir Braverman, Rafail Ostrovsky, and Gregory Vorsanger. 2015. Weighted sampling without replacement from data streams. Inform. Process. Lett. 115, 12 (2015), 923–926.

Digital Library

[5]

John S. Breese, David Heckerman, and Carl Kadie. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI’98). 43–52.

Digital Library

[6]

Hugo Caselles-Dupré, Florian Lesaint, and Jimena Royo-Letelier. 2018. Word2vec applied to recommendation: Hyperparameters matter. In Proceedings of the 2018 ACM Conference on Recommender Systems. 352–356.

Digital Library

[7]

Chih-Cheng Chang, Brian Thompson, Hui (Wendy) Wang, and Danfeng Yao. 2010. Towards Publishing Recommendation Data with Predictive Anonymization. In Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security (ASIACCS’10) (Beijing, China) (ASIACCS ’10). Association for Computing Machinery, New York, NY, USA, 24–35. https://doi.org/10.1145/1755688.1755693

Digital Library

[8]

Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. 2019. Are we really making much progress? A worrying analysis of recent neural recommendation approaches. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys’19). 101–109.

Digital Library

[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 248–255.

[10]

Kazuto Fukuchi, Satoshi Hara, and Takanori Maehara. 2020. Faking Fairness via Stealthily Biased Sampling. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20), Special Track on AI for Social Impact (AISI). AAAI, 8.

[11]

Blake Hallinan and Ted Striphas. 2016. Recommended for you: The Netflix Prize and the production of algorithmic culture. New media & society 18, 1 (2016), 117–137.

[12]

F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS) 5, 4(2015), 1–19.

Digital Library

[13]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. In Proceedings of the 2016 International Conference on Learning Representations (ICLR’16). 10.

[14]

Daniel G Horvitz and Donovan J Thompson. 1952. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47, 260 (1952), 663–685.

[15]

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635(2019).

[16]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Conference on Neural Information Processing Systems (NIPS’13). 3111–3119.

[17]

Arvind Narayanan and Vitaly Shmatikov. 2006. How To Break Anonymity of the Netflix Prize Dataset. arXiv abs/cs/0610105(2006).

[18]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 188–197.

[19]

Douglas W Oard, Jinmook Kim, 1998. Implicit feedback for recommender systems. In Proceedings of the 1998 AAAI Workshop on Recommender Systems. 81–83.

[20]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the 33th Conference on Neural Information Processing Systems (NeurIPS’19). Curran Associates, 8024–8035.

[21]

Ning Qian. 1999. On the momentum term in gradient descent learning algorithms. Neural Networks 12, 1 (1999), 145–151.

Digital Library

[22]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI’09). 452–461.

Digital Library

[23]

Luc Rocher, Julien M Hendrickx, and Yves-Alexandre De Montjoye. 2019. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications 10, 1 (2019), 1–9.

[24]

Pierangela Samarati and Latanya Sweeney. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. (1998).

[25]

Sumit Sidana, Charlotte Laclau, and Massih-Reza Amini. 2018. Learning to Recommend Diverse Items over Implicit Feedback on PANDOR. In Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys ’18). Association for Computing Machinery, New York, NY, USA, 427–431. https://doi.org/10.1145/3240323.3240400

Digital Library

[26]

Sumit Sidana, Charlotte Laclau, Massih R Amini, Gilles Vandelle, and André Bois-Crettez. 2017. KASANDR: A Large-Scale Dataset with Implicit Feedback for Recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1245–1248.

Digital Library

[27]

SS Vallender. 1974. Calculation of the Wasserstein distance between probability distributions on the line. Theory of Probability & Its Applications 18, 4 (1974), 784–786.

[28]

Lucas Vinh Tran, Yi Tay, Shuai Zhang, Gao Cong, and Xiaoli Li. 2020. HyperML: A boosting metric learning approach in hyperbolic space for recommender systems. In Proceedings of the 2020 International Conference on Web Search and Data Mining (WSDM’20). 609–617.

Digital Library

[29]

Longqi Yang, Yin Cui, Yuan Xuan, Chenyang Wang, Serge Belongie, and Deborah Estrin. 2018. Unbiased offline recommender evaluation for missing-not-at-random implicit feedback. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys’18). 279–287.

Digital Library

[30]

Sirui Yao and Bert Huang. 2017. Beyond parity: Fairness objectives for collaborative filtering. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS’17). 2921–2930.

Cited By

Klimashevskaia AJannach DElahi MTrattner C(2024)A survey on popularity bias in recommender systemsUser Modeling and User-Adapted Interaction10.1007/s11257-024-09406-0Online publication date: 1-Jul-2024
https://doi.org/10.1007/s11257-024-09406-0
Amine DHichem HSoumia Z(2022)Binary Archimedes Optimization Algorithm based Feature Selection for Regression Problem2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS)10.1109/PAIS56586.2022.9946903(1-7)Online publication date: 12-Oct-2022
https://doi.org/10.1109/PAIS56586.2022.9946903

A Method to Anonymize Business Metrics to Publishing Implicit Feedback Datasets
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals

Recommendations

Creating Recommender Systems Datasets in Scientific Fields
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Recommender systems (RS) have been successfully explored in a vast number of domains, e.g. movies and tv shows, music, or e-commerce. In these domains we have a large number of datasets freely available for testing and evaluating new recommender ...
The MovieLens Datasets: History and Context
Regular Articles and Special issue on New Directions in Eye Gaze for Interactive Intelligent Systems (Part 1 of 2)

The MovieLens datasets are widely used in education, research, and industry. They are downloaded hundreds of thousands of times each year, reflecting their use in popular press programming books, traditional and online courses, and software. These ...
The Adressa dataset for news recommendation
WI '17: Proceedings of the International Conference on Web Intelligence

Datasets for recommender systems are few and often inadequate for the contextualized nature of news recommendation. News recommender systems are both time- and location-dependent, make use of implicit signals, and often include both collaborative and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

September 2020

796 pages

ISBN:9781450375832

DOI:10.1145/3383313

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

RecSys '20

Sponsor:

RecSys '20: Fourteenth ACM Conference on Recommender Systems

September 22 - 26, 2020

Virtual Event, Brazil

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Upcoming Conference

RecSys '24

Sponsor:
sigchi

18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
802
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)3

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Klimashevskaia AJannach DElahi MTrattner C(2024)A survey on popularity bias in recommender systemsUser Modeling and User-Adapted Interaction10.1007/s11257-024-09406-0Online publication date: 1-Jul-2024
https://doi.org/10.1007/s11257-024-09406-0
Amine DHichem HSoumia Z(2022)Binary Archimedes Optimization Algorithm based Feature Selection for Regression Problem2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS)10.1109/PAIS56586.2022.9946903(1-7)Online publication date: 12-Oct-2022
https://doi.org/10.1109/PAIS56586.2022.9946903

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents