Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3383313.3412256acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

A Method to Anonymize Business Metrics to Publishing Implicit Feedback Datasets

Published: 22 September 2020 Publication History

Abstract

This paper shows a method for building and publishing datasets in commercial services. Datasets contribute to the development of research in machine learning and recommender systems. In particular, because recommender systems play a central role in many commercial services, publishing datasets from the services are in great demand from the recommender system community. However, the publication of datasets by commercial services may have some business risks to those companies. To publish a dataset, this must be approved by a business manager of the service. Because many business managers are not specialists in machine learning or recommender systems, the researchers are responsible for explaining to them the risks and benefits.
We first summarize three challenges in building datasets from commercial services: (1) anonymize the business metrics, (2) maintain fairness, and (3) reduce the popularity bias. Then, we formulate the problem of building and publishing datasets as an optimization problem that seeks the sampling weight of users, where the challenges are encoded as appropriate loss functions. We applied our method to build datasets from the raw data of our real-world mobile news delivery service. The raw data has more than 1,000,000 users with 100,000,000 interactions. Each dataset was built in less than 10 minutes. We discussed the properties of our method by checking the statistics of the datasets and the performances of typical recommender system algorithms.

References

[1]
Oren Barkan and Noam Koenigstein. 2016. Item2vec: neural item embedding for collaborative filtering. In Proceedings of the IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP’16). IEEE, 1–6.
[2]
Alejandro Bellogín, Pablo Castells, and Iván Cantador. 2017. Statistical Biases in Information Retrieval Metrics for Recommender Systems. Information Retrieval Journal 20, 6 (2017), 606–634.
[3]
James Bennett, Stan Lanning, 2007. The Netflix Prize. In Proceedings of KDD Cup and Workshop, Vol. 2007. 35.
[4]
Vladimir Braverman, Rafail Ostrovsky, and Gregory Vorsanger. 2015. Weighted sampling without replacement from data streams. Inform. Process. Lett. 115, 12 (2015), 923–926.
[5]
John S. Breese, David Heckerman, and Carl Kadie. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI’98). 43–52.
[6]
Hugo Caselles-Dupré, Florian Lesaint, and Jimena Royo-Letelier. 2018. Word2vec applied to recommendation: Hyperparameters matter. In Proceedings of the 2018 ACM Conference on Recommender Systems. 352–356.
[7]
Chih-Cheng Chang, Brian Thompson, Hui (Wendy) Wang, and Danfeng Yao. 2010. Towards Publishing Recommendation Data with Predictive Anonymization. In Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security (ASIACCS’10) (Beijing, China) (ASIACCS ’10). Association for Computing Machinery, New York, NY, USA, 24–35. https://doi.org/10.1145/1755688.1755693
[8]
Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. 2019. Are we really making much progress? A worrying analysis of recent neural recommendation approaches. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys’19). 101–109.
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 248–255.
[10]
Kazuto Fukuchi, Satoshi Hara, and Takanori Maehara. 2020. Faking Fairness via Stealthily Biased Sampling. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20), Special Track on AI for Social Impact (AISI). AAAI, 8.
[11]
Blake Hallinan and Ted Striphas. 2016. Recommended for you: The Netflix Prize and the production of algorithmic culture. New media & society 18, 1 (2016), 117–137.
[12]
F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS) 5, 4(2015), 1–19.
[13]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. In Proceedings of the 2016 International Conference on Learning Representations (ICLR’16). 10.
[14]
Daniel G Horvitz and Donovan J Thompson. 1952. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47, 260 (1952), 663–685.
[15]
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635(2019).
[16]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Conference on Neural Information Processing Systems (NIPS’13). 3111–3119.
[17]
Arvind Narayanan and Vitaly Shmatikov. 2006. How To Break Anonymity of the Netflix Prize Dataset. arXiv abs/cs/0610105(2006).
[18]
Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 188–197.
[19]
Douglas W Oard, Jinmook Kim, 1998. Implicit feedback for recommender systems. In Proceedings of the 1998 AAAI Workshop on Recommender Systems. 81–83.
[20]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the 33th Conference on Neural Information Processing Systems (NeurIPS’19). Curran Associates, 8024–8035.
[21]
Ning Qian. 1999. On the momentum term in gradient descent learning algorithms. Neural Networks 12, 1 (1999), 145–151.
[22]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI’09). 452–461.
[23]
Luc Rocher, Julien M Hendrickx, and Yves-Alexandre De Montjoye. 2019. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications 10, 1 (2019), 1–9.
[24]
Pierangela Samarati and Latanya Sweeney. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. (1998).
[25]
Sumit Sidana, Charlotte Laclau, and Massih-Reza Amini. 2018. Learning to Recommend Diverse Items over Implicit Feedback on PANDOR. In Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys ’18). Association for Computing Machinery, New York, NY, USA, 427–431. https://doi.org/10.1145/3240323.3240400
[26]
Sumit Sidana, Charlotte Laclau, Massih R Amini, Gilles Vandelle, and André Bois-Crettez. 2017. KASANDR: A Large-Scale Dataset with Implicit Feedback for Recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1245–1248.
[27]
SS Vallender. 1974. Calculation of the Wasserstein distance between probability distributions on the line. Theory of Probability & Its Applications 18, 4 (1974), 784–786.
[28]
Lucas Vinh Tran, Yi Tay, Shuai Zhang, Gao Cong, and Xiaoli Li. 2020. HyperML: A boosting metric learning approach in hyperbolic space for recommender systems. In Proceedings of the 2020 International Conference on Web Search and Data Mining (WSDM’20). 609–617.
[29]
Longqi Yang, Yin Cui, Yuan Xuan, Chenyang Wang, Serge Belongie, and Deborah Estrin. 2018. Unbiased offline recommender evaluation for missing-not-at-random implicit feedback. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys’18). 279–287.
[30]
Sirui Yao and Bert Huang. 2017. Beyond parity: Fairness objectives for collaborative filtering. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS’17). 2921–2930.

Cited By

View all
  • (2024)A survey on popularity bias in recommender systemsUser Modeling and User-Adapted Interaction10.1007/s11257-024-09406-0Online publication date: 1-Jul-2024
  • (2022)Binary Archimedes Optimization Algorithm based Feature Selection for Regression Problem2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS)10.1109/PAIS56586.2022.9946903(1-7)Online publication date: 12-Oct-2022
  1. A Method to Anonymize Business Metrics to Publishing Implicit Feedback Datasets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems
    September 2020
    796 pages
    ISBN:9781450375832
    DOI:10.1145/3383313
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 September 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. datasets
    2. recommender systems

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    RecSys '20: Fourteenth ACM Conference on Recommender Systems
    September 22 - 26, 2020
    Virtual Event, Brazil

    Acceptance Rates

    Overall Acceptance Rate 254 of 1,295 submissions, 20%

    Upcoming Conference

    RecSys '24
    18th ACM Conference on Recommender Systems
    October 14 - 18, 2024
    Bari , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A survey on popularity bias in recommender systemsUser Modeling and User-Adapted Interaction10.1007/s11257-024-09406-0Online publication date: 1-Jul-2024
    • (2022)Binary Archimedes Optimization Algorithm based Feature Selection for Regression Problem2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS)10.1109/PAIS56586.2022.9946903(1-7)Online publication date: 12-Oct-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media