Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3512076acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

Lessons from the AdKDD’21 Privacy-Preserving ML Challenge

Published: 25 April 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Designing data sharing mechanisms providing performance and strong privacy guarantees is a hot topic for the Online Advertising industry. Namely, a prominent proposal discussed under the Improving Web Advertising Business Group at W3C only allows sharing advertising signals through aggregated, differentially private reports of past displays. To study this proposal extensively, an open Privacy-Preserving Machine Learning Challenge took place at AdKDD’21, a premier workshop on Advertising Science with data provided by advertising company Criteo. In this paper, we describe the challenge tasks, the structure of the available datasets, report the challenge results, and enable its full reproducibility. A key finding is that learning models on large, aggregated data in the presence of a small set of unaggregated data points can be surprisingly efficient and cheap. We also run additional experiments to observe the sensitivity of winning methods to different parameters such as privacy budget or quantity of available privileged side information. We conclude that the industry needs either alternate designs for private data sharing or a breakthrough in learning with aggregated data only to keep ad relevance at a reasonable level.

    References

    [1]
    Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (Vienna, Austria) (CCS ’16). Association for Computing Machinery, New York, NY, USA, 308–318. https://doi.org/10.1145/2976749.2978318
    [2]
    Erik Anderson. 2021. Masked Learning, Aggregation and Reporting worKflow (Masked LARK). https://github.com/WICG/privacy-preserving-ads/blob/main/MaskedLARK.md. Accessed: 2021-05-01.
    [3]
    Avazu. 2014. Avazu CTR Prediction Contest. https://www.kaggle.com/c/avazu-ctr-prediction. Accessed: 2021-05-01.
    [4]
    Avito.ru. 2015. Avito Context Ad Clicks. https://www.kaggle.com/c/avito-context-ad-clicks. Accessed: 2021-05-01.
    [5]
    Luca Belli, Alykhan Tejani, Frank Portman, Alexandre Lung-Yut-Fong, Ben Chamberlain, Yuanpu Xie, Kristian Lum, Jonathan Hunt, Michael Bronstein, Vito Walter Anelli, Saikishore Kalloori, Bruce Ferwerda, and Wenzhe Shi. 2021. The 2021 RecSys Challenge Dataset: Fairness is not optional. arxiv:2109.08245 [cs.SI]
    [6]
    Avradeep Bhowmik, Minmin Chen, Zhengming Xing, and Suju Rajan. 2019. Estimagg: A learning framework for groupwise aggregated data. In Proceedings of the 2019 SIAM International Conference on Data Mining. SIAM, 477–485.
    [7]
    Avradeep Bhowmik, Joydeep Ghosh, and Oluwasanmi Koyejo. 2016. Sparse parameter recovery from aggregated data. In International Conference on Machine Learning. PMLR, 1090–1099.
    [8]
    U.S. Census Bureau. 2021. Differential Privacy for Census Data Explained. https://www.ncsl.org/research/redistricting/differential-privacy-for-census-data-explained.aspx. Accessed: 2021-05-01.
    [9]
    Web Incubator CG. 2020. The Conversion Measurement API. https://github.com/WICG/conversion-measurement-api. Accessed: 2021-05-01.
    [10]
    Olivier Chapelle, Eren Manavoglu, and Romer Rosales. 2014. Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 4(2014), 1–34.
    [11]
    Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially Private Empirical Risk Minimization. J. Mach. Learn. Res. 12, null (July 2011), 1069–1109.
    [12]
    Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.
    [13]
    Criteo. 2014. Criteo Display Advertising Challenge. https://www.kaggle.com/c/criteo-display-ad-challenge. Accessed: 2021-05-01.
    [14]
    Damien Desfontaines. 2021. The magic of Gaussian noise. https://desfontain.es/privacy/gaussian-noise.html. Accessed: 2021-05-01.
    [15]
    Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 486–503.
    [16]
    Cynthia Dwork, Aaron Roth, 2014. The algorithmic foundations of differential privacy.Found. Trends Theor. Comput. Sci. 9, 3-4 (2014), 211–407.
    [17]
    Alexandre Gilotte and David Rohde. 2021. Learning a logistic model from aggregated data. (2021).
    [18]
    C. S. Harrisson. 2020. The Aggregate Reporting API. https://github.com/csharrison/aggregate-reporting-api. Accessed: 2021-05-01.
    [19]
    Yuchin Juan, Yong Zhuang, Wei-Sheng Chin, and Chih-Jen Lin. 2016. Field-aware factorization machines for CTR prediction. In Proceedings of the 10th ACM conference on recommender systems. 43–50.
    [20]
    Gary King, Martin A Tanner, and Ori Rosen. 2004. Ecological inference: New methodological strategies. Cambridge University Press.
    [21]
    Jakub Konečnỳ, Brendan McMahan, and Daniel Ramage. 2015. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575(2015).
    [22]
    Hairen Liao, Lingxiao Peng, Zhenchuan Liu, and Xuehua Shen. 2014. iPinYou global rtb bidding algorithm competition dataset. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. 1–6.
    [23]
    H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, 2013. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1222–1230.
    [24]
    Outbrain. 2016. Outbrain Click Prediction Challenge. https://www.kaggle.com/c/outbrain-click-prediction. Accessed: 2021-05-01.
    [25]
    Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. 2017. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data. arxiv:1610.05755 [stat.ML]
    [26]
    Florian Pargent, Florian Pfisterer, Janek Thomas, and Bernd Bischl. 2021. Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. arXiv preprint arXiv:2104.00629(2021).
    [27]
    Pierangela Samarati and Latanya Sweeney. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. (1998).
    [28]
    John Wilander. 2021. Private Click Measurement. https://webkit.org/blog/11529/introducing-private-click-measurement-pcm/. Accessed: 2021-05-01.
    [29]
    Yivan Zhang, Nontawat Charoenphakdee, Zhenguo Wu, and Masashi Sugiyama. 2020. Learning from Aggregate Observations. Advances in Neural Information Processing Systems 33 (2020).

    Cited By

    View all
    • (2023)Label differential privacy and private training data releaseProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618540(3233-3251)Online publication date: 23-Jul-2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '22: Proceedings of the ACM Web Conference 2022
    April 2022
    3764 pages
    ISBN:9781450390965
    DOI:10.1145/3485447
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 April 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. aggregated data
    2. differential privacy
    3. machine learning
    4. online advertising

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '22
    Sponsor:
    WWW '22: The ACM Web Conference 2022
    April 25 - 29, 2022
    Virtual Event, Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Label differential privacy and private training data releaseProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618540(3233-3251)Online publication date: 23-Jul-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media