Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3447548.3467180acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Clustering for Private Interest-based Advertising

Published: 14 August 2021 Publication History

Abstract

We study the problem of designing privacy-enhanced solutions for interest-based advertisement (IBA). IBA is a key component of the online ads ecosystem and provides a better ad experience to users. Indeed, IBA enables advertisers to show users impressions that are relevant to them. Nevertheless, the current way ad tech companies achieve this is by building detailed interest profiles for individual users. In this work we ask whether such fine grained personalization is required, and present mechanisms that achieve competitive performance while giving privacy guarantees to the end users. More precisely we present the first detailed exploration of how to implement Chrome's Federated Learning of Cohorts (FLoC) API. We define the privacy properties required for the API and evaluate multiple hashing and clustering algorithms discussing the trade-offs between utility, privacy, and ease of implementation.

References

[1]
Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu. 2005. Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT) (2005).
[2]
Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. 2017. Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017 .
[3]
Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni. 2009. Streaming k-means approximation. In NIPS, Vol. 4. 2.
[4]
Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using pagerank vectors. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06). IEEE, 475--486.
[5]
Amitai Armon. 2011. On min--max r-gatherings. Theoretical Computer Science, Vol. 412, 7 (2011), 573--582.
[6]
Olivier Bachem, Mario Lucic, and Andreas Krause. 2017. Distributed and provably good seedings for k-means in constant rounds. In International Conference on Machine Learning. PMLR, 292--300.
[7]
Nikhil Bansal, Avrim Blum, and Shuchi Chawla. 2004. Correlation clustering. Machine learning, Vol. 56, 1 (2004), 89--113.
[8]
MohammadHossein Bateni, Aditya Bhaskara, Silvio Lattanzi, and Vahab S. Mirrokni. 2014. Distributed Balanced Clustering via Mapping Coresets. In NeurIPS. 2591--2599.
[9]
Mohammad Hossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, Mohammad Taghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, and Vahab Mirrokni. 2017. Affinity clustering: Hierarchical clustering at scale. In NeurIPS. 6867--6877.
[10]
Mayank Bawa, Tyson Condie, and Prasanna Ganesan. 2005. LSH forest: self-tuning indexes for similarity search. In WWW. 651--660.
[11]
Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011).
[12]
Ji-Won Byun, Ashish Kamra, Elisa Bertino, and Ninghui Li. 2007. Efficient k-anonymization using clustering techniques. In International Conference on Database Systems for Advanced Applications. Springer, 188--200.
[13]
Sanjoy Dasgupta. 2016. A cost function for similarity-based hierarchical clustering. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. 118--127.
[14]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In SOCG, Jack Snoeyink and Jean-Daniel Boissonnat (Eds.). ACM, 253--262.
[15]
Patrik D'haeseleer. 2005. How does gene expression clustering work? Nature biotechnology, Vol. 23, 12 (2005), 1499--1501.
[16]
Laxman Dhulipala, David Eisenstat, Vahab Mirrokni, and Jessica Shi. 2021. Hierarchical Agglomerative Graph Clustering in Nearly Linear Time. In ICML .
[17]
Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, and Peilin Zhong. 2021. Massively Parallel and Dynamic Algorithms for Minimum Size Clustering. arXiv preprint (2021).
[18]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, Vol. 96. 226--231.
[19]
José Estrada-Jiménez, Javier Parra-Arnau, Ana Rodr'iguez-Hoyos, and Jordi Forné. 2017. Online advertising: Analysis of privacy threats and protection approaches. Computer Communications, Vol. 100 (2017), 32--51.
[20]
Santo Fortunato. 2010. Community detection in graphs. Physics reports, Vol. 486, 3--5 (2010), 75--174.
[21]
Teofilo F Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. TCS, Vol. 38 (1985), 293--306.
[22]
John C Gower and Gavin JS Ross. 1969. Minimum spanning trees and single linkage cluster analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 18, 1 (1969), 54--64.
[23]
Saikat Guha, Bin Cheng, and Paul Francis. 2011. Privad: Practical Privacy in Online Advertising. In Proceedings of the 8th USENIX, David G. Andersen and Sylvia Ratnasamy (Eds.). USENIX Association.
[24]
Jonathan Halcrow, Alexandru Mosoi, Sam Ruth, and Bryan Perozzi. 2020. Grale: Designing networks for graph learning. In KDD. 2523--2532.
[25]
Sariel Har-Peled and Soham Mazumdar. 2004. On coresets for k-means and k-median clustering. In STOC. 291--300.
[26]
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., Vol. 5, 4, Article Article 19 (Dec. 2015), 19 pages. https://doi.org/10.1145/2827872
[27]
Daniel C. Howe and Helen Nissenbaum. 2017. Engineering Privacy and Protest: A Case Study of AdNauseam. In Proceedings of IWPE@SP, José M. del Álamo, Seda F. Gü rses, and Anupam Datta (Eds.), Vol. 1873. CEUR-WS.org, 57--64.
[28]
Anil K Jain. 2010. Data clustering: 50 years beyond K-means. Pattern recognition letters, Vol. 31, 8 (2010), 651--666.
[29]
Ari Juels. 2001. Targeted advertising... and privacy too. In Cryptographers' Track at the RSA Conference. Springer, 408--424.
[30]
Josh Karlin. [n.d.]. Federated Learning of Cohorts (FLoC). https://github.com/WICG/floc
[31]
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., Vol. 20 (1998), 359--392.
[32]
Przemysław Kazienko and Michał Adamski. 2007. AdROSA-Adaptive personalization of web advertising. Information Sciences, Vol. 177, 11 (2007), 2269--2295.
[33]
Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In International Colloquium on Automata, Languages, and Programming. Springer, 597--608.
[34]
Matth"aus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. 2019. Fair k-Center Clustering for Data Summarization. In ICML .
[35]
Jure Leskovec, Kevin J Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international conference on World wide web. 631--640.
[36]
Shi Li and Ola Svensson. 2016. Approximating k-Median via Pseudo-Approximation. SIAM J. Comput. (2016).
[37]
Gustavo Malkomes, Matt J Kusner, Wenlin Chen, Kilian Q Weinberger, and Benjamin Moseley. 2015. Fast distributed k-center clustering with outliers on massive data. In Advances in Neural Information Processing Systems. 1063--1071.
[38]
Bashir Muhammad A and Christo Wilson. 2018. Diffusion of User Tracking Data in the Online Advertising Ecosystem. Proc. Priv. Enhancing Technol. (2018).
[39]
Fionn Murtagh and Pedro Contreras. 2012. Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 2, 1 (2012), 86--97.
[40]
Liudmila Prokhorenkova, Alexey Tikhonov, and Nelly Litvak. 2019. Learning clusters through information diffusion. In WWW. 3151--3157.
[41]
Pierangela Samarati and Latanya Sweeney. 1998. Generalizing Data to Provide Anonymity when Disclosing Information (Abstract). In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART, Alberto O. Mendelzon and Jan Paredaens (Eds.). ACM Press, 188.
[42]
Jana Schmidt, Andreas Hapfelmeier, Marianne Mueller, Robert Perneczky, Alexander Kurz, Alexander Drzezga, and Stefan Kramer. 2010. Interpreting PET scans by structured patient data: a data mining case study in dementia research. KAIS, Vol. 24, 1 (2010), 149--170.
[43]
Anshumali Shrivastava and Ping Li. 2014. In Defense of Minhash over Simhash. In AISTATS, Vol. 33. JMLR.org, 886--894.
[44]
Statista. [n.d.]. Internet Usage Worldwide. https://www.statista.com/statistics/617136/digital-population-worldwide/
[45]
Vincent Toubiana, Arvind Narayanan, Dan Boneh, Helen Nissenbaum, and Solon Barocas. 2010. Adnostic: Privacy preserving targeted advertising. In Proceedings Network and Distributed System Symposium. The Internet Society.
[46]
Dongkuan Xu and Yingjie Tian. 2015. A comprehensive survey of clustering algorithms. Annals of Data Science, Vol. 2, 2 (2015), 165--193.
[47]
Jaewon Yang and Jure Leskovec. 2013. Overlapping community detection at scale: a nonnegative matrix factorization approach. In WSDM. 587--596.
[48]
Kang Zhao, Hongtao Lu, and Jincheng Mei. 2014. Locality Preserving Hashing. In AAAI .

Cited By

View all
  • (2024)Re-Identification Attacks against the Topics APIACM Transactions on the Web10.1145/367540018:3(1-24)Online publication date: 27-Jun-2024
  • (2024)Evaluating Impact of User-Cluster Targeted Attacks in Matrix Factorisation RecommendersACM Transactions on Recommender Systems10.1145/36741573:2(1-34)Online publication date: 21-Jun-2024
  • (2024)BB-FLoC: A Blockchain-based Targeted Advertisement Scheme with K-AnonymityDistributed Ledger Technologies: Research and Practice10.1145/3672404Online publication date: 13-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Check for updates

Author Tags

  1. anonymity
  2. clustering
  3. interest-based advertising
  4. privacy

Qualifiers

  • Research-article

Conference

KDD '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)471
  • Downloads (Last 6 weeks)48
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Re-Identification Attacks against the Topics APIACM Transactions on the Web10.1145/367540018:3(1-24)Online publication date: 27-Jun-2024
  • (2024)Evaluating Impact of User-Cluster Targeted Attacks in Matrix Factorisation RecommendersACM Transactions on Recommender Systems10.1145/36741573:2(1-34)Online publication date: 21-Jun-2024
  • (2024)BB-FLoC: A Blockchain-based Targeted Advertisement Scheme with K-AnonymityDistributed Ledger Technologies: Research and Practice10.1145/3672404Online publication date: 13-Jun-2024
  • (2023)TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge GraphsProceedings of the ACM on Management of Data10.1145/36173411:3(1-27)Online publication date: 13-Nov-2023
  • (2023) Feynman : Federated Learning-Based Advertising for Ecosystems-Oriented Mobile Apps Recommendation IEEE Transactions on Services Computing10.1109/TSC.2023.328593516:5(3361-3372)Online publication date: Sep-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media