research-article

Open access

Clustering for Private Interest-based Advertising

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Pages 2802 - 2810

https://doi.org/10.1145/3447548.3467180

Published: 14 August 2021 Publication History

Abstract

We study the problem of designing privacy-enhanced solutions for interest-based advertisement (IBA). IBA is a key component of the online ads ecosystem and provides a better ad experience to users. Indeed, IBA enables advertisers to show users impressions that are relevant to them. Nevertheless, the current way ad tech companies achieve this is by building detailed interest profiles for individual users. In this work we ask whether such fine grained personalization is required, and present mechanisms that achieve competitive performance while giving privacy guarantees to the end users. More precisely we present the first detailed exploration of how to implement Chrome's Federated Learning of Cohorts (FLoC) API. We define the privacy properties required for the API and evaluate multiple hashing and clustering algorithms discussing the trade-offs between utility, privacy, and ease of implementation.

References

[1]

Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu. 2005. Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT) (2005).

[2]

Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. 2017. Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017 .

[3]

Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni. 2009. Streaming k-means approximation. In NIPS, Vol. 4. 2.

[4]

Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using pagerank vectors. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06). IEEE, 475--486.

Digital Library

[5]

Amitai Armon. 2011. On min--max r-gatherings. Theoretical Computer Science, Vol. 412, 7 (2011), 573--582.

Digital Library

[6]

Olivier Bachem, Mario Lucic, and Andreas Krause. 2017. Distributed and provably good seedings for k-means in constant rounds. In International Conference on Machine Learning. PMLR, 292--300.

[7]

Nikhil Bansal, Avrim Blum, and Shuchi Chawla. 2004. Correlation clustering. Machine learning, Vol. 56, 1 (2004), 89--113.

[8]

MohammadHossein Bateni, Aditya Bhaskara, Silvio Lattanzi, and Vahab S. Mirrokni. 2014. Distributed Balanced Clustering via Mapping Coresets. In NeurIPS. 2591--2599.

[9]

Mohammad Hossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, Mohammad Taghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, and Vahab Mirrokni. 2017. Affinity clustering: Hierarchical clustering at scale. In NeurIPS. 6867--6877.

[10]

Mayank Bawa, Tyson Condie, and Prasanna Ganesan. 2005. LSH forest: self-tuning indexes for similarity search. In WWW. 651--660.

[11]

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011).

[12]

Ji-Won Byun, Ashish Kamra, Elisa Bertino, and Ninghui Li. 2007. Efficient k-anonymization using clustering techniques. In International Conference on Database Systems for Advanced Applications. Springer, 188--200.

Digital Library

[13]

Sanjoy Dasgupta. 2016. A cost function for similarity-based hierarchical clustering. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. 118--127.

Digital Library

[14]

Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In SOCG, Jack Snoeyink and Jean-Daniel Boissonnat (Eds.). ACM, 253--262.

[15]

Patrik D'haeseleer. 2005. How does gene expression clustering work? Nature biotechnology, Vol. 23, 12 (2005), 1499--1501.

[16]

Laxman Dhulipala, David Eisenstat, Vahab Mirrokni, and Jessica Shi. 2021. Hierarchical Agglomerative Graph Clustering in Nearly Linear Time. In ICML .

[17]

Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, and Peilin Zhong. 2021. Massively Parallel and Dynamic Algorithms for Minimum Size Clustering. arXiv preprint (2021).

[18]

Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, Vol. 96. 226--231.

Digital Library

[19]

José Estrada-Jiménez, Javier Parra-Arnau, Ana Rodr'iguez-Hoyos, and Jordi Forné. 2017. Online advertising: Analysis of privacy threats and protection approaches. Computer Communications, Vol. 100 (2017), 32--51.

Digital Library

[20]

Santo Fortunato. 2010. Community detection in graphs. Physics reports, Vol. 486, 3--5 (2010), 75--174.

[21]

Teofilo F Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. TCS, Vol. 38 (1985), 293--306.

[22]

John C Gower and Gavin JS Ross. 1969. Minimum spanning trees and single linkage cluster analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 18, 1 (1969), 54--64.

[23]

Saikat Guha, Bin Cheng, and Paul Francis. 2011. Privad: Practical Privacy in Online Advertising. In Proceedings of the 8th USENIX, David G. Andersen and Sylvia Ratnasamy (Eds.). USENIX Association.

[24]

Jonathan Halcrow, Alexandru Mosoi, Sam Ruth, and Bryan Perozzi. 2020. Grale: Designing networks for graph learning. In KDD. 2523--2532.

Digital Library

[25]

Sariel Har-Peled and Soham Mazumdar. 2004. On coresets for k-means and k-median clustering. In STOC. 291--300.

[26]

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., Vol. 5, 4, Article Article 19 (Dec. 2015), 19 pages. https://doi.org/10.1145/2827872

Digital Library

[27]

Daniel C. Howe and Helen Nissenbaum. 2017. Engineering Privacy and Protest: A Case Study of AdNauseam. In Proceedings of IWPE@SP, José M. del Álamo, Seda F. Gü rses, and Anupam Datta (Eds.), Vol. 1873. CEUR-WS.org, 57--64.

[28]

Anil K Jain. 2010. Data clustering: 50 years beyond K-means. Pattern recognition letters, Vol. 31, 8 (2010), 651--666.

[29]

Ari Juels. 2001. Targeted advertising... and privacy too. In Cryptographers' Track at the RSA Conference. Springer, 408--424.

[30]

Josh Karlin. [n.d.]. Federated Learning of Cohorts (FLoC). https://github.com/WICG/floc

[31]

George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., Vol. 20 (1998), 359--392.

Digital Library

[32]

Przemysław Kazienko and Michał Adamski. 2007. AdROSA-Adaptive personalization of web advertising. Information Sciences, Vol. 177, 11 (2007), 2269--2295.

Digital Library

[33]

Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In International Colloquium on Automata, Languages, and Programming. Springer, 597--608.

[34]

Matth"aus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. 2019. Fair k-Center Clustering for Data Summarization. In ICML .

[35]

Jure Leskovec, Kevin J Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international conference on World wide web. 631--640.

Digital Library

[36]

Shi Li and Ola Svensson. 2016. Approximating k-Median via Pseudo-Approximation. SIAM J. Comput. (2016).

[37]

Gustavo Malkomes, Matt J Kusner, Wenlin Chen, Kilian Q Weinberger, and Benjamin Moseley. 2015. Fast distributed k-center clustering with outliers on massive data. In Advances in Neural Information Processing Systems. 1063--1071.

[38]

Bashir Muhammad A and Christo Wilson. 2018. Diffusion of User Tracking Data in the Online Advertising Ecosystem. Proc. Priv. Enhancing Technol. (2018).

[39]

Fionn Murtagh and Pedro Contreras. 2012. Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 2, 1 (2012), 86--97.

[40]

Liudmila Prokhorenkova, Alexey Tikhonov, and Nelly Litvak. 2019. Learning clusters through information diffusion. In WWW. 3151--3157.

[41]

Pierangela Samarati and Latanya Sweeney. 1998. Generalizing Data to Provide Anonymity when Disclosing Information (Abstract). In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART, Alberto O. Mendelzon and Jan Paredaens (Eds.). ACM Press, 188.

Digital Library

[42]

Jana Schmidt, Andreas Hapfelmeier, Marianne Mueller, Robert Perneczky, Alexander Kurz, Alexander Drzezga, and Stefan Kramer. 2010. Interpreting PET scans by structured patient data: a data mining case study in dementia research. KAIS, Vol. 24, 1 (2010), 149--170.

[43]

Anshumali Shrivastava and Ping Li. 2014. In Defense of Minhash over Simhash. In AISTATS, Vol. 33. JMLR.org, 886--894.

[44]

Statista. [n.d.]. Internet Usage Worldwide. https://www.statista.com/statistics/617136/digital-population-worldwide/

[45]

Vincent Toubiana, Arvind Narayanan, Dan Boneh, Helen Nissenbaum, and Solon Barocas. 2010. Adnostic: Privacy preserving targeted advertising. In Proceedings Network and Distributed System Symposium. The Internet Society.

[46]

Dongkuan Xu and Yingjie Tian. 2015. A comprehensive survey of clustering algorithms. Annals of Data Science, Vol. 2, 2 (2015), 165--193.

[47]

Jaewon Yang and Jure Leskovec. 2013. Overlapping community detection at scale: a nonnegative matrix factorization approach. In WSDM. 587--596.

[48]

Kang Zhao, Hongtao Lu, and Jincheng Mei. 2014. Locality Preserving Hashing. In AAAI .

Cited By

Jha NTrevisan MLeonardi EMellia M(2024)Re-Identification Attacks against the Topics APIACM Transactions on the Web10.1145/367540018:3(1-24)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3675400
Shams SLeith D(2024)Evaluating Impact of User-Cluster Targeted Attacks in Matrix Factorisation RecommendersACM Transactions on Recommender Systems10.1145/36741573:2(1-34)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3674157
Kruminis ENavaie KAscigil O(2024)BB-FLoC: A Blockchain-based Targeted Advertisement Scheme with K-AnonymityDistributed Ledger Technologies: Research and Practice10.1145/3672404Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3672404
Show More Cited By

Index Terms

Clustering for Private Interest-based Advertising

Recommendations

Achieving anonymity via clustering

Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of deidentifying records is to remove identifying fields such as social ...
Achieving anonymity via clustering
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of de-identifying records is to remove identifying fields such as social ...
Private web search
WPES '07: Proceedings of the 2007 ACM workshop on Privacy in electronic society

Web search is currently a source of growing concern about personal privacy. It is an essential and central part of most users' activity online and therefore one through which a significant amount of personal information may be revealed.To help users ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

August 2021

4259 pages

ISBN:9781450383325

DOI:10.1145/3447548

General Chairs:
Feida Zhu
Singapore Management University
,
Beng Chin Ooi
National University of Singapore
,
Chunyan Miao
Nanyang Technology University
,
Program Chairs:
Haixun Wang,
Iryna Skrypnyk,
Wynne Hsu,
Sanjay Chawla

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '21

Sponsor:

KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2021

Virtual Event, Singapore

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
2,402
Total Downloads

Downloads (Last 12 months)471
Downloads (Last 6 weeks)48

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jha NTrevisan MLeonardi EMellia M(2024)Re-Identification Attacks against the Topics APIACM Transactions on the Web10.1145/367540018:3(1-24)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3675400
Shams SLeith D(2024)Evaluating Impact of User-Cluster Targeted Attacks in Matrix Factorisation RecommendersACM Transactions on Recommender Systems10.1145/36741573:2(1-34)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3674157
Kruminis ENavaie KAscigil O(2024)BB-FLoC: A Blockchain-based Targeted Advertisement Scheme with K-AnonymityDistributed Ledger Technologies: Research and Practice10.1145/3672404Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3672404
Dhulipala LŁącki JLee JMirrokni V(2023)TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge GraphsProceedings of the ACM on Management of Data10.1145/36173411:3(1-27)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3617341
Bian JHuang JJi SLiao YLi XWang QZhou JDou DWang YXiong H(2023) Feynman : Federated Learning-Based Advertising for Ecosystems-Oriented Mobile Apps Recommendation IEEE Transactions on Services Computing10.1109/TSC.2023.328593516:5(3361-3372)Online publication date: Sep-2023
https://doi.org/10.1109/TSC.2023.3285935

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents