Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3589334.3645655acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

Cardinality Counting in "Alcatraz": A Privacy-aware Federated Learning Approach

Published: 13 May 2024 Publication History

Abstract

The task of cardinality counting, pivotal for data analysis, endeavors to quantify unique elements within datasets and has significant applications across various sectors like healthcare, marketing, cybersecurity, and web analytics. Current methods, categorized into deterministic and probabilistic, often fail to prioritize data privacy. Given the fragmentation of datasets across various organizations, there is an elevated risk of inadvertently disclosing sensitive information during collaborative data studies using state-of-the-art cardinality counting techniques. This study introduces an innovative privacy-centric solution for the cardinality counting dilemma, leveraging a federated learning framework. Our approach involves employing a locally differentially private data encoding for initial processing, followed by a privacy-aware federated K-means clustering strategy, ensuring that cardinality counting occurs across distinct datasets without necessitating data amalgamation. The efficacy of our methodology is underscored by promising results from tests on both real-world and simulated datasets, pointing towards a transformative approach to privacy-sensitive cardinality counting in contemporary data science.

Supplemental Material

MP4 File
Supplemental video

References

[1]
Surajit Chaudhuri, Nilesh Dalvi, and Raghav Kaushik. Robust cardinality and cost estimation for skyline operator. In 22nd International Conference on Data Engineering (ICDE'06), pages 64--64. IEEE, 2006.
[2]
Foster Provost and Tom Fawcett. Data Science for Business: What you need to know about data mining and data-analytic thinking. " O'Reilly Media, Inc.", 2013.
[3]
Kenneth Kunen. Set theory an introduction to independence proofs. Elsevier, 2014.
[4]
Jing Cao, Yu Jin, Aiyou Chen, Tian Bu, and Z-L Zhang. Identifying high cardinality internet hosts. In IEEE INFOCOM 2009, pages 810--818. IEEE, 2009.
[5]
Reuven Cohen and Yuval Nezri. Cardinality estimation in a virtualized network device using online machine learning. IEEE/ACM Transactions on Networking, 27(5):2098--2110, 2019.
[6]
Nathaniel Schenker and Trivellore E Raghunathan. Combining information from multiple surveys to enhance estimation of measures of health. Statistics in medicine, 26(8):1802--1811, 2007.
[7]
Wei Xi, Jizhong Zhao, Xiang-Yang Li, Kun Zhao, Shaojie Tang, Xue Liu, and Zhiping Jiang. Electronic frog eye: Counting crowd using wifi. In IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pages 361--369. IEEE, 2014.
[8]
Bernard J Jansen. Search log analysis: What it is, what's been done, how to do it. Library & information science research, 28(3):407--432, 2006.
[9]
Kyu-Young Whang, Brad T Vander-Zanden, and Howard M Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems (TODS), 15(2):208--229, 1990.
[10]
Ziv Bar-Yossef, TS Jayram, Ravi Kumar, D Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In InternationalWorkshop on Randomization and Approximation Techniques in Computer Science, pages 1--10. Springer, 2002.
[11]
Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Alessandro Flammini, and Filippo Menczer. The spread of fake news by social bots. arXiv preprint arXiv:1707.07592, 96:104, 2017.
[12]
Shuji Sannomiya, Akira Sato, Kenichi Yoshida, and Hiroaki Nishikawa. Cardinality counting circuit for real-time abnormal traffic detection. In 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), volume 1, pages 505--510. IEEE, 2017.
[13]
David Moore, Colleen Shannon, Douglas J Brown, Geoffrey M Voelker, and Stefan Savage. Inferring internet denial-of-service activity. ACM Transactions on Computer Systems (TOCS), 24(2):115--139, 2006.
[14]
Justin M Johnson and Taghi M Khoshgoftaar. Encoding techniques for highcardinality features and ensemble learners. In 2021 IEEE 22nd international conference on information reuse and integration for data science (IRI), pages 355-- 361. IEEE, 2021.
[15]
Tobias Kussel, Torben Brenner, Galina Tremper, Josef Schepers, Martin Lablans, and Kay Hamacher. Record linkage based patient intersection cardinality for rare disease studies using mainzelliste and secure multi-party computation. Journal of Translational Medicine, 20(1):458, 2022.
[16]
Sebastian Stammler, Tobias Kussel, Phillipp Schoppmann, Florian Stampe, Galina Tremper, Stefan Katzenbeisser, Kay Hamacher, and Martin Lablans. Mainzelliste secureepilinker (mainsel): privacy-preserving record linkage using secure multiparty computation. Bioinformatics, 38(6):1657--1668, 2022.
[17]
Nikolay Golov, Alexander Filatov, and Sergey Bruskin. Efficient exact algorithm for count distinct problem. In International Workshop on Computer Algebra in Scientific Computing, pages 67--77. Springer, 2019.
[18]
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Conference on Analysis of Algorithms (AofA), Nancy, France, 2007.
[19]
Yousra Chabchoub and Georges Hébrail. Sliding hyperloglog: Estimating cardinality in a data stream over a sliding window. In International Conference on Data Mining Workshops, pages 1297--1303. IEEE, 2010.
[20]
Stefan Heule, Marc Nunkesser, and Alexander Hall. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In International Conference on Extending Database Technology, pages 683--692, 2013.
[21]
Otmar Ertl. New cardinality estimation algorithms for hyperloglog sketches. arXiv preprint arXiv:1702.01284, 2017.
[22]
Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.
[23]
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to algorithms. MIT press, 2022.
[24]
Raghu Ramakrishnan, Johannes Gehrke, and Johannes Gehrke. Database management systems, volume 3. McGraw-Hill New York, 2003.
[25]
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. Discrete mathematics & theoretical computer science, (Proceedings), 2007.
[26]
Andrei Z Broder. On the resemblance and containment of documents. In Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171), pages 21--29. IEEE, 1997.
[27]
Graham Cormode and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
[28]
C Wright, E Skvortsov, B Kreuter, and Y Wang. Privacy-preserving secure cardinality and frequency estimation. Google LLC: Mountain View, CA, USA, pages 1--20, 2020.
[29]
Adam Smith, Shuang Song, and Abhradeep Guha Thakurta. The flajolet-martin sketch itself preserves differential privacy: Private counting with minimal space. Advances in Neural Information Processing Systems, 33:19561--19572, 2020.
[30]
Damien Desfontaines, Andreas Lochbihler, and David Basin. Cardinality estimators do not preserve privacy. Proceedings on Privacy Enhancing Technologies, 2019(2):26--46, 2019.
[31]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3--18. IEEE, 2017.
[32]
Jon P Daries, Justin Reich, Jim Waldo, Elise M Young, Jonathan Whittinghill, Andrew Dean Ho, Daniel Thomas Seaton, and Isaac Chuang. Privacy, anonymity, and big data in the social sciences. Communications of the ACM, 57(9):56--63, 2014.
[33]
Nan Wu, Dinusha Vatsalan, Mohamed Ali Kaafar, and Sanath Kumar Ramesh. Privacy-preserving record linkage for cardinality counting. In Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security, ASIA CCS '23, page 53--64, New York, NY, USA, 2023. Association for Computing Machinery.
[34]
Xin Yuan,Wei Ni, Ming Ding, KangWei, Jun Li, and H. Vincent Poor. Amplitudevarying perturbation for balancing privacy and utility in federated learning. IEEE Transactions on Information Forensics and Security, 18:1884--1897, 2023.
[35]
Chaoyue Niu, FanWu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv, Zhihua Wu, and Guihai Chen. Billion-scale federated learning on mobile clients: A submodel design with tunable privacy. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--14, 2020.
[36]
NanWu, Dinusha Vatsalan, Sunny Verma, and Mohamed Ali Kaafar. Fairness and cost constrained privacy-aware record linkage. IEEE Transactions on Information Forensics and Security, 17:2644--2656, 2022.
[37]
Jeffrey Rosen. The right to be forgotten. Stan. L. Rev. Online, 64:88, 2011.
[38]
Rainer Schnell. Privacy preserving record linkage. In Katie Harron, Harvey Goldstein, and Chris Dibben, editors, Methodological developments in data linkage, pages 201--225. Wiley, Chichester, 2016.
[39]
Dinusha Vatsalan and Peter Christen. Privacy-preserving matching of similar patients. Journal of Biomedical Informatics, 59:285--298, 2016.
[40]
Peter Christen, Thilina Ranbaduge, Dinusha Vatsalan, and Rainer Schnell. Precise and fast cryptanalysis for Bloom filter based privacy-preserving record linkage. IEEE Transactions on Knowledge and Data Engineering, page 1, 2018.
[41]
Peter Christen, Anushka Vidanage, Thilina Ranbaduge, and Rainer Schnell. Pattern-mining based cryptanalysis of Bloom filters for privacy-preserving record linkage. In PAKDD, Springer LNAI, pages 530--542, Melbourne, 2018.
[42]
K. Wei et al. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans.Inf. Forensics Security, 15:3454--3469, 2020.

Index Terms

  1. Cardinality Counting in "Alcatraz": A Privacy-aware Federated Learning Approach

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '24: Proceedings of the ACM Web Conference 2024
      May 2024
      4826 pages
      ISBN:9798400701719
      DOI:10.1145/3589334
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 May 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cardinality counting
      2. differential privacy
      3. federated learning

      Qualifiers

      • Research-article

      Conference

      WWW '24
      Sponsor:
      WWW '24: The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore, Singapore

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 77
        Total Downloads
      • Downloads (Last 12 months)77
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 18 Aug 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media