Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An Effective and Differentially Private Protocol for Secure Distributed Cardinality Estimation

Published: 30 May 2023 Publication History

Abstract

Counting the number of distinct elements distributed over multiple data holders is a fundamental problem with many real-world applications ranging from crowd counting to network monitoring. Although a number of space and computationally efficient sketch methods (e.g., the Flajolet-Martin sketch and the HyperLogLog sketch) for cardinality estimation have been proposed to solve the above problem, these sketch methods are insecure when considering privacy concerns related to the use of each data holder's personal dataset. Despite a recently proposed protocol that successfully implements the well-known Flajolet-Martin (FM) sketch on a secret-sharing based multiparty computation (MPC) framework for solving the problem of private distributed cardinality estimation (PDCE), we observe that this MPC-FM protocol is not differentially private. In addition, the MPC-FM protocol is computationally expensive, which limits its applications to data holders with limited computation resources. To address the above issues, in this paper we propose a novel protocol DP-DICE, which is computationally efficient and differentially private for solving the problem of PDCE. Experimental results show that our DP-DICE achieves orders of magnitude speedup and reduces the estimation error by several times in comparison with state-of-the-arts under the same security requirements.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023

References

[1]
Wang Zhengxia and Xiao Laisheng. Modern logistics monitoring platform based on the internet of things. In 2010 International conference on intelligent computation technology and automation, volume 2, pages 726--731. IEEE, 2010.
[2]
Alex Berke, Michiel A. Bakker, Praneeth Vepakomma, Ramesh Raskar, Kent Larson, and Alex 'Sandy' Pentland. Assessing disease exposure risk with location histories and protecting privacy: A cryptographic approach in response to A global pandemic. CoRR, abs/2003.14412, 2020.
[3]
Andreas Ulbrich, Evgeny Sergeevich Skvortsov, Jeffrey Scott Wilhelm, Josh Bao, Lawrence Tsang, and Will Bradbury. Tracking audience statistics with hyperloglog. 2021.
[4]
Badih Ghazi, Ben Kreuter, Ravi Kumar, Pasin Manurangsi, Jiayu Peng, Evgeny Skvortsov, Yao Wang, and Craig Wright. Multiparty reach and frequency histogram: Private, secure, and practical. PoPETs, 2022(1):373--395, 2022.
[5]
Roger Dingledine, Nick Mathewson, and Paul F. Syverson. Tor: The second-generation onion router. In Matt Blaze, editor, Proceedings of the 13th USENIX Security Symposium, August 9--13, 2004, San Diego, CA, USA, pages 303--320. USENIX, 2004.
[6]
Hazar Harmouch and Felix Naumann. Cardinality estimation: An experimental survey. Proceedings of the VLDB Endowment, 11(4):499--512, 2017.
[7]
Philippe Flajolet and G Nigel Martin. Probabilistic counting algorithms for data base applications. Journal of computer and system sciences, 31(2):182--209, 1985.
[8]
Stefan Heule, Marc Nunkesser, and Alexander Hall. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In EDBT, 2013.
[9]
Adam Smith, Shuang Song, and Abhradeep Guha Thakurta. The flajolet-martin sketch itself preserves differential privacy: Private counting with minimal space. Advances in Neural Information Processing Systems, 33:19561--19572, 2020.
[10]
Charlie Dickens, Justin Thaler, and Daniel Ting. (nearly) all cardinality estimators are differentially private, 2022.
[11]
Changhui Hu, Jin Li, Zheli Liu, Xiaojie Guo, Yu Wei, Xuan Guang, Grigorios Loukides, and Changyu Dong. How to make private distributed cardinality estimation practical, and get differential privacy for free. In USENIX Security, 2021.
[12]
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science, pages 137--156. Discrete Mathematics and Theoretical Computer Science, 2007.
[13]
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, 2006.
[14]
Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
[15]
Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635--658. Springer, 2016.
[16]
Peter Kairouz, Ziyu Liu, and Thomas Steinke. The distributed discrete gaussian mechanism for federated learning with secure aggregation. In ICML, 2021.
[17]
Clément L Canonne, Gautam Kamath, and Thomas Steinke. The discrete gaussian for differential privacy. Advances in Neural Information Processing Systems, 33:15676--15688, 2020.
[18]
Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3--4):211--407, 2014.
[19]
Ilya Mironov. On significance of the least significant bits for differential privacy. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 650--661, 2012.
[20]
Ivan Damgård, Valerio Pastro, Nigel P. Smart, and Sarah Zakarias. Multiparty computation from somewhat homomorphic encryption. In CRYPTO, 2012.
[21]
Bargav Jayaraman, Lingxiao Wang, David Evans, and Quanquan Gu. Distributed learning without distress: Privacy- preserving empirical risk minimization. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, NeurIPS, pages 6346--6357, 2018.
[22]
Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.
[23]
Patrick Billingsley. Probability and Measure. John Wiley and Sons, second edition, 1986.
[24]
Jay M. Ver Hoef. Who invented the Delta Method? The American Statistician, 66(2):124--127, 2012.
[25]
Marcel Keller, Valerio Pastro, and Dragos Rotaru. Overdrive: Making SPDZ great again. In EUROCRYPT, 2018.
[26]
Jonathan Katz and Moti Yung. Scalable protocols for authenticated group key exchange. In CRYPTO, 2003.
[27]
Helger Lipmaa and Tomas Toft. Secure equality and greater-than tests with sublinear online complexity. In ICALP, 2013.
[28]
Ran Canetti. Universally composable security. J. ACM, 67(5):28:1--28:94, 2020.
[29]
Ellis Fenske, Akshaya Mani, Aaron Johnson, and Micah Sherr. Distributed measurement with private set-union cardinality. In CCS, 2017.
[30]
Marianne Durand and Philippe Flajolet. Loglog counting of large cardinalities (extended abstract). In Giuseppe Di Battista and Uri Zwick, editors, ESA, 2003.
[31]
Emiliano De Cristofaro, Paolo Gasti, and Gene Tsudik. Fast and private computation of cardinality of set intersection and union. In CANS, 2012.
[32]
Alex Davidson and Carlos Cid. An efficient toolkit for computing private set operations. In ACISP, 2017.
[33]
Florian Tschorsch and Björn Scheuermann. An algorithm for privacy-preserving distributed user statistics. Comput. Networks, 57(14):2775--2787, 2013.
[34]
Changyu Dong and Grigorios Loukides. Approximating private set union/intersection cardinality with logarithmic complexity. IEEE Trans. Inf. Forensics Secur., 12(11):2792--2806, 2017.
[35]
Lijie Chen, Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. On distributed differential privacy and counting distinct elements. In ITCS, volume 185, 2021.
[36]
Yanxue Jia, Shi-Feng Sun, Hong-Sheng Zhou, Jiajun Du, and Dawu Gu. Shuffle-based private set union: Faster and more secure. IACR Cryptol. ePrint Arch., page 157, 2022.
[37]
Vikas G. Ashok and Ravi Mukkamala. A scalable and efficient privacy preserving global itemset support approximation using bloom filters. In DBSec, 2014.
[38]
Rolf Egert, Marc Fischlin, David Gens, Sven Jacob, Matthias Senker, and Jörn Tillmanns. Privately computing set-union and set-intersection cardinality via bloom filters. In ACISP, 2015.
[39]
Rade Stanojevic, Mohamed Nabeel, and Ting Yu. Distributed cardinality estimation of set operations with differential privacy. In PAC, pages 37--48, 2017.
[40]
Seung Geol Choi, Dana Dachman-Soled, Mukul Kulkarni, and Arkady Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. Cryptology ePrint Archive, 2020.
[41]
Ellis Fenske, Akshaya Mani, Aaron Johnson, and Micah Sherr. Accountable private set cardinality for distributed measurement. ACM Transactions on Privacy and Security, 2022.

Cited By

View all
  • (2024)On Reducing Space Amplification with Multi-Column Compaction in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368197717:11(2974-2986)Online publication date: 30-Aug-2024
  • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 6-Aug-2024
  • (2024)CIVET: Exploring Compact Index for Variable-Length Subsequence Matching on Time SeriesProceedings of the VLDB Endowment10.14778/3665844.366584517:9(2123-2135)Online publication date: 6-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Author Tags

  1. cardinality estimation
  2. differential privacy
  3. secure multi-party computation

Qualifiers

  • Research-article

Funding Sources

  • National Key R&D Program of China
  • National Natural Science Foundation of China
  • Shenzhen Basic Research Grant

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)203
  • Downloads (Last 6 weeks)13
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On Reducing Space Amplification with Multi-Column Compaction in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368197717:11(2974-2986)Online publication date: 30-Aug-2024
  • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 6-Aug-2024
  • (2024)CIVET: Exploring Compact Index for Variable-Length Subsequence Matching on Time SeriesProceedings of the VLDB Endowment10.14778/3665844.366584517:9(2123-2135)Online publication date: 6-Aug-2024
  • (2024)Visualization-Aware Time Series Min-Max Caching with Error Bound GuaranteesProceedings of the VLDB Endowment10.14778/3659437.365946017:8(2091-2103)Online publication date: 31-May-2024
  • (2024)Performance-Based Pricing for Federated Learning via AuctionProceedings of the VLDB Endowment10.14778/3648160.364816917:6(1269-1282)Online publication date: 3-May-2024
  • (2024)Hybrid Prompt Learning for Generating Justifications of Security Risks in Automation RulesACM Transactions on Intelligent Systems and Technology10.1145/3675401Online publication date: 29-Jun-2024
  • (2024)Databases in Edge and Fog Environments: A SurveyACM Computing Surveys10.1145/366600156:11(1-40)Online publication date: 8-Jul-2024
  • (2024)RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36549702:3(1-27)Online publication date: 30-May-2024
  • (2024)Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesProceedings of the ACM on Management of Data10.1145/36549322:3(1-26)Online publication date: 30-May-2024
  • (2024)Time Series Representation for Visualization in Apache IoTDBProceedings of the ACM on Management of Data10.1145/36392902:1(1-26)Online publication date: 26-Mar-2024
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media