Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An LDP Compatible Sketch for Securely Approximating Set Intersection Cardinalities

Published: 26 March 2024 Publication History

Abstract

Given two sets of elements held by two different parties separately, computing the cardinality (i.e., the number of distinct elements) of their intersection set is a fundamental task in applications such as network monitoring and database systems. To handle large sets with limited space, computation, and communication costs, lightweight probabilistic methods (i.e., sketch methods) such as the Flajolet-Martin (FM) sketch and the HyperLogLog (HLL) sketch are extensively used. However, when a set's probabilistic data summary and the hash functions used to construct the sketch are disclosed to an untrusted third party, the set's privacy is compromised. Directly applyingLocal Differential Privacy (LDP) techniques to safeguard the sketch collection results in extremely large estimation errors of set intersection cardinalities. To address this issue, we propose a novel sketch method that makes it easier to incorporate noise into the constructed sketch to achieve differential privacy. More importantly, our sketch method is compatible with the LDP noise. In other words, the probabilistic model underlying our LDP-based data summary is quite basic, allowing us to eliminate the estimation error generated by the noise. We perform extensive experiments on various synthetic and real-world datasets and the experimental results demonstrate that our method is orders of magnitude more accurate and several times faster than state-of-the-art methods.

References

[1]
Ziv Bar-Yossef, TS Jayram, Ravi Kumar, D Sivakumar, and Luca Trevisan. 2002. Counting distinct elements in a data stream. In Randomization and Approximation Techniques in Computer Science: 6th International Workshop, RANDOM 2002 Cambridge, MA, USA, September 13--15, 2002 Proceedings 5. Springer, 1--10.
[2]
Patrick Billingsley. 1986. Probability and Measure second ed.). John Wiley and Sons.
[3]
Andrei Broder and Michael Mitzenmacher. 2004. Network applications of bloom filters: A survey. Internet mathematics, Vol. 1, 4 (2004), 485--509.
[4]
Reuven Cohen, Liran Katzir, and Aviv Yehezkel. 2017. A minimal variance estimator for the cardinality of big data set intersection. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 95--103.
[5]
Graham Cormode, Somesh Jha, Tejas Kulkarni, Ninghui Li, Divesh Srivastava, and Tianhao Wang. 2018. Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data. 1655--1658.
[6]
Søren Dahlgaard, Mathias Knudsen, and Mikkel Thorup. 2017. Practical hash functions for similarity estimation and dimensionality reduction. Advances in Neural Information Processing Systems, Vol. 30 (2017).
[7]
Tamraparni Dasu, Theodore Johnson, Shanmugauelayut Muthukrishnan, and Vladislav Shkapenyuk. 2002. Mining database structure; or, how to build a data quality browser. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data. 240--251.
[8]
Emiliano De Cristofaro, Mark Manulis, and Bertram Poettering. 2013. Private discovery of common social contacts. International journal of information security, Vol. 12 (2013), 49--65.
[9]
Daniel Demmler, Peter Rindal, Mike Rosulek, and Ni Trieu. 2018. PIR-PSI: scaling private contact discovery. Cryptology ePrint Archive (2018).
[10]
Damien Desfontaines, Andreas Lochbihler, and David Basin. 2019. Cardinality Estimators do not Preserve Privacy. Proceedings on Privacy Enhancing Technologies, Vol. 2 (2019), 26--46.
[11]
Charlie Dickens, Justin Thaler, and Daniel Ting. 2022. Order-invariant cardinality estimators are differentially private. Advances in Neural Information Processing Systems, Vol. 35 (2022), 15204--15216.
[12]
Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting telemetry data privately. Advances in Neural Information Processing Systems, Vol. 30 (2017).
[13]
Changyu Dong, Liqun Chen, and Zikai Wen. 2013. When private set intersection meets big data: an efficient and scalable protocol. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. 789--800.
[14]
John C Duchi, Michael I Jordan, and Martin J Wainwright. 2013. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. IEEE, 429--438.
[15]
Marianne Durand and Philippe Flajolet. 2003. Loglog counting of large cardinalities. In Algorithms-ESA 2003: 11th Annual European Symposium, Budapest, Hungary, September 16--19, 2003. Proceedings 11. Springer, 605--617.
[16]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, Vol. 9, 3--4 (2014), 211--407.
[17]
Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. 1054--1067.
[18]
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science. Discrete Mathematics and Theoretical Computer Science, 137--156.
[19]
Philippe Flajolet and G Nigel Martin. 1985. Probabilistic counting algorithms for data base applications. Journal of computer and system sciences, Vol. 31, 2 (1985), 182--209.
[20]
Michael J Freedman, Carmit Hazay, Kobbi Nissim, and Benny Pinkas. 2016. Efficient set intersection with simulation-based security. Journal of Cryptology, Vol. 29, 1 (2016), 115--155.
[21]
Michael J Freedman, Kobbi Nissim, and Benny Pinkas. 2004. Efficient private matching and set intersection. In International conference on the theory and applications of cryptographic techniques. Springer, 1--19.
[22]
Gayathri Garimella, Benny Pinkas, Mike Rosulek, Ni Trieu, and Avishay Yanai. 2021. Oblivious key-value stores and amplification for private set intersection. In Advances in Cryptology--CRYPTO 2021: 41st Annual International Cryptology Conference, CRYPTO 2021, Virtual Event, August 16--20, 2021, Proceedings, Part II 41. Springer, 395--425.
[23]
Oded Goldreich. 1998. Secure multi-party computation. Manuscript. Preliminary version, Vol. 78, 110 (1998).
[24]
Christoph Hagen, Christian Weinert, Christoph Sendner, Alexandra Dmitrienko, and Thomas Schneider. 2022. Contact Discovery in Mobile Messengers: Low-cost Attacks, Quantitative Analyses, and Efficient Mitigations. ACM Transactions on Privacy and Security, Vol. 26, 1 (2022), 1--44.
[25]
J. Hartung, G. Knapp, and B.K. Sinha. 2008. Statistical Meta-Analysis with Applications. Wiley.
[26]
Jonathan Hehir, Daniel Ting, and Graham Cormode. 2023. Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 12846--12865. https://proceedings.mlr.press/v202/hehir23a.html
[27]
Stefan Heule, Marc Nunkesser, and Alexander Hall. 2013. Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm. In Proceedings of the 16th International Conference on Extending Database Technology. 683--692.
[28]
Jay M. Ver Hoef. 2012. Who Invented the Delta Method? The American Statistician, Vol. 66, 2 (2012), 124--127.
[29]
Yan Huang, David Evans, and Jonathan Katz. 2012. Private set intersection: Are garbled circuits better than custom protocols?. In NDSS.
[30]
Zezhou Huang, Jiaxiang Liu, Daniel Gbenga Alabi, Raul Castro Fernandez, and Eugene Wu. 2023 a. Saibot: A Differentially Private Data Search Platform. Proc. VLDB Endow., Vol. 16, 11 (aug 2023), 3057--3070. https://doi.org/10.14778/3611479.3611508
[31]
Zezhou Huang, Jiaxiang Liu, Haonan Wang, and Eugene Wu. 2023 b. The Fast and the Private: Task-based Dataset Search. arXiv preprint arXiv:2308.05637 (2023).
[32]
Daniel Kales, Christian Rechberger, Thomas Schneider, Matthias Senker, and Christian Weinert. 2019. Mobile private contact discovery at scale. In 28th USENIX Security Symposium (USENIX Security 19). 1447--1464.
[33]
Donald E. Knuth. 1973. The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley.
[34]
Vladimir Kolesnikov, Ranjit Kumaresan, Mike Rosulek, and Ni Trieu. 2016. Efficient batched oblivious PRF with applications to private set intersection. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 818--829.
[35]
Benjamin Kreuter, Craig William Wright, Evgeny Sergeevich Skvortsov, Raimundo Mirisola, and Yao Wang. 2020. Privacy-Preserving Secure Cardinality and Frequency Estimation. Technical Report. Google, LLC.
[36]
Tao Li, Shigang Chen, and Yan Qiao. 2012. Origin-destination flow measurement in high-speed networks. In 2012 Proceedings IEEE INFOCOM. IEEE, 2526--2530.
[37]
Catherine Meadows. 1986. A more efficient cryptographic matchmaking protocol for use in the absence of a continuously available third party. In 1986 IEEE Symposium on Security and Privacy. IEEE, 134--134.
[38]
Alberto Medina, Nina Taft, Kave Salamatian, Supratik Bhattacharyya, and Christophe Diot. 2002. Traffic matrix estimation: Existing techniques and new directions. ACM SIGCOMM Computer Communication Review, Vol. 32, 4 (2002), 161--174.
[39]
Rasmus Pagh and Nina Mesing Stausholm. 2020. Efficient Differentially Private $ F_0 $ Linear Sketching. arXiv preprint arXiv:2001.11932 (2020).
[40]
Benny Pinkas, Mike Rosulek, Ni Trieu, and Avishay Yanai. 2019a. SpOT-light: lightweight private set intersection from sparse OT extension. In Advances in Cryptology--CRYPTO 2019: 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18--22, 2019, Proceedings, Part III 39. Springer, 401--431.
[41]
Benny Pinkas, Mike Rosulek, Ni Trieu, and Avishay Yanai. 2020. PSI from PaXoS: fast, malicious private set intersection. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 739--767.
[42]
Benny Pinkas, Thomas Schneider, Gil Segev, and Michael Zohner. 2015. Phasing: Private set intersection using permutation-based hashing. In 24th USENIX Security Symposium (USENIX Security 15). 515--530.
[43]
Benny Pinkas, Thomas Schneider, Oleksandr Tkachenko, and Avishay Yanai. 2019b. Efficient circuit-based PSI with linear communication. In Advances in Cryptology--EUROCRYPT 2019: 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, May 19--23, 2019, Proceedings, Part III 38. Springer, 122--153.
[44]
Benny Pinkas, Thomas Schneider, and Michael Zohner. 2014. Faster private set intersection based on $$OT$$ extension. In 23rd USENIX Security Symposium (USENIX Security 14). 797--812.
[45]
Benny Pinkas, Thomas Schneider, and Michael Zohner. 2018. Scalable private set intersection based on OT extension. ACM Transactions on Privacy and Security (TOPS), Vol. 21, 2 (2018), 1--35.
[46]
Yiyan Qi, Pinghui Wang, Yuanming Zhang, Qiaozhu Zhai, Chenxu Wang, Guangjian Tian, John CS Lui, and Xiaohong Guan. 2020. Streaming algorithms for estimating high set similarities in loglog space. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 10 (2020), 3438--3452.
[47]
Peter Rindal and Mike Rosulek. 2016. Faster malicious 2-party secure computation with $$Online/Offline$$ dual execution. In 25th USENIX Security Symposium (USENIX Security 16). 297--314.
[48]
Peter Rindal and Phillipp Schoppmann. 2021. VOLE-PSI: fast OPRF and circuit-PSI from vector-OLE. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 901--930.
[49]
Haakon Ringberg, Augustin Soule, Jennifer Rexford, and Christophe Diot. 2007. Sensitivity of PCA for traffic anomaly detection. In Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. 109--120.
[50]
Matthew Roughan, Mikkel Thorup, and Yin Zhang. 2003. Traffic engineering with estimated traffic matrices. In Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement. 248--258.
[51]
Adam Smith, Shuang Song, and Abhradeep Guha Thakurta. 2020. The flajolet-martin sketch itself preserves differential privacy: Private counting with minimal space. Advances in Neural Information Processing Systems, Vol. 33 (2020), 19561--19572.
[52]
Rade Stanojevic, Mohamed Nabeel, and Ting Yu. 2017. Distributed cardinality estimation of set operations with differential privacy. In 2017 IEEE Symposium on Privacy-Aware Computing (PAC). IEEE, 37--48.
[53]
Yu-E Sun, He Huang, Shigang Chen, You Zhou, Kai Han, and Wenjian Yang. 2019. Privacy-preserving estimation of $ k $-persistent traffic in vehicular cyber-physical systems. IEEE Internet of Things Journal, Vol. 6, 5 (2019), 8296--8309.
[54]
Yu-E Sun, He Huang, Wenjian Yang, Shigang Chen, and Yang Du. 2021. Toward Differential Privacy for Traffic Measurement in Vehicular Cyber-Physical Systems. IEEE Transactions on Industrial Informatics, Vol. 18, 6 (2021), 4078--4087.
[55]
Arisa Tajima, Hiroki Sato, and Hayato Yamana. 2018. Outsourced private set intersection cardinality with fully homomorphic encryption. In 2018 6th International Conference on Multimedia Computing and Systems (ICMCS). IEEE, 1--8.
[56]
Daniel Ting. 2016. Towards optimal cardinality estimation of unions and intersections with sketches. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1195--1204.
[57]
Pinghui Wang, Chengjin Yang, Dongdong Xie, Junzhou Zhao, Hui Li, Jing Tao, and Xiaohong Guan. 2023. An Effective and Differentially Private Protocol for Secure Distributed Cardinality Estimation. arXiv preprint arXiv:2302.02158 (2023).
[58]
Kyu-Young Whang, Brad T Vander-Zanden, and Howard M Taylor. 1990. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems (TODS), Vol. 15, 2 (1990), 208--229.
[59]
Qingjun Xiao, Shigang Chen, You Zhou, and Junzhou Luo. 2020. Estimating cardinality for arbitrarily large data stream with improved memory efficiency. IEEE/ACM Transactions on Networking, Vol. 28, 2 (2020), 433--446.
[60]
Qingjun Xiao, Youlin Zhang, Shigang Chen, Min Chen, Jia Liu, Guang Cheng, and Junzhou Luo. 2019. Estimating cardinality of arbitrary expression of multiple tag sets in a distributed RFID system. IEEE/ACM Transactions on Networking, Vol. 27, 2 (2019), 748--762.
[61]
Yian Zhou, Zhen Mo, Qingjun Xiao, Shigang Chen, and Yafeng Yin. 2015. Privacy-preserving transportation traffic measurement in intelligent cyber-physical road systems. IEEE Transactions on Vehicular Technology, Vol. 65, 5 (2015), 3749--3759.

Cited By

View all
  • (2024)Efficient Unsupervised Community Search with Pre-Trained Graph TransformerProceedings of the VLDB Endowment10.14778/3665844.366585317:9(2227-2240)Online publication date: 1-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 1
SIGMOD
February 2024
1874 pages
EISSN:2836-6573
DOI:10.1145/3654807
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024
Published in PACMMOD Volume 2, Issue 1

Permissions

Request permissions for this article.

Author Tags

  1. cardinality estimation
  2. local differential privacy

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)25
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Unsupervised Community Search with Pre-Trained Graph TransformerProceedings of the VLDB Endowment10.14778/3665844.366585317:9(2227-2240)Online publication date: 1-May-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media