Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Bayesian Frequency Estimation under Local Differential Privacy with an Adaptive Randomized Response Mechanism

Published: 11 January 2025 Publication History

Abstract

Frequency estimation plays a critical role in many applications involving personal and private categorical data. Such data are often collected sequentially over time, making it valuable to estimate their distribution online while preserving privacy. We propose AdOBEst-LDP, a new algorithm for adaptive, online Bayesian estimation of categorical distributions under local differential privacy (LDP). The key idea behind AdOBEst-LDP is to enhance the utility of future privatized categorical data by leveraging inference from previously collected privatized data. To achieve this, AdOBEst-LDP uses a new adaptive LDP mechanism to collect privatized data. This LDP mechanism constrains its output to a subset of categories that “predicts” the next user’s data. By adapting the subset selection process to the past privatized data via Bayesian estimation, the algorithm improves the utility of future privatized data. To quantify utility, we explore various well-known information metrics, including (but not limited to) the Fisher information matrix, total variation distance, and information entropy. For Bayesian estimation, we utilize posterior sampling through stochastic gradient Langevin dynamics, a computationally efficient approximate Markov chain Monte Carlo (MCMC) method.
We provide a theoretical analysis showing that (i) the posterior distribution of the category probabilities targeted with Bayesian estimation converges to the true probabilities even for approximate posterior sampling, and (ii) AdOBEst-LDP eventually selects the optimal subset for its LDP mechanism with high probability if posterior sampling is performed exactly. We also present numerical results to validate the estimation accuracy of AdOBEst-LDP. Our comparisons show its superior performance against non-adaptive and semi-adaptive competitors across different privacy levels and distributional parameters.

References

[1]
Jayadev Acharya, Clément L. Canonne, Ziteng Sun, and Himanshu Tyagi. 2023. Unified lower bounds for interactive high-dimensional estimation under information constraints. In Advances in Neural Information Processing Systems. A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36, Curran Associates, Inc., New Orleans, US, 51133–51165. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2023/file/a07e87ecfa8a651d62257571669b0150-Paper-Conference.pdf
[2]
Barış Alparslan and Sinan Yıldırım. 2022. Statistic selection and MCMC for differentially private Bayesian estimation. Statistics and Computing 32, 5 (2022), 66.
[3]
Leighton Pate Barnes, Wei-Ning Chen, and Ayfer Özgür. 2020. Fisher information under local differential privacy. IEEE Journal on Selected Areas in Information Theory 1, 3 (2020), 645–659.
[4]
Karuna Bhaila, Wen Huang, Yongkai Wu, and Xintao Wu. 2024. Local differential privacy in graph neural networks: A reconstruction approach. In Proceedings of the 2024 SIAM International Conference on Data Mining (SDM ’24). SIAM, 1–9.
[5]
Graham Cormode and Akash Bharadwaj. 2022. Sample-and-threshold differential privacy: Histograms and applications. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 1420–1431.
[6]
Graham Cormode, Tejas Kulkarni, and Divesh Srivastava. 2018. Marginal release under local differential privacy. In Proceedings of the 2018 International Conference on Management of Data, 131–146.
[7]
Cynthia Dwork. 2006. Differential privacy. In International Colloquium on Automata, Languages, and Programming. Springer, 1–12.
[8]
James Foulds, Joseph Geumlek, Max Welling, and Kamalika Chaudhuri. 2016. On the theory and practice of privacy-preserving Bayesian data analysis. arXiv:1603.07294. Retrieved from https://arxiv.org/abs/1603.07294
[9]
Richard D. Gill and Boris Y. Levit. 1995. Applications of the van trees inequality: A Bayesian CraméR-Rao bound. Bernoulli 1, 1/2 (1995), 59–79. Retrieved from http://www.jstor.org/stable/3318681
[10]
Jinyuan Jia and Neil Zhenqiang Gong. 2019. Calibrate: Frequency estimation and heavy hitter identification with local differential privacy via incorporating Prior knowledge. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2008–2016.
[11]
Matthew Joseph, Janardhan Kulkarni, Jieming Mao, and Steven Z. Wu. 2019. Locally private gaussian estimation. In Advances in Neural Information Processing Systems. H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2019/file/a588a6199feff5ba48402883d9b72700-Paper.pdf
[12]
Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2016. Extremal mechanisms for local differential privacy. The Journal of Machine Learning Research 17, 1 (2016), 492–542.
[13]
Vishesh Karwa, Aleksandra B. Slavković, and Pavel Krivitsky. 2014. Differentially private exponential random graphs. In Privacy in Statistical Databases. Josep Domingo-Ferrer (Ed.). Springer International Publishing, Cham, 143–155.
[14]
Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2011. What can we learn privately? SIAM Journal on Computing 40, 3 (2011), 793–826. DOI:
[15]
Chansoo Kim, Jinhyouk Jung, and Younshik Chung. 2011. Bayesian estimation for the exponentiated weibull model under type II progressive censoring. Statistical Papers 52, 1 (2011), 53–70. DOI:
[16]
Tianyu Liu, Lulu Zhang, Guang Jin, and Zhengqiang Pan. 2022. Reliability assessment of heavily censored data based on E-Bayesian estimation. Mathematics 10, 22 (2022). DOI:
[17]
Showkat Ahmad Lone, Hanieh Panahi, Sadia Anwar, and Sana Shahab. 2024. Inference of reliability model with burr type XII distribution under two sample balanced progressive censored samples. Physica Scripta 99, 2 (Jan. 2024), 025019. DOI:
[18]
Milan Lopuhaä-Zwakenberg, Boris Škorić, and Ninghui Li. 2022. Fisher information as a utility metric for frequency estimation under local differential privacy. In Proceedings of the 21st Workshop on Privacy in the Electronic Society, 41–53.
[19]
Eric Mazumdar, Aldo Pacchiano, Yi-An Ma, Peter L. Bartlett, and Michael I. Jordan. 2020. On approximate Thompson sampling with langevin algorithms. In Proceedings of the 37th International Conference on Machine Learning (ICML’20). JMLR.org, Article 631, 11 pages.
[20]
Christos Pelekis and Jan Ramon. 2017. Hoeffding’s inequality for sums of dependent random variables. Mediterranean Journal of Mathematics 14, 6 (2017), 243. DOI:
[21]
Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. 2018. A tutorial on Thompson sampling. Foundations and Trends in Machine Learning 11, 1 (2018), 1–96. Retrieved from http://dblp.uni-trier.de/db/journals/ftml/ftml11.html#RussoRKOW18
[22]
Lukas Steinberger. 2024. Efficiency in local differential privacy. arXiv:2301.10600. Retrieved from https://arxiv.org/abs/2301.10600
[23]
M. Wang, H. Jiang, P. Peng, and Y. Li. 2024. Accurately estimating frequencies of relations with relation privacy preserving in decentralized networks. IEEE Transactions on Mobile Computing 23, 5 (May 2024), 6408–6422. DOI:
[24]
Shaowei Wang, Liusheng Huang, Pengzhan Wang, Yiwen Nie, Hongli Xu, Wei Yang, Xiang-Yang Li, and Chunming Qiao. 2016. Mutual information optimally local private discrete distribution estimation. arXiv:1607.08025. Retrieved from https://arxiv.org/abs/1607.08025
[25]
S. Wang, Y. Li, Y. Zhong, K. Chen, X. Wang, Z. Zhou, F. Peng, Y. Qian, J. Du, and W. Yang. 2024. Locally private set-valued data analyses: Distribution and heavy hitters estimation. IEEE Transactions on Mobile Computing 23, 8 (Aug 2024), 8050–8065. DOI:
[26]
Tianhao Wang, Jeremiah Blocki, Ninghui Li, and Somesh Jha. 2017. Locally differentially private protocols for frequency estimation. In Proceedings of the 26th USENIX Security Symposium (USENIX Security ’17), 729–745.
[27]
Tianhao Wang, Milan Lopuhaä-Zwakenberg, Zitao Li, Boris Skoric, and Ninghui Li. 2020. Locally differentially private frequency estimation with consistency. In Proceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS ’20). 16 pages. DOI:
[28]
Ian Waudby-Smith, Steven Wu, and Aaditya Ramdas. 2023. Nonparametric extensions of randomized response for private confidence sets. In Proceedings of the International Conference on Machine Learning. PMLR, 36748–36789.
[29]
Fei Wei, Ergute Bao, Xiaokui Xiao, Yin Yang, and Bolin Ding. 2024. AAA: An adaptive mechanism for locally differential private mean estimation. arXiv:2404.01625. Retrieved from https://arxiv.org/abs/2404.01625
[30]
Max Welling and Yee Whye Teh. 2011. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML’11). Omnipress, Madison, WI, 681–688.
[31]
Oliver Williams and Frank Mcsherry. 2010. Probabilistic inference and differential privacy. In Advances in Neural Information Processing Systems. J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta (Eds.), Vol. 23, Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2010/file/fb60d411a5c5b72b2e7d3527cfc84fd0-Paper.pdf
[32]
Sinan Yıldırım. 2024. Differentially private online Bayesian estimation with adaptive truncation. Turkish Journal of Electrical Engineering and Computer Sciences 32, 2 (2024), 34–50. Retrieved from http://dblp.uni-trier.de/db/journals/ftml/ftml11.html#RussoRKOW18
[33]
Dan Zhao, Su-Yun Zhao, Hong Chen, Rui-Xuan Liu, Cui-Ping Li, and Xiao-Ying Zhang. 2023. Hadamard encoding based frequent itemset mining under local differential privacy. Journal of Computer Science and Technology 38, 6 (2023), 1403–1422.
[34]
Youwen Zhu, Yiran Cao, Qiao Xue, Qihui Wu, and Yushu Zhang. 2024. Heavy hitter identification over large-domain set-valued data with local differential privacy. IEEE Transactions on Information Forensics and Security 19 (2024), 414–426. DOI:

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 19, Issue 2
February 2025
153 pages
EISSN:1556-472X
DOI:10.1145/3703012
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2025
Online AM: 03 December 2024
Accepted: 21 November 2024
Revised: 08 October 2024
Received: 11 May 2024
Published in TKDD Volume 19, Issue 2

Check for updates

Author Tags

  1. Data privacy
  2. randomized response mechanisms
  3. posterior sampling
  4. stochastic gradient Langevin dynamics

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 349
    Total Downloads
  • Downloads (Last 12 months)349
  • Downloads (Last 6 weeks)260
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media