Hierarchical Aggregation for Numerical Data under Local Differential Privacy
Abstract
:1. Introduction
- (1)
- A locally hierarchical perturbation method is proposed, an LHP (locally hierarchical perturb) algorithm, for numerical data. This method not only solves the problem that the privacy level needs to be protected, but it also ensures the requirement that the output value domain is the same when different data are perturbed locally;
- (2)
- A privacy level conversion method, a PLC (privacy level convert) algorithm, is proposed to increase the amount of available data for each privacy level and thus improve the accuracy of mean estimation, which solves the problem of the reduced statistical accuracy of data caused by data hierarchy;
- (3)
- Based on the LHP and PLC algorithms, a hierarchical aggregation method, a HierA algorithm, is proposed for numerical data under local differential privacy, which achieves the hierarchical collection of privacy data and improves data availability while ensuring that users’ privacy needs are satisfied;
- (4)
- The proposed hierarchical collection method was applied to small-batch stochastic gradient descent to complete a linear regression task and obtain more accurate prediction models while protecting user privacy;
- (5)
- Experimental comparisons with other existing methods on real and synthetic datasets with different distributions were conducted to demonstrate that the proposed method has better usability than the existing methods.
2. Related Work
3. Preliminaries and Problem Definition
3.1. Preliminaries
Algorithm 1. Harmony [15] |
Input: user’s numerical data v, and their privacy budget ε |
Output: perturbed data v* |
1. Discretize |
; |
2. Perturb |
; |
3. Calibrate |
; |
4. Return |
Algorithm 2. Piecewise Mechanism [16] |
Input: user’s numerical data v, and their privacy budget ε |
Output: perturbed data |
1. Sample uniformly at random from ; |
2. If : |
Sample uniformly at random from ; |
3. Otherwise: |
Sample uniformly at random from ; |
4. Return |
3.2. Problem Definition
4. Hierarchical Aggregation for Numerical Data
4.1. Local Hierarchical Perturbation Mechanism on the User Side
Algorithm 3. Local hierarchical perturbation (LHP) |
Input: the user’s privacy data , the set of subintervals of the data value field and the set of privacy budgets corresponding to each subinterval |
Output: perturbed tuple <ti*, vi*> |
1. According to the interval in which vi is located, find out its corresponding pri-vacy level ; |
2. Perturb ti: |
where ; |
3. Discretize vi: |
; |
4. Perturb vi*: |
; |
5. Obtain the perturbed tuple <ti*, vi*>. |
4.2. Calibration Analysis at the Collection End
Algorithm 4. Privacy level conversion (PLC) |
Input: the dataset Vi with privacy level i, the number of times of data reuse μ and the set of privacy budgets |
Output: the set of converted datasets |
1. For j = i + 1 to |
2. For v in Vi: |
3. |
where , ; |
4. Obtain . |
Algorithm 5. Hierarchical aggregation for numerical data |
Input: the users’dataset the set of subintervals of the data value field , the set of privacy budgets and the number of data reuse |
Output: estimated mean |
User side: |
1. The user perturbs their data locally using the LHP method to obtain the perturbed tuple : |
; |
Collection side: |
2. The received user data are classified according to the privacy level to obtain b ; |
3. The classified dataset is transformed using the PLC algorithm to obtain the transformed dataset: |
4. The following dataset matrix is obtained, with each column representing a privacy level: |
5. The datasets with the same privacy level are merged and the compensation dataset is added: |
: |
: |
: |
otherwise : |
Obtain ; |
6. Perform the following for each dataset in : |
The number of 1 and −1 in is denoted as n1 and n2, respectively: |
; |
; |
Correct and by making them equal to N if they are greater than N or equal to 0 if they are less than 0. |
; |
7. Calculate the estimated mean:. |
4.3. Privacy and Usability Analysis
- Perturbation for user privacy level ti:
- 2.
- Perturbation for user privacy data vi:
5. Application of Hierarchical Aggregation in Stochastic Gradient Descent
6. Experiments
6.1. Different Data Reuse Times μ
6.2. Comparison of Different Methods
6.3. Small-Batch Stochastic Gradient Descent
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Proceedings of the 24th Annual Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, 28 May–1 June 2006; pp. 486–503. [Google Scholar]
- Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Proceedings of the 3rd Theory of Cryptography Conference, New York, NY, USA, 4–7 March 2006; pp. 265–284. [Google Scholar]
- Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor.Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
- Kasiviswanathan, S.P.; Lee, H.K.; Nissim, K.; Raskhodnikova, S.; Smith, A. What can we learn privately? SIAM J. Comput. 2011, 40, 793–826. [Google Scholar] [CrossRef]
- Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Local privacy and statistical minimax rates. In Proceedings of the IEEE 54th Annual Symposium on Foundations of Computer Science, Berkeley, CA, USA, 26–29 October 2013; pp. 429–438. [Google Scholar]
- Ye, Q.; Meng, X.; Zhu, M.; Huo, Z. Survey on local differential privacy. J. Softw. 2018, 29, 159–183. [Google Scholar]
- Wang, T.; Zhang, X.; Feng, J.; Yang, X. A Comprehensive Survey on Local Differential Privacy toward Data Statistics and Analysis. Sensors 2020, 20, 7030. [Google Scholar] [CrossRef] [PubMed]
- Learning with Privacy at Scale. Available online: https://machinelearning.apple.com/research/learningwith-privacy-at-scale (accessed on 31 December 2017).
- Erlingsson, Ú.; Pihur, V.; Korolova, A. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014; pp. 1054–1067. [Google Scholar]
- Ding, B.; Kulkarni, J.; Yekhanin, S. Collecting telemetry data privately. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3571–3580. [Google Scholar]
- Kairouz, P.; Bonawitz, K.; Ramage, D. Discrete distribution estimation under local privacy. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2436–2444. [Google Scholar]
- Wang, T.; Blocki, J.; Li, N.; Jha, S. Locally differentially private protocols for frequency estimation. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Baltimore, MD, USA, 15–17 August 2017; pp. 729–745. [Google Scholar]
- Wang, T.; Li, N.; Jha, S. Locally differentially private frequent itemset mining. In Proceedings of the IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–23 May 2018; pp. 127–143. [Google Scholar]
- Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Minimax optimal procedures for locally private estimation. J. Am. Stat. Assoc. 2018, 113, 182–201. [Google Scholar] [CrossRef]
- Nguyên, T.T.; Xiao, X.; Yang, Y.; Hui, S.C.; Shin, H.; Shin, J. Collecting and analyzing data from smart device users with local differential privacy. arXiv 2016, arXiv:1606.05053. [Google Scholar]
- Wang, N.; Xiao, X.; Yang, Y.; Zhao, J.; Hui, S.C.; Shin, H.; Shin, J.; Yu, G. Collecting and Analyzing Multidimensional Data with Local Differential Privacy. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE), Macao, China, 8–11 April 2019; pp. 638–649. [Google Scholar]
- Warner, S.L. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef] [PubMed]
- Gu, X.; Li, M.; Xiong, L.; Cao, Y. Providing Input-Discriminative Protection for Local Differential Privacy. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; pp. 505–516. [Google Scholar]
- Akter, M.; Hashem, T. Computing aggregates over numeric data with personalized local differential privacy. In Proceedings of the Australasian Conference on Information Security and Privacy, Auckland, New Zealand, 3–5 July 2017; pp. 249–260. [Google Scholar]
- NIE, Y.; Yang, W.; Huang, L.; Xie, X.; Zhao, Z.; Wang, S. A Utility-Optimized Framework for Personalized Private Histogram Estimation. IEEE Trans. Knowl. Data Eng. 2019, 31, 655–669. [Google Scholar] [CrossRef]
- Shen, Z.; Xia, Z.; Yu, P. PLDP: Personalized Local Differential Privacy for Multidimensional Data Aggregation. Secur. Commun. Netw. 2021, 2021, 6684179. [Google Scholar] [CrossRef]
- Ye, Q.; Hu, H.; Meng, X.; Zheng, H. PrivKV: Key-Value Data Collection with Local Differential Privacy. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019. [Google Scholar]
- Gu, X.; Li, M.; Cheng, Y.; Xiong, L.; Cao, Y. PCKV: Locally Differentially Private Correlated Key-Value Data Collection with Optimized Utility. In Proceedings of the 29th USENIX Security Symposium (USENIX seurity 20), Boston, MA, USA, 12–14 August 2020; pp. 967–984. [Google Scholar]
- Zhang, X.; Xu, Y.; Fu, N.; Meng, X. Towards Private Key-Value Data Collection with Histogram. J. Comput. Res. Dev. 2021, 58, 624–637. [Google Scholar]
- Sun, L.; Ye, X.; Zhao, J.; Lu, C.; Yang, M. BiSample: Bidirectional sampling for handling missing data with local differential privacy. In Proceedings of the International Conference on Database Systems for Advanced Applications, Jeju, Republic of Korea, 24–27 September 2020; pp. 88–104. [Google Scholar]
- IPUMS. Integrated Public use Microdata Series. Available online: https://www.ipums.org (accessed on 28 March 2022).
Symbol | Description |
---|---|
User set indicating the i-th user | |
User’s dataset denoting user data | |
Subintervals of data value fields by privacy level | |
Collection of privacy budgets for user data, with k levels | |
Number of data reused | |
vi* | User data after perturbation |
Privacy level of user data | |
ti* | Privacy level after perturbation |
p | Probability of user data remaining unchanged |
m | True mean of the user’s dataset |
Estimated mean of the user’s dataset |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hao, M.; Wu, W.; Wan, Y. Hierarchical Aggregation for Numerical Data under Local Differential Privacy. Sensors 2023, 23, 1115. https://doi.org/10.3390/s23031115
Hao M, Wu W, Wan Y. Hierarchical Aggregation for Numerical Data under Local Differential Privacy. Sensors. 2023; 23(3):1115. https://doi.org/10.3390/s23031115
Chicago/Turabian StyleHao, Mingchao, Wanqing Wu, and Yuan Wan. 2023. "Hierarchical Aggregation for Numerical Data under Local Differential Privacy" Sensors 23, no. 3: 1115. https://doi.org/10.3390/s23031115