Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3618408.3618540guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Label differential privacy and private training data release

Published: 23 July 2023 Publication History

Abstract

We study differentially private mechanisms for sharing training data in machine learning settings. Our goal is to enable learning of an accurate predictive model while protecting the privacy of each user's label. Previous work established privacy guarantees that assumed the features are public and given exogenously, a setting known as label differential privacy. In some scenarios, this can be a strong assumption that removes the interplay between features and labels from the privacy analysis. We relax this approach and instead assume the features are drawn from a distribution that depends on the private labels. We first show that simply adding noise to the label, as in previous work, can lead to an arbitrarily weak privacy guarantee, and also present methods for estimating this privacy loss from data. We then present a new mechanism that replaces some training examples with synthetically generated data, and show that our mechanism has a much better privacy-utility tradeoff if the synthetic data is realistic, in a certain quantifiable sense. Finally, we empirically validate our theoretical analysis.

References

[1]
R. Bassily, O. Thakkar, and A. Guha Thakurta. Model-agnostic private learning. Advances in Neural Information Processing Systems, 31, 2018.
[2]
A. Beimel, K. Nissim, and U. Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 363-378. Springer, 2013.
[3]
R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. Acm Sigkdd Explorations Newsletter, 9(2): 75-79, 2007.
[4]
J. Birrell, P. Dupuis, M. A. Katsoulakis, L. Rey-Bellet, and J. Wang. Variational representations and neural network estimation of rnyi divergences, 2020. URL https://arxiv.org/abs/2007.03814.
[5]
O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. In Proceedings of the 2010 International Conference on Yahoo! Learning to Rank Challenge - Volume 14, YLRC'10, page 124. JMLR.org, 2010.
[6]
K. Chaudhuri and D. J. Hsu. Sample complexity bounds for differentially private learning. In S. M. Kakade and U. von Luxburg, editors, COLT 2011 - The 24th Annual Conference on Learning Theory, June 9-11, 2011, Budapest, Hungary, volume 19 of JMLR Proceedings, pages 155-186. JMLR.org, 2011.
[7]
C. J. Clopper and E. S. Pearson. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4):404-413, 1934.
[8]
Criteo. http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/, 2014.
[9]
L. Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141-142, 2012.
[10]
E. Diemert, R. Fabre, A. Gilotte, F. Jia, B. Leparmentier, J. Mary, Z. Qu, U. Tanielian, and H. Yang. Lessons from the adkdd'21 privacy-preserving ml challenge, 2022. URL https://arxiv.org/abs/2201.13123.
[11]
C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3-4):211-407, 2014.
[12]
H. Esfandiari, V. Mirrokni, U. Syed, and S. Vassilvitskii. Label differential privacy via clustering. In International Conference on Artificial Intelligence and Statistics, pages 7055-7075. PMLR, 2022.
[13]
B. Ghazi, N. Golowich, R. Kumar, P. Manurangsi, and C. Zhang. Deep learning with label differential privacy. Advances in Neural Information Processing Systems, 34: 27131-27145, 2021.
[14]
Kaggle. Kaggle competitions. https://www.kaggle.com/competitions, 2022.
[15]
I. Mironov. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pages 263-275. IEEE, 2017.
[16]
M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014. OpenAI. Gpt-4 technical report, 2023.
[17]
H. Reeve and Kabán. Classification with unknown class-conditional label noise on non-compact feature spaces. In A. Beygelzimer and D. Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pages 2624-2651, Phoenix, USA, 25-28 Jun 2019. PMLR.
[18]
C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S. Mahdavi, R. G. Lopes, T. Salimans, J. Ho, D. J. Fleet, and M. Norouzi. Photorealistic text-to-image diffusion models with deep language understanding, 2022.
[19]
N.-M. A. R. S. G. G. Stamper, J. and K. Koedinger. Algebra i 2008-2009. challenge data set from kdd cup 2010 educational data mining challenge. find it at http://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp, 2010.
[20]
T. Steinke and J. Ullman. The pitfalls of average-case differential privacy, 2020. https://differentialprivacy.org/average-case-dp/.
[21]
A. Triastcyn and B. Faltings. Bayesian differential privacy for machine learning. In International Conference on Machine Learning, pages 9583-9592. PMLR, 2020.
[22]
D. Wang and J. Xu. On sparse linear regression in the local differential privacy model. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6628-6637. PMLR, 09-15 Jun 2019. URL https://proceedings.mlr.press/v97/wang19m.html.
[23]
R. Wu, J. P. Zhou, K. Q. Weinberger, and C. Guo. Does label differential privacy prevent label inference attacks? arXiv preprint arXiv:2202.12968, 2022.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'23: Proceedings of the 40th International Conference on Machine Learning
July 2023
43479 pages

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media