Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3411501.3419419acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Neither Private Nor Fair: Impact of Data Imbalance on Utility and Fairness in Differential Privacy

Published: 09 November 2020 Publication History

Abstract

Deployment of deep learning in different fields and industries is growing day by day due to its performance, which relies on the availability of data and compute. Data is often crowd-sourced and contains sensitive information about its contributors, which leaks into models that are trained on it. To achieve rigorous privacy guarantees, differentially private training mechanisms are used. However, it has recently been shown that differential privacy can exacerbate existing biases in the data and have disparate impacts on the accuracy of different subgroups of data. In this paper, we aim to study these effects within differentially private deep learning. Specifically, we aim to study how different levels of imbalance in the data affect the accuracy and the fairness of the decisions made by the model, given different levels of privacy. We demonstrate that even small imbalances and loose privacy guarantees can cause disparate impacts.

References

[1]
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 308--318.
[2]
John M Abowd. 2018. The US Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2867--2867.
[3]
Mário S Alvim, Miguel E Andrés, Konstantinos Chatzikokolakis, Pierpaolo Degano, and Catuscia Palamidessi. 2011. Differential privacy: on the trade-off between utility and information leakage. In International Workshop on Formal Aspects in Security and Trust. Springer, 39--54.
[4]
Devansh Arpit, Stanisław Jastrzundefinedbski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and et al. 2017. A Closer Look at Memorization in Deep Networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML'17). JMLR.org, 233--242.
[5]
Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. 2019. Differential privacy has disparate impact on model accuracy. In Advances in Neural Information Processing Systems. 15479--15488.
[6]
Su Lin Blodgett, Lisa Green, and Brendan O'Connor. 2016. Demographic dialectal variation in social media: A case study of African-American English. arXiv preprint arXiv:1608.08868 (2016).
[7]
Su Lin Blodgett, Johnny Wei, and Brendan O'Connor. 2018. Twitter universal dependency parsing for African-American and mainstream American English. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1415--1425.
[8]
Zhiqi Bu, Jinshuo Dong, Qi Long, and Weijie J Su. 2019. Deep learning with Gaussian differential privacy. arXiv preprint arXiv:1911.11607 (2019).
[9]
Nicholas Carlini, Chang Liu, Jernej Kos, Ú lfar Erlingsson, and Dawn Song. 2018. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. CoRR, Vol. abs/1802.08232 (2018). arxiv: 1802.08232 http://arxiv.org/abs/1802.08232
[10]
Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2009. Differentially Private Empirical Risk Minimization. arXiv preprint arXiv:0912.0071 (2009). arxiv: 0912.0071 [cs.LG]
[11]
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006 a. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 486--503.
[12]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006 b. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265--284.
[13]
Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2014. Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing. In Proceedings of the 23rd USENIX Conference on Security Symposium (San Diego, CA) (SEC'14). USENIX Association, USA, 17--32.
[14]
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315--3323.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR, Vol. abs/1512.03385 (2015). arxiv: 1512.03385 http://arxiv.org/abs/1512.03385
[16]
Matthew Jagielski, Michael Kearns, Jieming Mao, Alina Oprea, Aaron Roth, Saeed Sharifi-Malvajerdi, and Jonathan Ullman. 2018. Differentially private fair learning. arXiv preprint arXiv:1812.02696 (2018).
[17]
Peter Kairouz, Jiachun Liao, Chong Huang, and Lalitha Sankar. 2019. Censored and Fair Universal Representations using Generative Adversarial Models. arXiv (2019), arXiv--1910.
[18]
Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2019. An empirical study of rich subgroup fairness for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 100--109.
[19]
Satya Kuppam, Ryan McKenna, David Pujol, Michael Hay, Ashwin Machanavajjhala, and Gerome Miklau. 2019. Fair decision making using privacy-protected data. arXiv preprint arXiv:1905.12744 (2019).
[20]
Zachary Lipton, Julian McAuley, and Alexandra Chouldechova. 2018. Does mitigating ML's impact disparity require treatment disparity?. In Advances in Neural Information Processing Systems. 8125--8135.
[21]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).
[22]
H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, et almbox. 2016. Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629 (2016).
[23]
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635 (2019).
[24]
Michele Merler, Nalini Ratha, Rogerio S Feris, and John R Smith. 2019. Diversity in faces. arXiv preprint arXiv:1901.10436 (2019).
[25]
Fatemehsadat Mireshghallah, Mohammadkazem Taram, Praneeth Vepakomma, Abhishek Singh, Ramesh Raskar, and Hadi Esmaeilzadeh. 2020. Privacy in Deep Learning: A Survey. In ArXiv, Vol. abs/2004.12254.
[26]
Ilya Mironov. 2017. Renyi Differential Privacy. CoRR, Vol. abs/1702.07476 (2017). arxiv: 1702.07476 http://arxiv.org/abs/1702.07476
[27]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. https://github.com/pytorch/
[28]
R. Shokri, M. Stronati, C. Song, and V. Shmatikov. 2017. Membership Inference Attacks Against Machine Learning Models. In IEEE Symposium on Security and Privacy (S&P).
[29]
Sahib Singh, Harshvardhan Sikka, Sasikanth Kotti, and Andrew Trask. 2020. Benchmarking Differentially Private Residual Networks for Medical Imagery. arXiv preprint arXiv:2005.13099 (2020).
[30]
Stuart A. Thompson and Charlie Warzel. 2019. The Privacy Project: Twelve Million Phones, One Dataset, Zero Privacy. online accessed February 2020 https://www.nytimes.com/interactive/2019/12/19/opinion/location-tracking-cell-phone.html.
[31]
Berk Ustun, Alexander Spangher, and Yang Liu. 2019. Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 10--19.
[32]
Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. 2018. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8769--8778.
[33]
Zhifei Zhang, Yang Song, and Hairong Qi. 2017. Age progression/regression by conditional adversarial autoencoder. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5810--5818.

Cited By

View all
  • (2024)Finnish perspective on using synthetic health data to protect privacy: the PRIVASA projectApplied Computing and Intelligence10.3934/aci.20240094:2(138-163)Online publication date: 2024
  • (2024)Securing Collaborative Medical AI by Using Differential Privacy: Domain Transfer for Classification of Chest RadiographsRadiology: Artificial Intelligence10.1148/ryai.2302126:1Online publication date: 1-Jan-2024
  • (2024)Incentivising the federation: gradient-based metrics for data selection and valuation in private decentralised trainingProceedings of the 2024 European Interdisciplinary Cybersecurity Conference10.1145/3655693.3660253(179-185)Online publication date: 5-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPMLP'20: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice
November 2020
75 pages
ISBN:9781450380881
DOI:10.1145/3411501
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bias
  2. data imbalance
  3. deep learning
  4. differential privacy
  5. fairness

Qualifiers

  • Research-article

Conference

CCS '20
Sponsor:

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)175
  • Downloads (Last 6 weeks)17
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Finnish perspective on using synthetic health data to protect privacy: the PRIVASA projectApplied Computing and Intelligence10.3934/aci.20240094:2(138-163)Online publication date: 2024
  • (2024)Securing Collaborative Medical AI by Using Differential Privacy: Domain Transfer for Classification of Chest RadiographsRadiology: Artificial Intelligence10.1148/ryai.2302126:1Online publication date: 1-Jan-2024
  • (2024)Incentivising the federation: gradient-based metrics for data selection and valuation in private decentralised trainingProceedings of the 2024 European Interdisciplinary Cybersecurity Conference10.1145/3655693.3660253(179-185)Online publication date: 5-Jun-2024
  • (2024)Differentially Private Fair Binary Classifications2024 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT57864.2024.10619147(611-616)Online publication date: 7-Jul-2024
  • (2024)A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results2024 IEEE 37th Computer Security Foundations Symposium (CSF)10.1109/CSF61375.2024.00039(1-16)Online publication date: 8-Jul-2024
  • (2024)A Methodology and an Empirical Analysis to Determine the Most Suitable Synthetic Data GeneratorIEEE Access10.1109/ACCESS.2024.335427712(12209-12228)Online publication date: 2024
  • (2024)Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imagingCommunications Medicine10.1038/s43856-024-00462-64:1Online publication date: 14-Mar-2024
  • (2024)Make your data fair: A survey of data preprocessing techniques that address biases in data towards fair AIJournal of Engineering Research10.1016/j.jer.2024.06.016Online publication date: Jul-2024
  • (2024)Optimizing data privacy: an RFD-based approach to anonymization strategy selectionThe Journal of Supercomputing10.1007/s11227-024-06642-481:1Online publication date: 6-Nov-2024
  • (2024)On the impact of multi-dimensional local differential privacy on fairnessData Mining and Knowledge Discovery10.1007/s10618-024-01031-038:4(2252-2275)Online publication date: 27-May-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media