research-article

Neither Private Nor Fair: Impact of Data Imbalance on Utility and Fairness in Differential Privacy

Authors:

Fatemehsadat Mireshghallah,

Andrew TraskAuthors Info & Claims

PPMLP'20: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice

Pages 15 - 19

https://doi.org/10.1145/3411501.3419419

Published: 09 November 2020 Publication History

Abstract

Deployment of deep learning in different fields and industries is growing day by day due to its performance, which relies on the availability of data and compute. Data is often crowd-sourced and contains sensitive information about its contributors, which leaks into models that are trained on it. To achieve rigorous privacy guarantees, differentially private training mechanisms are used. However, it has recently been shown that differential privacy can exacerbate existing biases in the data and have disparate impacts on the accuracy of different subgroups of data. In this paper, we aim to study these effects within differentially private deep learning. Specifically, we aim to study how different levels of imbalance in the data affect the accuracy and the fairness of the decisions made by the model, given different levels of privacy. We demonstrate that even small imbalances and loose privacy guarantees can cause disparate impacts.

References

[1]

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 308--318.

Digital Library

[2]

John M Abowd. 2018. The US Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2867--2867.

Digital Library

[3]

Mário S Alvim, Miguel E Andrés, Konstantinos Chatzikokolakis, Pierpaolo Degano, and Catuscia Palamidessi. 2011. Differential privacy: on the trade-off between utility and information leakage. In International Workshop on Formal Aspects in Security and Trust. Springer, 39--54.

[4]

Devansh Arpit, Stanisław Jastrzundefinedbski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and et al. 2017. A Closer Look at Memorization in Deep Networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML'17). JMLR.org, 233--242.

[5]

Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. 2019. Differential privacy has disparate impact on model accuracy. In Advances in Neural Information Processing Systems. 15479--15488.

[6]

Su Lin Blodgett, Lisa Green, and Brendan O'Connor. 2016. Demographic dialectal variation in social media: A case study of African-American English. arXiv preprint arXiv:1608.08868 (2016).

[7]

Su Lin Blodgett, Johnny Wei, and Brendan O'Connor. 2018. Twitter universal dependency parsing for African-American and mainstream American English. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1415--1425.

[8]

Zhiqi Bu, Jinshuo Dong, Qi Long, and Weijie J Su. 2019. Deep learning with Gaussian differential privacy. arXiv preprint arXiv:1911.11607 (2019).

[9]

Nicholas Carlini, Chang Liu, Jernej Kos, Ú lfar Erlingsson, and Dawn Song. 2018. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. CoRR, Vol. abs/1802.08232 (2018). arxiv: 1802.08232 http://arxiv.org/abs/1802.08232

[10]

Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2009. Differentially Private Empirical Risk Minimization. arXiv preprint arXiv:0912.0071 (2009). arxiv: 0912.0071 [cs.LG]

[11]

Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006 a. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 486--503.

Digital Library

[12]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006 b. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265--284.

[13]

Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2014. Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing. In Proceedings of the 23rd USENIX Conference on Security Symposium (San Diego, CA) (SEC'14). USENIX Association, USA, 17--32.

[14]

Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315--3323.

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR, Vol. abs/1512.03385 (2015). arxiv: 1512.03385 http://arxiv.org/abs/1512.03385

[16]

Matthew Jagielski, Michael Kearns, Jieming Mao, Alina Oprea, Aaron Roth, Saeed Sharifi-Malvajerdi, and Jonathan Ullman. 2018. Differentially private fair learning. arXiv preprint arXiv:1812.02696 (2018).

[17]

Peter Kairouz, Jiachun Liao, Chong Huang, and Lalitha Sankar. 2019. Censored and Fair Universal Representations using Generative Adversarial Models. arXiv (2019), arXiv--1910.

[18]

Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2019. An empirical study of rich subgroup fairness for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 100--109.

Digital Library

[19]

Satya Kuppam, Ryan McKenna, David Pujol, Michael Hay, Ashwin Machanavajjhala, and Gerome Miklau. 2019. Fair decision making using privacy-protected data. arXiv preprint arXiv:1905.12744 (2019).

[20]

Zachary Lipton, Julian McAuley, and Alexandra Chouldechova. 2018. Does mitigating ML's impact disparity require treatment disparity?. In Advances in Neural Information Processing Systems. 8125--8135.

[21]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).

Digital Library

[22]

H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, et almbox. 2016. Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629 (2016).

[23]

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635 (2019).

[24]

Michele Merler, Nalini Ratha, Rogerio S Feris, and John R Smith. 2019. Diversity in faces. arXiv preprint arXiv:1901.10436 (2019).

[25]

Fatemehsadat Mireshghallah, Mohammadkazem Taram, Praneeth Vepakomma, Abhishek Singh, Ramesh Raskar, and Hadi Esmaeilzadeh. 2020. Privacy in Deep Learning: A Survey. In ArXiv, Vol. abs/2004.12254.

[26]

Ilya Mironov. 2017. Renyi Differential Privacy. CoRR, Vol. abs/1702.07476 (2017). arxiv: 1702.07476 http://arxiv.org/abs/1702.07476

[27]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. https://github.com/pytorch/

[28]

R. Shokri, M. Stronati, C. Song, and V. Shmatikov. 2017. Membership Inference Attacks Against Machine Learning Models. In IEEE Symposium on Security and Privacy (S&P).

[29]

Sahib Singh, Harshvardhan Sikka, Sasikanth Kotti, and Andrew Trask. 2020. Benchmarking Differentially Private Residual Networks for Medical Imagery. arXiv preprint arXiv:2005.13099 (2020).

[30]

Stuart A. Thompson and Charlie Warzel. 2019. The Privacy Project: Twelve Million Phones, One Dataset, Zero Privacy. online accessed February 2020 https://www.nytimes.com/interactive/2019/12/19/opinion/location-tracking-cell-phone.html.

[31]

Berk Ustun, Alexander Spangher, and Yang Liu. 2019. Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 10--19.

Digital Library

[32]

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. 2018. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8769--8778.

[33]

Zhifei Zhang, Yang Song, and Hairong Qi. 2017. Age progression/regression by conditional adversarial autoencoder. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5810--5818.

Cited By

Pitkämäki TPahikkala TPerez IMovahedi PNieminen VSoutherington TVaiste JJafaritadi MKhan MKontio ERanttila PPajula JPölönen HDegerli APlomp JAirola A(2024)Finnish perspective on using synthetic health data to protect privacy: the PRIVASA projectApplied Computing and Intelligence10.3934/aci.20240094:2(138-163)Online publication date: 2024
https://doi.org/10.3934/aci.2024009
Tayebi Arasteh SLotfinia MNolte TSähn MIsfort PKuhl CNebelung SKaissis GTruhn D(2024)Securing Collaborative Medical AI by Using Differential Privacy: Domain Transfer for Classification of Chest RadiographsRadiology: Artificial Intelligence10.1148/ryai.2302126:1Online publication date: 1-Jan-2024
https://doi.org/10.1148/ryai.230212
Usynin DRueckert DKaissis G(2024)Incentivising the federation: gradient-based metrics for data selection and valuation in private decentralised trainingProceedings of the 2024 European Interdisciplinary Cybersecurity Conference10.1145/3655693.3660253(179-185)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3655693.3660253
Show More Cited By

Index Terms

Neither Private Nor Fair: Impact of Data Imbalance on Utility and Fairness in Differential Privacy
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Human and societal aspects of security and privacy
  2. Security services
    1. Privacy-preserving protocols

Recommendations

Towards Understanding the fairness of differentially private margin classifiers
Abstract
Margin classifiers, such as Support Vector Machine, are usually critical in the high-stakes decision domains. In recent years, differential privacy has been widely employed in margin classifiers to protect user privacy. However, incorporating ...
A differentially private algorithm for location data release

The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Evaluating differentially private decision tree model over model inversion attack
Abstract
Machine learning techniques have been widely used and shown remarkable performance in various fields. Along with the widespread utilization of machine learning, concerns about privacy violations have been raised. Recently, as privacy invasion ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPMLP'20: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice

November 2020

75 pages

ISBN:9781450380881

DOI:10.1145/3411501

Program Chairs:
Benyu Zhang
Ant Group, USA
,
Raluca Ada Popa
UC Berkeley, USA
,
Matei Zaharia
Stanford University, USA
,
Guofei Gu
Texas A&M University, USA
,
Shouling Ji
Zhejiang University, China

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '20

Sponsor:

SIGSAC

CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security

November 9, 2020

Virtual Event, USA

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
753
Total Downloads

Downloads (Last 12 months)175
Downloads (Last 6 weeks)17

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pitkämäki TPahikkala TPerez IMovahedi PNieminen VSoutherington TVaiste JJafaritadi MKhan MKontio ERanttila PPajula JPölönen HDegerli APlomp JAirola A(2024)Finnish perspective on using synthetic health data to protect privacy: the PRIVASA projectApplied Computing and Intelligence10.3934/aci.20240094:2(138-163)Online publication date: 2024
https://doi.org/10.3934/aci.2024009
Tayebi Arasteh SLotfinia MNolte TSähn MIsfort PKuhl CNebelung SKaissis GTruhn D(2024)Securing Collaborative Medical AI by Using Differential Privacy: Domain Transfer for Classification of Chest RadiographsRadiology: Artificial Intelligence10.1148/ryai.2302126:1Online publication date: 1-Jan-2024
https://doi.org/10.1148/ryai.230212
Usynin DRueckert DKaissis G(2024)Incentivising the federation: gradient-based metrics for data selection and valuation in private decentralised trainingProceedings of the 2024 European Interdisciplinary Cybersecurity Conference10.1145/3655693.3660253(179-185)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3655693.3660253
Ghoukasian HAsoodeh S(2024)Differentially Private Fair Binary Classifications2024 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT57864.2024.10619147(611-616)Online publication date: 7-Jul-2024
https://doi.org/10.1109/ISIT57864.2024.10619147
Makhlouf KStefanović TArcolezi HPalamidessi C(2024)A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results2024 IEEE 37th Computer Security Foundations Symposium (CSF)10.1109/CSF61375.2024.00039(1-16)Online publication date: 8-Jul-2024
https://doi.org/10.1109/CSF61375.2024.00039
Kiran AKumar S(2024)A Methodology and an Empirical Analysis to Determine the Most Suitable Synthetic Data GeneratorIEEE Access10.1109/ACCESS.2024.335427712(12209-12228)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3354277
Tayebi Arasteh SZiller AKuhl CMakowski MNebelung SBraren RRueckert DTruhn DKaissis G(2024)Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imagingCommunications Medicine10.1038/s43856-024-00462-64:1Online publication date: 14-Mar-2024
https://doi.org/10.1038/s43856-024-00462-6
Tawakuli AEngel T(2024)Make your data fair: A survey of data preprocessing techniques that address biases in data towards fair AIJournal of Engineering Research10.1016/j.jer.2024.06.016Online publication date: Jul-2024
https://doi.org/10.1016/j.jer.2024.06.016
Sadeghi-Nasab ARahmani M(2024)Optimizing data privacy: an RFD-based approach to anonymization strategy selectionThe Journal of Supercomputing10.1007/s11227-024-06642-481:1Online publication date: 6-Nov-2024
https://doi.org/10.1007/s11227-024-06642-4
Makhlouf KArcolezi HZhioua SBrahim GPalamidessi C(2024)On the impact of multi-dimensional local differential privacy on fairnessData Mining and Knowledge Discovery10.1007/s10618-024-01031-038:4(2252-2275)Online publication date: 27-May-2024
https://doi.org/10.1007/s10618-024-01031-0
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents