Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3375627.3375865acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

Data Augmentation for Discrimination Prevention and Bias Disambiguation

Published: 07 February 2020 Publication History

Abstract

Machine learning models are prone to biased decisions due to biases in the datasets they are trained on. In this paper, we introduce a novel data augmentation technique to create a fairer dataset for model training that could also lend itself to understanding the type of bias existing in the dataset i.e. if bias arises from a lack of representation for a particular group (sampling bias) or if it arises because of human bias reflected in the labels (prejudice based bias). Given a dataset involving a protected attribute with a privileged and unprivileged group, we create an "ideal world'' dataset: for every data sample, we create a new sample having the same features (except the protected attribute(s)) and label as the original sample but with the opposite protected attribute value. The synthetic data points are sorted in order of their proximity to the original training distribution and added successively to the real dataset to create intermediate datasets. We theoretically show that two different notions of fairness: statistical parity difference (independence) and average odds difference (separation) always change in the same direction using such an augmentation. We also show submodularity of the proposed fairness-aware augmentation approach that enables an efficient greedy algorithm. We empirically study the effect of training models on the intermediate datasets and show that this technique reduces the two bias measures while keeping the accuracy nearly constant for three datasets. We then discuss the implications of this study on the disambiguation of sample bias and prejudice based bias and discuss how pre-processing techniques should be evaluated in general. The proposed method can be used by policy makers who want to use unbiased datasets to train machine learning models for their applications to add a subset of synthetic points to an extent that they are comfortable with to mitigate unwanted bias.

References

[1]
[n. d.]. ProPublica COMPAS. https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis.
[2]
Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.
[3]
Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Calif. L. Rev. 104 (2016), 671.
[4]
Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2019. AI Fairness 360: An Extensible Toolkit for Detecting and Mitigating Algorithmic Bias. IBM Journal of Research and Development 63, 4/5 (2019).
[5]
Toon Calders and Indre Zliobaite. 2013. Why unbiased computational processes can lead to discriminative decision procedures. In Discrimination and privacy in the information society. Springer, 43--57.
[6]
Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. 2017. Optimized Pre-Processing for Discrimination Prevention. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 3992--4001. http://papers.nips.cc/paper/6988-optimized-pre-processing-for-discrimination-prevention.pdf
[7]
Irene Chen, Fredrik D Johansson, and David Sontag. 2018. Why is my classifier discriminatory?. In Advances in Neural Information Processing Systems. 3539--3550.
[8]
Brian d'Alessandro, Cathy O'Neil, and Tom LaGatta. 2017. Conscientious classification: A data scientist's guide to discrimination-aware classification. Big data 5, 2 (2017), 120--134.
[9]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[10]
Sanghamitra Dutta, Dennis Wei, Hazar Yueksel, Pin-Yu Chen, Sijia Liu, and Kush R Varshney. 2019. An Information-Theoretic Perspective on the Relationship Between Fairness and Accuracy. arXiv preprint arXiv:1910.07870 (2019).
[11]
Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 259--268.
[12]
Sorelle A Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. 2016. On the (im) possibility of fairness. arXiv preprint arXiv:1609.07236 (2016).
[13]
Satoru Fujishige. 2005. Submodular functions and optimization. Vol. 58. Elsevier.
[14]
Pratik Gajane and Mykola Pechenizkiy. 2017. On formalizing fairness in prediction with machine learning. arXiv preprint arXiv:1710.03184 (2017).
[15]
Sara Hajian and Josep Domingo-Ferrer. 2012. A methodology for direct and indirect discrimination prevention in data mining. IEEE transactions on knowledge and data engineering 25, 7 (2012), 1445--1459.
[16]
Carol Isaac, Barbara Lee, and Molly Carnes. 2009. Interventions That Affect Gender Bias in Hiring: A Systematic Review. Academic medicine : journal of the Association of American Medical Colleges 84 (10 2009), 1440--6. https://doi.org/10.1097/ACM.0b013e3181b6ba00
[17]
Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (2012), 1--33.
[18]
Ron Kohavi. 1996. Scaling up the accuracy of naive-bayes classifiers: A decisiontree hybrid. In Kdd, Vol. 96. Citeseer, 202--207.
[19]
Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems. 4066--4076.
[20]
David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2019. Fairness through causal awareness: Learning causal latent-variable models for biased data. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 349--358.
[21]
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635 (2019).
[22]
Salvatore Ruggieri. 2014. Using t-closeness anonymity to control for nondiscrimination. Trans. Data Privacy 7, 2 (2014), 99--129.
[23]
Prasanna Sattigeri, Samuel C. Hoffman, Vijil Chenthamarakshan, and Kush R. Varshney. 2019. Fairness GAN: Generating Datasets with Fairness Properties using a Generative Adversarial Network. IBM Journal of Research and Development 63, 4/5 (2019).
[24]
Dennis Wei, Karthikeyan Natesan Ramamurthy, and Flavio du Pin Calmon. 2019. Optimized Score Transformation for Fair Classification. arXiv preprint arXiv:1906.00066 (2019).
[25]
Samuel Yeom and Michael Carl Tschantz. 2018. Discriminative but Not Discriminatory: A Comparison of Fairness Definitions under DifferentWorldviews. arXiv preprint arXiv:1808.08619 (2018).
[26]
Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International Conference on Machine Learning. 325--333.
[27]
Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 335--340.

Cited By

View all
  • (2024)Comprehensive Validation on Reweighting Samples for Bias Mitigation via AIF360Applied Sciences10.3390/app1409382614:9(3826)Online publication date: 30-Apr-2024
  • (2024)Toward Fair Ultrasound Computing Tomography: Challenges, Solutions and OutlookProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3660387(748-753)Online publication date: 12-Jun-2024
  • (2024)Enabling An Informed Contextual Multi-Armed Bandit Framework For Stock Trading With NeuroevolutionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664145(1924-1933)Online publication date: 14-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AIES '20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society
February 2020
439 pages
ISBN:9781450371100
DOI:10.1145/3375627
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. discrimination prevention
  2. fairness in machine learning
  3. responsible artificial intelligence

Qualifiers

  • Research-article

Conference

AIES '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 61 of 162 submissions, 38%

Upcoming Conference

AIES '24
AAAI/ACM Conference on AI, Ethics, and Society
October 21 - 23, 2024
San Jose , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)192
  • Downloads (Last 6 weeks)20
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Comprehensive Validation on Reweighting Samples for Bias Mitigation via AIF360Applied Sciences10.3390/app1409382614:9(3826)Online publication date: 30-Apr-2024
  • (2024)Toward Fair Ultrasound Computing Tomography: Challenges, Solutions and OutlookProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3660387(748-753)Online publication date: 12-Jun-2024
  • (2024)Enabling An Informed Contextual Multi-Armed Bandit Framework For Stock Trading With NeuroevolutionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664145(1924-1933)Online publication date: 14-Jul-2024
  • (2024)Representation Debiasing of Generated Data Involving Domain ExpertsAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3631700.3664910(516-522)Online publication date: 27-Jun-2024
  • (2024)On the relation of causality- versus correlation-based feature selection on model fairnessProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636018(56-64)Online publication date: 8-Apr-2024
  • (2024)FairBalance: How to Achieve Equalized Odds With Data Pre-ProcessingIEEE Transactions on Software Engineering10.1109/TSE.2024.343144550:9(2294-2312)Online publication date: Sep-2024
  • (2024)Data Augmentation via Subgroup Mixup for Improving FairnessICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446564(7350-7354)Online publication date: 14-Apr-2024
  • (2024)Detecting and Mitigating Algorithmic Bias in Binary Classification using Causal Modeling2024 4th International Conference on Computer Communication and Information Systems (CCCIS)10.1109/CCCIS63483.2024.00016(47-51)Online publication date: 27-Feb-2024
  • (2024)Big data and deep learning for RNA biologyExperimental & Molecular Medicine10.1038/s12276-024-01243-w56:6(1293-1321)Online publication date: 14-Jun-2024
  • (2024)Mitigating bias in artificial intelligenceFuture Generation Computer Systems10.1016/j.future.2024.02.023155:C(384-401)Online publication date: 1-Jun-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media