Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3299869.3319901acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Interventional Fairness: Causal Database Repair for Algorithmic Fairness

Published: 25 June 2019 Publication History

Abstract

Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflect discrimination, suggesting a database repair problem. Existing treatments of fairness rely on statistical correlations that can be fooled by statistical anomalies, such as Simpson's paradox. Proposals for causality-based definitions of fairness can correctly model some of these situations, but they require specification of the underlying causal models. In this paper, we formalize the situation as a database repair problem, proving sufficient conditions for fair classifiers in terms of admissible variables as opposed to a complete causal model. We show that these conditions correctly capture subtle fairness violations. We then use these conditions as the basis for database repair algorithms that provide provable fairness guarantees about classifiers trained on their training labels. We evaluate our algorithms on real data, demonstrating improvement over the state of the art on multiple fairness metrics proposed in the literature while retaining high utility.

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
Chen Avin, Ilya Shpitser, and Judea Pearl. Identifiability of path-specific effects. 2005.
[3]
Leopoldo E. Bertossi. Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.
[4]
Matthew T Bodie, Miriam A Cherry, Marcia L McCormick, and Jintong Tang. The law and policy of people analytics. U. Colo. L. Rev., 88:961, 2017.
[5]
Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with independency constraints. In Data mining workshops, 2009. ICDMW'09. IEEE international conference on, pages 13--18. IEEE, 2009.
[6]
Toon Calders and Sicco Verwer. Three naive bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, 21(2):277--292, 2010.
[7]
Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. Optimized pre-processing for discrimination prevention. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 3992--4001. Curran Associates, Inc., 2017.
[8]
Bei-Hung Chang and David C Hoaglin. Meta-analysis of odds ratios: Current good practices. Medical care, 55(4):328, 2017.
[9]
Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153--163, 2017.
[10]
Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797--806. ACM, 2017.
[11]
Rachel Courtland. Bias detectives: the researchers striving to make algorithms fair. Nature, 558, 2018.
[12]
Jeffrey Dastin. Rpt-insight-amazon scraps secret ai recruiting tool that showed bias against women. Reuters, 2018. tinyhttps://www.reuters.com/article/amazoncom-jobs-automation/rpt-insight-amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSL2N1WP1RO.
[13]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214--226. ACM, 2012.
[14]
Golnoosh Farnadi, Behrouz Babaki, and Lise Getoor. Fairness-aware relational learning and inference. In Third International Workshop on Declarative Learning Based Programming (DeLBP) at thirty-second AAAI conference on Artificial Intelligence, 2018.
[15]
Golnoosh Farnadi, Behrouz Babaki, and Lise Getoor. Fairness in relational domains. In AAAI/ACM Conference on AI, Ethics, and Society, 2018.
[16]
Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 259--268. ACM, 2015.
[17]
Cédric Févotte and Jérôme Idier. Algorithms for nonnegative matrix factorization with the β-divergence. Neural computation, 23(9):2421--2456, 2011.
[18]
Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pages 498--510. ACM, 2017.
[19]
Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315--3323, 2016.
[20]
Joachim Hartung. A note on combining dependent tests of significance. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 41(7):849--855, 1999.
[21]
David Ingold and Spencer Soper. Amazon doesn't consider the race of its customers. should it? Bloomberg, 2016. www.bloomberg.com/graphics/2016-amazon-same-day/.
[22]
Faisal Kamiran and Toon Calders. Classifying without discriminating. In Computer, Control and Communication, 2009. IC4 2009. 2nd International Conference on, pages 1--6. IEEE, 2009.
[23]
Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 35--50. Springer, 2012.
[24]
Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656--666, 2017.
[25]
Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4069--4079, 2017.
[26]
Matt J. Kusner, Joshua R. Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. CoRR, abs/1703.06856, 2017.
[27]
Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism algorithm. ProPublica (5 2016), 9, 2016.
[28]
M. Lichman. Uci machine learning repository, 2013.
[29]
Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. Computing optimal repairs for functional dependencies. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10--15, 2018, pages 225--237, 2018.
[30]
Joshua R Loftus, Chris Russell, Matt J Kusner, and Ricardo Silva. Causal reasoning for algorithmic fairness. arXiv preprint arXiv:1805.05859, 2018.
[31]
Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. k-nn as an implementation of situation testing for discrimination discovery and prevention. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 502--510. ACM, 2011.
[32]
Dimitris Margaritis. Learning bayesian network model structure from data. Technical report, Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science, 2003.
[33]
Ruben Martins, Vasco Manquinho, and Inês Lynce. Open-wbo: A modular maxsat solver. In International Conference on Theory and Applications of Satisfiability Testing, pages 438--445. Springer, 2014.
[34]
Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. In Proceedings of the... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, volume 2018, page 1931. NIH Public Access, 2018.
[35]
Richard E Neapolitan et al. Learning bayesian networks, volume 38. Pearson Prentice Hall Upper Saddle River, NJ, 2004.
[36]
Judea Pearl. Causality. Cambridge university press, 2009.
[37]
Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 2014.
[38]
Judea Pearl et al. Causal inference in statistics: An overview. Statistics Surveys, 3:96--146, 2009.
[39]
Judea Pearl and Azaria Paz. Graphoids: A graph-based logic for reasoning about relevance relations. University of California (Los Angeles). Computer Science Department, 1985.
[40]
Donald B Rubin. The Use of Matched Sampling and Regression Adjustment in Observational Studies. Ph.D. Thesis, Department of Statistics, Harvard University, Cambridge, MA, 1970.
[41]
Donald B Rubin. Statistics and causal inference: Comment: Which ifs have causal answers. Journal of the American Statistical Association, 81(396):961--962, 1986.
[42]
Donald B Rubin. Comment: The design and analysis of gold standard randomized experiments. Journal of the American Statistical Association, 103(484):1350--1353, 2008.
[43]
Chris Russell, Matt J Kusner, Joshua Loftus, and Ricardo Silva. When worlds collide: integrating different counterfactual assumptions in fairness. In Advances in Neural Information Processing Systems, pages 6414--6423, 2017.
[44]
Babak Salimi, Johannes Gehrke, and Dan Suciu. Bias in olap queries: Detection, explanation, and removal. In Proceedings of the 2018 International Conference on Management of Data, pages 1021--1035. ACM, 2018.
[45]
Babak Salimi, Luke Rodriguez, Bill Howe, and Dan Suciu. Capuchin: Causal database repair for algorithmic fairness. arXiv preprint arXiv:1902.08283, 2019.
[46]
Andrew D Selbst. Disparate impact in big data policing. Ga. L. Rev., 52:109, 2017.
[47]
Camelia Simoiu, Sam Corbett-Davies, Sharad Goel, et al. The problem of infra-marginality in outcome tests for discrimination. The Annals of Applied Statistics, 11(3):1193--1216, 2017.
[48]
Florian Tramer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. Fairtest: Discovering unwarranted associations in data-driven applications. In Security and Privacy (EuroS&P), 2017 IEEE European Symposium on, pages 401--416. IEEE, 2017.
[49]
Jennifer Valentino-Devries, Jeremy Singer-Vine, and Ashkan Soltani. Websites vary prices, deals based on users' information. Wall Street Journal, 10:60--68, 2012.
[50]
Stephen A Vavasis. On the complexity of nonnegative matrix factorization. SIAM Journal on Optimization, 20(3):1364--1377, 2009.
[51]
Michael Veale, Max Van Kleek, and Reuben Binns. Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI '18, pages 440:1--440:14, New York, NY, USA, 2018. ACM.
[52]
Sahil Verma and Julia Rubin. Fairness definitions explained. In Proceedings of the International Workshop on Software Fairness, FairWare '18, pages 1--7, New York, NY, USA, 2018. ACM.
[53]
Lauren Weber and Elizabeth Dwoskin. Are workplace personality tests fair? Wall Strreet Journal, 2014.
[54]
SK Michael Wong, Cory J. Butz, and Dan Wu. On the implication problem for probabilistic conditional independency. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 30(6):785--805, 2000.
[55]
Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, and Nathan Srebro. Learning non-discriminatory predictors. In Satyen Kale and Ohad Shamir, editors, Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 1920--1953, Amsterdam, Netherlands, 07--10 Jul 2017. PMLR.
[56]
Jane Xu, Waley Zhang, Abdussalam Alawini, and Val Tannen. Provenance analysis for missing answers and integrity repairs. Data Engineering, page 39, 2018.
[57]
Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171--1180. International World Wide Web Conferences Steering Committee, 2017.
[58]
Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P. Gummadi. Fairness Constraints: Mechanisms for Fair Classification. In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 962--970, Fort Lauderdale, FL, USA, 20--22 Apr 2017. PMLR.
[59]
Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325--333, 2013.
[60]
Indre vZ liobaite, Faisal Kamiran, and Toon Calders. Handling conditional discrimination. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 992--1001. IEEE, 2011.

Cited By

View all
  • (2024)Databases Unbound: Querying All of the World's Bytes with AIProceedings of the VLDB Endowment10.14778/3685800.368591617:12(4546-4554)Online publication date: 8-Nov-2024
  • (2024)Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of MinoritiesProceedings of the VLDB Endowment10.14778/3681954.368201417:11(3470-3483)Online publication date: 30-Aug-2024
  • (2024)Falcon: Fair Active Learning Using Multi-Armed BanditsProceedings of the VLDB Endowment10.14778/3641204.364120717:5(952-965)Online publication date: 2-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2019

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. algorithmic bias
  2. algorithmic fairness
  3. algorithmic transparency
  4. causal inference
  5. database dependancies
  6. database repair
  7. fair machine leaning

Qualifiers

  • Research-article

Funding Sources

  • NSF
  • NSF III
  • NSF AITF

Conference

SIGMOD/PODS '19
Sponsor:
SIGMOD/PODS '19: International Conference on Management of Data
June 30 - July 5, 2019
Amsterdam, Netherlands

Acceptance Rates

SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)695
  • Downloads (Last 6 weeks)85
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Databases Unbound: Querying All of the World's Bytes with AIProceedings of the VLDB Endowment10.14778/3685800.368591617:12(4546-4554)Online publication date: 8-Nov-2024
  • (2024)Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of MinoritiesProceedings of the VLDB Endowment10.14778/3681954.368201417:11(3470-3483)Online publication date: 30-Aug-2024
  • (2024)Falcon: Fair Active Learning Using Multi-Armed BanditsProceedings of the VLDB Endowment10.14778/3641204.364120717:5(952-965)Online publication date: 2-May-2024
  • (2024)OTClean: Data Cleaning for Conditional Independence Violations using Optimal TransportProceedings of the ACM on Management of Data10.1145/36549632:3(1-26)Online publication date: 30-May-2024
  • (2024)FairHash: A Fair and Memory/Time-efficient HashmapProceedings of the ACM on Management of Data10.1145/36549392:3(1-29)Online publication date: 30-May-2024
  • (2024)Cleenex: Support for User Involvement during an Iterative Data Cleaning ProcessJournal of Data and Information Quality10.1145/364847616:1(1-26)Online publication date: 15-Feb-2024
  • (2024)Fair Feature Selection: A Causal PerspectiveACM Transactions on Knowledge Discovery from Data10.1145/364389018:7(1-23)Online publication date: 3-Feb-2024
  • (2024)Fairness in Machine Learning: A SurveyACM Computing Surveys10.1145/361686556:7(1-38)Online publication date: 9-Apr-2024
  • (2024)Fair Top-k Query on Alpha-Fairness2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00185(2338-2350)Online publication date: 13-May-2024
  • (2024)Explainable Disparity Compensation for Efficient Fair Ranking2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00174(2192-2204)Online publication date: 13-May-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media