Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3294996.3295155guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

Optimized pre-processing for discrimination prevention

Published: 04 December 2017 Publication History

Abstract

Non-discrimination is a recognized objective in algorithmic decision making. In this paper, we introduce a novel probabilistic formulation of data pre-processing for reducing discrimination. We propose a convex optimization for learning a data transformation with three goals: controlling discrimination, limiting distortion in individual data samples, and preserving utility. We characterize the impact of limited sample size in accomplishing this objective. Two instances of the proposed optimization are applied to datasets, including one on real-world criminal recidivism. Results show that discrimination can be greatly reduced at a small cost in classification accuracy.

References

[1]
T. Calders and I. Žliobaitė. Why unbiased computational processes can lead to discriminative decision procedures. In Discrimination and Privacy in the Information Society, pages 43-57. Springer, 2013.
[2]
A. Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. arXiv preprint arXiv:1610.07524, 2016.
[3]
S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq. Algorithmic decision making and the cost of fairness. arXiv preprint arXiv:1701.08230, 2017.
[4]
S. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1-5, 2016.
[5]
C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pages 214-226. ACM, 2012.
[6]
T. U. EEOC. Uniform guidelines on employee selection procedures. https://www.eeoc.gov/policy/docs/qanda_clarify_procedures.html, Mar. 1979.
[7]
M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. Certifying and removing disparate impact. In Proc. ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., pages 259-268, 2015.
[8]
B. Fish, J. Kun, and Á. D. Lelkes. A confidence-based approach for balancing fairness and accuracy. In Proceedings of the SIAM International Conference on Data Mining, pages 144-152. SIAM, 2016.
[9]
S. A. Friedler, C. Scheidegger, and S. Venkatasubramanian. On the (im) possibility of fairness. arXiv preprint arXiv:1609.07236, 2016.
[10]
S. Hajian. Simultaneous Discrimination Prevention and Privacy Protection in Data Publishing and Mining. PhD thesis, Universitat Rovira i Virgili, 2013. Available online: https://arxiv.org/abs/1306.6805.
[11]
S. Hajian and J. Domingo-Ferrer. A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans. Knowl. Data Eng., 25(7):1445-1459, 2013.
[12]
M. Hardt, E. Price, and N. Srebro. Equality of opportunity in supervised learning. In Adv. Neur. Inf. Process. Syst. 29, pages 3315-3323, 2016.
[13]
K. D. Johnson, D. P. Foster, and R. A. Stine. Impartial predictive modeling: Ensuring fairness in arbitrary models. arXiv preprint arXiv:1608.00528, 2016.
[14]
F. Kamiran and T. Calders. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1-33, 2012.
[15]
T. Kamishima, S. Akaho, and J. Sakuma. Fairness-aware learning through regularization approach. In Data Mining Workshops (ICDMW), IEEE 11th International Conference on, pages 643-650. IEEE, 2011.
[16]
J. Kleinberg, S. Mullainathan, and M. Raghavan. Inherent trade-offs in the fair determination of risk scores. In Proc. Innov. Theoret. Comp. Sci., 2017.
[17]
M. J. Kusner, J. R. Loftus, C. Russell, and R. Silva. Counterfactual fairness. arXiv preprint arXiv:1703.06856, 2017.
[18]
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In IEEE 23rd International Conference on Data Engineering, pages 106-115. IEEE, 2007.
[19]
M. Lichman. UCI machine learning repository, 2013. URL http://archive.ics.uci.edu/ml.
[20]
J. Pearl. Comment: understanding simpson's paradox. The American Statistician, 68(1):8-13, 2014.
[21]
D. Pedreschi, S. Ruggieri, and F. Turini. Discrimination-aware data mining. In Proc. ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., pages 560-568. ACM, 2008.
[22]
D. Pedreschi, S. Ruggieri, and F. Turini. A study of top-k measures for discrimination discovery. In Proc. ACM Symp. Applied Comput., pages 126-131, 2012.
[23]
ProPublica. COMPAS Recidivism Risk Score Data and Analysis. https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis, 2017.
[24]
S. Ruggieri. Using t-closeness anonymity to control for non-discrimination. Trans. Data Privacy, 7(2):99-129, 2014.
[25]
M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. arXiv preprint arXiv:1610.08452, 2016.
[26]
R. Zemel, Y. L. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In Proc. Int. Conf. Mach. Learn., pages 325-333, 2013.
[27]
Z. Zhang and D. B. Neill. Identifying significant predictive bias in classifiers. In Proceedings of the NIPS Workshop on Interpretable Machine Learning in Complex Systems, 2016. Available online: https://arxiv.org/abs/1611.08292.
[28]
I. Žliobaitė, F. Kamiran, and T. Calders. Handling conditional discrimination. In Proc. IEEE Int. Conf. Data Mining, pages 992-1001, 2011.

Cited By

View all
  • (2024)MirrorFair: Fixing Fairness Bugs in Machine Learning Software via Counterfactual PredictionsProceedings of the ACM on Software Engineering10.1145/36608011:FSE(2121-2143)Online publication date: 12-Jul-2024
  • (2024)Leveraging Simulation Data to Understand Bias in Predictive Models of Infectious Disease SpreadACM Transactions on Spatial Algorithms and Systems10.1145/366063110:2(1-22)Online publication date: 1-Jul-2024
  • (2024)OTClean: Data Cleaning for Conditional Independence Violations using Optimal TransportProceedings of the ACM on Management of Data10.1145/36549632:3(1-26)Online publication date: 30-May-2024
  • Show More Cited By
  1. Optimized pre-processing for discrimination prevention

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems
    December 2017
    7104 pages

    Publisher

    Curran Associates Inc.

    Red Hook, NY, United States

    Publication History

    Published: 04 December 2017

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)135
    • Downloads (Last 6 weeks)44
    Reflects downloads up to 28 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)MirrorFair: Fixing Fairness Bugs in Machine Learning Software via Counterfactual PredictionsProceedings of the ACM on Software Engineering10.1145/36608011:FSE(2121-2143)Online publication date: 12-Jul-2024
    • (2024)Leveraging Simulation Data to Understand Bias in Predictive Models of Infectious Disease SpreadACM Transactions on Spatial Algorithms and Systems10.1145/366063110:2(1-22)Online publication date: 1-Jul-2024
    • (2024)OTClean: Data Cleaning for Conditional Independence Violations using Optimal TransportProceedings of the ACM on Management of Data10.1145/36549632:3(1-26)Online publication date: 30-May-2024
    • (2024)FairHash: A Fair and Memory/Time-efficient HashmapProceedings of the ACM on Management of Data10.1145/36549392:3(1-29)Online publication date: 30-May-2024
    • (2024)Integrating Fair Representation Learning with Fairness Regularization for Intersectional Group FairnessProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679802(560-569)Online publication date: 21-Oct-2024
    • (2024)Wise Fusion: Group Fairness Enhanced Rank FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679649(163-174)Online publication date: 21-Oct-2024
    • (2024)FairBalance: How to Achieve Equalized Odds With Data Pre-ProcessingIEEE Transactions on Software Engineering10.1109/TSE.2024.343144550:9(2294-2312)Online publication date: 22-Jul-2024
    • (2023)Loss balancing for fair supervised learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619075(16271-16290)Online publication date: 23-Jul-2023
    • (2023)FEAMOEProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/55(492-500)Online publication date: 19-Aug-2023
    • (2023)Through the Fairness Lens: Experimental Analysis and Evaluation of Entity MatchingProceedings of the VLDB Endowment10.14778/3611479.361152516:11(3279-3292)Online publication date: 24-Aug-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media