research-article

Falcon: Fair Active Learning Using Multi-Armed Bandits

Authors:

Steven Euijong WhangAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 17, Issue 5

Pages 952 - 965

https://doi.org/10.14778/3641204.3641207

Published: 02 May 2024 Publication History

Abstract

Biased data can lead to unfair machine learning models, highlighting the importance of embedding fairness at the beginning of data analysis, particularly during dataset curation and labeling. In response, we propose Falcon, a scalable fair active learning framework. Falcon adopts a data-centric approach that improves machine learning model fairness via strategic sample selection. Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e.g., (attribute=female, label=positive)) that are the most informative for improving fairness. However, a challenge arises since these target groups are defined using ground truth labels that are not available during sample selection. To handle this, we propose a novel trial-and-error method, where we postpone using a sample if the predicted label is different from the expected one and falls outside the target group. We also observe the trade-off that selecting more informative samples results in higher likelihood of postponing due to undesired label prediction, and the optimal balance varies per dataset. We capture the trade-off between informativeness and postpone rate as policies and propose to automatically select the best policy using adversarial multi-armed bandit methods, given their computational efficiency and theoretical guarantees. Experiments show that Falcon significantly outperforms existing fair active learning approaches in terms of fairness and accuracy and is more efficient. In particular, only Falcon supports a proper trade-off between accuracy and fairness where its maximum fairness score is 1.8--4.5x higher than the second-best results.

References

[1]

Naoki Abe and Hiroshi Mamitsuka. 1998. Query Learning Strategies Using Boosting and Bagging. In ICML. San Francisco, CA, USA, 1--9.

[2]

Jacob Abernethy, Pranjal Awasthi, Matthäus Kleindessner, Jamie Morgenstern, Chris Russell, and Jie Zhang. 2022. Active Sampling for Min-Max Fairness. In ICML.

[3]

Jacob D. Abernethy, Pranjal Awasthi, Matthäus Kleindessner, Jamie Morgenstern, Chris Russell, and Jie Zhang. 2022. Active Sampling for Min-Max Fairness. In ICML. 53--65.

[4]

Umang Aggarwal, Adrian Popescu, and Céline Hudelot. 2020. Active Learning for Imbalanced Datasets. In WACV. 1417--1426.

[5]

Umang Aggarwal, Adrian Popescu, and Céline Hudelot. 2021. Minority Class Oriented Active Learning for Imbalanced Datasets. In ICPR. 9920--9927.

[6]

Hadis Anahideh, Abolfazl Asudeh, and Saravanan Thirumuruganathan. 2022. Fair active learning. Expert Systems with Applications 199 (2022), 116981.

Digital Library

[7]

J. Angwin, J. Larson, S. Mattu, and L. Kirchner. 2016. Machine bias: There's software used across the country to predict future criminals. And its biased against blacks.

[8]

Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. 2020. Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds. In ICLR.

[9]

Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002. Finite-Time Analysis of the Multiarmed Bandit Problem. 47, 2--3 (2002), 235--256.

[10]

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2002. The Nonstochastic Multiarmed Bandit Problem. SIAM J. Comput. 32, 1 (2002), 48--77.

Digital Library

[11]

Jaeho Bang, Gaurav Tarlok Kakkar, Pramod Chunduri, Subrata Mitra, and Joy Arulraj. 2023. Seiden: Revisiting Query Processing in Video Database Systems. Proc. VLDB Endow. 16, 9 (may 2023), 2289--2301.

Digital Library

[12]

Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org. http://www.fairmlbook.org.

[13]

Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias.

[14]

Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert E. Schapire. 2011. Contextual Bandit Algorithms with Supervised Learning Guarantees. In AISTATS (JMLR Proceedings), Vol. 15. 19--26.

[15]

Sarah Bird, Miro Dudik, Richard Edgar, Brandon Horn, Roman Lutz, Vanessa Milan, Mehrnoosh Sameki, Hanna Wallach, and Kathleen Walker. 2020. Fairlearn: A toolkit for assessing and improving fairness in AI. Technical Report MSR-TR-2020-32. Microsoft.

[16]

Christopher Buss, Jasmin Mousavi, Mikhail Tokarev, Arash Termehchy, David Maier, and Stefan Lee. 2023. Effective Entity Augmentation by Querying External Data Sources. Proc. VLDB Endow. 16, 11 (jul 2023), 3404--3417.

Digital Library

[17]

Yiting Cao and Chao Lan. 2022. Fairness-Aware Active Learning for Decoupled Model. In IJCNN. 1--9.

[18]

Chengliang Chai, Jiabin Liu, Nan Tang, Guoliang Li, and Yuyu Luo. 2022. Selective Data Acquisition in the Wild for Model Charging. Proc. VLDB Endow. 15, 7 (2022), 1466--1478.

Digital Library

[19]

Irene Y. Chen, Fredrik D. Johansson, and David A. Sontag. 2018. Why Is My Classifier Discriminatory?. In NeurIPS. 3543--3554.

[20]

Alexandra Chouldechova. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 5, 2 (2017), 153--163.

[21]

Jeffrey Dastin. 2018. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters (2018).

[22]

Frances Ding, Moritz Hardt, John Miller, and Ludwig Schmidt. 2021. Retiring Adult: New Datasets for Fair Machine Learning. In NeurIPS. 6478--6490.

[23]

Sanghamitra Dutta, Dennis Wei, Hazar Yueksel, Pin-Yu Chen, Sijia Liu, and Kush R. Varshney. 2020. Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing. In ICML, Vol. 119. 2803--2813.

[24]

Ehsan Elhamifar, Guillermo Sapiro, Allen Y. Yang, and S. Shankar Sastry. 2013. A Convex Optimization Framework for Active Learning. In ICCV. 209--216.

[25]

Hossein Esfandiari, Amin Karbasi, Abbas Mehrabian, and Vahab Mirrokni. 2021. Regret Bounds for Batched Bandits. 35, 8 (May 2021), 7340--7348.

[26]

Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In KDD. 259--268.

[27]

Stefan Grafberger, Shubha Guha, Paul Groth, and Sebastian Schelter. 2023. Ml-whatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses over and Over? Proc. VLDB Endow. 16, 12 (aug 2023), 4002--4005.

Digital Library

[28]

Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In NeurIPS. 3315--3323.

[29]

Wei-Ning Hsu and Hsuan-Tien Lin. 2015. Active Learning by Learning. In AAAI. 2659--2665.

[30]

Ihab F. Ilyas and Theodoros Rekatsinas. 2022. Machine Learning and Data Cleaning: Which Serves the Other? ACM J. Data Inf. Qual. 14, 3 (2022), 13:1--13:11.

[31]

Vasileios Iosifidis and Eirini Ntoutsi. 2019. AdaFair: Cumulative Fairness Adaptive Boosting. In CIKM. ACM, 781--790.

[32]

Faisal Kamiran and Toon Calders. 2011. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1 (2011), 1--33.

Digital Library

[33]

Jung-Hun Kim, Milan Vojnovic, and Se-Young Yun. 2022. Rotting Infinitely Many-Armed Bandits. In ICML (Proceedings of Machine Learning Research), Vol. 162. PMLR, 11229--11254.

[34]

Sanjay Krishnan, Michael J Franklin, Ken Goldberg, and Eugene Wu. 2017. Boost-clean: Automated error detection and repair for machine learning. arXiv preprint arXiv:1711.01299 (2017).

[35]

Preethi Lahoti, Krishna P. Gummadi, and Gerhard Weikum. 2019. Operationalizing Individual Fairness with Pairwise Fair Representations. Proc. VLDB Endow. 13, 4 (Dec. 2019), 506--518.

Digital Library

[36]

Tor Lattimore and Csaba Szepesvári. 2020. Bandit Algorithms. Cambridge University Press.

[37]

Dong-Hyun Lee. 2013. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks.

[38]

Nir Levine, Koby Crammer, and Shie Mannor. 2017. Rotting Bandits. In NeurIPS. 3074--3083.

[39]

David D. Lewis and William A. Gale. 1994. A Sequential Algorithm for Training Text Classifiers. In SIGIR. 3--12.

[40]

Zifan Liu, Zhechun Zhou, and Theodoros Rekatsinas. 2022. Picket: guarding against corrupted data in tabular data during learning and inference. VLDB J. 31, 5 (2022), 927--955.

Digital Library

[41]

Aditya Krishna Menon and Robert C. Williamson. 2018. The cost of fairness in binary classification. In FAT, Vol. 81. 107--118.

[42]

Hieu Tat Nguyen and Arnold W. M. Smeulders. 2004. Active learning using pre-clustering. In ICML.

[43]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. JMLR 12 (2011), 2825--2830.

Digital Library

[44]

Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision. Proc. VLDB Endow. 11, 3, 269.

Digital Library

[45]

Yuji Roh, Kangwook Lee, Steven Whang, and Changho Suh. 2021. Sample Selection for Fair and Robust Training. In NeurIPS. 815--827.

[46]

Yuji Roh, Kangwook Lee, Steven Euijong Whang, and Changho Suh. 2021. Fair-Batch: Batch Selection for Model Fairness. In ICLR.

[47]

Dan Roth and Kevin Small. 2006. Margin-Based Active Learning for Structured Output Spaces. In ECML, Vol. 4212. Springer, 413--424.

Digital Library

[48]

Nicholas Roy and Andrew McCallum. 2001. Toward Optimal Active Learning Through Sampling Estimation of Error Reduction. In ICML. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 441--448.

[49]

Babak Salimi, Luke Rodriguez, Bill Howe, and Dan Suciu. 2019. Interventional fairness: Causal database repair for algorithmic fairness. In SIGMOD. 793--810.

[50]

Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In ICLR.

[51]

Burr Settles. 2012. Active Learning. Morgan & Claypool Publishers.

[52]

Burr Settles and Mark Craven. 2008. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. In EMNLP. 1070--1079.

[53]

Burr Settles, Mark Craven, and Soumya Ray. 2007. Multiple-instance Active Learning. In NIPS (Vancouver, British Columbia, Canada). Curran Associates Inc., USA, 1289--1296.

[54]

H. S. Seung, M. Opper, and H. Sompolinsky. 1992. Query by Committee. In COLT (Pittsburgh, Pennsylvania, USA). ACM, New York, NY, USA, 287--294.

[55]

C. E. Shannon. 2001. A Mathematical Theory of Communication. SIGMOBILE Mob. Comput. Commun. Rev. 5, 1 (jan 2001), 3--55.

Digital Library

[56]

Amr Sharaf, Hal Daumé III, and Renkun Ni. 2022. Promoting Fairness in Learned Models by Learning to Active Learn under Parity Constraints. In FAccT. ACM, 2149--2156.

[57]

Shubhanshu Shekhar, Greg Fields, Mohammad Ghavamzadeh, and Tara Javidi. 2021. Adaptive Sampling for Minimax Fair Classification. In NeurIPS. 24535--24544.

[58]

Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms. In NeurIPS (NIPS'12). 2951--2959.

Digital Library

[59]

Ki Hyun Tae and Steven Euijong Whang. 2021. Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models. In SIGMOD. ACM, 1771--1783.

Digital Library

[60]

Ki Hyun Tae, Hantian Zhang, Jaeyoung Park, Kexin Rong, and Steven Euijong Whang. 2023. Falcon: Fair Active Learning using MABs. https://github.com/khtae8250/Falcon/blob/main/techreport.pdf.

[61]

Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, and Neoklis Polyzotis. 2015. Seedb: Efficient data-driven visualization recommendations to support visual analytics. Proc. VLDB Endow. 8, 13, 2182.

Digital Library

[62]

Sahil Verma and Julia Rubin. 2018. Fairness definitions explained. In Fair-Ware@ICSE. 1--7.

[63]

Joannès Vermorel and Mehryar Mohri. 2005. Multi-armed Bandit Algorithms and Empirical Evaluation. In ECML. Springer Berlin Heidelberg, 437--448.

[64]

Dan Wang and Yi Shang. 2014. A new active labeling method for deep learning. In IJCNN. 112--119.

[65]

Yizao Wang, Jean-yves Audibert, and Rémi Munos. 2008. Algorithms for Infinitely Many-Armed Bandits. In NeurIPS, Vol. 21. Curran Associates, Inc.

[66]

Steven Euijong Whang, Yuji Roh, Hwanjun Song, and Jae-Gil Lee. 2023. Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective. VLDB J. (2023).

[67]

Hantian Zhang, Xu Chu, Abolfazl Asudeh, and Shamkant B. Navathe. 2021. OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning. In SIGMOD. ACM, 2076--2088.

Digital Library

[68]

Hantian Zhang, Ki Hyun Tae, Jaeyoung Park, Xu Chu, and Steven Euijong Whang. 2023. iFlipper: Label Flipping for Individual Fairness. Proc. ACM Manag. Data 1, 1 (2023), 8:1--8:26.

Digital Library

[69]

Han Zhao and Geoffrey J. Gordon. 2019. Inherent Tradeoffs in Learning Fair Representations. In NeurIPS. 15649--15659.

Recommendations

Thompson sampling for budgeted multi-armed bandits
IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence

Thompson sampling is one of the earliest randomized algorithms for multi-armed bandits (MAB). In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget. We ...
Budgeted Combinatorial Multi-Armed Bandits
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

We consider a budgeted combinatorial multi-armed bandit setting where, in every round, the algorithm selects a super-arm consisting of one or more arms. The goal is to minimize the total expected regret after all rounds within a limited budget. Existing ...
Fair active learning
Abstract
Machine learning (ML) is increasingly being used in high-stakes applications impacting society. Therefore, it is of critical importance that ML models do not propagate discrimination. Collecting accurate labeled data in societal ...
Highlights
- We introduce fair active learning to mitigate bias in limited labeled data problems.

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 17, Issue 5

January 2024

233 pages

Editors:
Meihui Zhang
Beijing Institute of Technology
,
Cyrus Shahabi
University of Southern California

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 02 May 2024

Published in PVLDB Volume 17, Issue 5

Check for updates

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
16
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)3

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents