Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Should Fairness be a Metric or a Model? A Model-based Framework for Assessing Bias in Machine Learning Pipelines

Published: 22 March 2024 Publication History

Abstract

Fairness measurement is crucial for assessing algorithmic bias in various types of machine learning (ML) models, including ones used for search relevance, recommendation, personalization, talent analytics, and natural language processing. However, the fairness measurement paradigm is currently dominated by fairness metrics that examine disparities in allocation and/or prediction error as univariate key performance indicators (KPIs) for a protected attribute or group. Although important and effective in assessing ML bias in certain contexts such as recidivism, existing metrics don’t work well in many real-world applications of ML characterized by imperfect models applied to an array of instances encompassing a multivariate mixture of protected attributes, that are part of a broader process pipeline. Consequently, the upstream representational harm quantified by existing metrics based on how the model represents protected groups doesn’t necessarily relate to allocational harm in the application of such models in downstream policy/decision contexts. We propose FAIR-Frame, a model-based framework for parsimoniously modeling fairness across multiple protected attributes in regard to the representational and allocational harm associated with the upstream design/development and downstream usage of ML models. We evaluate the efficacy of our proposed framework on two testbeds pertaining to text classification using pretrained language models. The upstream testbeds encompass over fifty thousand documents associated with twenty-eight thousand users, seven protected attributes and five different classification tasks. The downstream testbeds span three policy outcomes and over 5.41 million total observations. Results in comparison with several existing metrics show that the upstream representational harm measures produced by FAIR-Frame and other metrics are significantly different from one another, and that FAIR-Frame’s representational fairness measures have the highest percentage alignment and lowest error with allocational harm observed in downstream applications. Our findings have important implications for various ML contexts, including information retrieval, user modeling, digital platforms, and text classification, where responsible and trustworthy AI is becoming an imperative.

References

[1]
Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Transactions on Information Systems 26, 3 (2008), 1–34.
[2]
Ahmed Abbasi, Roger H. L. Chiang, and Jennifer Xu. 2023. Data science for social good. Journal of the Association for Information Systems 24, 6 (2023), 1439–1458.
[3]
Ahmed Abbasi, David Dobolyi, John P. Lalor, Richard G. Netemeyer, Kendall Smith, and Yi Yang. 2021. Constructing a psychometric testbed for fair natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3748–3758.
[4]
Ahmed Abbasi, Stephen France, Zhu Zhang, and Hsinchun Chen. 2010. Selecting attributes for sentiment classification using feature relation networks. IEEE Transactions on Knowledge and Data Engineering 23, 3 (2010), 447–462.
[5]
A. Abbasi, J. Li, G. Clifford, and H. Taylor. 2018. Make ‘fairness by design’ part of machine learning. Harvard Business Review, August 1. https://hbr.org/2018/08/makefairness-by-design-part-of-machine-learning
[6]
Ahmed Abbasi, Suprateek Sarker, and Roger H. L. Chiang. 2016. Big data research in information systems: Toward an inclusive research agenda. Journal of the Association for Information Systems 17, 2 (2016), 3.
[7]
Ajay Agrawal, Joshua Gans, and Avi Goldfarb. 2018. Prediction Machines: The Simple Economics of Artificial Intelligence. Harvard Business Press.
[8]
Faizan Ahmad, Ahmed Abbasi, Brent Kitchens, Donald Adjeroh, and Daniel Zeng. 2022. Deep learning for adverse event detection from web search. IEEE Transactions on Knowledge and Data Engineering 34, 6 (2022), 2681–2695.
[9]
Faizan Ahmad, Ahmed Abbasi, Jingjing Li, David G. Dobolyi, Richard G. Netemeyer, Gari D. Clifford, and Hsinchun Chen. 2020. A deep learning architecture for psychometric natural language processing. ACM Transactions on Information Systems 38, 1 (2020), 1–29.
[10]
H. Akaike. 1973. Information theory and an extension of the maximum likelihood principle. In Proceedings of the 2nd International Symposium on Information Theory, 1973. Akademiai Kiado.
[11]
Jaime Arguello and Bogeum Choi. 2019. The effects of working memory, perceptual speed, and inhibition in aggregated search. ACM Transactions on Information Systems 37, 3 (2019), 1–34.
[12]
Yazeed Awwad, Richard Fletcher, Daniel Frey, Amit Gandhi, Maryam Najafian, and Mike Teodorescu. 2020. Exploring Fairness in Machine Learning for International Development. Technical Report. CITE MIT D-Lab.
[13]
Solon Barocas, Kate Crawford, Aaron Shapiro, and Hanna Wallach. 2017. The problem with bias: Allocative versus representational harms in machine learning. In Proceedings of the 9th Annual Conference of the Special Interest Group for Computing, Information and Society.
[14]
Solon Barocas and Andrew D. Selbst. 2016. Big data’s disparate impact. California Law Review 104, 3 (2016), 671–732.
[15]
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 610–623.
[16]
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13 (2012), 281–305.
[17]
Richard A. Berk, Arun Kumar Kuchibhotla, and Eric Tchetgen Tchetgen. 2022. Fair risk algorithms. Annual Review of Statistics and Its Application 10 (2022), 165–187.
[18]
Michael L. Bernauer. 2017. Mlbernauer/drugstandards: Python library for standardizing drug names (v0.1). Zenodo.
[19]
S. L. Blodgett, S. Barocas, H. Daumé III, and H. Wallach. 2020. Language (Technology) is power: A critical survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5454–5476.
[20]
Su Lin Blodgett, Lisa Green, and Brendan O’Connor. 2016. Demographic dialectal variation in social media: A case study of african-american english. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1119–1130.
[21]
Kenneth A. Bollen and Mark D. Noble. 2011. Structural equation models and the quantification of behavior. Proceedings of the National Academy of Sciences 108, supplement_3 (2011), 15639–15646.
[22]
Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in Neural Information Processing Systems 29 (2016).
[23]
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, S. Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie S. Chen, Kathleen A. Creel, Jared Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren E. Gillespie, Karan Goel, Noah D. Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas F. Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, O. Khattab, Pang Wei Koh, Mark S. Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir P. Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Benjamin Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, J. F. Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Robert Reich, Hongyu Ren, Frieda Rong, Yusuf H. Roohani, Camilo Ruiz, Jack Ryan, Christopher R’e, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishna Parasuram Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei A. Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, and Percy Liang. 2021. On the opportunities and risks of foundation models. ArXiv (2021). Retrieved from https://crfm.stanford.edu/assets/report.pdf
[24]
Avishek Bose and William Hamilton. 2019. Compositional fairness constraints for graph embeddings. In Proceedings of the International Conference on Machine Learning. PMLR, 715–724.
[25]
Amanda Bower, Sarah N. Kitchen, Laura Niss, Martin J. Strauss, Alexander Vargas, and Suresh Venkatasubramanian. 2017. Fair pipelines. In Proceedings of the Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML).
[26]
Donald E. Brown, Ahmed Abbasi, and Raymond Y. K. Lau. 2015. Predictive analytics: Predictive modeling at the micro level. IEEE Intelligent Systems 30, 3 (2015), 6–8.
[27]
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the Conference on Fairness, Accountability and Transparency. PMLR, 77–91.
[28]
Robin Burke. 2017. Multisided fairness for recommendation. 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML’17).
[29]
Gordon Burtch, Yili Hong, Ravi Bapna, and Vladas Griskevicius. 2018. Stimulating online reviews by combining financial incentives and social norms. Management Science 64, 5 (2018), 2065–2082.
[30]
Ángel Alexander Cabrera, Will Epperson, Fred Hohman, Minsuk Kahng, Jamie Morgenstern, and Duen Horng Chau. 2019. FairVis: Visual analytics for discovering intersectional bias in machine learning. In Proceedings of the 2019 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 46–56.
[31]
Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183–186.
[32]
Jacqueline G. Cavazos, P Jonathon Phillips, Carlos D. Castillo, and Alice J. O’Toole. 2020. Accuracy comparison across face recognition algorithms: Where are we on measuring race bias? IEEE Transactions on Biometrics, Behavior, and Identity Science 3, 1 (2020), 101–111.
[33]
Tessa E. S. Charlesworth, Aylin Caliskan, and Mahzarin R. Banaji. 2022. Historical representations of social groups across 200 years of word embeddings from Google Books. Proceedings of the National Academy of Sciences 119, 28 (2022), e2121798119.
[34]
Jiawei Chen, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. 2023. Bias and debias in recommender system: A survey and future directions. ACM Trans. Inf. Syst. 41, 3, Article 67 (July 2023), 39 pages.
[35]
Alexandra Chouldechova and Aaron Roth. 2020. A snapshot of the frontiers of fairness in machine learning. Communications of the ACM 63, 5 (2020), 82–89.
[36]
Maria De-Arteaga, Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, and Adam Tauman Kalai. 2019. Bias in bios: A case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 120–128.
[37]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. DOI:
[38]
Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Science Advances 4, 1 (2018), eaao5580.
[39]
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey. The Journal of Machine Learning Research 20, 1 (2019), 1997–2017.
[40]
Vitalii Emelianov, George Arvanitakis, Nicolas Gast, Krishna P. Gummadi, and Patrick Loiseau. 2019. The price of local fairness in multistage selection. In Proceedings of the IJCAI-2019-28th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 5836–5842.
[41]
Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2019. A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 329–338.
[42]
Batya Friedman and Helen Nissenbaum. 1996. Bias in computer systems. ACM Transactions on Information Systems 14, 3 (1996), 330–347.
[43]
Tianjun Fu, Ahmed Abbasi, Daniel Zeng, and Hsinchun Chen. 2012. Sentimental spidering: Leveraging opinion information in focused crawlers. ACM Transactions on Information Systems 30, 4 (2012), 1–30.
[44]
Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences 115, 16 (2018), E3635–E3644.
[45]
Seraphina Goldfarb-Tarrant, Rebecca Marchant, Ricardo Muñoz Sánchez, Mugdha Pandya, and Adam Lopez. 2021. Intrinsic bias metrics do not correlate with application bias. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1926–1940.
[46]
Yue Guo, Yi Yang, and Ahmed Abbasi. 2022. Auto-debias: Debiasing masked language models with automated biased prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1012–1023.
[47]
Xiangnan He, Zhaochun Ren, Emine Yilmaz, Marc Najork, and Tat-Seng Chua. 2021. Graph technologies for user modeling and recommendation: Introduction to the special issue - part 1. ACM Transactions on Information Systems 40, 2, Article 21 (2021), 5 pages. DOI:
[48]
Xiangnan He, Zhaochun Ren, Emine Yilmaz, Marc Najork, and Tat-Seng Chua. 2021. Introduction to the special section on graph technologies for user modeling and recommendation, part 2. ACM Transactions on Information Systems 40, 3, Article 42 (2021), 5 pages. DOI:
[49]
Xin He, Kaiyong Zhao, and Xiaowen Chu. 2021. AutoML: A survey of the state-of-the-art. Knowledge-Based Systems 212 (2021), 106622.
[50]
Jake M. Hofman, Duncan J. Watts, Susan Athey, Filiz Garip, Thomas L. Griffiths, Jon Kleinberg, Helen Margetts, Sendhil Mullainathan, Matthew J. Salganik, Simine Vazire, Alessandro Vespignani, and Tal Yarkoni. 2021. Integrating explanation and prediction in computational social science. Nature 595, 7866 (2021), 181–188.
[51]
Masahiro Kaneko and Danushka Bollegala. 2019. Gender-preserving debiasing for pre-trained word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1641–1650.
[52]
Masahiro Kaneko and Danushka Bollegala. 2021. Debiasing pre-trained contextualised embeddings. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 1256–1266.
[53]
Svetlana Kiritchenko and Saif Mohammad. 2018. Examining gender and race bias in two hundred sentiment analysis systems. In Proceedings of the 7th Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics, New Orleans, Louisiana, 43–53. DOI:
[54]
Brent Kitchens, David Dobolyi, Jingjing Li, and Ahmed Abbasi. 2018. Advanced customer analytics: Strategic value through integration of relationship-oriented big data. Journal of Management Information Systems 35, 2 (2018), 540–574.
[55]
Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Ziad Obermeyer. 2015. Prediction policy problems. American Economic Review 105, 5 (2015), 491–495.
[56]
Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences 117, 14 (2020), 7684–7689.
[57]
John Lalor, Yi Yang, Kendall Smith, Nicole Forsgren, and Ahmed Abbasi. 2022. Benchmarking intersectional biases in NLP. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 3598–3609. DOI:
[58]
Min Kyung Lee, Anuraag Jain, Hea Jin Cha, Shashank Ojha, and Daniel Kusbit. 2019. Procedural justice in algorithmic fairness: Leveraging transparency and outcome control for fair algorithmic mediation. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–26.
[59]
Bo Li, Peng Qi, Bo Liu, Shuai Di, Jingen Liu, Jiquan Pei, Jinfeng Yi, and Bowen Zhou. 2023. Trustworthy ai: From principles to practices. ACM Computing Surveys 55, 9 (2023), 1–46.
[60]
Nut Limsopatham and Nigel Collier. 2016. Normalising medical concepts in social media texts by learning semantic representation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1014–1023.
[61]
Carolyn E. Lipscomb. 2000. Medical subject headings (MeSH). Bulletin of the Medical Library Association 88, 3 (2000), 265.
[62]
Dugang Liu, Pengxiang Cheng, Zinan Lin, Xiaolian Zhang, Zhenhua Dong, Rui Zhang, Xiuqiang He, Weike Pan, and Zhong Ming. 2023. Bounding system-induced biases in recommender systems with a randomized dataset. ACM Trans. Inf. Syst. 41, 4, Article 108 (October 2023), 26 pages.
[63]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https://arxiv.org/abs/1907.11692
[64]
Zhongzhou Liu, Yuan Fang, and Min Wu. 2023. Mitigating popularity bias for users and items with fairness-centric adaptive recommendation. ACM Trans. Inf. Syst. 41, 3, Article 55 (July 2023), 27 pages.
[65]
Kristian Lum, Yunfeng Zhang, and Amanda Bower. 2022. De-biasing “bias” measurement. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 379–389.
[66]
Jing Ma, Ruocheng Guo, Mengting Wan, Longqi Yang, Aidong Zhang, and Jundong Li. 2022. Learning fair node representations with graph counterfactual fairness. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 695–703.
[67]
David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018. Learning adversarially fair and transferable representations. In Proceedings of the International Conference on Machine Learning. PMLR, 3384–3393.
[68]
Masoud Mansoury, Himan Abdollahpouri, Mykola Pechenizkiy, Bamshad Mobasher, and Robin Burke. 2022. A graph-based approach for mitigating multi-sided exposure bias in recommender systems. ACM Transactions on Information Systems 40, 2, Article 32 (2022), 31 pages. DOI:
[69]
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. ACM Computing Surveys 54, 6 (2021), 1–35.
[70]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 (2013).
[71]
L. Morse, M. H. M. Teodorescu, Y. Awwad, et al. 2022. Do the ends justify the means? Variation in the distributive and procedural fairness of machine learning algorithms. J. Bus Ethics 181 (2022), 1083–1095.
[72]
Arvind Narayanan. 2018. Translation tutorial: 21 fairness definitions and their politics. In Proceedings of the Conference on Fairness, Accountability, and Transparency, New York, USA. 3.
[73]
Richard G. Netemeyer, David G. Dobolyi, Ahmed Abbasi, Gari Clifford, and Herman Taylor. 2020. Health literacy, health numeracy, and trust in doctor: Effects on key patient health outcomes. Journal of Consumer Affairs 54, 1 (2020), 3–42.
[74]
A. Ng. 2011. Advice for applying machine learning. Stanford Univ., Stanford, CA, USA, Tech. Rep., 2011. [Online]. Available: http://cs229.stanford.edu/materials/ML-advice.pdf
[75]
Harrie Oosterhuis. 2023. Doubly robust estimation for correcting position bias in click feedback for unbiased learning to rank. ACM Trans. Inf. Syst. 41, 3, Article 61 (July 2023), 33 pages.
[76]
Aditya Pal, F. Maxwell Harper, and Joseph A. Konstan. 2012. Exploring question selection bias to identify experts and potential experts in community question answering. ACM Transactions on Information Systems 30, 2, Article 10 (2012), 28 pages. DOI:
[77]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532–1543.
[78]
Dana Pessach and Erez Shmueli. 2022. A review on fairness in machine learning. ACM Computing Surveys 55, 3 (2022), 1–44.
[79]
Foster Provost and Tom Fawcett. 2013. Data Science for Business: What You Need to Know About Data Mining and Data-analytic Thinking. O’Reilly Media, Inc.
[80]
Bogdana Rakova, Jingying Yang, Henriette Cramer, and Rumman Chowdhury. 2021. Where responsible AI meets reality: Practitioner perspectives on enablers for shifting organizational practices. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–23.
[81]
Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Xiaojiang Chen, and Xin Wang. 2021. A comprehensive survey of neural architecture search: Challenges and solutions. ACM Computing Surveys 54, 4 (2021), 1–34.
[82]
Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.
[83]
Tetsuya Sakai, Jin Young Kim, and Inho Kang. 2023. A versatile framework for evaluating ranked lists in terms of group fairness and relevance. ACM Trans. Inf. Syst. 42, 1, Article 11 (January 2024), 36 pages.
[84]
Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A. Smith. 2019. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1668–1678.
[85]
Deven Santosh Shah, H. Andrew Schwartz, and Dirk Hovy. 2020. Predictive biases in natural language processing models: A conceptual framework and overview. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5248–5264. DOI:
[86]
Galit Shmueli. 2010. To explain or to predict? Statist. Sci. 25, 3 (2010), 289–310.
[87]
Galit Shmueli and Otto Koppius. 2011. Predictive analytics in information systems research. Management Information Systems Quarterly 35, 3 (2011), 553–572.
[88]
Herbert A. Simon. 1998. The science of design: Creating the artificial. Design Issues 4, 1/2 (1988), 67–82.
[89]
Sriram Somanchi, Ahmed Abbasi, Ken Kelley, David Dobolyi, and Ted Tao Yuan. 2023. Examining user heterogeneity in digital experiments. ACM Trans. Inf. Syst. 41, 4, Article 100 (October 2023), 34 pages.
[90]
Ryan Steed, Swetasudha Panda, Ari Kobren, and Michael Wick. 2022. Upstream mitigation is not all you need: Testing the bias transfer hypothesis in pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 3524–3542.
[91]
Shivashankar Subramanian, Xudong Han, Timothy Baldwin, Trevor Cohn, and Lea Frermann. 2021. Evaluating debiasing techniques for intersectional biases. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2492–2498.
[92]
Yi Chern Tan and L. Elisa Celis. 2019. Assessing social and intersectional biases in contextualized word representations. Advances in Neural Information Processing Systems 32 (2019).
[93]
Yla R. Tausczik and James W. Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology 29, 1 (2010), 24–54.
[94]
Mike H. M. Teodorescu, et al. 2021. Failures of fairness in automation require a deeper understanding of human-ML augmentation. Management Information Systems Quarterly 45, 3 (2021), 1483–1500.
[95]
Kelsey Urgo and Jaime Arguello. 2022. Understanding the “Pathway” towards a searcher’s learning objective. ACM Transactions on Information Systems 40, 4 (2022), 1–43.
[96]
Yifan Wang, Weizhi Ma, Min Zhang, Yiqun Liu, and Shaoping Ma. 2023. A survey on the fairness of recommender systems. ACM Trans. Inf. Syst. 41, 3, Article 52 (July 2023), 43 pages.
[97]
Ryen W. White and Eric Horvitz. 2009. Cyberchondria: Studies of the escalation of medical concerns in web search. ACM Transactions on Information Systems 27, 4 (2009), 1–37.
[98]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 38–45.
[99]
J. M. Wooldridge. 2009. Omitted variable bias: the simple case. Introductory Econometrics: A Modern Approach. Mason, OH: Cengage Learning. 89–93.
[100]
Haolun Wu, Chen Ma, Bhaskar Mitra, Fernando Diaz, and Xue Liu. 2023. A multi-objective optimization framework for multi-stakeholder fairness-aware recommendation. ACM Transactions on Information Systems 41, 2, Article 47 (2023), 29 pages. DOI:
[101]
Le Wu, Lei Chen, Pengyang Shao, Richang Hong, Xiting Wang, and Meng Wang. 2021. Learning fair representations for recommendation: A graph-based perspective. In Proceedings of the Web Conference 2021. 2198–2208.
[102]
Heng Xu and Nan Zhang. 2022. Goal orientation for fair machine learning algorithms (December 12, 2022). Available at SSRN: https://ssrn.com/abstract=4300581
[103]
Forest Yang, Mouhamadou Cisse, and Sanmi Koyejo. 2020. Fairness with overlapping groups; A probabilistic perspective. Advances in Neural Information Processing Systems 33 (2020), 4067–4078.
[104]
Kai Yang, Raymond Y. K. Lau, and Ahmed Abbasi. 2023. Getting personal: A deep learning artifact for text-based measurement of personality. Information Systems Research 34, 1 (2023), 194–222.
[105]
Tal Yarkoni and Jacob Westfall. 2017. Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science 12, 6 (2017), 1100–1122.
[106]
Han Zhang, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen. 2023. Contrastive learning for legal judgment prediction. ACM Trans. Inf. Syst. 41, 4, Article 113 (October 2023), 25 pages.
[107]
N. Zhang and H. Xu. 2024. Fairness of ratemaking for catastrophe insurance: Lessons from machine learning. Information Systems Research, Forthcoming.
[108]
Z. Zhao et al. 2023. Popularity bias is not always evil: Disentangling benign and harmful bias for recommendation. In IEEE Transactions on Knowledge and Data Engineering, 35, 10 (2023), 9920–9931, 1 Oct. 2023. DOI:
[109]
Fan Zhou, Yuzhou Mao, Liu Yu, Yi Yang, and Ting Zhong. 2023. Causal-debias: Unifying debiasing in pretrained language models and fine-tuning via causal invariant learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 4227–4241. Retrieved from https://aclanthology.org/2023.acl-long.232

Cited By

View all
  • (2024)Preparedness and Response in the Century of Disasters: Overview of Information Systems Research FrontiersInformation Systems Research10.1287/isre.2024.intro.v35.n235:2(460-468)Online publication date: Jun-2024
  • (2024)Pathways for Design Research on Artificial IntelligenceInformation Systems Research10.1287/isre.2024.editorial.v35.n235:2(441-459)Online publication date: Jun-2024
  • (2024)A Multifaceted Survey on Federated Learning: Fundamentals, Paradigm Shifts, Practical Issues, Recent Developments, Partnerships, Trade-Offs, Trustworthiness, and Ways ForwardIEEE Access10.1109/ACCESS.2024.341306912(84643-84679)Online publication date: 2024

Index Terms

  1. Should Fairness be a Metric or a Model? A Model-based Framework for Assessing Bias in Machine Learning Pipelines

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 42, Issue 4
      July 2024
      751 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/3613639
      • Editor:
      • Min Zhang
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 March 2024
      Online AM: 23 January 2024
      Accepted: 03 January 2024
      Revised: 20 November 2023
      Received: 30 March 2023
      Published in TOIS Volume 42, Issue 4

      Check for updates

      Author Tags

      1. Machine learning fairness
      2. algorithmic bias
      3. model framework
      4. prediction and explanation
      5. AI governance
      6. machine learning pipelines

      Qualifiers

      • Research-article

      Funding Sources

      • U.S. NSF
      • Kemper Faculty Award

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)808
      • Downloads (Last 6 weeks)170
      Reflects downloads up to 18 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Preparedness and Response in the Century of Disasters: Overview of Information Systems Research FrontiersInformation Systems Research10.1287/isre.2024.intro.v35.n235:2(460-468)Online publication date: Jun-2024
      • (2024)Pathways for Design Research on Artificial IntelligenceInformation Systems Research10.1287/isre.2024.editorial.v35.n235:2(441-459)Online publication date: Jun-2024
      • (2024)A Multifaceted Survey on Federated Learning: Fundamentals, Paradigm Shifts, Practical Issues, Recent Developments, Partnerships, Trade-Offs, Trustworthiness, and Ways ForwardIEEE Access10.1109/ACCESS.2024.341306912(84643-84679)Online publication date: 2024

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media