research-article

Open access

GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

Authors:

NengHai YuAuthors Info & Claims

CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security

Pages 1196 - 1210

https://doi.org/10.1145/3658644.3670284

Published: 09 December 2024 Publication History

Abstract

Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed to assess gender bias in LLMs comprehensively. Our benchmark provides standardized and realistic evaluations, including previously overlooked gender groups such as transgender and non-binary individuals. Furthermore, we develop effective debiasing techniques that incorporate counterfactual data augmentation and specialized fine-tuning strategies to reduce gender bias in LLMs without compromising their overall performance. Extensive experiments demonstrate a significant reduction in various gender bias benchmarks, with reductions peaking at over 90% and averaging above 35% across 17 different LLMs. Importantly, these reductions come with minimal variability in mainstream language tasks, remaining below 2%. By offering a realistic assessment and tailored reduction of gender biases, we hope that our GenderCARE can represent a significant step towards achieving fairness and equity in LLMs. More details are available at https://github.com/kstanghere/GenderCARE-ccs24.

References

[1]

2023. ChatGPT. Retrieved November 28, 2023 from https://openai.com/blog/ chatgpt

[2]

2023. Gender Census 2021--2023: Worldwide Report. Retrieved November 19, 2023 from https://www.gendercensus.com/results/

[3]

2023. Gender Census 2023: Worldwide Report. Retrieved November 19, 2023 from https://www.gendercensus.com/results/2023-worldwide/

[4]

2023. GPT-3.5. Retrieved November 28, 2023 from https://platform.openai.com/ docs/models/gpt-3--5

[5]

2023. GPT-4. Retrieved November 28, 2023 from https://platform.openai.com/ docs/models/gpt-4-and-gpt-4-turbo

[6]

2023. Llama 2. Retrieved November 29, 2023 from https://ai.meta.com/llama/

[7]

2023. OpenAI's First Developer Conference. Retrieved November 19, 2023 from https://www.youtube.com/watch?v=U9mJuUkhUzk

[8]

2023. Sudowrite. Retrieved November 27, 2023 from https://www.sudowrite.com/

[9]

Annalisa Anzani, Laura Siboni, and et al. 2023. From abstinence to deviance: Sexual stereotypes associated with transgender and nonbinary individuals. Sexuality Research and Social Policy (2023), 1--17.

[10]

Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna M. Wallach. 2020. Language (Technology) is Power:ACritical Survey of "Bias" in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5--10, 2020. Association for Computational Linguistics, 5454-- 5476. https://doi.org/10.18653/V1/2020.ACL-MAIN.485

[11]

Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam Tauman Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain. 4349--4357. https://proceedings. neurips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html

[12]

Kovila P. L. Coopamootoo and Magdalene Ng. 2023. "Un-Equal Online Safety?" A Gender Analysis of Security and Privacy Protection Advice and Behaviour Patterns. In 32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9--11, 2023. USENIX Association, 5611--5628. https://www. usenix.org/conference/usenixsecurity23/presentation/coopamootoo

[13]

Marta R. Costa-jussà. 2019. An analysis of gender bias studies in natural language processing. Nat. Mach. Intell. 1, 11 (2019), 495--496. https://doi.org/10.1038/S42256-019-0105--5

[14]

Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta. 2021. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. In FAccT '21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3--10, 2021. ACM, 862--872. https://doi.org/10.1145/ 3442188.3445924

[15]

Alice Eagly, Christa Nater, and et al. 2020. Gender stereotypes have changed: A cross-temporal meta-analysis of US public opinion polls from 1946 to 2018. American psychologist 75, 3 (2020), 301.

[16]

Naomi Ellemers. 2018. Gender stereotypes. Annual review of psychology 69 (2018), 275--298.

[17]

Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, and Jonathan May. 2023. WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ Bias in Large Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9--14, 2023. Association for Computational Linguistics, 9126-- 9140. https://doi.org/10.18653/V1/2023.ACL-LONG.507

[18]

Christine Geeng, Mike Harris, Elissa M. Redmiles, and Franziska Roesner. 2022. "Like Lesbians Walking the Perimeter": Experiences of U.S. LGBTQ Folks With Online Security, Safety, and Privacy Advice. In 31st USENIX Security Symposium, USENIX Security 2022, Boston, MA, USA, August 10--12, 2022. USENIX Association, 305--322. https://www.usenix.org/conference/usenixsecurity22/presentation/ geeng

[19]

The Guardian. 2023. 'It's destroyed me completely?: Kenyan moderators decry toll of training of AI models. Retrieved November 24, 2023 from https://www.theguardian.com/technology/2023/aug/02/ai-chatbottraining- human-toll-content-moderator-meta-openai

[20]

Yue Guo, Yi Yang, and Ahmed Abbasi. 2022. Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22--27, 2022. Association for Computational Linguistics, 1012--1023. https://doi.org/10.18653/V1/2022.ACL-LONG.72

[21]

Adrienne Hancock and Gregory Haskin. 2015. Speech-language pathologists? knowledge and attitudes regarding lesbian, gay, bisexual, transgender, and queer (LGBTQ) populations. American Journal of Speech-Language Pathology 24, 2 (2015), 206--221.

[22]

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net. https: //openreview.net/forum?id=d7KBjmI3GmQ

[23]

The White House. 2021. National Strategy on Gender Equity and Equality. Retrieved November 17, 2023 from https://www.whitehouse.gov/wp-content/ uploads/2021/10/National-Strategy-on-Gender-Equity-and-Equality.pdf

[24]

The White House. 2023. Blueprint for an AI Bill of Rights. Retrieved November 15, 2023 from https://www.whitehouse.gov/ostp/ai-bill-of-rights/

[25]

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25--29, 2022. OpenReview.net. https://openreview.net/forum?id=nZeVKeeFYf9

[26]

Sayash Kapoor and Arvind Narayanan. 2023. Quantifying ChatGPT?s gender bias. Retrieved November 12, 2023 from https://www.aisnakeoil.com/p/quantifyingchatgpts- gender-bias

[27]

Byeongchang Kim, Hyunwoo Kim, and Gunhee Kim. 2019. Abstractive Summarization of Reddit Posts with Multi-level Memory Networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 2519--2531. https://doi.org/10.18653/V1/N19--1260

[28]

Svetlana Kiritchenko and Saif M. Mohammad. 2018. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5--6, 2018. Association for Computational Linguistics, 43--53. https://doi.org/10.18653/V1/S18--2005

[29]

Hadas Kotek, Rikker Dockum, and David Q. Sun. 2023. Gender bias and stereotypes in Large Language Models. In Proceedings of The ACM Collective Intelligence Conference, CI 2023, Delft, Netherlands, November 6--9, 2023. ACM, 12--24. https://doi.org/10.1145/3582269.3615599

Digital Library

[30]

Tianlin Li, Qing Guo, Aishan Liu, Mengnan Du, Zhiming Li, and Yang Liu. 2023. FAIRER: fairness as decision rationale alignment. In International Conference on Machine Learning. PMLR, 19471--19489.

[31]

Tianlin Li, Zhiming Li, Anran Li, Mengnan Du, Aishan Liu, Qing Guo, Guozhu Meng, and Yang Liu. 2023. Fairness via group contribution matching. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. 436--445.

Digital Library

[32]

Clara Meister and Ryan Cotterell. 2021. Language Model Evaluation Beyond Perplexity. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1--6, 2021. Association for Computational Linguistics, 5328--5339. https://doi.org/10.18653/V1/2021.ACL-LONG.414

[33]

Moin Nadeem, Anna Bethke, and Siva Reddy. 2021. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1--6, 2021. Association for Computational Linguistics, 5356--5371. https://doi.org/10.18653/V1/2021.ACL-LONG.416

[34]

Aurélie Névéol, Yoann Dupont, Julien Bezançon, and Karën Fort. 2022. French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22--27, 2022. Association for Computational Linguistics, 8521--8531. https://doi.org/10.18653/V1/2022.ACL-LONG.583

[35]

National Institute of Standards and Technology (NIST). 2023. Trustworthy and Responsible AI. Retrieved November 17, 2023 from https://www.nist.gov/ trustworthy-and-responsible-ai

[36]

Anaelia Ovalle, Palash Goyal, Jwala Dhamala, Zachary Jaggers, Kai-Wei Chang, Aram Galstyan, Richard S. Zemel, and Rahul Gupta. 2023. "I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12--15, 2023. ACM, 1246--1266. https://doi.org/10.1145/3593013.3594078

Digital Library

[37]

The European Parliament and of the Council. 2023. Convention on AI and Human Rights. Retrieved November 15, 2023 from https://rm.coe.int/cai-2023--18- consolidated-working-draft-framework-convention/1680abde66

[38]

Deborah A Prentice and Erica Carranza. 2002. What women and men should be, shouldn?t be, are allowed to be, and don't have to be: The contents of prescriptive gender stereotypes. Psychology of women quarterly 26, 4 (2002), 269--281.

[39]

Organizers Of QueerInAI, Anaelia Ovalle, and et al. 2023. Queer In AI: A Case Study in Community-Led Participatory AI. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12--15, 2023. ACM, 1882--1895. https://doi.org/10.1145/3593013.3594134

Digital Library

[40]

Patrick Schramowski, Cigdem Turan, Nico Andersen, Constantin A. Rothkopf, and Kristian Kersting. 2022. Large pre-trained language models contain humanlike biases of what is right and wrong to do. Nat. Mach. Intell. 4, 3 (2022), 258--268. https://doi.org/10.1038/S42256-022-00458--8

[41]

Preethi Seshadri, Pouya Pezeshkpour, and Sameer Singh. 2022. Quantifying Social Biases Using Templates is Unreliable. CoRR abs/2210.04337 (2022). https: //doi.org/10.48550/ARXIV.2210.04337

[42]

Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. 2019. The WomanWorked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3--7, 2019. Association for Computational Linguistics, 3405--3410. https://doi.org/10.18653/V1/D19--1339

[43]

Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, and Yang Zhang. 2022. Why So Toxic?: Measuring and Triggering Toxic Behavior in Open-Domain Chatbots. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7--11, 2022. ACM, 2659--2673. https: //doi.org/10.1145/3548606.3560599

Digital Library

[44]

U.S. Social Security Administration (SSA). 2022. Popular Names for individuals born in 2022. Retrieved November 20, 2023 from https://www.ssa.gov/cgi-bin/ popularnames.cgi

[45]

Gabriel Stanovsky, Noah A. Smith, and Luke Zettlemoyer. 2019. Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, 1679--1684. https://doi.org/10.18653/V1/P19--1164

[46]

Cara Tannenbaum, Robert P. Ellis, Friederike Eyssel, James Zou, and Londa Schiebinger. 2019. Sex and gender analysis improves science and engineering. Nat. 575, 7781 (2019), 137--146. https://doi.org/10.1038/S41586-019--1657--6

[47]

Aniket Vashishtha, Kabir Ahuja, and Sunayana Sitaram. 2023. On Evaluating and Mitigating Gender Biases in Multilingual Settings. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9--14, 2023. Association for Computational Linguistics, 307--318. https://doi.org/10.18653/ V1/2023.FINDINGS-ACL.21

[48]

Bertie Vidgen, Tristan Thrush, Zeerak Waseem, and Douwe Kiela. 2021. Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1--6, 2021. Association for Computational Linguistics, 1667--1682. https://doi.org/10.18653/V1/2021.ACL-LONG.132

[49]

AlexWang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the Workshop: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2018, Brussels, Belgium, November 1, 2018. Association for Computational Linguistics, 353--355. https://doi.org/10.18653/V1/W18--5446

[50]

Jun Wang, Benjamin I. P. Rubinstein, and Trevor Cohn. 2022. Measuring and Mitigating Name Biases in Neural Machine Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22--27, 2022. Association for Computational Linguistics, 2576--2590. https://doi.org/10.18653/V1/2022.ACL-LONG.184

[51]

Miranda Wei, Pardis Emami Naeini, Franziska Roesner, and Tadayoshi Kohno. 2023. Skilled or Gullible f Gender Stereotypes Related to Computer Security and Privacy. In 44th IEEE Symposium on Security and Privacy, SP 2023, San Francisco, CA, USA, May 21--25, 2023. IEEE, 2050--2067. https://doi.org/10.1109/SP46215. 2023.10179469

[52]

Jaclyn White, Sari Reisner, and et al. 2015. Transgender stigma and health: A critical review of stigma determinants, mechanisms, and interventions. Social science & medicine 147 (2015), 222--231.

[53]

Wikipedia. 2023. Category:People with non-binary gender identities. Retrieved November 24, 2023 from https://en.wikipedia.org/wiki/Category:People_with_ non-binary_gender_identities

[54]

Wikipedia. 2023. Gender Binary Entry. Retrieved November 24, 2023 from https://en.wikipedia.org/wiki/Gender_binary

[55]

Twitter (X). 2017. Sentiment140 dataset with 1.6 million tweets. Retrieved November 17, 2023 from https://kaggle.com/datasets/kazanova/sentiment140/data

[56]

Yisong Xiao, Aishan Liu, Tianlin Li, and Xianglong Liu. 2023. Latent imitator: Generating natural individual discriminatory instances for black-box fairness testing. In Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis. 829--841.

Digital Library

[57]

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1--6, 2018, Volume 2 (Short Papers). Association for Computational Linguistics, 15--20. https://doi.org/10.18653/V1/N18--2003

Cited By

Wang TZhang YQi SZhao RXia ZWeng J(2024)Security and Privacy on Generative Data in AIGC: A SurveyACM Computing Surveys10.1145/370362657:4(1-34)Online publication date: 10-Dec-2024
https://dl.acm.org/doi/10.1145/3703626

Index Terms

GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Human and societal aspects of security and privacy

Recommendations

Gender bias and stereotypes in Large Language Models
CI '23: Proceedings of The ACM Collective Intelligence Conference

Large Language Models (LLMs) have made substantial progress in the past several months, shattering state-of-the-art benchmarks in many domains. This paper investigates LLMs’ behavior with respect to gender stereotypes, a known issue for prior models. We ...
Gender bias in invention
ABSTRACT
Gender bias exists not only throughout society but also in inventing. In this study, we identify the gender of 15,752,108 inventor in USPTO to analyze gender bias in invent. There are 3,525,055 invent teams led by male inventors (who are named ...
Gender bias in artificial intelligence: the need for diversity and gender theory in machine learning
GE '18: Proceedings of the 1st International Workshop on Gender Equality in Software Engineering

Artificial intelligence is increasingly influencing the opinions and behavior of people in everyday life. However, the over-representation of men in the design of these technologies could quietly undo decades of advances in gender equality. Over ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security

December 2024

5188 pages

ISBN:9798400706363

DOI:10.1145/3658644

General Chairs:
Bo Luo
University of Kansas, USA
,
Xiaojing Liao
Indiana University Bloomington, USA
,
Jun Xu
University of Utah, USA
,
Program Chairs:
Engin Kirda
Northeastern University, USA
,
David Lie
University of Toronto, Canada

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 December 2024

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

the National Research Foundation, Singapore, and the Cyber Security Agency under its National Cybersecurity R&D Programme
the Natural Science Foundation of China
the National Research Foundation, Singapore and Infocomm Media Development Authority under its Trust Tech Funding Initiative

Conference

CCS '24

Sponsor:

SIGSAC

CCS '24: ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
435
Total Downloads

Downloads (Last 12 months)435
Downloads (Last 6 weeks)243

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang TZhang YQi SZhao RXia ZWeng J(2024)Security and Privacy on Generative Data in AIGC: A SurveyACM Computing Surveys10.1145/370362657:4(1-34)Online publication date: 10-Dec-2024
https://dl.acm.org/doi/10.1145/3703626

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten