Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3658644.3670284acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Open access

GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

Published: 09 December 2024 Publication History

Abstract

Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed to assess gender bias in LLMs comprehensively. Our benchmark provides standardized and realistic evaluations, including previously overlooked gender groups such as transgender and non-binary individuals. Furthermore, we develop effective debiasing techniques that incorporate counterfactual data augmentation and specialized fine-tuning strategies to reduce gender bias in LLMs without compromising their overall performance. Extensive experiments demonstrate a significant reduction in various gender bias benchmarks, with reductions peaking at over 90% and averaging above 35% across 17 different LLMs. Importantly, these reductions come with minimal variability in mainstream language tasks, remaining below 2%. By offering a realistic assessment and tailored reduction of gender biases, we hope that our GenderCARE can represent a significant step towards achieving fairness and equity in LLMs. More details are available at https://github.com/kstanghere/GenderCARE-ccs24.

References

[1]
2023. ChatGPT. Retrieved November 28, 2023 from https://openai.com/blog/ chatgpt
[2]
2023. Gender Census 2021--2023: Worldwide Report. Retrieved November 19, 2023 from https://www.gendercensus.com/results/
[3]
2023. Gender Census 2023: Worldwide Report. Retrieved November 19, 2023 from https://www.gendercensus.com/results/2023-worldwide/
[4]
2023. GPT-3.5. Retrieved November 28, 2023 from https://platform.openai.com/ docs/models/gpt-3--5
[5]
2023. GPT-4. Retrieved November 28, 2023 from https://platform.openai.com/ docs/models/gpt-4-and-gpt-4-turbo
[6]
2023. Llama 2. Retrieved November 29, 2023 from https://ai.meta.com/llama/
[7]
2023. OpenAI's First Developer Conference. Retrieved November 19, 2023 from https://www.youtube.com/watch?v=U9mJuUkhUzk
[8]
2023. Sudowrite. Retrieved November 27, 2023 from https://www.sudowrite.com/
[9]
Annalisa Anzani, Laura Siboni, and et al. 2023. From abstinence to deviance: Sexual stereotypes associated with transgender and nonbinary individuals. Sexuality Research and Social Policy (2023), 1--17.
[10]
Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna M. Wallach. 2020. Language (Technology) is Power:ACritical Survey of "Bias" in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5--10, 2020. Association for Computational Linguistics, 5454-- 5476. https://doi.org/10.18653/V1/2020.ACL-MAIN.485
[11]
Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam Tauman Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain. 4349--4357. https://proceedings. neurips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html
[12]
Kovila P. L. Coopamootoo and Magdalene Ng. 2023. "Un-Equal Online Safety?" A Gender Analysis of Security and Privacy Protection Advice and Behaviour Patterns. In 32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9--11, 2023. USENIX Association, 5611--5628. https://www. usenix.org/conference/usenixsecurity23/presentation/coopamootoo
[13]
Marta R. Costa-jussà. 2019. An analysis of gender bias studies in natural language processing. Nat. Mach. Intell. 1, 11 (2019), 495--496. https://doi.org/10.1038/S42256-019-0105--5
[14]
Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta. 2021. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. In FAccT '21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3--10, 2021. ACM, 862--872. https://doi.org/10.1145/ 3442188.3445924
[15]
Alice Eagly, Christa Nater, and et al. 2020. Gender stereotypes have changed: A cross-temporal meta-analysis of US public opinion polls from 1946 to 2018. American psychologist 75, 3 (2020), 301.
[16]
Naomi Ellemers. 2018. Gender stereotypes. Annual review of psychology 69 (2018), 275--298.
[17]
Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, and Jonathan May. 2023. WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ Bias in Large Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9--14, 2023. Association for Computational Linguistics, 9126-- 9140. https://doi.org/10.18653/V1/2023.ACL-LONG.507
[18]
Christine Geeng, Mike Harris, Elissa M. Redmiles, and Franziska Roesner. 2022. "Like Lesbians Walking the Perimeter": Experiences of U.S. LGBTQ Folks With Online Security, Safety, and Privacy Advice. In 31st USENIX Security Symposium, USENIX Security 2022, Boston, MA, USA, August 10--12, 2022. USENIX Association, 305--322. https://www.usenix.org/conference/usenixsecurity22/presentation/ geeng
[19]
The Guardian. 2023. 'It's destroyed me completely?: Kenyan moderators decry toll of training of AI models. Retrieved November 24, 2023 from https://www.theguardian.com/technology/2023/aug/02/ai-chatbottraining- human-toll-content-moderator-meta-openai
[20]
Yue Guo, Yi Yang, and Ahmed Abbasi. 2022. Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22--27, 2022. Association for Computational Linguistics, 1012--1023. https://doi.org/10.18653/V1/2022.ACL-LONG.72
[21]
Adrienne Hancock and Gregory Haskin. 2015. Speech-language pathologists? knowledge and attitudes regarding lesbian, gay, bisexual, transgender, and queer (LGBTQ) populations. American Journal of Speech-Language Pathology 24, 2 (2015), 206--221.
[22]
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net. https: //openreview.net/forum?id=d7KBjmI3GmQ
[23]
The White House. 2021. National Strategy on Gender Equity and Equality. Retrieved November 17, 2023 from https://www.whitehouse.gov/wp-content/ uploads/2021/10/National-Strategy-on-Gender-Equity-and-Equality.pdf
[24]
The White House. 2023. Blueprint for an AI Bill of Rights. Retrieved November 15, 2023 from https://www.whitehouse.gov/ostp/ai-bill-of-rights/
[25]
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25--29, 2022. OpenReview.net. https://openreview.net/forum?id=nZeVKeeFYf9
[26]
Sayash Kapoor and Arvind Narayanan. 2023. Quantifying ChatGPT?s gender bias. Retrieved November 12, 2023 from https://www.aisnakeoil.com/p/quantifyingchatgpts- gender-bias
[27]
Byeongchang Kim, Hyunwoo Kim, and Gunhee Kim. 2019. Abstractive Summarization of Reddit Posts with Multi-level Memory Networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 2519--2531. https://doi.org/10.18653/V1/N19--1260
[28]
Svetlana Kiritchenko and Saif M. Mohammad. 2018. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5--6, 2018. Association for Computational Linguistics, 43--53. https://doi.org/10.18653/V1/S18--2005
[29]
Hadas Kotek, Rikker Dockum, and David Q. Sun. 2023. Gender bias and stereotypes in Large Language Models. In Proceedings of The ACM Collective Intelligence Conference, CI 2023, Delft, Netherlands, November 6--9, 2023. ACM, 12--24. https://doi.org/10.1145/3582269.3615599
[30]
Tianlin Li, Qing Guo, Aishan Liu, Mengnan Du, Zhiming Li, and Yang Liu. 2023. FAIRER: fairness as decision rationale alignment. In International Conference on Machine Learning. PMLR, 19471--19489.
[31]
Tianlin Li, Zhiming Li, Anran Li, Mengnan Du, Aishan Liu, Qing Guo, Guozhu Meng, and Yang Liu. 2023. Fairness via group contribution matching. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. 436--445.
[32]
Clara Meister and Ryan Cotterell. 2021. Language Model Evaluation Beyond Perplexity. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1--6, 2021. Association for Computational Linguistics, 5328--5339. https://doi.org/10.18653/V1/2021.ACL-LONG.414
[33]
Moin Nadeem, Anna Bethke, and Siva Reddy. 2021. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1--6, 2021. Association for Computational Linguistics, 5356--5371. https://doi.org/10.18653/V1/2021.ACL-LONG.416
[34]
Aurélie Névéol, Yoann Dupont, Julien Bezançon, and Karën Fort. 2022. French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22--27, 2022. Association for Computational Linguistics, 8521--8531. https://doi.org/10.18653/V1/2022.ACL-LONG.583
[35]
National Institute of Standards and Technology (NIST). 2023. Trustworthy and Responsible AI. Retrieved November 17, 2023 from https://www.nist.gov/ trustworthy-and-responsible-ai
[36]
Anaelia Ovalle, Palash Goyal, Jwala Dhamala, Zachary Jaggers, Kai-Wei Chang, Aram Galstyan, Richard S. Zemel, and Rahul Gupta. 2023. "I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12--15, 2023. ACM, 1246--1266. https://doi.org/10.1145/3593013.3594078
[37]
The European Parliament and of the Council. 2023. Convention on AI and Human Rights. Retrieved November 15, 2023 from https://rm.coe.int/cai-2023--18- consolidated-working-draft-framework-convention/1680abde66
[38]
Deborah A Prentice and Erica Carranza. 2002. What women and men should be, shouldn?t be, are allowed to be, and don't have to be: The contents of prescriptive gender stereotypes. Psychology of women quarterly 26, 4 (2002), 269--281.
[39]
Organizers Of QueerInAI, Anaelia Ovalle, and et al. 2023. Queer In AI: A Case Study in Community-Led Participatory AI. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12--15, 2023. ACM, 1882--1895. https://doi.org/10.1145/3593013.3594134
[40]
Patrick Schramowski, Cigdem Turan, Nico Andersen, Constantin A. Rothkopf, and Kristian Kersting. 2022. Large pre-trained language models contain humanlike biases of what is right and wrong to do. Nat. Mach. Intell. 4, 3 (2022), 258--268. https://doi.org/10.1038/S42256-022-00458--8
[41]
Preethi Seshadri, Pouya Pezeshkpour, and Sameer Singh. 2022. Quantifying Social Biases Using Templates is Unreliable. CoRR abs/2210.04337 (2022). https: //doi.org/10.48550/ARXIV.2210.04337
[42]
Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. 2019. The WomanWorked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3--7, 2019. Association for Computational Linguistics, 3405--3410. https://doi.org/10.18653/V1/D19--1339
[43]
Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, and Yang Zhang. 2022. Why So Toxic?: Measuring and Triggering Toxic Behavior in Open-Domain Chatbots. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7--11, 2022. ACM, 2659--2673. https: //doi.org/10.1145/3548606.3560599
[44]
U.S. Social Security Administration (SSA). 2022. Popular Names for individuals born in 2022. Retrieved November 20, 2023 from https://www.ssa.gov/cgi-bin/ popularnames.cgi
[45]
Gabriel Stanovsky, Noah A. Smith, and Luke Zettlemoyer. 2019. Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, 1679--1684. https://doi.org/10.18653/V1/P19--1164
[46]
Cara Tannenbaum, Robert P. Ellis, Friederike Eyssel, James Zou, and Londa Schiebinger. 2019. Sex and gender analysis improves science and engineering. Nat. 575, 7781 (2019), 137--146. https://doi.org/10.1038/S41586-019--1657--6
[47]
Aniket Vashishtha, Kabir Ahuja, and Sunayana Sitaram. 2023. On Evaluating and Mitigating Gender Biases in Multilingual Settings. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9--14, 2023. Association for Computational Linguistics, 307--318. https://doi.org/10.18653/ V1/2023.FINDINGS-ACL.21
[48]
Bertie Vidgen, Tristan Thrush, Zeerak Waseem, and Douwe Kiela. 2021. Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1--6, 2021. Association for Computational Linguistics, 1667--1682. https://doi.org/10.18653/V1/2021.ACL-LONG.132
[49]
AlexWang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the Workshop: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2018, Brussels, Belgium, November 1, 2018. Association for Computational Linguistics, 353--355. https://doi.org/10.18653/V1/W18--5446
[50]
Jun Wang, Benjamin I. P. Rubinstein, and Trevor Cohn. 2022. Measuring and Mitigating Name Biases in Neural Machine Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22--27, 2022. Association for Computational Linguistics, 2576--2590. https://doi.org/10.18653/V1/2022.ACL-LONG.184
[51]
Miranda Wei, Pardis Emami Naeini, Franziska Roesner, and Tadayoshi Kohno. 2023. Skilled or Gullible f Gender Stereotypes Related to Computer Security and Privacy. In 44th IEEE Symposium on Security and Privacy, SP 2023, San Francisco, CA, USA, May 21--25, 2023. IEEE, 2050--2067. https://doi.org/10.1109/SP46215. 2023.10179469
[52]
Jaclyn White, Sari Reisner, and et al. 2015. Transgender stigma and health: A critical review of stigma determinants, mechanisms, and interventions. Social science & medicine 147 (2015), 222--231.
[53]
Wikipedia. 2023. Category:People with non-binary gender identities. Retrieved November 24, 2023 from https://en.wikipedia.org/wiki/Category:People_with_ non-binary_gender_identities
[54]
Wikipedia. 2023. Gender Binary Entry. Retrieved November 24, 2023 from https://en.wikipedia.org/wiki/Gender_binary
[55]
Twitter (X). 2017. Sentiment140 dataset with 1.6 million tweets. Retrieved November 17, 2023 from https://kaggle.com/datasets/kazanova/sentiment140/data
[56]
Yisong Xiao, Aishan Liu, Tianlin Li, and Xianglong Liu. 2023. Latent imitator: Generating natural individual discriminatory instances for black-box fairness testing. In Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis. 829--841.
[57]
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1--6, 2018, Volume 2 (Short Papers). Association for Computational Linguistics, 15--20. https://doi.org/10.18653/V1/N18--2003

Cited By

View all
  • (2024)Security and Privacy on Generative Data in AIGC: A SurveyACM Computing Surveys10.1145/370362657:4(1-34)Online publication date: 10-Dec-2024

Index Terms

  1. GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security
      December 2024
      5188 pages
      ISBN:9798400706363
      DOI:10.1145/3658644
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 December 2024

      Check for updates

      Badges

      Author Tags

      1. AI security
      2. algorithmic fairness
      3. gender bias
      4. large language models

      Qualifiers

      • Research-article

      Funding Sources

      • the National Research Foundation, Singapore, and the Cyber Security Agency under its National Cybersecurity R&D Programme
      • the Natural Science Foundation of China
      • the National Research Foundation, Singapore and Infocomm Media Development Authority under its Trust Tech Funding Initiative

      Conference

      CCS '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)435
      • Downloads (Last 6 weeks)243
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Security and Privacy on Generative Data in AIGC: A SurveyACM Computing Surveys10.1145/370362657:4(1-34)Online publication date: 10-Dec-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media