Bias Assessment and Mitigation in LLM-based Code Generation

Huang, Dong; Bu, Qingwen; Zhang, Jie; Xie, Xiaofei; Chen, Junjie; Cui, Heming

Computer Science > Software Engineering

arXiv:2309.14345v1 (cs)

[Submitted on 3 Sep 2023 (this version), latest version 24 May 2024 (v3)]

Title:Bias Assessment and Mitigation in LLM-based Code Generation

Authors:Dong Huang, Qingwen Bu, Jie Zhang, Xiaofei Xie, Junjie Chen, Heming Cui

View PDF

Abstract:Utilizing state-of-the-art Large Language Models (LLMs), automatic code generation models play a pivotal role in enhancing the productivity and efficiency of software development coding procedures. As the adoption of LLMs becomes more widespread in software coding ecosystems, a pressing issue has emerged: does the generated code contain social biases, such as those related to age, gender, and race? This issue concerns the integrity, fairness, and ethical foundation of software applications that depend on the code generated by these models, yet is under-explored in the literature. This paper presents a novel bias assessment framework that is specifically designed for code generation tasks. Based on this framework, we conduct an extensive evaluation on the bias of nine state-of-the-art LLM-based code generation models. Our findings reveal that first, 31.45\% to 79.93\% code functions generated by our evaluated code generation models are biased, and 9.68\% to 37.37\% code functions' functionality are affected by the bias, which means biases not only exist in code generation models but in some cases, directly affect the functionality of the generated code, posing risks of unintended and possibly harmful software behaviors. To mitigate bias from code generation models, we propose three mitigation strategies, which can decrease the biased code ratio to a very low level of 0.4\% to 4.57\%.

Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2309.14345 [cs.SE]
	(or arXiv:2309.14345v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2309.14345

Submission history

From: Qingwen Bu [view email]
[v1] Sun, 3 Sep 2023 07:14:49 UTC (1,375 KB)
[v2] Tue, 9 Jan 2024 09:19:17 UTC (2,368 KB)
[v3] Fri, 24 May 2024 13:03:49 UTC (3,776 KB)

Computer Science > Software Engineering

Title:Bias Assessment and Mitigation in LLM-based Code Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Bias Assessment and Mitigation in LLM-based Code Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators