Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

Gupta, Umang; Dhamala, Jwala; Kumar, Varun; Verma, Apurv; Pruksachatkun, Yada; Krishna, Satyapriya; Gupta, Rahul; Chang, Kai-Wei; Steeg, Greg Ver; Galstyan, Aram

Computer Science > Computation and Language

arXiv:2203.12574 (cs)

[Submitted on 23 Mar 2022]

Title:Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

Authors:Umang Gupta, Jwala Dhamala, Varun Kumar, Apurv Verma, Yada Pruksachatkun, Satyapriya Krishna, Rahul Gupta, Kai-Wei Chang, Greg Ver Steeg, Aram Galstyan

View PDF

Abstract:Language models excel at generating coherent text, and model compression techniques such as knowledge distillation have enabled their use in resource-constrained settings. However, these models can be biased in multiple ways, including the unfounded association of male and female genders with gender-neutral professions. Therefore, knowledge distillation without any fairness constraints may preserve or exaggerate the teacher model's biases onto the distilled model. To this end, we present a novel approach to mitigate gender disparity in text generation by learning a fair model during knowledge distillation. We propose two modifications to the base knowledge distillation based on counterfactual role reversal$\unicode{x2014}$modifying teacher probabilities and augmenting the training set. We evaluate gender polarity across professions in open-ended text generated from the resulting distilled and finetuned GPT$\unicode{x2012}$2 models and demonstrate a substantial reduction in gender disparity with only a minor compromise in utility. Finally, we observe that language models that reduce gender polarity in language generation do not improve embedding fairness or downstream classification fairness.

Comments:	To appear in the Findings of ACL 2022
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2203.12574 [cs.CL]
	(or arXiv:2203.12574v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2203.12574

Submission history

From: Umang Gupta [view email]
[v1] Wed, 23 Mar 2022 17:34:35 UTC (124 KB)

Computer Science > Computation and Language

Title:Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators