Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Models

Ye, Wenqian; Xu, Fei; Huang, Yaojia; Huang, Cassie; A, Ji

Computer Science > Computation and Language

arXiv:2110.01094 (cs)

[Submitted on 3 Oct 2021]

Title:Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Models

Authors:Wenqian Ye, Fei Xu, Yaojia Huang, Cassie Huang, Ji A

View PDF

Abstract:Over the last few years, Contextualized Pre-trained Neural Language Models, such as BERT, GPT, have shown significant gains in various NLP tasks. To enhance the robustness of existing pre-trained models, one way is adversarial examples generation and evaluation for conducting data augmentation or adversarial learning. In the meanwhile, gender bias embedded in the models seems to be a serious problem in practical applications. Many researches have covered the gender bias produced by word-level information(e.g. gender-stereotypical occupations), while few researchers have investigated the sentence-level cases and implicit cases.
In this paper, we proposed a method to automatically generate implicit gender bias samples at sentence-level and a metric to measure gender bias. Samples generated by our method will be evaluated in terms of accuracy. The metric will be used to guide the generation of examples from Pre-trained models. Therefore, those examples could be used to impose attacks on Pre-trained Models. Finally, we discussed the evaluation efficacy of our generated examples on reducing gender bias for future research.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2110.01094 [cs.CL]
	(or arXiv:2110.01094v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2110.01094

Submission history

From: Wenqian Ye [view email]
[v1] Sun, 3 Oct 2021 20:22:54 UTC (14,828 KB)

Computer Science > Computation and Language

Title:Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators