Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: arXiv.org perpetual non-exclusive license
arXiv:2311.13892v3 [cs.CL] 25 Jan 2024

General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level

Abstract

The social biases and unwelcome stereotypes revealed by pretrained language models are becoming obstacles to their application. Compared to numerous debiasing methods targeting word level, there has been relatively less attention on biases present at phrase level, limiting the performance of debiasing in discipline domains. In this paper, we propose an automatic multi-token debiasing pipeline called General Phrase Debiaser, which is capable of mitigating phrase-level biases in masked language models. Specifically, our method consists of a phrase filter stage that generates stereotypical phrases from Wikipedia pages as well as a model debias stage that can debias models at the multi-token level to tackle bias challenges on phrases. The latter searches for prompts that trigger model’s bias, and then uses them for debiasing. State-of-the-art results on standard datasets and metrics show that our approach can significantly reduce gender biases on both career and multiple disciplines, across models with varying parameter sizes.

Index Terms—  Social Bias, Stereotype, Pretrained Language Model, Masked Language Model, NLP

1 Introduction

Recently, masked language models (MLMs) [1, 2, 3, 4, 5, 6] are employed in both traditional tasks like text classification [7, 8, 9] and diverse multimodal tasks [10, 11] when combined with models like image generators [12, 13]. We aim to develop MLMs with minimal human biases, even when the pretraining data unavoidably contains these biases. However, correcting implicit biases in pretrained MLMs can be very challenging, especially considering the high cost of retraining models from scratch.

Existing studies [14, 15, 16, 17, 18, 19] have introduced intuitive approaches that use additional corpus to retrieve contextualized embeddings or locate the biases and fine-tune accordingly. But they are rely on external human-written corpus. Auto-Debias[20] hires the prompt[21] template ”[attribute word] [T]…[T] [MASK]” to guide MLMs to automatically search for prompts that makes the model show its bias, and then fine-tune MLMs with them. Nevertheless, real-world language environments are not so ideal, meaning both attribute words and stereotypes should be treated as multi-token. While these method only correct biases at the word level, lead to struggling at the phrase level.

Motivated by this, we propose an automatic multi-token debias pipeline called General Phrase Debiaser to address the limitations of automatic debiasing mentioned above. The major contributions of our work are:

  • Unlike existing methods, we debias MLMs at the phrase granularity. In order to reduce the cost of manually constructing the phrase list, we get the stereotypical phrases filtered from hyperlinks of Wikipedia pages in Phrase Filter Stage.

  • With the multi-token debias head we proposed, “discriminative” prompts can be searched in Model Debias Stage. These cloze-style prompts have the highest disagreement in generating stereotypical phrases (e.g., mathematical theory/dance art) with respect to demographic words (e.g., man/woman). Then we fine-tune the model using searched prompts.

  • Different from the Auto-Debias’ fine-tuning stage, our approach derives loss from stereotypical phrases, rather than from the entire vocabulary belonging to the model itself. This allows our method to adjust the model parameters more specifically without affecting any other gender-independent word or knowledge.

  • We conduct experiments on three well-known open-source MLMs: BERT[1], ALBERT[2], and DistilBERT[5], and achieves state-of-the-art performance (0.12, 0.16, and 52) on SEAT test.

Our code and debiased model files are available at https://
github.com/BingkangShi/general-phrase-debiaser.

Refer to caption
Fig. 1: The proposed General phrase Debiaser pipeline has two stages. In phrase filter stage, we filter out stereotypical phrases Sweightedsubscript𝑆𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{weighted}italic_S start_POSTSUBSCRIPT italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT and Sunweightedsubscript𝑆𝑢𝑛𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{unweighted}italic_S start_POSTSUBSCRIPT italic_u italic_n italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT from WikiPedia pages with MLM and stereotypical seeds. In model debias stage, we search biased prompts at multi-token granularity, and fine-tuning MLM with them.
Refer to caption
Fig. 2: Computation of multi-token JSD loss.

2 General Phrase Debiaser

2.1 Phrase Filter Stage

To minimize the cost of manually constructing stereotypes in many specific domains, we use the MLM which needs to be debiased to filter hyperlinks of WikiPedia pages. Hyperlinked phrases Srawsubscript𝑆𝑟𝑎𝑤S_{raw}italic_S start_POSTSUBSCRIPT italic_r italic_a italic_w end_POSTSUBSCRIPT that semantically similar to stereotypical seeds can be filtered out. And stereotype seeds comprising Ntopicsubscript𝑁𝑡𝑜𝑝𝑖𝑐N_{topic}italic_N start_POSTSUBSCRIPT italic_t italic_o italic_p italic_i italic_c end_POSTSUBSCRIPT topics need to be manually specified, with each topic having nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT hyponyms. We choose career, math, art, and science as topics to construct stereotypes, so Ntopicsubscript𝑁𝑡𝑜𝑝𝑖𝑐N_{topic}italic_N start_POSTSUBSCRIPT italic_t italic_o italic_p italic_i italic_c end_POSTSUBSCRIPT is 4 in this paper. It should be noted that the filtered phrases under the math, art, and science topics are generated by our Phrase Filter Stage, while the phrases under the career topic were provided by previous work [15].

Let \mathcal{M}caligraphic_M be a MLM and CLSsubscript𝐶𝐿𝑆\mathcal{M}_{CLS}caligraphic_M start_POSTSUBSCRIPT italic_C italic_L italic_S end_POSTSUBSCRIPT be the process of computing the classification embedding (CLS) that represents a sentence. The embedding of a phrase can be computed as follows:

E^(x)=1|T|tTCLS(t(x))^𝐸𝑥1𝑇subscript𝑡𝑇subscript𝐶𝐿𝑆𝑡𝑥\hat{E}\left(x\right)=\frac{1}{\left|T\right|}\sum_{t\in T}{\mathcal{M}_{CLS}% \left(t\left(x\right)\right)}\\ over^ start_ARG italic_E end_ARG ( italic_x ) = divide start_ARG 1 end_ARG start_ARG | italic_T | end_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_C italic_L italic_S end_POSTSUBSCRIPT ( italic_t ( italic_x ) ) (1)

where t𝑡titalic_t represents a sentence template, and T𝑇Titalic_T is a set that includes all templates. We refer to the 14 blank-filling templates used in the SEAT test [22], such as ”this is a __.” or ”__ is here.”.

Then the cosine similarity between phrase phrawSraw𝑝subscript𝑟𝑎𝑤subscript𝑆𝑟𝑎𝑤ph_{raw}\in S_{raw}italic_p italic_h start_POSTSUBSCRIPT italic_r italic_a italic_w end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_r italic_a italic_w end_POSTSUBSCRIPT and wi,jsubscript𝑤𝑖𝑗w_{i,j}italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT can be computed through:

d(i,j,phraw)=cos(E^(phraw),E^(wi,j))𝑑𝑖𝑗𝑝subscript𝑟𝑎𝑤𝑐𝑜𝑠^𝐸𝑝subscript𝑟𝑎𝑤^𝐸subscript𝑤𝑖𝑗d\left(i,j,ph_{raw}\right)=cos(\hat{E}\left(ph_{raw}\right),\ \hat{E}\left(w_{% i,j}\right))\\ italic_d ( italic_i , italic_j , italic_p italic_h start_POSTSUBSCRIPT italic_r italic_a italic_w end_POSTSUBSCRIPT ) = italic_c italic_o italic_s ( over^ start_ARG italic_E end_ARG ( italic_p italic_h start_POSTSUBSCRIPT italic_r italic_a italic_w end_POSTSUBSCRIPT ) , over^ start_ARG italic_E end_ARG ( italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) ) (2)

where wi,jsubscript𝑤𝑖𝑗w_{i,j}italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is the j𝑗jitalic_j-th hyponym of the i𝑖iitalic_i-th topic, jni𝑗subscript𝑛𝑖j\in n_{i}italic_j ∈ italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and iNtopic𝑖subscript𝑁𝑡𝑜𝑝𝑖𝑐i\in N_{topic}italic_i ∈ italic_N start_POSTSUBSCRIPT italic_t italic_o italic_p italic_i italic_c end_POSTSUBSCRIPT.

We define the quantity of phraw𝑝subscript𝑟𝑎𝑤ph_{raw}italic_p italic_h start_POSTSUBSCRIPT italic_r italic_a italic_w end_POSTSUBSCRIPT in phrase set s^i,jsubscript^𝑠𝑖𝑗{\hat{s}}_{i,j}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT as topKp𝑡𝑜𝑝subscript𝐾𝑝topK_{p}italic_t italic_o italic_p italic_K start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, according to d(i,j,phraw)𝑑𝑖𝑗𝑝subscript𝑟𝑎𝑤d\left(i,j,ph_{raw}\right)italic_d ( italic_i , italic_j , italic_p italic_h start_POSTSUBSCRIPT italic_r italic_a italic_w end_POSTSUBSCRIPT ) sorted by ascending order. So we can collect Sweightedsubscript𝑆𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{weighted}italic_S start_POSTSUBSCRIPT italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT with:

Sweighted=(S0,S1,,SNtopic1)subscript𝑆𝑤𝑒𝑖𝑔𝑡𝑒𝑑subscript𝑆0subscript𝑆1subscript𝑆subscript𝑁𝑡𝑜𝑝𝑖𝑐1S_{weighted}=(S_{0},S_{1},\cdots,{\ S}_{N_{topic}-1}\ )\\ italic_S start_POSTSUBSCRIPT italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT = ( italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_S start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_t italic_o italic_p italic_i italic_c end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ) (3)

where the phrase set of i𝑖iitalic_i-th topic is Si=(s^0,s^1,,s^ni1)subscript𝑆𝑖subscript^𝑠0subscript^𝑠1subscript^𝑠subscript𝑛𝑖1S_{i}=({\hat{s}}_{0},{\hat{s}}_{1},\cdots,\ {\hat{s}}_{n_{i}-1}\ )italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ). After removing duplicate phrases, Sweightedsubscript𝑆𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{weighted}italic_S start_POSTSUBSCRIPT italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT is transformed into Sunweightedsubscript𝑆𝑢𝑛𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{unweighted}italic_S start_POSTSUBSCRIPT italic_u italic_n italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT.

2.2 Finding Biased Prompts

Previous attempts of Auto-Debias [20] used cloze-style prompts to detect biases in attribute words within stereotypes. Let 𝒱𝒱\mathcal{V}caligraphic_V be vocabulary of a MLM, and a prompt xprompt𝒱subscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡𝒱x_{prompt}\in\mathcal{V}italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT ∈ caligraphic_V is a sequence of words with one masked token [MASK] and one attribute token. A MLM can be probed by a cloze-style prompt, such as xpromptsubscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡x_{prompt}italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT=”[attribute] majors in [MASK].”. The ”[attribute]” is assigned to be filled in a set 𝒞={(c1,1,c1,2,,c1,m),(c2,1,c2,2,,c2,m),}𝒞subscript𝑐11subscript𝑐12subscript𝑐1𝑚subscript𝑐21subscript𝑐22subscript𝑐2𝑚\mathcal{C}=\{(c_{1,1},c_{1,2},\cdots,c_{1,m}),(c_{2,1},c_{2,2},\cdots,c_{2,m}% ),\cdots\}caligraphic_C = { ( italic_c start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT , ⋯ , italic_c start_POSTSUBSCRIPT 1 , italic_m end_POSTSUBSCRIPT ) , ( italic_c start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT , ⋯ , italic_c start_POSTSUBSCRIPT 2 , italic_m end_POSTSUBSCRIPT ) , ⋯ } composed of m𝑚mitalic_m-tuples, derived from the gender word list in [15]. And the position of ”[MASK]” serves the purpose of being predicted by \mathcal{M}caligraphic_M for a stereotypical word. So we can obtain stereotypical word probability as:

P([MASK]=v|,xprompt(c))𝑃delimited-[]𝑀𝐴𝑆𝐾conditional𝑣subscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡𝑐\displaystyle P\left([MASK]=v|\mathcal{M},x_{prompt}\left(c\right)\right)italic_P ( [ italic_M italic_A italic_S italic_K ] = italic_v | caligraphic_M , italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT ( italic_c ) ) (4)
=softmax([MASK](v|xprompt(c)))absent𝑠𝑜𝑓𝑡𝑚𝑎𝑥subscriptdelimited-[]𝑀𝐴𝑆𝐾conditional𝑣subscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡𝑐\displaystyle=softmax\left(\mathcal{M}_{[MASK]}\left(v|x_{prompt}\left(c\right% )\right)\right)= italic_s italic_o italic_f italic_t italic_m italic_a italic_x ( caligraphic_M start_POSTSUBSCRIPT [ italic_M italic_A italic_S italic_K ] end_POSTSUBSCRIPT ( italic_v | italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT ( italic_c ) ) )

where v𝒱𝑣𝒱v\in\mathcal{V}italic_v ∈ caligraphic_V. xprompt(c)=cx[MASK]subscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡𝑐direct-sum𝑐𝑥delimited-[]𝑀𝐴𝑆𝐾x_{prompt}\left(c\right)=c\oplus x\oplus\left[{MASK}\right]italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT ( italic_c ) = italic_c ⊕ italic_x ⊕ [ italic_M italic_A italic_S italic_K ] is a string composed of xpromptsubscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡x_{prompt}italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT and c𝒞𝑐𝒞c\in\mathcal{C}italic_c ∈ caligraphic_C. For example, xprompt(c)subscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡𝑐x_{prompt}\left(c\right)italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT ( italic_c ) = ”c𝑐citalic_c majors in [MASK].”.

While the above method is effective only when stereotypes are single-token. Thus we introduce a probability calculation method for stereotypes at multi-token granularity (as shown in Fig.2):

Pck,m(n)=P([𝐭𝐚𝐫𝐠𝐞𝐭]=phi|,xprompt(ck,m(n)))superscriptsubscript𝑃subscript𝑐𝑘𝑚𝑛𝑃delimited-[]𝐭𝐚𝐫𝐠𝐞𝐭conditional𝑝subscript𝑖superscriptsubscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡superscriptsubscript𝑐𝑘𝑚𝑛\displaystyle P_{c_{k,m}}^{(n)}=P(\left[\mathbf{t}\mathbf{a}\mathbf{r}\mathbf{% g}\mathbf{e}\mathbf{t}\right]={ph}_{i}|\mathcal{M},x_{prompt}^{{}^{\prime}}(c_% {k,m}^{({n})}))italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT = italic_P ( [ bold_target ] = italic_p italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_M , italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) ) (5)
=softmax(lnlogitl)absent𝑠𝑜𝑓𝑡𝑚𝑎𝑥subscript𝑙𝑛𝑙𝑜𝑔𝑖subscript𝑡𝑙\displaystyle=softmax({\sum\limits_{l\in n}{logit_{l}}})= italic_s italic_o italic_f italic_t italic_m italic_a italic_x ( ∑ start_POSTSUBSCRIPT italic_l ∈ italic_n end_POSTSUBSCRIPT italic_l italic_o italic_g italic_i italic_t start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT )

while the logit corresponding to the [MASK] token position should be:

logitl=[MASK]l(phi|xprompt(ck,m(n)))𝑙𝑜𝑔𝑖subscript𝑡𝑙subscriptsubscriptdelimited-[]𝑀𝐴𝑆𝐾𝑙conditional𝑝subscript𝑖superscriptsubscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡superscriptsubscript𝑐𝑘𝑚𝑛logit_{l}=\mathcal{M}_{{[MASK]}_{l}}({ph}_{i}|x_{prompt}^{{}^{\prime}}(c_{k,m}% ^{({n})}))\\ italic_l italic_o italic_g italic_i italic_t start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = caligraphic_M start_POSTSUBSCRIPT [ italic_M italic_A italic_S italic_K ] start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) ) (6)

where [𝐭𝐚𝐫𝐠𝐞𝐭]delimited-[]𝐭𝐚𝐫𝐠𝐞𝐭\left[\mathbf{t}\mathbf{a}\mathbf{r}\mathbf{g}\mathbf{e}\mathbf{t}\right][ bold_target ] is multiple [MASK] token sequence of length n𝑛nitalic_n, and n𝑛nitalic_n is the maximum length of stereotypical phrase phiSunweighted𝑝subscript𝑖subscript𝑆𝑢𝑛𝑤𝑒𝑖𝑔𝑡𝑒𝑑{ph}_{i}\in S_{unweighted}italic_p italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_u italic_n italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT. ck,m(n)superscriptsubscript𝑐𝑘𝑚𝑛c_{k,m}^{({n})}italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT containing n𝑛nitalic_n [MASK] tokens is the m𝑚mitalic_m-th phrase of the k𝑘kitalic_k-th tuple in set 𝒞𝒞\mathcal{C}caligraphic_C. Here xprompt(c)subscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡𝑐x_{prompt}\left(c\right)italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT ( italic_c ) should evolve into:

xprompt(ck,m(n))=ck,m(n)x[𝐭𝐚𝐫𝐠𝐞𝐭]superscriptsubscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡superscriptsubscript𝑐𝑘𝑚𝑛direct-sumsuperscriptsubscript𝑐𝑘𝑚𝑛𝑥delimited-[]𝐭𝐚𝐫𝐠𝐞𝐭\displaystyle x_{prompt}^{{}^{\prime}}\left(c_{k,m}^{({n})}\right)=c_{k,m}^{({% n})}\oplus x\oplus\left[{\mathbf{t}\mathbf{a}\mathbf{r}\mathbf{g}\mathbf{e}% \mathbf{t}}\right]italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) = italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ⊕ italic_x ⊕ [ bold_target ] (7)
=ck,m(n)x[MASK][MASK][MASK]absentdirect-sumsuperscriptsubscript𝑐𝑘𝑚𝑛𝑥delimited-[]𝑀𝐴𝑆𝐾delimited-[]𝑀𝐴𝑆𝐾delimited-[]𝑀𝐴𝑆𝐾\displaystyle=c_{k,m}^{({n})}\oplus x\oplus\left[{MASK}\right]\left[{MASK}% \right]\cdots\left[{MASK}\right]= italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ⊕ italic_x ⊕ [ italic_M italic_A italic_S italic_K ] [ italic_M italic_A italic_S italic_K ] ⋯ [ italic_M italic_A italic_S italic_K ]

By repeatedly applying Eq. (5), we can obtain the distributions P(1)superscript𝑃1P^{(1)}italic_P start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT, P(2)superscript𝑃2P^{(2)}italic_P start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT, … , P(n)superscript𝑃𝑛P^{(n)}italic_P start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT for stereotypical phrases with different token length. In step 2 and step 3 in Fig.1, we use Jensen-Shannon Divergence (JSD), which is a symmetric and smooth Kullback–Leibler divergence (KLD), to measure the difference between multiple distributions Pck,1(n)+Pck,2(n)++Pck,m(n)superscriptsubscript𝑃subscript𝑐𝑘1𝑛superscriptsubscript𝑃subscript𝑐𝑘2𝑛superscriptsubscript𝑃subscript𝑐𝑘𝑚𝑛P_{c_{k,1}}^{(n)}+P_{c_{k,2}}^{(n)}+\cdots+P_{c_{k,m}}^{(n)}italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT + italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT + ⋯ + italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT as follow:

JSDlossk(n)=nJSD(Pck,1(n),Pck,2(n),,Pck,m(n))𝐽𝑆𝐷𝑙𝑜𝑠superscriptsubscript𝑠𝑘𝑛subscript𝑛𝐽𝑆𝐷superscriptsubscript𝑃subscript𝑐𝑘1𝑛superscriptsubscript𝑃subscript𝑐𝑘2𝑛superscriptsubscript𝑃subscript𝑐𝑘𝑚𝑛\displaystyle{JSD\ loss}_{k}^{({n})}={\sum\limits_{n}{JSD({P_{c_{k,1}}^{(n)},P% _{c_{k,2}}^{(n)},\cdots,P_{c_{k,m}}^{(n)}})}}italic_J italic_S italic_D italic_l italic_o italic_s italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , ⋯ , italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) (8)
=1miKLD(Pck,i(n)||Pck,1(n)+Pck,2(n)++Pck,m(n)m)\displaystyle=\frac{1}{m}{\sum\limits_{i}{KLD\left(P_{c_{k,i}}^{(n)}\middle|% \middle|\frac{P_{c_{k,1}}^{(n)}+P_{c_{k,2}}^{(n)}+\cdots+P_{c_{k,m}}^{(n)}}{m}% \right)}}= divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K italic_L italic_D ( italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT | | divide start_ARG italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT + italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT + ⋯ + italic_P start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_m end_ARG )

In this paper, JSD measures the difference between the two-gender distributions, so m=2𝑚2m=2italic_m = 2. The KLD between two distributions pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and pjsubscript𝑝𝑗p_{j}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT can be computed as: KLD(pi||pj)=v𝒱pi(v)log(pi(v)pj(v))KLD\left(p_{i}\middle|\middle|p_{j}\right)={\sum_{v\in\mathcal{V}}{p_{i}(v)}}{% \mathit{\log}(\frac{p_{i}(v)}{p_{j}(v)})}italic_K italic_L italic_D ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_V end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v ) roman_log ( divide start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_v ) end_ARG ).

The loss of an input xpromptsuperscriptsubscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡x_{prompt}^{{}^{\prime}}italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT can be seen as the sum of the overall probability distribution differences of 𝒞𝒞\mathcal{C}caligraphic_C:

loss(xprompt)=kJSDlossk(multi)𝑙𝑜𝑠𝑠superscriptsubscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡subscript𝑘𝐽𝑆𝐷𝑙𝑜𝑠superscriptsubscript𝑠𝑘𝑚𝑢𝑙𝑡𝑖\displaystyle loss\left(x_{prompt}^{{}^{\prime}}\right)={\sum\limits_{k}{JSD\ % loss}_{k}^{({multi})}}italic_l italic_o italic_s italic_s ( italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_J italic_S italic_D italic_l italic_o italic_s italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_m italic_u italic_l italic_t italic_i ) end_POSTSUPERSCRIPT (9)
=knJSDlossk(n)absentsubscript𝑘subscript𝑛𝐽𝑆𝐷𝑙𝑜𝑠superscriptsubscript𝑠𝑘𝑛\displaystyle={\sum\limits_{k}{\sum\limits_{n}{JSD\ loss}_{k}^{(n)}}}= ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_J italic_S italic_D italic_l italic_o italic_s italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT

Step 2 in Fig.1 shows we employ Beam Search[23] to find biased prompts that maximize the loss(xprompt)𝑙𝑜𝑠𝑠superscriptsubscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡loss(x_{prompt}^{{}^{\prime}})italic_l italic_o italic_s italic_s ( italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ). Searched biased prompts will be collected for fine-tuning MLM in the step 3.

2.3 Fine-tuning MLM with Prompts

Model SEAT-6 SEAT-6b SEAT-7 SEAT-7b SEAT-8 SEAT-8b avg.
BERT 0.48 0.11 0.25 0.25 0.40 0.61 0.35
+Context-Debias[15] 1.13 - 0.34 - 0.12 - 0.53
+FairFil[19] 0.18 0.08 0.12 0.08 0.20 0.24 0.15
+Auto-Debias[20] 0.08 0.02 0.36 0.40 0.12 0.20 0.20
+General Phrase Debiaser 0.00 0.13 0.19 0.10 0.02 0.27 0.12
ALBERT 0.51 0.02 0.58 1.02 0.99 1.20 0.72
+Context-Debias[15] 0.18 - 0.05 - 0.77 - 0.33
+General Phrase Debiaser 0.04 0.30 0.01 0.02 0.33 0.29 0.16
DistilBERT 1.26 0.25 0.31 1.22 0.74 0.98 0.79
+Context-Debias[15] 1.34 - 1.01 - 0.97 - 1.11
+General Phrase Debiaser 0.60 0.32 0.21 0.99 0.23 0.79 0.52
Table 1: Gender debiasing results of SEAT on BERT, ALBERT and DistilBERT. SEAT-6 and SEAT-6b are tests about career, SEAT-7, SEAT-7b, SEAT-8 and SEAT-8b are about discipline. Scores closer to 0 are better. ”-” means the value is not reported in the original paper.
Model CoLA SST-2 MRPC STS-B QQP MNLI QNLI RTE WNLI
BERT 0.59 0.93 0.89/0.85 0.89/0.88 0.91/0.88 0.85/0.85 0.92 0.65 0.56
+General Phrase Debiaser 0.56 0.93 0.89/0.84 0.89/0.89 0.90/0.88 0.85/0.85 0.92 0.65 0.56
ALBERT 0.55 0.92 0.92/0.89 0.91/0.91 0.91/0.88 0.85/0.85 0.92 0.73 0.39
+General Phrase Debiaser 0.54 0.93 0.90/0.86 0.91/0.91 0.91/0.88 0.85/0.85 0.92 0.73 0.42
DistilBERT 0.47 0.91 0.89/0.84 0.86/0.86 0.90/0.87 0.82/0.82 0.88 0.58 0.56
+General Phrase Debiaser 0.46 0.91 0.89/0.84 0.86/0.86 0.90/0.87 0.82/0.82 0.89 0.62 0.56
Table 2: GLUE test results of the original and the gender-debiased MLMs with our method.

Given that existing work [15] demonstrated the presence of biases in all parameters of the models, we choose to fine-tune the entire \mathcal{M}caligraphic_M to mitigate biases in the model with searched biased prompts in step 2. This corresponds to step 3 as illustrated in Fig.1.

In contrast to the prompt search phase, during the debiasing fine-tuning, we aim to minimize loss(xprompt)𝑙𝑜𝑠𝑠superscriptsubscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡loss(x_{prompt}^{{}^{\prime}})italic_l italic_o italic_s italic_s ( italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ) to reduce the distribution discrepancy of \mathcal{M}caligraphic_M on Sweightedsubscript𝑆𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{weighted}italic_S start_POSTSUBSCRIPT italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT induced by xpromptsuperscriptsubscript𝑥𝑝𝑟𝑜𝑚𝑝𝑡x_{prompt}^{{}^{\prime}}italic_x start_POSTSUBSCRIPT italic_p italic_r italic_o italic_m italic_p italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT. This distribution discrepancy is specific to Sweightedsubscript𝑆𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{weighted}italic_S start_POSTSUBSCRIPT italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT, indicating that our method propagates gradients through each phrase in the entire stereotype, rather than debiasing the entire vocabulary 𝒱𝒱\mathcal{V}caligraphic_V of \mathcal{M}caligraphic_M as done in Auto-Debias. As a result, our debiasing approach pays more attention to stereotypical phrases in Sweightedsubscript𝑆𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{weighted}italic_S start_POSTSUBSCRIPT italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT and has less impact on unrelated words.

3 Results and Evaluation

3.1 Evaluation Data And Details

Debias Data & Language Capability Data: We evaluate the proposed General Phrase Debiaser on two dataset: (1) the Sentence Embedding Association Test (SEAT) [22] which provides a commonly used metric for assessing biases in PLM embeddings, and (2)the General Language Understanding Evaluation (GLUE) benchmark [24] which measures common language modeling capability. We evaluate our method on 3 MLMs with differnt sized parameters: BERT [1], ALBERT [2], and distilBERT [5], and compare the proposed method with 3 other algorithms: Context-Debias[15], FairFil[19] as well as Auto-Debias[20].

Implementation Details: Hyperparameters play a critical role in final performance [25, 26, 27]. For completeness sake, we then introduce the hyperparameters we used in our study. In step 2 of Fig.1, we use Sunweightedsubscript𝑆𝑢𝑛𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{unweighted}italic_S start_POSTSUBSCRIPT italic_u italic_n italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT which has 624 phrases. The maximum biased prompt length PL𝑃𝐿PLitalic_P italic_L is 5 and beam search width K𝐾Kitalic_K is 100. We use the 5,000 highest frequency words in Wikipedia as the search space 𝒱𝒱\mathcal{V}caligraphic_V, to avoid noise in the vocabulary 𝒱𝒱\mathcal{V}caligraphic_V and speed up the prompt search process. In step 3, we use Sweightedsubscript𝑆𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{weighted}italic_S start_POSTSUBSCRIPT italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT with more than 500 phrases instead of Sunweightedsubscript𝑆𝑢𝑛𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{unweighted}italic_S start_POSTSUBSCRIPT italic_u italic_n italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT because Sweightedsubscript𝑆𝑤𝑒𝑖𝑔𝑡𝑒𝑑S_{weighted}italic_S start_POSTSUBSCRIPT italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d end_POSTSUBSCRIPT takes into account the varying weights of different stereotypical phrases, resulting in better debiasing effects (as shown in Table 1). And we choose 𝒞*superscript𝒞\mathcal{C}^{*}caligraphic_C start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT (an extension of 𝒞𝒞\mathcal{C}caligraphic_C, derived from the gender word list in [15]) as attribute phrases to construct more fine-tuning data. All models are trained with AdamW [28] optimizer and early stopping strategy. Our experiments run on a single NVIDIA 3090Ti.

3.2 Evaluation Result And Analysis

We run General Phrase Debiaser in both career field and discipline field at the same time. The effect size score of the SEAT[16] benchmark we report in Table 1 measures the association between two sets of target concepts and two sets of attributes. It is obtained by calculating the normalized distance between a set of attribute sentence vectors and a set of concept sentence vectors output by the model, and the closer this distance is to 0, the less biased the model is. The result demonstrates that our method is capable of reducing model biases, lowering original average scores of BERT, ALBERT, and DistilBERT in the six SEAT tests from 0.35, 0.72, and 0.79, respectively, to 0.12, 0.16, and 0.52. Furthermore, compared to other approaches in the entire benchmark, including those relying on manual datasets or generating data automatically, General Phrase Debiaser shows the state-of-the-art debiasing performance. We find the global superiority of our model is derived from three aspects through analysis:

Simultaneous Debiasing across Multiple Domains. Our method can effectively eliminate the gender bias in career, math, art, and science simultaneously, without requiring multiple debiasing process on the same model.

Knowledge Debiasing in Phrase Granularity. Our method operates at the phrase granularity rather than the word granularity, making it easier to probe and mitigate biases in disciplines. For example, in Table 2, General Phrase Debiaser achieves the best average score in the four tests (SEAT-7 to SEAT-8) concerning math, art and science.

Keep Language Capability after Debias.we test gender-debiased versions of BERT, ALBERT, and DistilBERT on the General Language Understanding Evaluation (GLUE) benchmark [24]. The test results are presented in Table 2. Gender-debiased versions of BERT, ALBERT, and DistilBERT show a little decrease in scores compared to the original models on the GLUE test, demonstrating our General MLM Debiaser alleviates the bias concerns while also maintaining language modeling capability.

4 Conclusion

Our proposed method can debias MLMs at phrase granularity while also maintaining language modeling capability, and gets state of the art in SEAT test. Although decoder-only LLMs are gaining popularity, we still consider bias mitigation in encoder-only models crucial. Moreover, the concepts presented here can also be applied to cross-modal models involving encoder-only models.

Acknowledgements

This work was supported by Grant 2020YFB1005400 from the National Key R&D Program of China.

References

  • [1] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  • [2] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut, “Albert: A lite bert for self-supervised learning of language representations,” .
  • [3] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
  • [4] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. 2019, vol. 32, Curran Associates, Inc.
  • [5] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf, “Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.
  • [6] Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, and Rongrong Ji, “You only compress once: Towards effective and elastic bert compression via exploit-explore stochastic nature gradient,” arXiv preprint arXiv:2106.02435, 2021.
  • [7] Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes, and Donald Brown, “Text classification algorithms: A survey,” Information, vol. 10, no. 4, pp. 150, 2019.
  • [8] Shaokun Zhang, Xiaobo Xia, Zhaoqing Wang, Ling-Hao Chen, Jiale Liu, Qingyun Wu, and Tongliang Liu, “Ideal: Influence-driven selective annotations empower in-context learners in large language models,” arXiv preprint arXiv:2310.10873, 2023.
  • [9] Shaokun Zhang, Yiran Wu, Zhonghua Zheng, Qingyun Wu, and Chi Wang, “Hypertime: Hyperparameter optimization for combating temporal distribution shifts,” arXiv preprint arXiv:2305.18421, 2023.
  • [10] Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang, “Autogen: Enabling next-gen llm applications via multi-agent conversation framework,” arXiv preprint arXiv:2308.08155, 2023.
  • [11] Yiran Wu, Feiran Jia, Shaokun Zhang, Qingyun Wu, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, and Chi Wang, “An empirical study on challenging math problem solving with gpt-4,” arXiv preprint arXiv:2306.01337, 2023.
  • [12] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
  • [13] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695.
  • [14] Paul Pu Liang, Irene Mengze Li, Emily Zheng, Yao Chong Lim, Ruslan Salakhutdinov, and Louis-Philippe Morency, “Towards debiasing sentence representations,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 5502–5515.
  • [15] Masahiro Kaneko and Danushka Bollegala, “Debiasing pre-trained contextualised embeddings,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, pp. 1256–1266.
  • [16] Aparna Garimella, Akhash Amarnath, Kiran Kumar, Akash Pramod Yalla, N Anandhavelu, Niyati Chhaya, and Balaji Vasan Srinivasan, “He is very intelligent, she is very beautiful? on mitigating social biases in language modelling and generation,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 4534–4545.
  • [17] James W. Cooley and John W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Mathematics of Computation, vol. 19, no. 90, pp. 297–301, 1965.
  • [18] Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi, and Slav Petrov, “Measuring and reducing gendered correlations in pre-trained models,” arXiv preprint arXiv:2010.06032, 2020.
  • [19] Pengyu Cheng, Weituo Hao, Siyang Yuan, Shijing Si, and Lawrence Carin, “Fairfil: Contrastive neural debiasing method for pretrained text encoders,” in International Conference on Learning Representations.
  • [20] Yue Guo, Yi Yang, and Ahmed Abbasi, “Auto-debias: Debiasing masked language models with automated biased prompts,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 1012–1023.
  • [21] Zhengbao Jiang, Frank F Xu, Jun Araki, and Graham Neubig, “How can we know what language models know?,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 423–438, 2020.
  • [22] Chandler May, Alex Wang, Shikha Bordia, Samuel R Bowman, and Rachel Rudinger, “On measuring social biases in sentence encoders,” in Proceedings of NAACL-HLT, 2019, pp. 622–628.
  • [23] Markus Freitag and Yaser Al-Onaizan, “Beam search strategies for neural machine translation,” ACL 2017, p. 56, 2017.
  • [24] Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” in International Conference on Learning Representations.
  • [25] Shaokun Zhang, Feiran Jia, Chi Wang, and Qingyun Wu, “Targeted hyperparameter optimization with lexicographic preferences over multiple objectives,” in The Eleventh International Conference on Learning Representations, 2022.
  • [26] Xiawu Zheng, Chenyi Yang, Shaokun Zhang, Yan Wang, Baochang Zhang, Yongjian Wu, Yunsheng Wu, Ling Shao, and Rongrong Ji, “Ddpnas: Efficient neural architecture search via dynamic distribution pruning,” International Journal of Computer Vision, vol. 131, no. 5, pp. 1234–1249, 2023.
  • [27] Xiaobo Xia, Jiale Liu, Shaokun Zhang, Qingyun Wu, and Tongliang Liu, “Coreset selection with prioritized multiple objectives,” arXiv preprint arXiv:2311.08675, 2023.
  • [28] Ilya Loshchilov and Frank Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.