In developing VERB, we identify that all of this class of mechanisms can be decoupled into a three-step process. The first is to identify a linear concept subspace among the vectorized representations that capture the direction of bias (e.g., the concept of gender or nationality). The second is to use this subspace to transform the representations in a simple and controlled way. The last is to evaluate the transformed representations.
This decoupling process provides several important advantages for VERB in addressing our design requirements. First, by identifying the linear subspaces, it orients the embedding view to show the more relevant, and only modified, components in accordance with \(\langle\)R1\(\rangle\). This way, users can verify how changes are happening with respect to concepts and not worry that other unseen dimensions are being distorted. Second, it allows the explanation of the debiasing process to be presented in simpler and easier to digest components. Third, it allows VERB to be modular and allows users to mix-and-match these components and find effective ways to pair them for the task at hand, addressing \(\langle\)R2\(\rangle\).
4.1 Step 1: Subspace Identification
In embedded representations, the specific dimensions occupied by features are unknown. In this section, we discuss four methods in the literature used to determine the subspace that is the span of a specific concept (e.g., gender). Some of these methods (PCA and PCA-paired) naturally generalize to identify multiple directions, but it is quite rare to use more than one direction to represent a concept. To keep subspace identification modular and simple, VERB currently identifies 1-dimensional subspaces, as described below. The identified direction is independent from any subsequent visualization or debiasing mechanism.
PCA. This is a general and simple approach to determine a subspace. It requires one set of word vectors, referred to as the
seed words, from which it computes the top principal component – which is the best 1-dimensional subspace that minimizes the sum of squared distances from all word vectors. This resulting unit vector represents the subspace direction. Using VERB, we illustrate the PCA method in Figure
2(A) using a set of gendered seed words:“man, woman, brother, sister, he, she”. The red arrow above the black line shows the direction of the gender subspace obtained via PCA.
Paired-PCA. Another variant based on PCA was proposed by Bolukbasi et al. [
5]. It requires a list of paired words as the seeds; each pair has one word vector from different groups. For example, for the gender concept shown in Figure
2(B), we use “man-woman, he-she, brother-sister” as seeds for subspace identification. The paired-PCA method then reports the concept subspace as the first principle component of the difference vectors between each paired vectors. Because these vectors are the result of differences, we do not need to “center” them (remove their mean) first as when PCA is used on word vectors.
2-Means method. The 2-means method [
16], for any two sets of words as seed sets, returns the normalized difference vector of their respective averages. Thus, for groups of words
\(F=\lbrace f_i\rbrace\) and
\(M=\lbrace m_i\rbrace\), it computes
\(f = \frac{1}{|F|}\sum _{i} f_i\) and
\(m = \frac{1}{|M|}\sum _{i} m_i\) as the mean of each set. Then the direction is calculated as
\(v = \frac{f - m}{\Vert f - m\Vert }.\) This method has the advantage that it does not require paired words or an equal number of words in the two seed sets. We give an example of applying the 2-means method to two sets of seed words in Figure
2(C), where
\({F=\lbrace }``woman\hbox{''}, ``sister\hbox{''}, ``she\hbox{''}\rbrace\) and
\(M=\lbrace ``man\hbox{''}, ``brother\hbox{''}, ``he\hbox{''}\rbrace\). The computed gender direction (black line segment) originates from the origin in the visualization.
Classification normal. For two groups of seed words that can be classified using a linear support vector machine (SVM), the direction perpendicular to the classification boundary represents the direction of the difference between the two sets. Again, this requires only two sets
F and
M, but they do not need to be paired or of equal size. As illustrated in Figure
2(D), the dotted line represents the classification boundary between
\(F=\lbrace ``woman\hbox{''}, ``sister\hbox{''}, ``she\hbox{''}\rbrace\) and
\(M=\lbrace ``man\hbox{''}, ``brother\hbox{''}, ``he\hbox{''}\rbrace\), and the red arrow is its normal direction. The black segment emanating from the origin again indicates the gender direction. Ravfogel et al. [
59] used this direction iteratively to remove bias in word vectors by projections.
4.2 Step 2: Bias Mitigation
There are several methods to modify the embedding structure in ways that mitigate the encoded bias. Although there are more complicated optimization-based ones designed for specific tasks in gender bias in text [
82], we describe a subset of four debiasing methods that are quite simple to actuate (although nuances of them may be intricate), which rely specifically on the concept subspaces identified earlier. Again, VERB serves as the perfect visual medium to explain these debiasing methods. For the descriptions below, a point in the space of high-dimensional embedded representations is denoted as
\(x \in {\mathbb {R}}^d\) (e.g., for
\(d = 50\) or
\(d=300\)). A concept subspace is labeled
v and is restricted to be a unit vector in
\({\mathbb {R}}^d\).
Linear Projection (LP). This approach [
16] removes the component of concept subspace for each data point
x. This procedure can be applied individually to each data point
x, where the component along
v is
\(\langle v,x \rangle v\), where
\(\langle v, x \rangle\) is the Euclidean dot product. The LP method then removes the component along
v for every point
\(x \in {\mathbb {R}}^d\) as
\(x^{\prime } = x - \langle v,x \rangle v.\) Using VERB, we give a simple example by applying two-means and LP debiasing in mitigating the gender bias in occupational words. The two seed sets are
M={“man”, “he”} and
F={“woman”, “she”}. The evaluation set is
E = {“receptionist”, “nurse”, “scientist”, “mathematician”}. As illustrated in Figure
3, VERB decomposes the LP method into an interpretable sequence of transformations. In Step 0, both seed sets and evaluation set are viewed using a perspective from PCA, where the gender direction is identified using two-means. In Step 1, the viewing perspective/angle is reoriented so that the gender direction is aligned with the x-axis, where we see clearly that “receptionist” and “nurse” are shown to be closer to the female direction whereas “banker” and “engineer” are closer to the male direction. The reorientation does not change any data; it simply changes the 2-dimensional subspace that is visualized. The VERB interface smoothly animates this by interpolating the viewing angles. In Step 2, for every word in the embedding, LP removes its component along the gender direction in
\({\mathbb {R}}^d\), where all words are shown to be aligned along the vertical axis. The underlying data is modified in this step. In Step 3, the transformed (debiased) points are reoriented again using the perspective from PCA, where there is no clear gender association among the occupational words. This last PCA view is different from the original PCA view since the data was modified in Step 2.
Hard Debiasing (HD). An earlier approach (the first one proposed) by Bolukbasi et al. [
5], known as Hard Debiasing, uses a similar mechanism and is designed specifically for gender bias. It also requires an additional wordlist called the
equalize set, which is used to preserve some of the information about that concept. We summarize this mechanism next. The words that are used to define
v are considered definitionally gendered and not modified. The exception is another provided set of pairs of words (e.g., “boy-girl”, “man-woman”, “dad-mom”, “brother-sister”). These word pairs are
equalized; that is, they are first projected as in Linear Projection, but then each pair is extended along the direction
v; thus, the words are equally far apart as they were before the operation. The remaining words are then projected as in Linear Projection.
In our example with VERB, we again use
M={“he, man”} and
F ={“she, woman”} and two-means to define a gender direction
v,
Q= {“boy-girl”, “sister-brother”} as the equalize set, and
E= {“engineer”, “lawyer”, “receptionist”, “nurse” } again as the evaluation set. As illustrated in Figure
4, Step 1 is obtained after a reorientation of the gender direction along the x-axis. Step 2 is removing the component of each point along the gender direction with the exception of
M and
F (“she, woman” and “he, man”). Step 3 tries to preserve some information regarding gender using the equalize set
E, thus, extending the words in
Q (“brother,” “sister,” “boy,” “girl”) along the gender direction so that they become equally far apart. Step 4 reorients the modified words using PCA from a viewing perspective with the most variance.
Bolukbasi et al. [
5] described other methods, and later works by Wang et al. [
75] also provided slight variants, or rediscovered these approaches. One concern about Hard Debiasing is that it may leave residual bias [
23]. The authors of that critique helped develop the next approach as an alternative.
Iterative Nullspace Projection (INLP). INLP [
59] starts with a pair of large word lists (e.g., sets of male and female words). It suggests to select the top
\(0.5\%\) of the extreme words along either direction of the he-she vector, denoted as sets
M and
F, respectively. It then builds a linear classifier that best separates
M and
F, and linearly projects all words along the classifier normal (denoted as
\(v_1\)). However, a classifier with accuracy better than random may still be built on
M and
F after the projection. Let
\(v_2\) denote the classifier normal. INLP then applies linear projection to all words again along
\(v_2\). This continues for some large number of iterations or when no better classifier can be found. Afterwards, the words that may encode bias, even by association (the sets
M and
F), cannot be linearly separated with accuracy better than random chance.
An example run of INLP using VERB is shown in Figure
5 using two sets of definitionally gendered words
M = {“man, he, him, his, guy, boy, grandpa, uncle, brother, son, nephew, mr”} and
F = {“woman, she, her, hers, gal, girl, grandma, aunt, sister, daughter, niece”}. A perfect separator/classifier can be found initially (shown in Step 1), and then linear projection along the classifier normal is shown in Step 2. The next classifier normal (shown in Step 4) is not a perfect separator. Yet, after its next application, and a PCA reorientation as shown in Step 6, no sufficiently good classifier can be found, and the procedure stops.
Orthogonal Subspace Correction and Rectification (OSCaR). A critique of the above techniques, especially INLP, is that they are destroying information that we might want to preserve. For example, we may want to know that “grandpa” is referring to a male grandparent. The OSCaR algorithm [
14] seeks a more controlled approach. It requires two specific concept subspaces, for instance, one representing gender
\(v_1\) and another representing occupations
\(v_2\). OSCaR does not “project away” the gender subspace; rather, it attempts to disassociate them by making those subspaces
orthogonal. In addition to orthogonalizing those subspaces, which can be done by rotating
\(v_2\) to
\(v_2^{\prime }\) so that
\(\langle v_1, v_2^{\prime } \rangle = 0\), it also rotates all other data points by a lesser amount. Points close to
\(v_1\) do not rotate much, whereas points close to
\(v_2\) rotate about as much as
\(v_2\). Whereas OSCaR does not remove any possible way to find any association between data aligned with either of these subspaces, it does make the concepts as a whole orthogonal. In the bias evaluation approaches described in Section
4.3, OSCaR is demonstrated to reduce bias in an amount similar to other debiasing approaches. Moreover, it retains the information along each of the original subspaces
\(v_1\) and
\(v_2\).
With VERB, Figure
6 shows the four steps of OSCaR. The first subspace
\(v_1\) representing gender is defined with words “he,” “his,” “him,” “she,” “her,” “hers,” “man,” and “woman.” The second subspace
\(v_2\) representing occupations is defined with words “engineer,” “scientist,” “lawyer,” “banker,” “nurse,” “homemaker,” “maid,” and “receptionist.” In the PCA view (Step 0), one can observe that the two subspaces are correlated, and the typical gender stereotypes of the occupation are present in the word representation, e.g., “maid” in the female direction and “engineer” in the male direction. The reoriented view in Step 1 aligns the Gender direction (
\(v_1\)) along the x-axis. It shows the span of
\(v_1\) and
\(v_2\), which is the 2-dimensional subspace where OSCaR modifies the data. It is also the subspace with the largest angle between these two subspaces. In Step 2, the data is modified so that the gender and occupation subspaces become orthogonal. The evaluation set words “grandma,” “grandpa,” and “programmer” (along with all other words) can be seen to move along with these words. Note how “programmer” is still near the other technical-oriented careers, and how “grandpa-grandma” retains the inherently male-female relationship. Finally, in Step 3, another PCA view is shown on the modified data. Now, the subspaces can be seen to retain the orthogonal nature and the gender connotation in the occupations has been rectified. Thus, there is no apparent stereotypical correlation.