Attenuating Bias in Word Vectors

Page created by Dwayne Mueller
Attenuating Bias in Word Vectors

                                                                   Sunipa Dev                                        Jeff Phillips
                                                                 University of Utah                                University of Utah

                                                                Abstract                               and Hendricks et al. (2018) [6] show that machine
arXiv:1901.07656v1 [cs.CL] 23 Jan 2019

                                                                                                       learning algorithms and their output show more bias
                                              Word vector representations are well devel-              than the data they are generated from.
                                              oped tools for various NLP and Machine                   Word vector embeddings as used in machine learning
                                              Learning tasks and are known to retain sig-              towards applications which significantly affect people’s
                                              nificant semantic and syntactic structure of             lives, such as to assess credit [11], predict crime [5],
                                              languages. But they are prone to carrying                and other emerging domains such judging loan appli-
                                              and amplifying bias which can perpetrate dis-            cations and resumes for jobs or college applications.
                                              crimination in various applications. In this             So it is paramount that efforts are made to identify
                                              work, we explore new simple ways to detect               and if possible to remove bias inherent in them. Or
                                              the most stereotypically gendered words in an            at least, we should attempt minimize the propagation
                                              embedding and remove the bias from them.                 of bias within them. For instance, in using existing
                                              We verify how names are masked carriers of               word embeddings, Bolukbasi et al. (2016) [3] demon-
                                              gender bias and then use that as a tool to               strated that women and men are associated with dif-
                                              attenuate bias in embeddings. Further, we                ferent professions, with men associated with leader-
                                              extend this property of names to show how                ships roles and professions like doctor, programmer
                                              names can be used to detect other types of               and women closer to professions like receptionist or
                                              bias in the embeddings such as bias based on             nurse. Caliskan et al. (2017) [7] similarly noted how
                                              race, ethnicity, and age.                                word embeddings show that women are more closely
                                                                                                       associated with arts than math while it is the opposite
                                                                                                       for men. They also showed how positive and negative
                                         1    BIAS IN WORD VECTORS                                     connotations are associated with European-American
                                                                                                       versus African-American names.
                                         Word embeddings are an increasingly popular applica-
                                                                                                       Our work simplifies, quantifies, and fine-tunes these
                                         tion of neural networks wherein enormous text corpora
                                                                                                       approaches: we show that very simple linear projection
                                         are taken as input and words therein are mapped to
                                                                                                       of all words based on vectors captured by common
                                         a vector in some high dimensional space. Two com-
                                                                                                       names is an effective and general way to significantly
                                         monly used approaches to implement this are Word-
                                                                                                       reduce bias in word embeddings. More specifically:
                                         ToVec [15,16] and GloVe [17]. These word vector repre-
                                         sentations estimate similarity between words based on
                                         the context of their nearby text, or to predict the likeli-   1a. We demonstrate that simple linear projection of
                                         hood of seeing words in the context of another. Richer            all word vectors along a bias direction is more
                                         properties were discovered such as synonym similarity,            effective than the Hard Debiasing of Bolukbasi et
                                         linear word relationships, and analogies such as man :            al. (2016) [3] which is more complex and also
                                         woman :: king : queen. Their use is now standard in               partially relies on crowd sourcing.
                                         training complex language models.
                                                                                                       1b. We show that these results can be slightly im-
                                         However, it has been observed that word embeddings                proved by dampening the projection of words
                                         are prone to express the bias inherent in the data it is          which are far from the projection distance.
                                         extracted from [3,4,7]. Further, Zhao et al. (2017) [18]
                                                                                                        2. We examine the bias inherent in the standard
                                         Thanks to NSF CCF-1350888, ACI-1443046, CNS-1514520,              word pairs used for debiasing based on gender
                                         CNS-1564287, IIS-1816149, and NVidia Corporation.                 by randomly flipping or swapping these words in
                                         Part of the work by JP was done while visiting the Simons         the raw text before creating the embeddings. We
                                         Institute for Theory of Computing.
                                                                                                           show that this alone does not eliminate bias in
Attenuating Bias in Word Vectors

     word embeddings, corroborating that simple lan-             {man,woman}, {son,daughter}, {he,she}, {his,her},
                                                                  {male,female}, {boy,girl}, {himself,herself},
     guage modification is not as effective as repairing
                                                                     {guy,gal}, {father,mother}, {john,mary}
     the word embeddings themselves.
                                                                          Table 1: Gendered Word Pairs
3a. We show that common names with gender associ-
    ation (e.g., john, amy) often provides a more effec-
    tive gender subspace to debias along than using           vB be the top singular vector of Q. We revisit how to
    gendered words (e.g., he, she).                           create such a bias direction in Section 4.
                                                              Now given a word vector w ∈ W , we can project it to
3b. We demonstrate that names carry other inher-
                                                              its component along this bias direction vB as
    ent, and sometimes unfavorable, biases associated
    with race, nationality, and age, which also corre-                          πB (w) = hw, vB ivB .
    sponds with bias subspaces in word embeddings.
    And that it is effective to use common names
    to establish these bias directions and remove this        3.1   Existing Method : Hard Debiasing
    bias from word embeddings.                                The most notable advance towards debiasing embed-
                                                              dings along the gender direction has been by Boluk-
2    DATA AND NOTATIONS                                       basi et al. (2016) [3] in their algorithm called Hard
                                                              Debiasing (HD). It takes a set of words desired to
We set as default the text corpus of a Wikipedia              be neutralized, {w1 , w2 , . . . , wn } = WN ⊂ W , a unit
dump         (              bias subspace vector vB , and a set of equality sets
enwiki-latest-pages-articles.xml.bz2)           with          E1 , E2 , . . . , Em .
4.57 billion tokens and we extract a GloVe embedding          First, words {w1 , w2 , . . . , wn } ∈ WN are projected or-
from it in D = 300 dimensions per word. We restrict           thogonal to the bias direction and normalized
the word vocabulary to the most frequent 100,000
words. We also modify the text corpus and extract                                          wi − wB
                                                                                 wi0 =                .
embeddings from it as described later.                                                   ||wi − wB ||
So, for each word in the Vocabulary W , we represent
                                                              Second, it corrects the locations
                                                                                             P of the vectors in the
the word by the vector wi ∈ RD in the embedding.                                          1
                                  ¯                           equality sets. Let µj = |E|           e be the mean of
The bias (e.g., gender) subspace is denoted by a set of

vector B. It is typically considered in this work to be a     an equality set, and µ = m j=1 µj be the mean of
single unit vector, vB (explained in detail later). As we     of equality set means. Let νj = µ − µj be the offset
will revisit, a single vector is typically sufficient, and    of a particular equality set from the mean. Now each
will simplify descriptions. However, these approaches         e ∈ Ej in each equality set Ej is first centered using
can be generalized to a set of vectors defining a multi-      their average and then neutralized as
dimensional subspace.
                                                                                               πB (e) − vB
                                                                       e0 = νj + 1 − kνj k2                  .
                                                                                              kπB (e) − vB k
                                                              Intuitively νj quantifies the amount words in each
Given a word embedding, debiasing typically takes as          equality set Ej differ from each other in directions
input a set Ȩ = {E1 , E2 , . . . , Em } of equality sets.    apart from the gender direction. This is used to center
An equality set Ej for instance can be a single pair          the words in each of these sets.
(e.g., {man, woman}), but could be more words (e.g.,          This renders word pairs such as man and woman as
{latina, latino, latinx}) that if the bias connota-           equidistant from the neutral words wi0 with each word
tion (e.g, gender) is removed, then it would objectively      of the pair being centralized and moved to a position
make sense for all of them to be equal. Our data sets         opposite the other in the space. This can filter out
will only use word pairs (as a default the ones in Ta-        properties either word gained by being used in some
ble 1), and we will describe them as such hereafter for       other context, like mankind or humans for the word
simpler descriptions. In particular, we will represent        man.
each Ej as a set of two vectors e+        −
                                     i , ei ∈ R .
                                              ¯               The word set WN = {w1 , w2 , . . . , wn } ⊂ W which
Given such a set Ȩ of equality sets, the bias vector vB      is debiased is obtained in two steps. First it seeds
can be formed as follows [3]. For each Ej = {e+         −
                                                   j , ej }   some words as definitionally gendered via crowd sourc-
                        +   −
create a vector ~ei = ei − ei between the pairs. Stack        ing and using dictionary definitions; the complement
these to form a matrix Q = [~e1 ~e2 . . . ~em ], and let      – ones not selected in this step – are set as neutral.
Sunipa Dev, Jeff Phillips

Next, using this seeding an SVM is trained and used
to predict among all W the set of other biased WB or                                  µ1
neutral words WN . This set WN is taken as desired                                              hw, vB ivB
to be neutral and is debiased. Thus not all words W                    woman
in the vocabulary are debiased in this procedure, only
a select set chosen via crowd-sourcing and definitions,                                                           ⌘
and its extrapolation. Also the word vectors in the                                                          µ2
equality sets are also handled separately. This makes                                                                  w
this approach not a fully automatic way to debias the
vector embedding.                                                                               she

3.2   Alternate and Simple Methods
                                                             Figure 1: Illustration of η and β for word vector w.
We next present some simple alternatives to HD which
are simple and fully automatic. These all assume a
bias direction vB .                                         3.3   Partial Projection
Subtraction.       As a simple baseline, for all word
                                                            A potential issue with the simple approaches is that
vectors w subtract the gender direction vB from w:
                                                            they can significantly change some embedded words
                     w 0 = w − vB .                         which are definitionally biased (e.g., the neutral words
                                                            WB described by Bolukbasi et al. [3]). [[We note that
                                                            this may not *actually* be a problem (see Section 5);
Linear Projection. A better baseline is to project          the change may only be associated with the bias, so re-
all words w ∈ W orthogonally to the bias vector vB .        moving it would then not change the meaning of those
                                                            words in any way except the ones we want to avoid.]]
          w0 = w − πB (w) = w − hw, vB ivB .                However, these intuitively should be words which have
                                                            correlation with the bias vector, but also are far in the
This enforces that the updated set W 0 = {w0 | w ∈ W }
                                                            orthogonal direction. In this section we explore how
has no component along vB , and hence the resulting
                                                            to automatically attenuate the effect of the projection
span is only D − 1 dimensions. Reducing the total
                                                            on these words.
dimension from say 300 to 299 should have minimal
effects of expressiveness or generalizability of the word   This stems from the observation that given a bias di-
vector embeddings.                                          rection, the words which are most extreme in this di-
                                                            rection (have the largest dot product) sometimes have
Bolukbasi et al. [3] apply this same step to a dictio-
                                                            a reasonable biased context, but some do not. These
nary definition based extrapolation and crowd-source-
                                                            “false positives” may be large normed vectors which
chosen set of word pairs WN ⊂ W . We quantify in
                                                            also happen to have a component in the bias direc-
Section 5 that this single universal projection step de-
biases better than HD.
                                                            We start with a bias direction vB and mean µ derived
For example, consider the bias as gender, and the
                                                            from equality pairs (defined the same way as in context
equality set with words man and woman. Linear pro-
                                                            of HD). Now given a word vector w we decompose it
jection will subtract from their word embeddings the
                                                            into key values along two components, illustrated in
proportion that were along the gender direction vB
                                                            Figure 1. First, we write its bias component as
learned from a larger set of equality pairs. It will make
them close-by but not exactly equal. The word man is                       β(w) = hw, vB i − hµ, vB i.
used in many extra senses than the word woman; it is
used to refer to humankind, to a person in general,         This is the difference of w from µ when both are pro-
and in expressions like “oh man”. In contrast a sim-        jected onto the bias direction vB .
pler word pair with fewer word senses, like (he - she)
and (him - her), we can expect them to be almost at         Second, we write a (residual) orthogonal component
identical positions in the vector space after debiasing,
implying their synonymity.                                                     r(w) = w − hw, vB ivB .

Thus, this approach uniformly reduces the component         Let η(w) = kr(w)k be its value. It is the orthogonal
of the word along the bias direction without compro-        distance from the bias vector vB ; recall we chose vB
mising on the differences that words (and word pairs)       to pass through the origin, so the choice of µ does not
have.                                                       affect this distance.
Attenuating Bias in Word Vectors

Now we will maintain the orthogonal component

                                                            Proportion along the 1 dimensional gender subspace
                                                                                                                 1.00                                                     f
(r(w), which is in a subspace spanned by D − 1 out of                                                                                                                     f1
                                                                                                                 0.75                                                     f2
D dimensions) but adjust the bias component β(w) to                                                                                                                       f3
                                                                                                                 0.50                                                     P1
make it closer to µ. But the adjustment will depend                                                                                                                       P2
on the magnitude η(w). As a default we set                                                                       0.25
                    w0 = µ + r(w)
so all word vectors retain their orthogonal component,                                                           0.50
but have a fixed and constant bias term. This is func-                                                           0.75
tionally equivalent to the Linear Projection approach;
the only difference is that instead of having a 0 magni-                                                                2           1              0             1        2
tude along vB (and the orthogonal part unchanged), it                                                                         Proportion along the other 299 dimensions
instead has a magnitude of constant µ along vB (and
the orthogonal part still unchanged). This adds a           Figure 2: The gendered region as per the three varia-
constant to every inner product, and a constant off-        tions of projection. Both points P1 and P2 have a dot
set to any linear projection or classifier. If we are       product of 1.0 initially with the gender subspace. But
required to work with normalize vectors (we do not          their orthogonal distance to it differs, as expressed by
recommend this as the vector length captures veracity       their dot product with the other 299 dimensions.
information about its embedding), we can simple set
w0 = r(w)/kr(w)k.
                                                            to have more damping (large fi values – they are not
Given this set-up, we now propose three modifications.      moved much).
In each set
                                                            Given sets S and T , we can define a gain function
          w0 = µ + r(w) + β · fi (η(w)) · vB                           X                     X
                                                            γi,ρ (σ) =   β(s)(1−fi (η(s)))−ρ     β(t)(1−fi (η(t))),
were fi for i = {1, 2, 3} is a function of only the or-                                                                 s∈S                                    t∈T
thogonal value η(w). For the default case f (η) = 0
                                                            with a regularization term ρ. The gain γ is large when
                f1 (η) = σ 2 /(η + 1)2                      most bias words in S have very little damping (small
                f2 (η) = exp(−η 2 /σ 2 )                    fi , large 1−fi ), and the opposite is true for the neutral
                f3 (η) = max(0, σ/2η)                       words in T . We want the neutral words to have large
                                                            fi and hence small 1 − fi , so they do not change much.
Here σ is a hyperparameter that controls the impor-         To define the gain function, we need sets S and T ; we
tance of η; in Section 3.4 we show that we can just set     do so with the bias of interest as gender. The biased set
σ = 1. x                                                    S is chosen among a set of 1000 popular names in W
In Figure 2 we see the regions of the (η, β)-space          which (based on and and SSN
that the functions f , f1 and f2 consider gendered. f       databases [1,2]) are strongly associated with a gender.
projects all points onto the y = µ line. But variants       The neutral set T is chosen as the most frequent 1000
f1 , f2 , and f3 are represented by curves that dampen      words from W , after filtering out obviously gendered
the bias reduction to different degrees as η increases.     words like names man and he. We also omit occupation
Points P1 and P2 have the same dot products with            words like doctor and others which may carry unin-
the bias direction but different dot products along the     tentional gender bias (these are words we would like
other D − 1 dimensions. We can observe the effects          to automatically de-bias). The neutral set may not
of each dampening function as η increases from P1 to        be perfectly non-gendered, but it provides reasonable
P2.                                                         approximation of all non-gendered words.
                                                            We find for an array of choices for ρ (we tried ρ = 1,
3.4   SETTING σ = 1.                                        ρ = 10, and ρ = 100), the value σ = 1 approxi-
                                                            mately maximizes the gain function γi,ρ (σ) for each
To complete the damping functions f1 , f2 , and f3 , we
                                                            i ∈ {1, 2, 3}. So for hereafter we fix σ = 1.
need a value σ. If σ is larger, then more word vectors
have bias completely removed; if σ is smaller, than         Although these sets S and T play a role somewhat
more words are unaffected by the projection. The goal       similar to the crowd-sourced sets WB and WN from
is that words S which are likely to carry a bias con-       HD that we hoped to avoid, the role here is much
notation to have little damping (small fi values) and       reduced. This is just to verify that a choice of σ = 1
words T which are unlikely to carry a bias connotation      is reasonable, and otherwise they are not used.
Sunipa Dev, Jeff Phillips

                                                    0.5                       0.200                                  0.175
0.6                                                                           0.175                                                              0.6
                          0.4                                                                                        0.150
0.5                                                                           0.150                                                              0.5
                          0.3                       0.3                       0.125
0.4                                                                                                                  0.100                       0.4
0.3                       0.2                                                                                        0.075                       0.3
                                                    0.2                       0.075
0.2                                                                                                                  0.050                       0.2
                          0.1                       0.1
0.1                                                                           0.025                                  0.025                       0.1

0.0                       0.0                       0.0                       0.000                                  0.000                       0.0
      0   2   4   6   8         0   2   4   6   8         0   2   4   6   8           0   2   4   6   8                      0   2   4   6   8         0   2   4   6   8

     0.00           0.50          0.75          1.00
Figure 3: Fractional singular values for avg male -                                                       Figure 4: Proportion of singular values along principal
female words (as per Table 1) after flipping with prob-                                                   directions (left) using names as indicators, and (right)
ability (from left to right) 0.0 (the original data set),                                                 using word pairs from Table 1 as indicators
0.5, 0.75, and 1.0.

                                                                                                          4     THE BIAS SUBSPACE

                                                                                                          We explore ways of detecting and defining the bias
3.5           Flipping the Raw Text
                                                                                                          subspace vB and recovering the most gendered words
                                                                                                          in the embedding. Recall as default, we use vB as the
Since the embeddings preserve inner products of the                                                       top singular vector of the matrix defined by stacking
data from which it is drawn, we explore if we can make                                                    vectors ~ei = e+   −
                                                                                                                         i −ei of biased word pairs. We primarily
the data itself gender unbiased and then observe how                                                      focus on gendered bias, using words in Table 1, and
that change shows up in the embedding. Unbiasing                                                          show later how to effectively extend to other biases.
a textual corpus completely can be very intricate and                                                     We discuss this in detail in Supplementary Material
complicated since there are a many (sometimes im-                                                         C.
plicit) gender indicators in text. Nonetheless, we pro-
pose a simple way of neutralizing bias in textual data                                                    Most gendered words. The dot product, hvB , wi
by using word pairs E1 , E2 , . . . Em ; in particular, when                                              of the word vectors w with the gender subspace vB
we observe in raw text on part of a word part, we ran-                                                    is a good indicator of how gendered a word is. The
domly flip it to the other pair. For instance for gen-                                                    magnitude of the dot product tells us of the length
dered word pairs (e.g., (he - she)) in a string “he was                                                   along the gender subspace and the sign tells us whether
a doctor” we may flip to “she was a doctor.”                                                              it is more female or male. Some of the words denoted
We implement this procedure over the entire input raw                                                     as most gendered are listed in Table 3.
text, and try various probabilities of flipping each ob-
served word, focusing on probabilities 0.5, 0.75 and
                                                                                                          4.1   Bias Direction using Names
1.00. The first 0.5-flip probability makes each element
of a word pair equally likely. The last 1.00-flip proba-
                                                                                                          When listing gendered words by |hvB , wi|, we observe
bility reverses the roles of those word pairs, and 0.75-
                                                                                                          that many gendered words are names. This indicates
flip probability does something in between. We per-
                                                                                                          the potential to use names as an alternative (and po-
form this set of experiments on the default Wikipedia
                                                                                                          tentially in a more general way) to bootstrap finding
data set and switch between word pairs (say man →
                                                                                                          the gender direction.
woman, she → he, etc), from a list larger that Table 3
consisting of 75 word pairs; see Supplementary Mate-                                                      From the top 100K words, we extract the 10
rial D.1.                                                                                                 most common male {m1 , m2 , . . . , m10 } and female
                                                                                                          {s1 , s2 , . . . , s10 } names which are not used in ambigu-
We observe how the proportion along the principal
                                                                                                          ous ways (e.g., not the name hope which could also
component changes with this flipping in Figure 3. We
                                                                                                          refer to the sentiment). We pair these 10 names from
see that flipping with 0.5 somewhat dampens the dif-
                                                                                                          each category (male, female) randomly and compute
ference between the different principal components.
                                                                                                          the SVD as before. We observe in 4 that the frac-
On the other hand flipping with probability 1.0 (and
                                                                                                          tional singular values show a similar pattern as with
to a lesser extent 0.75) exacerbates the gender compo-
                                                                                                          the list of correctly gendered word pairs like (man -
nents rather than dampening it. Now there are two
                                                                                                          woman), (he - she), etc. But this way of pairing names
components significantly larger than the others. This
                                                                                                          is quite imprecise. These names are not ‘opposites’ of
indicates this flipping is only addressing part of the ex-
                                                                                                          each other in the sense that word pairs are. So, we
plicit bias, but missing some implicit bias, and these
                                                                                                          modify how we compute vB now so that we can better
effects are now muddled.
                                                                                                          use names to detect the bias in the embedding. The
We list some gender biased analogies in the default em-                                                   following method gives us this advantage where we do
bedding and how they change with each of the methods                                                      not necessarily need word pairs or equality sets as in
described in this section in Table 2.                                                                     Bolukbasi et al. [3].
Attenuating Bias in Word Vectors

Table 2: What analogies look like before and after damping gender by different methods discussed : hard
Debiaisng, flipping words in text corpus, subtraction and projection

        analogy head                    original        HD                        flipping                subtraction     projection
                                                                      0.5             0.75       1.0
   man : woman :: doctor :             nurse           surgeon        dr               dr     medicine    physician      physician
 man : woman :: footballer :        politician         striker    midfielder     goalkeeper   striker    politician     midfielder
     he : she :: strong :              weak            stronger      weak         strongly      many        well          stronger
    he : she :: captain :               mrs          lieutenant   lieutenant       colonel    colonel    lieutenant     lieutenant
   john : mary :: doctor :             nurse          physician    medicine        surgeon     nurse       father        physician

Table 3: Some of the most gendered words in default               Table 4: Gendered occupations as observed in word
embedding; and most gendered adjectives and occupa-               embeddings using names as the gender direction indi-
tion words.                                                       cator
                    Gendered Words                                  Female Occ    Male Occ    Female* Occ     Male* Occ
    miss        herself    forefather      himself                     nurse       captain     policeman      policeman
    maid        heroine      nephew      congressman                    maid          cop      detective          cop
 motherhood     jessica       zahir       suceeded                    actress         boss     character      character
  adriana      seductive       him           sir                    housewife      officer         cop        assassin
         Female Adjectives Male Adjectives                            dancer         actor      assassin      bodyguard
             glamorous           strong                                 nun       scientist       actor         waiter
                diva            muscular                             waitress      gangster      waiter         actor
             shimmery           powerful                            scientist      trucker       butler       detective
             beautiful             fast
       Female Occupations Male Occupations
              nurse              soldier                          5    QUANTIFYING BIAS
               maid               captain
            housewife             officer
           prostitute           footballer
                                                                  In this section we develop new measures to quantify
                                                                  how much bias has been removed from an embedding,
                                                                  and evaluate the various techniques we have developed
                                                                  for doing so.
Our gender direction is calculated as,
                                                                  As one measure, we use the Word Embedding Associ-
                                                                  ation Test (WEAT) test developed by Caliskan et al.
                     vB,names =           ,                       (2017) [7] as analogous to the IAT tests to evaluate the
                                  ks − mk                         association of male and female gendered words with
                                                                  two categories of target words: career oriented words
            1                       1
                 P                      P
where s =   10   i si   and m =    10     i   mi .                versus family oriented words. We detail WEAT and
                                                                  list the exact words used (as in [7]) in Supplementary
Using the default Wikipedia dataset, we found that
                                                                  Material B; smaller values are better.
this is a good approximator of the gender subspace
defined by the first right singular vector calculated us-         Bolukbasi et al. [3] evaluated embedding bias use a
ing gendered words from Table 1; there dot product                crowsourced judgement of whether an analogy pro-
is 0.809. We find similar large dot product scores for            duced by an embedding is biased or not. Our goal
other datasets too.                                               was to avoid crowd sourcing, so we propose two more
                                                                  automatic tests to qualitatively and uniformly evalu-
Here too we collect all the most gendered words as
                                                                  ate an embedding for the presence of gender bias.
per the gender direction vB,names determined by these
names. Most gendered words returned are similar as
                                                                  Embedding Coherence Test (ECT). A way to
using the default vB , like occupational words, adjec-
                                                                  evaluate how the neutralization technique affects the
tives, and synonyms for each gender. We find names
                                                                  embedding is to evaluate how the nearest neighbors
to express similar classification of words along male
                                                                  change for (a) gendered pairs of words Ȩ and (b)
- female vectors with homemaker more female and
                                                                  indirect-bias-affected words such as those associated
policeman being more male. We illustrate this in more
                                                                  with sports or occupational words (e.g., football,
detail in Table 4.
                                                                  captain, doctor). We use the gendered word pairs
Using that direction, we debias by linear projection.             in Table 1 for Ȩ and the professions list P =
There is a similar shift in analogy results. We see a             {p1 , p2 , . . . , pk } as proposed and used by Bolukbasi
few examples in Table 5.                                          et al. (see
Sunipa Dev, Jeff Phillips

         Table 5: What analogies look like before and after removing the gender direction using names
                               analogy head              original   subtraction     projection
                          man : woman :: doctor :         nurse     physician       physician
                        man : woman :: footballer :    politician   politician     midfielder
                            he : she :: strong :           weak        very          stronger
                           he : she :: captain :            mrs     lieutenant     lieutenant
                          john : mary :: doctor :         nurse         dr              dr

also Supplementary Material D.2) to represent (b).            The scores for EQT are typically much smaller than
                                                              for ECT. We explain two reasons for this.
S1: For all word pair {e+      −
                          j , ej } P= Ej ∈ Ȩ we com-         First, EQT does not check for if the analogy makes
                                1            +
    pute two means m = |Ȩ        |  Ej ∈Ȩ ej and s =        relative sense, biased or otherwise. So, “man : woman
         P        −                                           :: doctor : nurse” is as wrong as “man : woman ::
    |Ȩ|  Ej ∈Ȩ ej . We find the cosine similarity of
                                                              doctor : chair.” This pushes the score down.
    both m and s to all words pi ∈ P . This creates
    two vectors um , us ∈ Rk .                                Second, synonyms in each set si as returned by Word-
                                                              Net [8] on the Natural Language Toolkit, NLTK [14]
S2: We transform these similarity vectors to replace          do not always contain all possible variants of the
    each coordinate by its rank order, and compute            word. For example, the words psychiatrist and
    the Spearman Coefficient (in [−1, 1], larger is bet-      psychologist can be seen as analogous for our pur-
    ter) between the rank order of the similarities to        poses here but linguistically are removed enough that
    words in P .                                              WordNet does not put them as synonyms together.
                                                              Hence, even after debiasing, if the analogy returns
Thus, here, we care about the order in which the words        “man : woman :: psychiatrist : psychologist“ S1
in P occur as neighbors to each word pair rather than         returns 0. Further, since the data also has several
the exact distance. The exact distance between each           misspelt words, archeologist is not recognized as a
word pair would depend on the usage of each word and          synonym or alternative for the word archaeologist.
thus on all the different dimensions other than the gen-      For this too S1 returns a 0.
der subspace too. But the order being relatively the
same, as determined using Spearman Coefficient would          The first caveat can be side-stepped by restricting the
indicate the dampening of bias in the gender direction        pool of words we search over for the analogous word
(i.e., if doctor by profession is the 2nd closest of all      to be from list P . But it is debatable if an embedding
professions to both man and woman, then the embed-            should be penalized equally for returning both nurse
ding has a dampened bias for the word doctor in the           or chair for the analogy “man : woman :: doctor : ?”
gender direction). Neutralization should ideally bring        This measures the quality of analogies, with better
the Spearman coefficient towards 1.                           quality having a score closer to 1.

Embedding Quality Test (EQT). The demon-
                                                              Evaluating embeddings. We mainly run 4 meth-
stration by Bolukbasi et al. [3] about the skewed gen-
                                                              ods to evaluate our methods WEAT, EQT, and two
der roles in embeddings using analogies is what we try
                                                              variants of ECT: ECT (word pairs) uses Ȩ defined by
to quantify in this test. We attempt to quantify the
                                                              words in Table 1 and ECT (names) which uses vectors
improvement in analogies with respect to bias in the
                                                              m and s derived by gendered names.
                                                              We observe in Table 6 that the ECT score increases
We use the same sets Ȩ and P as in the ECT test.
                                                              for all methods in comparison to the non-debiased (the
However, for each profession pi ∈ P we create a list
                                                              original) word embedding; the exception is flipping
Si of their plurals and synonyms from WordNet on
                                                              with 1.0 probability score for ECT (word pairs) and
NLTK [14].
                                                              all flipping variants for ECT (names). Flipping does
                                                              nothing to affect the names, so it is not surprising that
S1: For each word pair {e+       −
                            j , ej } = Ej ∈ Ȩ, and each      it does not improve this score; further indicating that
    occupation word pi ∈ P , we test if the analogy           it is challenging to directly fix bias in raw text before
    e+    −
     j : ej :: pi returns a word from Si . If yes, we set     creating embeddings. Moreover, HD has the lowest
    Q(Ej , pi ) = 1, and Q(Ej , pi ) = 0 otherwise.           score (of 0.917) whereas projection obtains scores of
S2: Return                                                    0.996 (with vB ) and 0.943 (with vB,names ).
     1 1
           P the average
                     P        value across all combinations
    |Ȩ| k  E j ∈ Ȩ   p i ∈P Q(Ej , pi ).                    EQT is a more challenging test, and the original em-
Attenuating Bias in Word Vectors

Table 6: Performance on ECT, EQT and WEAT by the different debiasing methods; and performance on
standard similarity and analogy tests.
         analogy head     original   HD              flipping               subtraction            projection
                                              0.5       0.75     1.0    word pairs names       word pairs names
      ECT (word pairs)      0.798    0.917   0.983     0.984    0.683     0.963       0.936      0.996      0.943
        ECT (names)         0.832    0.968   0.714     0.662    0.587     0.923       0.966      0.935      0.999
           EQT              0.128    0.145   0.131     0.098    0.085     0.268       0.236      0.283      0.291
          WEAT              1.623    1.221   1.164      1.09    1.03      1.427       1.440      1.233      1.219
           WSim             0.637    0.537   0.567     0.537    0.536     0.627       0.636      0.627      0.629
          Simlex            0.324    0.314   0.317     0.314    0.264     0.302       0.312      0.321      0.321
       Google Analogy       0.623    0.561   0.565     0.561    0.321     0.538       0.565      0.565      0.584

bedding only achieves a score of 0.128, and HD only
                                                                Table 7: Performance of damped linear projection
obtains 0.145 (that is 12 − 15% of occupation words
                                                                using word pairs.
have their related word as nearest neighbor). On the
                                                                        Tests            f       f1      f2      f3
other hand, projection increases this percentage to                     ECT            0.996   0.994   0.995   0.997
28.3% (using vB ) and 29.1% (using vB,names ). Even                     EQT            0.283   0.280   0.292   0.287
subtraction does nearly as well at between 23 − 27%.                   WEAT            1.233   1.253   1.245   1.241
Generally, the subtraction always performs slightly                     WSim           0.627   0.628   0.627   0.627
worse than projection.                                                 Simlex          0.321   0.324   0.324   0.324
                                                                    Google Analogy     0.565   0.571   0.569   0.569
For the WEAT test, the original data has a score of
1.623, and this is decreased the most by all forms of
flipping, down to about 1.1. HD and projection do               Analogy test. This test set is devoid of bias and is
about the same with HD obtaining a score of 1.221 and           made up of syntactic and semantic analogies. So, a
projection obtaining 1.219 (with vB,names ) and 1.234           score closer to that of the original, biased embedding,
(with vB ); values closer to 0 are better (See Supple-          tells us that more structure has been retained by f1 , f2
mentary Material B).                                            and f3 . Overall, any of these approaches could be used
                                                                if a user wants to debias while retaining as much struc-
In the bottom of Table 6 we also run these approaches           ture as possible, but otherwise linear projection (or f )
on standard similarity and analogy tests for evaluat-           is roughly as good as these dampened approaches.
ing the quality of embeddings. We use cosine similar-
ity [13] on WordSimilarity-353 (WSim, 353 word pairs)
[9] and SimLex-999 (Simlex, 999 word pairs) [10], each          6   DETECTING OTHER BIAS
of which evaluates a Spearman coefficient (larger is                USING NAMES
better). We also use the Google Analogy Dataset us-
ing the function 3COSADD [12] which takes in three              We saw so far how projection combined with finding
words which for a part of the analogy and returns the           the gender direction using names works well and works
4th word which fits the analogy the best.                       as well as projection combined with finding the gender
                                                                direction using word pairs.
We observe (as expected) that all that debiasing ap-
proaches reduce these scores. The largest decrease in           We explore here a way of extending this approach to
scores (between 1% and 10%) is almost always from               detect other kinds of bias where we cannot necessar-
HD. Flipping at 0.5 rate is comparable to HD. And               ily find good word pairs to indicate a direction, like
simple linear projection decreases the least (usually           Table 1 for gender, but where names are known to
only about 1%, except on analogies where it is 7%               belong to certain protected demographic groups. For
(with vB ) or 5% (with vB,names ).                              example, there is a divide between names that differ-
                                                                ent racial groups tend to use more. Caliskan et al. [7]
In Table 7 we also evaluate the damping mechanisms
                                                                use a list of names that are more African-American
defined by f1 , f2 , and f3 , using vB . These are very
                                                                (AA) versus names that are more European-American
comparable to simple linear projection (represented by
                                                                (EA) for their analysis of bias. There are similar lists
f ). The scores for ECT, EQT, and WEAT are all
                                                                of names that are distinctly and commonly used by
about the same as simple linear projection, usually
                                                                different ethnic, racial (e.g., Asian, African-American)
slightly worse.
                                                                and even religious (for e.g., Islamic) groups.
While ECT, EQT and WEAT scores are in a similar
                                                                We first try this with two common demographic group
range for all of f , f1 , f2 , and f3 ; the dampened ap-
                                                                divides : Hispanic / European-American and African-
proaches f1 , f2 , and f3 performs better on the Google
                                                                American / European-American.
Sunipa Dev, Jeff Phillips

Hispanic and European-American names. Even
 though we begin with most commonly used His-
 panic (H) names (Supplementary Material D), this
 is tricky as not all names occur as much as Euro-
 pean American names and are thus not as well em-
 bedded. We use the frequencies from the dataset to
 guide us in selecting commonly used names that are
 also most frequent in the Wikipedia dataset. Using
 the same method as Section 4.1, we determine the
 direction, vB,names , which encodes this racial differ-
 ence and find the words most commonly aligned with           Figure 5: Gender and racial bias in the embedding
 it. Other Hispanic and European-American names
 are the closest words. But other words like, latino
 or hispanic also appear to be close, which affirms           Table 8: WEAT positive-negative test scores before
 that we are capturing the right subspace.                    and after debiasing
                                                                                Before Debiasing           After Debiasing
African-American and European-American                             EA-AA             1.803                      0.425
 names.       We see a similar trend when we use                    EA-H             1.461                      0.480
 African-American names and European-American                    Youth-Aged          0.915                      0.704
 names (Figure 5). We use the African-American
 names used by Caliskan et al. (2017) [7]. We                 bias, we see that biased words like other names belong-
 determine the bias direction by using method in              ing to these specific demographic groups, slang words,
 Section 4.1.                                                 colloquial terms like latinos are removed from the
We plot in Figure 5 a few occupation words along the          closest 10% words. This is beneficial since the distin-
axes defined by H-EA and AA-EA bias directions, and           guishability of demographic characteristics based on
compare them with those along the male-female axis.           names is what shows up in these different ways like in
The embedding is different among the groups, and              occupational or financial bias.
likely still generally more subordinate-biased towards
Hispanic and African-American names as it was for
                                                              Age-associated names. We observed that names
female. Although footballer is more Hispanic than
                                                              can be masked carriers of age too. Using the database
European-American, while maid is more neutral in the
                                                              for names through time [1] and extracting the most
racial bias setting than the gender setting. We see this
                                                              common names from early 1900s as compared to late
pattern repeated across embeddings and datasets (see
                                                              1900s and early 2000s, we find a correlation between
Supplementary Material A).
                                                              these names (see Supplementary Material) and age re-
When we switch the type of bias, we also end up find-         lated words. In Figure 6, we see a clear correlation be-
ing different patterns in the embeddings. In the case         tween age and names. Bias in this case does not show
of both of these racial directions, there is a the split in   up in professions as clearly as in gender but in terms of
not just occupation words but other words that are de-        association with positive and negative words [7]. We
tected as highly associated with the bias subspace. It        again evaluate using a WEAT test in Table 8, the bias
shows up foremost among the closest words of the sub-         before and after debiasing the embedding.
space of the bias. Here, we find words like drugs and
illegal close to the H-EA direction while, close to
the AA-EA direction, we retrieve several slang words                                     youth
used to refer to African-Americans. These word as-                                           maturity
sociations with each racial group can be detected by                                    youthadolescence
the WEAT tests (lower means less bias) using positive                                        young
and negative words as demonstrated by Caliskan et al.
(2017) [7]. We evaluate using the WEAT test before                                           senile
and after linear projection debiasing in Table 8. For
each of these tests, we use half of the names in each cat-                                   old
egory for finding the bias direction and the other half
for WEAT testing. This selection is done arbitrarily                                     aged
and the scores are averaged over 3 such selections.
More qualitatively, as a result of the dampening of           Figure 6: Detecting Age with Names : a plot of age
                                                              related terms along names from different centuries.
Attenuating Bias in Word Vectors

7   DISCUSSION                                            [10] F Hill, R Reichart, and A Korhonen. Simlex-999 :
                                                               Evaluating semantic models with (genuine) sim-
Different types of bias exist in textual data. Some            ilarity estimation. In Computational Linguistics,
are easier to detect and evaluate. Some are harder to          volume 41, pages 665–695, 2015.
find suitable and frequent indicators for and thus, to
                                                          [11] Amir E. Khandani, Adlar J. Kim, and Andrew
dampen. Gendered word pairs and gendered names
                                                               Lo. Consumer credit-risk models via machine-
are frequent enough in textual data to allow us to suc-
                                                               learning algorithms. Journal of Banking & Fi-
cessfully measure it in different ways and project the
                                                               nance, 34(11):2767–2787, 2010.
word embeddings away from the subspace occupied by
gender. Other types of bias don’t always have a list      [12] Omer Levy and Yoav Goldberg. Neural word em-
of word pairs to fall back on to do the same. But              bedding as implicit matrix factorization. In NIPS,
using names, as we see here, we can measure and de-            2013.
tect the different biases anyway and then project the
embedding away from. In this work we also see how         [13] Omer Levy and Yoav Goldberg. Linguistic reg-
a weighted variant of propection removes bias while            ularities of sparse and explicit word representa-
retaining best the inherent structure of the word em-          tions. In CoNLL, 2014.
                                                          [14] Edward Loper and Steven Bird. Nltk: The natu-
                                                               ral language toolkit. In Proceedings of the ACL-
                                                               02 Workshop on Effective Tools and Method-
 [1]                      ologies for Teaching Natural Language Process-
                                                               ing and Computational Linguistics - Volume 1,
 [2]                            ETMTNLP ’02, pages 63–70, Stroudsburg, PA,
                                                               USA, 2002. Association for Computational Lin-
 [3] T Bolukbasi, K W Chang, J Zou, V Saligrama,               guistics.
     and A Kalai. Man is to computer programmer
     as woman is to homemaker? debiasing word em-         [15] Tomas Mikolov, Kai Chen, Greg Corrado, and
     beddings. In ACM Transactions of Information              Jeffrey Dean. Efficient estimation of word rep-
     Systems, 2016.                                            resentations in vector space. Technical report,
                                                               arXiv:1301.3781, 2013.
 [4] T Bolukbasi, K W Chang, J Zou, V Saligrama,
                                                          [16] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg
     and A Kalai. Quantifying and reducing bias in
                                                               Corrado, and Jeffrey Dean. Distributed represen-
     word embeddings. 2016.
                                                               tations of words and phrases and their composi-
                                                               tionality. In NIPS, pages 3111–3119, 2013.
 [5] Tim Brennan, William Dieterich, and Beate
     Ehret. Evaluating the predictive validity of the     [17] Jeffrey Pennington, Richard Socher, and Christo-
     compas risk and needs assessment system. Crim-            pher D. Manning. Glove: Global vectors for word
     inal Justice and Behavior, 36(1):21–40, 2009.             representation. 2014.
 [6] Kaylee Burns, Lisa Anne Hendricks, Trevor Dar-       [18] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente
     rell, and Anna Rohrbach. Women also snowboard:            Ordonez, and Kai-Wei Chang. Men also like shop-
     Overcoming bias in captioning models. CoRR,               ping: Reducing gender bias amplification using
     abs/1803.09797, 2018.                                     corpus-level constraints. CoRR, abs/1707.09457,
 [7] Aylin Caliskan, Joanna J. Bryson, and Arvind
     Narayanan.     Semantics derived automatically
     from language corpora contain human-like biases.
     Science, 356(6334):183–186, 2017.

 [8] Christiane Fellbaum. WordNet: An Electronic
     Lexical Database. Bradford Books, 1998.

 [9] L Finkelstein, E Gabrilovich, Y Matias, E Rivlin,
     Z Solan, G Wolfman, and etal. Placing search in
     context : The concept revisited. In ACM Trans-
     actions of Information Systems, volume 20, pages
     116–131, 2002.
Sunipa Dev, Jeff Phillips

     Supplementary material for:                               male/female words or names.
      Attenuating Bias in Word                                 Career : { executive, management, professional, cor-
            Embeddings                                         poration, salary, office, business, career }
                                                               Family : { home, parents, children, family, cousins,
                                                               marriage, wedding, relatives }
A     Bias in different embeddings
                                                               Male names : { john, paul, mike, kevin, steve, greg,
We explore here how gender bias is expressed across            jeff, bill }
different embeddings, datasets and embedding mecha-
                                                               Female names : { amy, joan, lisa, sarah, diana, kate,
nisms. Similar patterns are reflected across all as seen
                                                               ann, donna }
in Figure 7.
                                                               Male words : { male, man, boy, brother, he, him, his,
For this verification of the permeative na-
                                                               son }
ture of bias across datasets and embeddings,
we use the GloVe embeddings of a Wikipedia                     Female words : { female, woman, girl, she, her, hers,
dump      (                  daughter }
enwiki-latest-pages-articles.xml.bz2,     4.7B
tokens) Common Crawl (840B tokens, 2.2M vo-
cab) and Twitter (27B tokens, 1.2M vocab) from,                      C    DETECTING THE GENDER
and the WordToVec embedding of Google                               DIRECTION
News (100B tokens, 3M vocab) from https:
//                         For this, we take a set of gendered word pairs as listed
                                                               in Table 3. From our default Wikipedia dataset, using
B    WORD EMBEDDING                                            the embedded vectors for these word pairs (i.e., (woman
     ASSOCIATION TEST                                          - man), (she - he), etc), we create a basis for the sub-
                                                               space F , of dimension 10. We then try to understand
                                                               the distribution of variance in this subspace. To do
Word Embedding Association Test (WEAT) was de-
                                                               so, we project the entire dataset onto this subspace F ,
fined as an analogue to Implicit Association Test (IAT)
                                                               and take the SVD. The top chart in Figure 10 shows
by Caliskan et al. [7]. It checks for human like bias
                                                               the singular values of the entire data in this subspace
associated with words in word embeddings. For ex-
                                                               F . We observe that there is a dominant first singu-
ample, it found career oriented words (executive, ca-
                                                               lar vector/value which is almost twice the size of the
reer, etc) more associated with male names and male
                                                               second value. After the this drop, the decay is signifi-
gendered words (’man’,’boy’ etc) than female names
                                                               cantly more gradual. This suggests to use only the top
and gendered words and family oriented words (’fam-
                                                               singular vector of F as the gender subspace, not 2 or
ily’,’home’ etc) more associated with female names and
                                                               more of these vectors.
words than male. We list a set of words used for
WEAT by Calisan et al. and that we used in our work            To grasp how much of this variation is from the correla-
below.                                                         tion along the gender direction, and how much is just
                                                               random variation, we repeat this experiment, again
For two sets of target words X and Y and attribute
                                                               in Figure 10, with different ways of creating the sub-
words A and B, the WEAT test statistic is :
                P                                              space F . First in chart (b), we generate 10 vectors,
s(X, Y, A, B) = x∈X s(x, A, B) − s(y, A, B)                    with one word chosen randomly, and one chosen from
                                                               the gendered set (e.g., chair-woman). Second in chart
                                                               (c), we generate 10 vectors between two random words
s(w, A, B) = meana∈A cos(a, w) − meanb∈B cos(b, w)             from our set of the 100,000 most frequent words; these
and, cos(a, b) is the cosine distance between vector a         are averaged over 100 random iterations due to higher
and b.                                                         variance in the plots. Finally in chart (d), we gen-
                                                               erate 10 random unit vectors in R300 . We observe
This score is normalized by std−devw∈X∪Y s(w, A, B).                                                 ¯
                                                               that the pairs with one gendered vector in each pair
So, closer to 0 this value is, the less bias or preferential
                                                               still exhibits a significant drop in singular values, but
association target word groups have to the attribute
                                                               not as drastic as with both pairs. The other two ap-
word groups.
                                                               proaches have no significant drop since they do not
Here target words are occupation words or ca-                  in general contain a gendered word with interesting
reer/family oriented words and attributes are                  subspace. All remaining singular values, and their de-
Attenuating Bias in Word Vectors

                         man                               he                        john                        avg male
                          captain                                                                                 captain
                             doctor                                                     attorney       programmerfootballer
               programmer                         footballercaptain
                                               programmer                  programmer                            doctor
                             footballer                     doctor
                             attorney                       attorney                    doctor

                         receptionist                                                receptionist
                   dancer                                                      dancer                             homemaker
                     maid                           dancerreceptionist               homemaker                    receptionist
                         homemaker                        nurse                      nurse                  dancer
                                                                                        maid                      nurse
                         woman                             she                       mary                        avg female
                              man                          he                            john                        avg male

                                                                                          captain                  captain
                                                                                          attorney                 footballer
                               captain                                                    footballer
                                                    captainattorney                                                attorney
                               footballer         footballerdoctor                                       programmerdoctor
                               doctor          programmer                         dancer
                 programmer                                                                homemaker          dancer
                                                            homemaker                      nurse                     homemaker
                            nurse                           receptionist                                       nurse
                            maid                                                          doctor                     receptionist
                      dancer                                nurse
                                                            maid                          maid                       maid
                            receptionist            dancer

                              woman                        she                           mary                        avg female
                         man                            he                              john                     avg male
                             captain                                                                              captain
                                                          captain                        captain                    footballer
                                                          footballer                                   programmer
                             footballer                                      programmerfootballer                   attorney
                programmer                   programmer
                                                          attorney                     attorney
                             attorney                     doctor                  dancer
                             doctor                                                     receptionist
                          receptionist                 receptionist                                               receptionist
                                                       homemaker                        homemaker
                    dancer                       dancer                                                    dancer
                                                       maid                                                       homemaker
                          maid                         nurse                                                nurse
                          nurse                                                                                   maid
                         woman                          she                             mary                    avg female
                         man                            he                               john                    avg male

                             captain                       captain                                                  captain
                             footballer                    footballer                                               footballer
                                                                                        captain                     doctor
               programmer                                                               maid           programmer
                                             programmer                                 footballer
                                                           doctor                                                   attorney
                            doctor              attorney                          dancernurse
                            attorney                                                    receptionist
                   dancer                        dancer                                                     dancer
                            homemaker                    maid
                     nurse                               receptionist                                        maid
                            maid                         nurse                                                    receptionist
                                                         homemaker                                                nurse
                         woman                          she                              mary                    avg female

Figure 7: Gender Bias in Different Embeddings : (a) GloVeon Wikipedia, (b) GloVeon Twitter, (c) GloVeon
Common Crawl and (d) WordToVecon GoogleNews datasets.
Sunipa Dev, Jeff Phillips

                          avg male                                    avg male                                   avg male                                             avg male

                 important                                  importantstrong                                                                                              confident
                          powerful                                                                           intelligent                                     intelligentstrong
                 confidentintelligent                        powerfulconfident                               confident                                               shypowerful
                                                                 uglyintelligent                     powerfulimportant
                                                                                                             strong                                          important
                                                                     homely                                  shy                                                    ugly
                           homely                                      shy                                   homely                                                   homely
                                                                                                        uglyglamorous                                        glamorousbeautiful
                             beautiful                                 beautiful
                glamorous                                              glamorous                                  beautiful

                          avg female                                  avg female                                 avg female                                           avg female
  (a)                                           (b)                                       (c)                                           (d)

Figure 8: Bias in adjectives along the gender direction : (a) GloVe on default Wikipedia dataset, (b) GloVe on
Common Crawl (840B token dataset), (c) GloVe on Twitter dataset and (d) WordToVec on Google News dataset


                                                                                                      0.5                                 0.4

                                                                                                      0.4                                 0.3


                    European American Names                     European American Names

                                                                                                      0.0                                 0.0

                                                                                                             0      2     4     6   8            0   2   4      6      8

                                                                                                      0.35                                0.35

                                                                                                      0.30                                0.30

                                                                                                      0.25                                0.25

                                                                                                      0.20                                0.20





                                                                                                      0.05                                0.05

              captainattorney                               nurse                                     0.00                                0.00

                                                                                                             0      2       4   6   8            0   2   4       6     8

                                                                 footballer                Figure 10: Fractional singular values for (a) male-
              dancerprogrammer                        programmerreceptionist               female word pairs (b) one gendered word - one random
                 maidreceptionist                           maid                           word (c) random word pair (d) random unit vectors
                    Hispanic Names                              Hispanic Names             cay appears similar to the non-leading ones from the
(a)                                           (b)
                    European American Names                    European American Names     charts (a) and (b). This further indicates that there
                                                                                           is roughly one important gender direction, and any
                                                                                           related subspace is not significantly different than a
                                                                                           random one in the word-vector embedding.
                                                                 captain                   Now, for any word w in vocabulary W of the embed-
                                                                 doctor                    ding, we can define wB as the part of w along the
                      attorney                                                             gender direction.
          programmernurse                                  dancer
                      maid                                                                 Based on the experiments shown in Figure 10, it is jus-
               dancerreceptionist                                attorney                  tified to take the gender direction as the (normalized)
                      footballer                      programmer
                      homemaker                                  maid                      first right singular vector, vB , or the full data set data
                                                                 receptionist              projected onto the subspace F . Then, the component
                                                                African American Names     of a word vector w along vB is simply hw, vB ivB .
                    African American Names
(c)                                           (d)                                          Calculating this component when the gender subspace
                                                                                           is defined by two or more of the top right singular
Figure 9: Racial bias in different embeddings : Oc-                                        vectors of V can be done similarly.
cupation words along the European American - His-
                                                                                           We should note here that the gender subspace defined
panic axis in GloVe embeddings of (a) Common Crawl
                                                                                           here passes through the origin. Centering the data
and (b) Twitter Dataset and the European American
                                                                                           and using PCA to define the gender subspace lets the
- African American axis in GloVe embeddings of (a)
                                                                                           gender subspace not pass through the origin. We see
Common Crawl and (b) Twitter Dataset
                                                                                           a comparison in the two methods in Section 5 as HD
                                                                                           uses PCA and we use SVD to define the gender direc-
Attenuating Bias in Word Vectors

D     Word Lists                                policeman policewoman
                                                postman postwoman
D.1   Word Pairs used for Flipping              postmaster postmistress
                                                priest priestess
actor actress                                   prince princess
author authoress                                prophet prophetess
bachelor spinster                               proprietor proprietress
boy girl                                        shepherd shepherdess
brave squaw                                     sir madam
bridegroom bride                                son daughter
brother sister                                  son-in-law daughter-in-law
conductor conductress                           step-father step-mother
count countess                                  step-son step-daughter
czar czarina                                    steward stewardess
dad mum                                         sultan sultana
daddy mummy                                     tailor tailoress
duke duchess                                    uncle aunt
emperor empress                                 usher usherette
father mother                                   waiter waitress
father-in-law mother-in-law                     washerman washerwoman
fiance fiancee                                  widower widow
gentleman lady                                  wizard witch
giant giantess
god goddess
governor matron                                 D.2   Occupation Words
grandfather grandmother
grandson granddaughter                          detective
he she                                          ambassador
headmaster headmistress                         coach
heir heiress                                    officer
hero heroine                                    epidemiologist
him her                                         rabbi
himself herself                                 ballplayer
host hostess                                    secretary
hunter huntress                                 actress
husband wife                                    manager
king queen                                      scientist
lad lass                                        cardiologist
landlord landlady                               actor
lord lady                                       industrialist
male female                                     welder
man woman                                       biologist
manager manageress                              undersecretary
manservant maidservant                          captain
masseur masseuse                                economist
master mistress                                 politician
mayor mayoress                                  baron
milkman milkmaid                                pollster
millionaire millionairess                       environmentalist
monitor monitress                               photographer
monk nun                                        mediator
mr mrs                                          character
murderer murderess                              housewife
nephew niece                                    jeweler
papa mama                                       physicist
poet poetess                                    hitman
Sunipa Dev, Jeff Phillips

geologist                     novelist
painter                       senator
employee                      collector
stockbroker                   goalkeeper
footballer                    singer
tycoon                        acquaintance
dad                           preacher
patrolman                     trumpeter
chancellor                    colonel
advocate                      trooper
bureaucrat                    understudy
strategist                    paralegal
pathologist                   philosopher
psychologist                  councilor
campaigner                    violinist
magistrate                    priest
judge                         cellist
illustrator                   hooker
surgeon                       jurist
nurse                         commentator
missionary                    gardener
stylist                       journalist
solicitor                     warrior
scholar                       cameraman
naturalist                    wrestler
artist                        hairdresser
mathematician                 lawmaker
businesswoman                 psychiatrist
investigator                  clerk
curator                       writer
soloist                       handyman
servant                       broker
broadcaster                   boss
fisherman                     lieutenant
landlord                      neurosurgeon
housekeeper                   protagonist
crooner                       sculptor
archaeologist                 nanny
teenager                      teacher
councilman                    homemaker
attorney                      cop
choreographer                 planner
principal                     laborer
parishioner                   programmer
therapist                     philanthropist
administrator                 waiter
skipper                       barrister
aide                          trader
chef                          swimmer
gangster                      adventurer
astronomer                    monk
educator                      bookkeeper
lawyer                        radiologist
midfielder                    columnist
evangelist                    banker
Attenuating Bias in Word Vectors

neurologist                      technician
barber                           nun
policeman                        instructor
assassin                         alderman
marshal                          analyst
waitress                         chaplain
artiste                          inventor
playwright                       lifeguard
electrician                      bodyguard
student                          bartender
deputy                           surveyor
researcher                       consultant
caretaker                        athlete
ranger                           cartoonist
lyricist                         negotiator
entrepreneur                     promoter
sailor                           socialite
dancer                           architect
composer                         mechanic
president                        entertainer
dean                             counselor
comic                            janitor
medic                            firebrand
legislator                       sportsman
salesman                         anthropologist
observer                         performer
pundit                           crusader
maid                             envoy
archbishop                       trucker
firefighter                      publicist
vocalist                         commander
tutor                            professor
proprietor                       critic
restaurateur                     comedian
editor                           receptionist
saint                            financier
butler                           valedictorian
prosecutor                       inspector
sergeant                         steward
realtor                          confesses
commissioner                     bishop
narrator                         shopkeeper
conductor                        ballerina
historian                        diplomat
citizen                          parliamentarian
worker                           author
pastor                           sociologist
serviceman                       photojournalist
filmmaker                        guitarist
sportswriter                     butcher
poet                             mobster
dentist                          drummer
statesman                        astronaut
minister                         protester
dermatologist                    custodian
Sunipa Dev, Jeff Phillips

maestro                                                    African American : { darnell, hakim, jermaine,
pianist                                                    kareem, jamal, leroy, tyrone, rasheed, yvette, malika,
pharmacist                                                 latonya, jasmine }
chemist                                                    Hispanic : { alejandro, pancho, bernardo, pedro,
pediatrician                                               octavio, rodrigo, ricardo, augusto, carmen, katia,
lecturer                                                   marcella , sofia }
musician                                                   D.5   Names used for Age related Bias
cabbie                                                           Detection and Dampening
farmer                                                     Aged : { ruth, william, horace, mary, susie, amy,
headmaster                                                 john, henry, edward, elizabeth }
soldier                                                    Youth : { taylor, jamie, daniel, aubrey, alison,
carpenter                                                  miranda, jacob, arthur, aaron, ethan }

D.3   Names used for Gender Bias Detection

Male : { john, william, george, liam, andrew, michael,
louis, tony, scott, jackson }
Female : { mary, victoria, carolina, maria, anne, kelly,
marie, anna, sarah, jane }

D.4   Names used for Racial Bias Detection
      and Dampening

European American : { brad, brendan, geoffrey, greg,
brett, matthew, neil, todd, nancy, amanda, emily,
rachel }
You can also read