Sustainable Modular Debiasing of Language Models

Page created by Rodney Navarro
 
CONTINUE READING
Sustainable Modular Debiasing of Language Models

                                                           Anne Lauscher,1∗† Tobias Lüken,2∗ Goran Glavaš2
                                                          1
                                                      MilaNLP, Bocconi University, Via Sarfatti 25, 20136 Milan, Italy
                                        2
                                          Data and Web Science Group, University of Mannheim, B 6, 26, 68159 Mannheim, Germany
                                             anne.lauscher@unibocconi.it, tlueken@mail.uni-mannheim.de,
                                                              goran@informatik.uni-mannheim.de

                                                               Abstract                           inter alia). The reason for this lies in the distribu-
                                                                                                  tional nature of these models: human-produced
                                            Unfair stereotypical biases (e.g., gender, racial,    corpora on which these models are trained are
                                            or religious biases) encoded in modern pre-
                                                                                                  abundant with stereotypically biased concept co-
                                            trained language models (PLMs) have nega-
arXiv:2109.03646v1 [cs.CL] 8 Sep 2021

                                            tive ethical implications for widespread adop-        occurrences (for instance, male terms like man or
                                            tion of state-of-the-art language technology.         son appear more often together with certain ca-
                                            To remedy for this, a wide range of debiasing         reer terms like doctor or programmer than female
                                            techniques have recently been introduced to           terms like women or daughter) and the PLMs mod-
                                            remove such stereotypical biases from PLMs.           els, being trained with language modeling objec-
                                            Existing debiasing methods, however, directly         tives, consequently encode these biased associa-
                                            modify all of the PLMs parameters, which
                                                                                                  tions in their parameters. While this effect can lend
                                            – besides being computationally expensive –
                                            comes with the inherent risk of (catastrophic)        itself to diachronic analysis of societal biases (e.g.,
                                            forgetting of useful language knowledge ac-           Garg et al., 2018; Walter et al., 2021), it represents
                                            quired in pretraining. In this work, we pro-          stereotyping, one of the main types of representa-
                                            pose a more sustainable modular debiasing ap-         tional harm (Blodgett et al., 2020) and, if unmiti-
                                            proach based on dedicated debiasing adapters,         gated, may cause severe ethical issues in various
                                            dubbed A DELE. Concretely, we (1) inject              sociotechnical deployment scenarios.
                                            adapter modules into the original PLM layers             To alleviate this problem and ensure fair lan-
                                            and (2) update only the adapters (i.e., we keep
                                                                                                  guage technology, previous work introduced a wide
                                            the original PLM parameters frozen) via lan-
                                            guage modeling training on a counterfactually         range of bias mitigation methods (e.g., Bordia
                                            augmented corpus. We showcase A DELE in               and Bowman, 2019; Dev et al., 2020; Lauscher
                                            gender debiasing of BERT: our extensive eval-         et al., 2020a, inter alia). All existing debiasing
                                            uation, encompassing three intrinsic and two          approaches, however, modify all parameters of
                                            extrinsic bias measures, renders A DELE very          the PLMs which has two prominent shortcom-
                                            effective in bias mitigation. We further show         ings: (1) it comes with a high computational cost1
                                            that – due to its modular nature – A DELE, cou-
                                                                                                  and (2) can lead to (catastrophic) forgetting (Mc-
                                            pled with task adapters, retains fairness even
                                            after large-scale downstream training. Finally,       Closkey and Cohen, 1989; Kirkpatrick et al., 2017)
                                            by means of multilingual BERT, we success-            of the useful distributional knowledge obtained dur-
                                            fully transfer A DELE to six target languages.        ing pretraining. For example, Webster et al. (2020)
                                                                                                  incorporate counterfactual debiasing already into
                                        1   Introduction                                          BERT’s pretraining: this implies a debiasing frame-
                                                                                                  work in which a separate “debiased BERT” in-
                                        Recent work has shown that pretrained language
                                                                                                  stance needs to be trained from scratch for each
                                        models such as ELMo (Peters et al., 2018),
                                                                                                  individual bias type and specification. In sum, cur-
                                        BERT (Devlin et al., 2019), or GPT-2 (Radford
                                                                                                  rent debiasing procedures designed for pretraining
                                        et al., 2019) tend to exhibit a range of stereotypical
                                                                                                  or full fine-tuning of PLMs have a large carbon
                                        societal biases, such as racism and sexism (e.g.,
                                                                                                  footprint (Strubell et al., 2019) and consequently
                                        Kurita et al., 2019; Dev et al., 2020; Webster et al.,
                                                                                                     1
                                        2020; Nangia et al., 2020; Barikeri et al., 2021,               While a full fine-tuning approach to PLM debiasing may
                                                                                                  still be feasible for moderate-sized PLMs like BERT (Devlin
                                            ∗
                                             Equal contribution.                                  et al., 2019), it is prohibitively computationally expensive for
                                            †
                                             Most of the work was conducted while Anne Lauscher   giant language models like GPT-3 (Brown et al., 2020) or
                                        was employed at the University of Mannheim.               GShard (Lepikhin et al., 2020).
jeopardize the sustainability (Moosavi et al., 2020)    (3) fully preserving the distributional knowledge
of fair representation learning in NLP.                 acquired in the pretraining. To meet all three cri-
   In this work, we move towards more sustain-          teria, we propose debiasing based on the popu-
able removal of stereotypical societal biases from      lar adapter modules (Houlsby et al., 2019; Pfeif-
pretrained language models. To this end, we             fer et al., 2020a). Adapters are lightweight neu-
propose A DELE (Adapter-based DEbiasing of              ral components designed for parameter-efficient
LanguagE Models), a debiasing approach based on         fine-tuning of PLMs, injected into the PLM layers.
the the recently proposed modular adapter frame-        In downstream fine-tuning, all original PLM pa-
work (Houlsby et al., 2019; Pfeiffer et al., 2020a).    rameters are kept frozen and only the adapters are
In A DELE, we inject additional parameters, the so-     trained. Because adapters have fewer parameters
called adapter layers into the layers of the PLM        than the original PLM, adapter-based fine-tuning
and incorporate the “debiasing” knowledge only in       is more computationally efficient. And since fine-
those parameters, without changing the pretrained       tuning does not update the PLM’s original parame-
knowledge in the PLM. We show that, while be-           ters, all distributional knowledge is preserved.
ing substantially more efficient (i.e., sustainable)       The debiasing adapters could, in principle, be
than existing state-of-the-art debiasing approaches,    trained using any of the debiasing strategies and
A DELE is just as effective in bias attenuation.        training objectives from the literature, e.g., via ad-
                                                        ditional debiasing loss objectives Qian et al. (2019);
Contributions. The contributions of this work
                                                        Bordia and Bowman (2019); Lauscher et al. (2020a,
are three-fold: (i) we first present A DELE, our
                                                        inter alia) or data-driven approaches such as Coun-
novel adapter-based framework for parameter-
                                                        terfactual Data Augmentation (Zhao et al., 2018).
efficient and knowledge-preserving debiasing of
                                                        For simplicity, we opt for the data-driven CDA
PLMs. We combine A DELE with one of the
                                                        approach: it has been shown to offer reliable de-
most effective debiasing strategies, Counterfactual
                                                        biasing performance (Zhao et al., 2018; Webster
Data Augmentation (CDA; Zhao et al., 2018), and
                                                        et al., 2020) and, unlike other approaches, it does
demonstrate its effectiveness in gender-debiasing
                                                        not require any modifications of the model archi-
of BERT (Devlin et al., 2019), the most widely
                                                        tecture nor training procedure.
used PLM. (ii) We benchmark A DELE in what is
arguably the most comprehensive set of bias mea-        2.1   Debiasing Adapters
sures and data sets for both intrinsic and extrin-
                                                        In this work, we employ the simple adapter archi-
sic evaluation of biases in representation spaces
                                                        tecture proposed by Pfeiffer et al. (2021), in which
spanned by PLMs. Additionally, we study a previ-
                                                        only one adapter module is added to each layer of
ously neglected effect of fairness forgetting present
                                                        the pretrained Transformer, after the feed-forward
when debiased PLMs are subjected to large-scale
                                                        sub-layer. The more widely used architecture of
downstream training for specific tasks (e.g., natural
                                                        Houlsby et al. (2019) inserts two adapter mod-
language inference, NLI); we show that A DELE’s
                                                        ules per Transformer layer, with the other adapter
modular nature allows to counter this undesirable
                                                        injected after the multi-head attention sublayer.
effect by stacking a dedicated task adapter on top of
                                                        We opt for the “Pfeiffer architecture” because in
the debiasing adapter. (iii) Finally, we successfully
                                                        comparison with the “Houlsby architecture” it is
transfer A DELE’s debiasing effects to six other lan-
                                                        more parameter-efficient and has been shown to
guages in a zero-shot manner, i.e., without rely-
                                                        yield slightly better performance on a wide range
ing on any debiasing data in the target languages.
                                                        of downstream NLP tasks (Pfeiffer et al., 2020a,
We achieve this by training the debiasing adapter
                                                        2021). The output of the adapter, a two-layer feed-
stacked on top of the multilingual BERT on the
                                                        forward network, is computed as follows:
English counterfactually augmented dataset.

2   A DELE: Adapter-Based Debiasing                            Adapter(h, r) = U · g(D · h) + r,          (1)

In this work, we seek to fulfill the following three     with h and r as the hidden state and residual of
desiderata: (1) we want to achieve effective de-        the respective Transformer layer. D ∈ Rm×h and
biasing, comparable to that of existing state-of-       U ∈ Rh×m are the linear down- and up-projections,
the-art debiasing methods while (2) keeping the         respectively (h being the Transformer’s hidden size,
training costs of debiasing significantly lower; and    and m the adapter’s bottleneck dimension), and g(·)
is a non-linear activation function. The residual r is    (t1 , t2 ), we check whether either t1 or t2 occur in
the output of the Transformer’s feed-forward layer        s: if t1 is present, we replace its occurrence with t2
whereas h is the output of the subsequent layer nor-      and vice versa. We denote the counterfactual sen-
malization. The down-projection D compresses to-          tence of s obtained this way with s0 and the whole
ken representations to the adapter size m < h, and        counterfactual corpus with S 0 . We adopt the so-
the up-projection U projects the activated down-          called two-sided CDA from (Webster et al., 2020):
projections back to the Transformer’s hidden size         the final corpus for debiasing training consists of
h. The ratio h/m captures the factor by which             both the original and counterfactually created sen-
the adapter-based fine-tuning is more parameter-          tences. Finally, we train the debiasing adapter via
efficient than full fine-tuning of the Transformer.       masked language modeling on the counterfactually
   In our case, we train the adapters for debias-         augmented corpus S ∪ S 0 . We train sequentially by
ing: we inject adapter layers into BERT (Devlin           first exposing the adapter to the original corpus S
et al., 2019), freeze the original BERT’s parameters,     and then to the augmented portion S 0 .
and run a standard debiasing training procedure –
language modeling on counterfactual data (§2.2) –
                                                          3     Experiments
during which we only tune the parameters of the           We showcase A DELE for arguably the most ex-
debiasing adapters. At the end of the debiasing           plored societal bias – gender bias – and the most
training, the debiasing functionality is isolated into    widely used PLM, BERT. We profile its debiasing
the adapter parameters. This not only preserves the       effects with a comprehensive set of intrinsic and
distributional knowledge in the Transformer’s orig-       downstream (i.e., extrinsic) evaluations.
inal parameters, but also allows for more flexibility
and “on-demand” usage of the debiasing function-          3.1    Evaluation Data Sets and Measures
ality in downstream applications. For example,            We test A DELE on three intrinsic (BEC-Pro, DisCo,
one could train a separate set of debiasing adapters      WEAT) and two downstream debiasing bench-
for each bias dimension of interest (e.g., gender,        marks (Bias-STS-B and Bias-NLI). We now de-
race, religion, sexual orientation) and selectively       scribe each of the benchmarks in more detail.
combine them in downstream tasks, depending on
                                                          Bias Evaluation Corpus with Professions (BEC-
the constraints and requirements of the concrete
                                                          Pro). We intrinsically evaluate A DELE on the
sociotechnical environment.
                                                          BEC-Pro data set (Bartl et al., 2020), designed to
                                                          capture gender bias w.r.t. professions. The data set
2.2   Counterfactual Augmentation Training                consists of 2,700 sentence pairs in the format (“m
In the context of representation debiasing, coun-         [temp] p”; “f [temp] p”), where m is a male term
terfactual data augmentation (CDA) refers to the          (e.g., boy, groom), f is a female term (e.g., girl,
automatic creation of text instances that in some         bride), p is a profession term (e.g., mechanic, doc-
way counter the stereotypical bias present in the         tor), and [temp] is one of the predefined connecting
representation space. CDA has been successfully           templates, e.g., “is a” or “works as a”.
used for attenuating a variety of bias types, e.g.,          We measure the bias on BEC-Pro using the bias
gender and race, and in several variants, e.g., with      measure of Kurita et al. (2019). They compute the
general terms describing dominant and minoritized         association at,p between a gender term t (male or
groups, or with personal names acting as proxies          female) and a profession p as:
for such groups (Zhao et al., 2018; Lu et al., 2020).                                  P (t)t
Most commonly, CDA modifies the training data by                          at,p = log            ,           (2)
                                                                                       P (t)t,p
replacing terms describing one of the target groups
(dominant or minoritized) with terms describing           where P (t)t is the probability of the PLM generat-
the other group. Let S be our training corpus, con-       ing the target term t when only t itself is masked,
sisting of sentences s and let T = {(t1 , t2 )i }N
                                                 i=1 be   and P (t)t,p is the probability of t being generated
a set of N term pairings between the dominant and         when both t and the profession p are masked. The
minoritized group (i.e., t1 is a term representing        bias score b is then simply a difference in the as-
the dominant group, e.g., man, and t2 is a corre-         sociation score between the male term m and its
sponding term representing the minoritized group,         corresponding female term f : b = am,p − af,p .
e.g., woman). For each sentence si and each pair          We measure the overall bias on the whole dataset
in two complementary ways: (a) by averaging the                   rank in the two lists (e.g., Liam and Olivia). Fi-
bias scores b across all 2,700 instances (∅ bias) and             nally, we remove pairs with ambiguous names that
(b) by measuring the percentage of instances for                  may also be used as general concepts (e.g., violet,
which b is below some threshold value: we report                  a color), resulting in final 92 pairs.
this score for two different thresholds (0.1 and 0.7).
                                                                  Word Embedding Association Test (WEAT).
   Bartl et al. (2020) additionally published a Ger-
                                                                  As the final intrinsic measure, we use the well-
man version of the BEC-Pro data set, which we use
                                                                  known WEAT (Caliskan et al., 2017) test. Devel-
to evaluate A DELE’s zero-shot transfer abilities.
                                                                  oped for detecting biases in static word embedding
Discovery of Correlations (DisCo). The sec-                       spaces, it computes the differential association be-
ond data set for intrinsic debiasing evaluation,                  tween two target term sets A (e.g., male terms) and
DisCo (Webster et al., 2020), also relies on tem-                 B (e.g., female terms) based on the mean (cosine)
plates (e.g., “[PERSON] studied [BLANK] at col-                   similarity of their embeddings with embeddings
lege”). For each template, the [PERSON] slot is                   of terms from two attribute sets X (e.g., science
filled first with a male and then with a female term              terms) and Y (e.g., art terms):
(e.g., for the pair (John, Veronica), we get John
studied [BLANK] at college and Veronica studied
                                                                                       X                     X
                                                                   w(A, B, X, Y ) =          s(a, X, Y ) −         s(b, X, Y ) . (5)
[BLANK] at college). Next, for each of the two                                         a∈A                   b∈B

instances, the model is asked to fill the [BLANK]                 The association s of term t ∈ A or t ∈ B is com-
slot: the goal is to determine the difference in the              puted as:
probability distribution for the masked token, de-
                                                                                   1 X                1X
pending on which term is inserted in the [PERSON]                    s(t,X,Y )=          cos(t, x) −         cos(t, y) .         (6)
                                                                                  |X|x∈X             |Y |y∈Y
slot. While Webster et al. (2020) retrieve the top
three most likely terms for the masked position, we               The significance of the statistic is computed with
retrieve all terms t with the probability p(t) > 0.1.2            a permutation test in which s(A, B, X, Y ) is com-
           (i)      (i)
    Let Cm and Cf be the candidate sets obtained                  pared with the scores s(A∗ , B ∗ , X, Y ) where A∗
for the i-th instance when filled with a male [PER-               and B ∗ are equally sized partitions of A ∪ B. We
SON] term m and the corresponding female term f ,                 report the effect size, a normalized measure of sep-
respectively. We then compute two different mea-                  aration between the association distributions:
sures. The first is the average fraction of shared
candidates between the two sets (∅frac):                                µ({s(a, X, Y )}a∈A ) − µ({s(b, X, Y )}b∈B )
                                                                                                                    ,            (7)
                                                                                 σ ({s(t, X, Y )}t∈A∪B )
                         N         (i)     (i)
                     1   X       |Cm  ∩ Cf |
           ∅frac =                   (i)      (i)
                                                    ,      (3)    where µ is the mean and σ is the standard deviation.
                     N    i   min (|Cm |, |Cf |)
                                                                     Since WEAT requires word embeddings as in-
with N as the total number of test instances. Intu-               put, we first have to extract word-level vectors from
itively, a higher average fraction of shared candi-               a PLM like BERT. To this end, we follow Vulić
dates indicates lower bias.                                       et al. (2020) and obtain a vector xi ∈ Rd for each
   For the second measure, we retrieve the proba-                 word wi (e.g., man) from the bias specification as
bilities p(t) for all candidates t in the union of two            follows: we prepend the word with the BERT’s se-
                (i)     (i)                                       quence start token and append it with the separator
sets C (i) = Cm ∪ Cf . We then compute the nor-
malized average absolute probability difference:                  token (e.g., [CLS] man [SEP]). We then feed
                                                                  the input sequence through the Transformer and
            N
          1 X
                   P
                       t∈Ci |pm (t) − pf (t)|
                                                                  compute xi as the average of the term’s represen-
 ∅diff=        P                 P                 . (4)          tations from layers m : n. We experimented with
          N i ( t∈C (i) pm (t) + t∈C (i) pf (t))/2
                         m                  m
                                                                  inducing word-level embeddings by averaging rep-
   We create test instances by collecting 100 most                resentations over all consecutive ranges of layers
frequent baby names for each gender from the US                   [m : n], m ≤ n. We measure the gender bias using
Social Security name statistics for 2019.3 We cre-                the test WEAT 7 (see the full specification in the
ate pairs (m, f ) from names at the same frequency                Appendix), which compares male terms (e.g., man,
   2                                                              boy) against female terms (e.g., woman, girl) w.r.t.
      We argue that retrieving more terms from the distribution
allows for a more accurate estimate of the bias.                  associations to science terms (e.g., math, algebra,
    3
      https://www.ssa.gov/oact/babynames/limits.html              numbers) and art terms (e.g., poetry, dance, novel).
Lauscher and Glavaš (2019) created XWEAT             stances. Following the original work, we compute
by translating some of the original WEAT bias           two bias scores: (1) the fraction neutral (FN) score
specifications to six target languages: German (DE),    is the percentage of instances for which the model
Spanish (ES), Italian (IT), Croatian (HR), Russian      predicts the NEUTRAL class; (2) net neutral (NN)
(RU), and Turkish (TR). We use their translations of    score is the average probability that the model as-
the WEAT 7 gender test in the zero-shot debiasing       signs to the NEUTRAL class across all instances.
transfer evaluation of ADELE.                           In both cases, the higher score corresponds to a
                                                        lower bias. We couple FN and NN on Bias-NLI
Bias-STS-B. The first extrinsic measure we              with the actual NLI accuracy on the MNLI matched
use is Bias-STS-B, introduced by Webster et al.         development set (Williams et al., 2018).
(2020), based on the well-known Semantic Textual
Similarity-Benchmark (STS-B; Cer et al., 2017),         3.2    Experimental Setup
a regression task where models need to predict se-
                                                        Data. Aligned with BERT’s pretraining, we carry
mantic similarity for pairs of sentences. Webster
                                                        out the debiasing MLM training on the concatena-
et al. (2020) adapt STS-B for discovering gender-
                                                        tion of the English Wikipedia and the BookCor-
biased correlations. They start from neutral STS
                                                        pus (Zhu et al., 2015). Since we are only training
templates and fill them with a gendered term (man,
                                                        the parameters of the debiasing adapters, we uni-
woman) and a profession term from (Rudinger
                                                        formly subsample the corpus to one third of its
et al., 2018) (e.g., A man is walking vs. A nurse
                                                        original size. We adopt the set of gender term
is walking and A woman is walking vs. A nurse
                                                        pairs T for CDA from Zhao et al. (2018) (e.g.,
is walking). The dataset consists of 16,980 such
                                                        actor-actress, bride-groom)4 and augment it with
pairs. As a measure of bias, we compute the av-
                                                        three additional pairs: his-her, himself -herself, and
erage absolute difference between the similarity
                                                        male-female, resulting with the total of 193 term
scores of male and female sentence pairs, with a
                                                        pairs. Our final debiasing CDA corpus consists of
lower value corresponding to less bias. We couple
                                                        105,306,803 sentences.
the bias score with the actual STS task performance
score (Pearson correlation with human similarity        Models and Baselines. In all experiments we in-
scores), measured on the STS-B development set.         ject A DELE adapters of bottleneck size m = 48
                                                        into the pretrained BERT Base Transformer (12 lay-
 Bias-NLI. We select the task of understanding          ers, 12 attention heads, 768 hidden size).5 We com-
 biased natural language inferences (NLI) as the sec-   pare A DELE with the debiased BERT Large models
 ond extrinsic evaluation. To this end, we fine-tune    released by Webster et al. (2020): (1) ZariCDA is
 the original BERT as well as our adapter-debiased      counterfactually pretrained (from scratch); whereas
 BERT on the MNLI data set (Williams et al., 2018).     (2) ZariDO was post-hoc MLM-fine-tuned on regu-
 For evaluation, we follow Dev et al. (2020), and       lar corpora, but with more aggressive dropout rates.
 create a synthetic NLI data set that tests for the     In cross-lingual zero-shot transfer experiments, we
 gender-occupation bias: it comprises NLI instances     train A DELE on top of multilingual BERT (Devlin
 for which an unbiased model should not be able         et al., 2019) in its base configuration (uncased, 12
 to infer anything, i.e., it should predict the NEU -   layers, 768 hidden size).
 TRAL class. We use the code of Dev et al. (2020)
 and, starting from the generic template The   a/an , fill the slots with        MLM procedure for BERT training and mask 15%
 term sets provided with the code. First, we fill the   of the tokens. We then train A DELE’s debiasing
 verb and object slots with common activities, e.g.,    adapters on our CDA data set for 2 epochs, with a
“bought a car”. We then create neutral entailment       batch size of 16. We optimize the adapter param-
 pairs by filling the subject slot with an occupation   eters using the Adam algorithm (Kingma and Ba,
 term, e.g., “physician”, for the hypothesis and a      2015), with the constant learning rate of 3 · 10−5 .
 gendered term, e.g., “woman”, for the premise,
                                                           4
 resulting in the final instance: (woman bought a             https://github.com/uclanlp/corefBias/
 car, physician bought a car, NEUTRAL). Using the       tree/master/WinoBias/wino
                                                            5
                                                              We implement A DELE using the Huggingface tranformers
 code and terms released by Dev et al. (2020), we       library (Wolf et al., 2020) in combination with the AdapterHub
 produce the total of N = 1, 936, 512 Bias-NLI in-      framework (Pfeiffer et al., 2020a).
Downstream Fine-tuning. Our two extrinsic                           [12:12]). For A DELE, we get the most gender-
evaluations require task-specific fine-tuning on                    neutral embeddings by aggregating representations
the STS-B and MNLI training datasets, respec-                       from lower layers (e.g., [0:3] or [1:3]); representa-
tively. We couple BERT (with and without                            tions from higher layers (e.g., [6:12]) flip the bias
A DELE adapters) with the standard single-layer                     into the opposite direction (blue color). Both Zari
feed-forward softmax classifier and fine-tune all                   models produce embeddings which are relatively
parameters in task-specific training.6 We optimize                  unbiased, but ZariCDA still exhibits slight gender
the hyperparameters on the respective STS-B and                     bias in higher layer representations. The dropout-
MNLI (matched) development sets. To this end, we                    based debiasing of ZariDO results in an interesting
search for the optimal number of training epochs                    per-layer-region oscillating gender bias.
in {2, 3, 4} and fix the learning rate to 2 · 10−5 ,
maximum sequence length to 128, and batch size                      Zero-Shot Cross-Lingual Transfer. We show
to 32. Like in debiasing training, we use Adam                      the results of zero-shot transfer of gender debias-
(Kingma and Ba, 2015) for optimization.                             ing with A DELE (on top of mBERT) on German
                                                                    BEC-Pro in Table 2. On the E N BEC-Pro portion
4    Results and Discussion                                         A DELE is as effective on top of mBERT as it is
                                                                    on top of the E N BERT (see Table 1): it reduces
Monolingual Evaluation. Our main monolin-                           mBERT’s bias from 0.81 to 0.3. More importantly,
gual English debiasing results on three intrinsic                   the positive debiasing effect successfully transfers
and two extrinsic benchmarks are summarized in                      to German: the bias effect on the DE portion is
Table 1. The results show that (1) A DELE suc-                      reduced from 1.1 to 0.67, despite not using any
cessfully attenuates BERT’s gender bias across the                  German data in the training of debiasing adapters.
board, and (2) it is, in many cases, more effective in              We also see an improvement with respect to the
attenuating gender biases than the computationally                  fraction of unbiased instances for both thresholds,
much more intensive Zari models (Webster et al.,                    expectedly with larger improvements for the more
2020). In fact, on BEC-Pro and DisCo A DELE                         lenient threshold of 0.7.
substantially outperforms both Zari variants.                          In Table 3, we show the bias effects of static
   The results from two extrinsic evaluations – STS                 word embeddings, aggregated from layers of
and NLI – demonstrate that A DELE successfully                      mBERT and A DELE-debiased mBERT, on the
attenuates the bias, while retaining the high task                  XWEAT gender-bias test 7 for six different target
performance. Zari variants yield slightly better task               languages. We show the results for two aggregation
performance for both STS-B and MNLI: this is                        strategies, including ([0:12]) and excluding ([1:12])
expected, as they are instances of the BERT Large                   mBERT’s (sub)word embedding layer.
Transformer with 336M parameters; in comparison,                       Like BEC-Pro, WEAT confirms that A DELE also
A DELE has only 110M parameters of BERT Base                        attenuates the bias in E N representations coming
and approx. 885K adapter parameters.7                               from mBERT. The results across the six target lan-
   According to WEAT evaluation on static em-                       guages are somewhat mixed, but overall encour-
beddings extracted from BERT (§3.1), the original                   aging: for all significantly biased combinations
BERT Transformer is only slightly and insignifi-                    of languages and layer aggregations from original
cantly biased. Consequently, A DELE inverts the                     mBERT ([0:12] – IT, RU; [1:12] – HR, RU), A DELE
bias in the opposite direction. In Figure 1, we                     successfully reduces the bias. E.g., for IT embed-
further analyze the WEAT bias effects w.r.t. the                    dings extracted from all layers ([0:12]), the bias
subset of BERT layers from which we aggregate                       effect size drops from significant 1.02 to insignifi-
the word embeddings. For the original BERT (Fig-                    cant −0.25. In case of already insignificant biases
ure 1a), we obtain the gender unbiased embeddings                   in original mBERT, A DELE often further reduces
if we aggregate representations from higher layers                  the bias effect size (DE, TR) and if not, the bias
(e.g., [5:12], [6:9], or by taking final layer vectors,             effects remain insignificant.
    6
      The only exception is the fairness forgetting experiment in      We additionally visualize all XWEAT bias effect
§4, in which we freeze both the Transformer and the debiasing       sizes in the produced embeddings via heatmaps
adapters and train the dedicated task adapter on top.               in Figure 2. The intuition we can get from the
    7
      A DELE adds 884,736 parameters to BERT Base: 12 (lay-
ers) × 2 (down-projection and up-projection matrix) × 768           plots supports our conclusion: for all languages,
(hidden size h of BERT Base) × 48 (bottleneck size m).              especially for the source language EN and the tar-
WEAT T7                                                          BEC-Pro                                     DisCo (names)                STS                                 NLI
 Model                                          e[0:12]↓                ∅ bias↓                                  t(0.1)↑         t(0.7)↑               ∅ frac↑         ∅ diff↓   ∅ diff↓          Pear↑         FN↑         NN↑      Acc↑
 BERT                                              0.79*                 1.33                                      0.05              0.37              0.8112          0.5146    0.313            88.78        0.0102       0.0816   84.77
 ZariCDA                                           0.43*                 1.11                                      0.07              0.45              0.7527          0.6988    0.087            89.37        0.1202       0.1628   85.52
 ZariDO                                            0.23*                 1.20                                      0.07              0.38              0.6422          0.9352    0.118            88.22        0.1058       0.1147   86.06
 A DELE                                            -0.98                 0.39                                      0.17              0.85              0.8862          0.3118    0.121            88.93        0.1273       0.1726   84.13

Table 1: Results of our monolingual gender bias evaluation. We report WEAT effect size (e), BEC-Pro average
bias (∅ bias) and fraction of biased instances at thresholds 0.1 and 0.7, DisCo average fraction (∅ frac) and
average difference (∅ diff), STS average similarity difference (∅ diff) and Pearson correlation (Pear), and Bias-
NLI fraction neutral (FN) and net neutral (NN) scores as well as MNLI-m accuracy (Acc) for three models: original
BERT, ZariCDA and ZariDO (Webster et al., 2020), and A DELE. ↑: higher is better (lower bias); ↓: lower is better.

                                                                                                                                                                                                           0
                                                                                                                                                                   0                                       1
                                                                                                                                                                   1                                       2
   12 11 10 9 8 7 6 5 4 3 2 1 0

                                                                                12 11 10 9 8 7 6 5 4 3 2 1 0

                                                                                                                                                                   2                                       3
                                                                                                                                                                   3                                       4                          1.0
                                                                  1.0                                                                           1.0
                                                                                                                                                                   4                       1.0             5
                                                                                                                                                                   5                                       6
                                                                                                                                                                   6                                       7
                                                                                                                                                                   7                                       8                          0.5
                                                                  0.5                                                                           0.5                8                       0.5             9
                                                                                                                                                                   9                                      10
                                                                                                                                                                  10                                      11
                                                                                                                                                                  11                                      12                          0.0

                                                                                                                                                                                                      n
                                                                  0.0                                                                           0.0               12                       0.0

                                                                                                                                                              n
                                                                                                                                                                                                          13
  n

                                                                                n

                                                                                                                                                                  13                                      14
                                                                                                                                                                  14                                      15                           0.5
                                                                   0.5                                                                           0.5              15                        0.5           16
                                                                                                                                                                  16                                      17
                                                                                                                                                                  17                                      18
                                                                   1.0                                                                           1.0              18                        1.0           19                           1.0
                                                                                                                                                                  19                                      20
                                                                                                                                                                  20                                      21
                                                                                                                                                                  21
                                                                                                                                                                  22                                      22
                                                                                                                                                                  23                                      23
                                  0 1 2 3 4 5 6 7 8 9 10 11 12                                                 0 1 2 3 4 5 6 7 8 9 10 11 12

                                                                                                                                                                                                            0
                                                                                                                                                                                                            1
                                                                                                                                                                                                            2
                                                                                                                                                                                                            3
                                                                                                                                                                                                            4
                                                                                                                                                                                                            5
                                                                                                                                                                                                            6
                                                                                                                                                                                                            7
                                                                                                                                                                                                            8
                                                                                                                                                                                                            9
                                                                                                                                                                                                           10
                                                                                                                                                                                                           11
                                                                                                                                                                                                           12
                                                                                                                                                                                                           13
                                                                                                                                                                                                           14
                                                                                                                                                                                                           15
                                                                                                                                                                                                           16
                                                                                                                                                                                                           17
                                                                                                                                                                                                           18
                                                                                                                                                                                                           19
                                                                                                                                                                                                           20
                                                                                                                                                                                                           21
                                                                                                                                                                                                           22
                                                                                                                                                                                                           23
                                                                                                                                                                    0
                                                                                                                                                                    1
                                                                                                                                                                    2
                                                                                                                                                                    3
                                                                                                                                                                    4
                                                                                                                                                                    5
                                                                                                                                                                    6
                                                                                                                                                                    7
                                                                                                                                                                    8
                                                                                                                                                                    9
                                                                                                                                                                   10
                                                                                                                                                                   11
                                                                                                                                                                   12
                                                                                                                                                                   13
                                                                                                                                                                   14
                                                                                                                                                                   15
                                                                                                                                                                   16
                                                                                                                                                                   17
                                                                                                                                                                   18
                                                                                                                                                                   19
                                                                                                                                                                   20
                                                                                                                                                                   21
                                                                                                                                                                   22
                                                                                                                                                                   23
                                               m                                                                            m                                                m                                          m

                                       (a) BERTBase .                                                              (b) BERTA DELE .                                     (c) ZariCDA .                             (d) ZariDO .

Figure 1: WEAT bias effect heatmaps for (a) original BERTBase , and the debiased BERTs, (b) BERTA DELE , (c)
ZariCDA (Webster et al., 2020), and (d) ZariCDA , for word embeddings averaged over different subsets of layers
[m : n]. E.g., [0 : 0] points to word embeddings directly obtained from BERT’s (sub)word embeddings (layer 0);
[1 : 7] indicates word vectors obtained by averaging word representations after Transformer layers 1 through 7.

                                                             EN                                                                 DE                             large-scale fine-tuning in downstream tasks. Web-
Model                                       ∅ bias t(0.1) t(0.7) ∅ bias t(0.1) t(0.7)                                                                          ster et al. (2020) report the presence of debiasing
mBERT                                           0.81         0.08         0.55                                    1.10          0.08          0.39             effects after STS-B training. With merely 5,749
mBERTA                                          0.30         0.23         0.93                                    0.67          0.11          0.62             training instances, however, STS-B is two orders
                                                                                                                                                               of magnitude smaller than MNLI (392,702 train-
Table 2: Results for mBERT and mBERT debiased on                                                                                                               ing instances). Here we conduct a study on MNLI,
EN data with A DELE on BEC-Pro English and German.
                                                                                                                                                               testing for the presence of the gender bias in Bias-
We report the average bias (∅ bias) and the fraction of
                                                                                                                                                               NLI after A DELE’s exposure to varying amount
biased instances for thresholds t(0.1) and t(0.7).
                                                                                                                                                               of MNLI training data. We fully fine-tune BERT
Layers Model                                            EN        DE        ES                                     IT       HR         RU             TR       Base and BERTA DELE (i.e., BERT augmented with
0:12
                                     mBERT          1.42   0.59* -0.47* 1.02 -0.57*                                                  1.49     -0.55*           debiasing adapters) on MNLI datasets of varying
                                     mBERTA         0.20* -0.04* -0.49* -0.25* 0.72*                                                 1.24     -0.33*
                                                                                                                                                               sizes (10K, 25K, 75K, 100K, 150K, and 200K) and
                                     mBERT   1.36             0.62* -0.55* -0.55*                                        1.08   0.62          -0.61*
1:12
                                     mBERTA -0.08            -0.05* -0.63* -0.63*                                        0.79* -0.05          -0.34*           measure, for each model, the Bias-NLI net neu-
                                                                                                                                                               tral (NN) score as well as the NLI accuracy on the
Table 3: XWEAT effect sizes for original mBERT and                                                                                                             MNLI (matched) development set. For each model
zero-shot cross-lingual debiasing transfer of A DELE                                                                                                           and each training set size, we carry out five training
(mBERTA ) from EN to six target languages. Results
                                                                                                                                                               runs and report the average scores.
for two variants of embedding aggregation over Trans-
former layers: [1:12] – all Tranformer layers; [0:12] –                                                                                                           Figure 3 summarizes the results of our fairness
all layers plus mBERT’s (sub)word embeddings (“layer                                                                                                           forgetting experiment. We report the mean and the
0”). Asterisks: insignificant bias effects at α < 0.05.                                                                                                        95% confidence interval over the five runs for NN
                                                                                                                                                               on Bias-NLI and Accuracy (Acc) on the MNLI-m
get language DE, the bias gets reduced, which is                                                                                                               development set. Several interesting observations
indicated by the lighter colors throughout all plots.                                                                                                          emerge. First, the NN scores seem to be quite
                                                                                                                                                               unstable across different runs (wide confidence
Fairness Forgetting. Finally, we investigate                                                                                                                   intervals) for both BERT and A DELE, which is
whether the debiasing effects persist even after the                                                                                                           surprising given the size of the Bias-NLI test set
12 11 10 9 8 7 6 5 4 3 2 1 0

                                                                                                                                                                                                                                                                                            12 11 10 9 8 7 6 5 4 3 2 1 0

                                                                                                                                                                                                                                                                                                                                                                   12 11 10 9 8 7 6 5 4 3 2 1 0
                                                                                                                                               12 11 10 9 8 7 6 5 4 3 2 1 0
   12 11 10 9 8 7 6 5 4 3 2 1 0

                                                                          12 11 10 9 8 7 6 5 4 3 2 1 0

                                                                                                                                                                                                                                                                                                                                                                                                                                        12 11 10 9 8 7 6 5 4 3 2 1 0
                                                                                                                                                                                                                                                                                     1.5                                                                  1.5                                                                    1.5
                                                                 1.5                                                                    1.5                                                                  1.5                                                                                                                                                                                                                                                                                      1.5
                                                                                                                                                                                                                                                                                     1.0                                                                  1.0                                                                    1.0
                                                                 1.0                                                                    1.0                                                                  1.0                                                                                                                                                                                                                                                                                      1.0
                                                                                                                                                                                                                                                                                     0.5                                                                  0.5                                                                    0.5
                                                                 0.5                                                                    0.5                                                                  0.5                                                                                                                                                                                                                                                                                      0.5
                                                                                                                                                                                                                                                                                     0.0                                                                  0.0                                                                    0.0

                                                                                                                                                                                                                      n

                                                                                                                                                                                                                                                                                            n

                                                                                                                                                                                                                                                                                                                                                                  n
                                                                 0.0                                                                    0.0                                                                  0.0                                                                                                                                                                                                                                                                                      0.0

                                                                                                                                               n
  n

                                                                         n

                                                                                                                                                                                                                                                                                                                                                                                                                                        n
                                                                                                                                                                                                              0.5                                                                     0.5                                                                  0.5                                                                    0.5
                                                                  0.5                                                                    0.5                                                                                                                                                                                                                                                                                                                                                           0.5
                                                                  1.0                                                                    1.0                                                                  1.0                                                                     1.0                                                                  1.0                                                                    1.0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       1.0
                                                                  1.5                                                                    1.5                                                                  1.5                                                                     1.5                                                                  1.5                                                                    1.5                                                                  1.5

                                  0 1 2 3 4 5 6 7 8 9 10 11 12                                           0 1 2 3 4 5 6 7 8 9 10 11 12                                         0 1 2 3 4 5 6 7 8 9 10 11 12                                            0 1 2 3 4 5 6 7 8 9 10 11 12                                         0 1 2 3 4 5 6 7 8 9 10 11 12                                           0 1 2 3 4 5 6 7 8 9 10 11 12                                         0 1 2 3 4 5 6 7 8 9 10 11 12
                                               m                                                                      m                                                                    m                                                                       m                                                                    m                                                                      m                                                                    m

                                  (a) mBERT EN.                                                          (b) mBERT DE.                                                        (c) mBERT ES.                                                           (d) mBERT IT.                                                    (e) mBERT HR.                                                              (f) mBERT RU.                                     (g) mBERT TR.

                                                                                                                                                                                                                       12 11 10 9 8 7 6 5 4 3 2 1 0

                                                                                                                                                                                                                                                                                            12 11 10 9 8 7 6 5 4 3 2 1 0

                                                                                                                                                                                                                                                                                                                                                                   12 11 10 9 8 7 6 5 4 3 2 1 0
                                                                                                                                               12 11 10 9 8 7 6 5 4 3 2 1 0
   12 11 10 9 8 7 6 5 4 3 2 1 0

                                                                          12 11 10 9 8 7 6 5 4 3 2 1 0

                                                                                                                                                                                                                                                                                                                                                                                                                                        12 11 10 9 8 7 6 5 4 3 2 1 0
                                                                                                                                                                                                                                                                                     1.5                                                                  1.5                                                                    1.5
                                                                 1.5                                                                    1.5                                                                  1.5                                                                                                                                                                                                                                                                                      1.5
                                                                                                                                                                                                                                                                                     1.0                                                                  1.0                                                                    1.0
                                                                 1.0                                                                    1.0                                                                  1.0                                                                                                                                                                                                                                                                                      1.0
                                                                                                                                                                                                                                                                                     0.5                                                                  0.5                                                                    0.5
                                                                 0.5                                                                    0.5                                                                  0.5                                                                                                                                                                                                                                                                                      0.5
                                                                                                                                                                                                                                                                                     0.0                                                                  0.0                                                                    0.0

                                                                                                                                                                                                                      n

                                                                                                                                                                                                                                                                                            n

                                                                                                                                                                                                                                                                                                                                                                  n
                                                                 0.0                                                                    0.0                                                                  0.0                                                                                                                                                                                                                                                                                      0.0

                                                                                                                                               n
  n

                                                                         n

                                                                                                                                                                                                                                                                                                                                                                                                                                        n
                                                                                                                                                                                                              0.5                                                                     0.5                                                                  0.5                                                                    0.5
                                                                  0.5                                                                    0.5                                                                                                                                                                                                                                                                                                                                                           0.5
                                                                  1.0                                                                    1.0                                                                  1.0                                                                     1.0                                                                  1.0                                                                    1.0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       1.0
                                                                  1.5                                                                    1.5                                                                  1.5                                                                     1.5                                                                  1.5                                                                    1.5                                                                  1.5

                                  0 1 2 3 4 5 6 7 8 9 10 11 12                                           0 1 2 3 4 5 6 7 8 9 10 11 12                                         0 1 2 3 4 5 6 7 8 9 10 11 12                                            0 1 2 3 4 5 6 7 8 9 10 11 12                                         0 1 2 3 4 5 6 7 8 9 10 11 12                                           0 1 2 3 4 5 6 7 8 9 10 11 12                                         0 1 2 3 4 5 6 7 8 9 10 11 12
                                               m                                                                      m                                                                    m                                                                       m                                                                    m                                                                      m                                                                    m

    (h) mBERTA EN.                                                                    (i) mBERTA DE.                                                           (j) mBERTA ES.                                                   (k) mBERTA IT.                                                     (l) mBERTA HR.                                                  (m) mBERTA RU.                                                       (n) mBERTA TR.

Figure 2: XWEAT effect sizes heat maps for (a) original mBERT, and the debiased (b) mBERTA DELE in seven
languages (source language EN, and transfer languages DE, ES, IT, HR, RU, TR), for word embeddings averaged
over different subsets of layers [m : n]. E.g., [0 : 0] points to word embeddings directly obtained from BERT’s
(sub)word embeddings (layer 0); [1 : 7] indicates word vectors obtained by averaging word representations after
Transformer layers 1 through 7. Lighter colors indicate less bias.

                                   0.8                                                                                                                                                                                                                                               Model                                                                       FN↑                                                    NN↑                                                       Acc↑
                                   0.7                                                                                                                                                                                                                                               BERT                                                                       0.010                                                  0.082                                                      84.77
                                   0.6                                                                                                                                                                                                                                               A DELE                                                                     0.127                                                  0.173                                                      84.13
                                                                                                                                                                                                   BERT NN
                                   0.5                                                                                                                                                                                                                                               A DELE-TA                                                                  0.557                                                  0.504                                                      81.30
NN, Acc

                                                                                                                                                                                                   BERT Acc
                                   0.4                                                                                                                                                             ADELE NN
                                                                                                                                                                                                   ADELE Acc                                                             Table 4: Fairness preservation results for A DELE-TA.
                                   0.3
                                                                                                                                                                                                                                                                         We report bias measures Fraction Neutral (FN) and Net
                                   0.2
                                                                                                                                                                                                                                                                         Neutral (NN) on the Bias-NLI data set together with
                                   0.1
                                                                                                                                                                                                                                                                         NLI accuracy on MNLI-m dev set.
                                                   10k 25k              50k                                     75k 100k                                                           150k                             200k
                                                                                                                          #instances
                                                                                                                                                                                                                                                                          only the TA parameters in downstream (MNLI)
Figure 3: Bias and performance over time for different
                                                                                                                                                                                                                                                                          training. This way, the debiasing knowledge stored
size of downstream (MNLI) training sets (#instances).
We report mean and the 95% confidence interval over                                                                                                                                                                                                                       in A DELE’s debiasing adapters remains intact. Ta-
five runs for Net Neutral (NN) on Bias-NLI and Accu-                                                                                                                                                                                                                      ble 4 compares Bias-NLI and MNLI performance
racy (Acc) on the MNLI matched development set.                                                                                                                                                                                                                           of this fairness preserving variant (A DELE-TA)
                                                                                                                                                                                                                                                                          against BERT and A DELE.
                                                                                                                                                                                                                                                                             Results strongly suggest that by freezing the de-
(1,936,512 instances). This could point to the lack                                                                                                                                                                                                                       biasing adapters and injecting the additional task
of robustness of the NN measure (Dev et al., 2020)                                                                                                                                                                                                                        adapters, we indeed retain most of the debiasing
as means for capturing biases in fine-tuned Trans-                                                                                                                                                                                                                        effects of A DELE: according to bias measures,
formers. Second, after training on smaller datasets                                                                                                                                                                                                                       A DELE-TA is massively fairer than the fully fine-
(10K), A DELE still retains much of its debiasing                                                                                                                                                                                                                         tuned A DELE (e.g., FN score of 0.557 vs. A DELE’s
effect and is much fairer than BERT. With larger                                                                                                                                                                                                                          0.127). Preventing fairness forgetting comes at a
NLI training (already at 25K), however, much of                                                                                                                                                                                                                           tolerable task performance cost: A DELE-TA loses
its debiasing effect vanishes, although it still seems                                                                                                                                                                                                                    3 points in NLI accuracy compared to fully fine-
to be slightly (but consistently) fairer than BERT                                                                                                                                                                                                                        tuning BERT and A DELE for the task.
over time. We dub this effect fairness forgetting
and will investigate it further in future work.                                                                                                                                                                                                                           5            Related Work
                                                                                                                                                                                                                                                                         We provide a brief overview of work in two areas
Preventing Fairness Forgetting. Finally, we
                                                                                                                                                                                                                                                                         which we bridge in this work: debiasing methods
propose a downstream fine-tuning strategy that can
                                                                                                                                                                                                                                                                         and parameter efficient fine-tuning with adapters.
prevent fairness forgetting and which is aligned
with the modular debiasing nature of A DELE: we                                                                                                                                                                                                                           Adapter Layers in NLP. Adapters (Rebuffi
(1) inject an additional task-specific adapter (TA)                                                                                                                                                                                                                       et al., 2018) have been introduced to NLP by
on top of A DELE’s debiasing adapter and (2) update                                                                                                                                                                                                                       Houlsby et al. (2019), who demonstrated their ef-
fectiveness and efficiency for general language un-      evaluated A DELE on gender debiasing of BERT,
derstanding (NLU). Since then, they have been            demonstrating its effectiveness on three intrinsic
employed for various purposes: apart from NLU,           and two extrinsic debiasing benchmarks. Further,
task adapters have been explored for natural lan-        applying A DELE on top of mBERT, we success-
guage generation (Lin et al., 2020) and machine          fully transfered its debiasing effects to six target
translation quality estimation (Yang et al., 2020).      languages. Finally, we showed that by combining
Other works use language adapters encoding               A DELE’s debiasing adapters with task-adapters, we
language-specific knowledge, e.g., for machine           can preserve the representational fairness even af-
translation (Philip et al., 2020; Kim et al., 2019) or   ter large-scale downstream training. We hope that
multilingual parsing (Üstün et al., 2020). Further,      A DELE catalyzes more research efforts towards
adapters have been shown useful in domain adapta-        making fair NLP fairer, i.e., more sustainable and
tion (Pham et al., 2020; Glavaš et al., 2021) and for    more inclusive (i.e., more multilingual).
injection of external knowlege (Wang et al., 2020;
Lauscher et al., 2020b). Pfeiffer et al. (2020b) use     Acknowledgments
adapters to learn both language and task represen-
                                                         The work of Anne Lauscher and Goran Glavaš
tations. Building on top of this, Vidoni et al. (2020)
                                                         has been supported by the Multi2ConvAI Grant
prevent adapters from learning redundant informa-
                                                         (Mehrsprachige und Domänen-übergreifende Con-
tion by introducing orthogonality constraints.
                                                         versational AI) of the Baden-Württemberg Ministry
Debiasing Methods. A recent survey covering              of Economy, Labor, and Housing (KI-Innovation).
research on stereotypical biases in NLP is provided      Additionally, Anne Lauscher has partially received
by Blodgett et al. (2020). In the following, we focus    funding from the European Research Council
on approaches for mitigating biases from PLMs,           (ERC) under the European Union’s Horizon 2020
which are largely inspired by debiasing for static       research and innovation program (grant agreement
word embeddings (e.g., Bolukbasi et al., 2016; Dev       No. 949944, INTEGRATOR).
and Phillips, 2019; Lauscher et al., 2020a; Karve
et al., 2019, inter alia). While several works pro-      Further Ethical Considerations
pose projection-based debiasing for PLMs (e.g.,          In this work, we employed a binary conceptual-
Dev et al., 2020; Liang et al., 2020; Kaneko and         ization of gender due to the plethora of existing
Bollegala, 2021), most of the debiasing approaches       bias evaluation tests that are restricted to such a
require training. Here, some methods rely on de-         narrow notion of gender available. Our work is
biasing objectives (e.g., Qian et al., 2019; Bordia      of methodological nature (i.e., we do not create
and Bowman, 2019). In contrast, the debiasing ap-        additional data sets and text resources), and our
proach we employ in this work, CDA (Zhao et al.,         primary goal was to demonstrate the bias attenua-
2018), relies on adapting the input data and is more     tion effectiveness of our approach based on debi-
generally applicable. Variants of CDA exist, e.g.,       asing adapters: to this end, we relied on the avail-
Hall Maudslay et al. (2019) use names as bias prox-      able evaluation data sets from previous work. We
ies and substitute instances instead of augmenting       fully acknowledge that gender is a spectrum: we
the data, whereas Zhao et al. (2019) use CDA at test     fully support the inclusion of all gender identities
time to neutralize the models’ biased predictions.       (nonbinary, gender fluid, polygender, and other) in
Webster et al. (2020) investigate one-sided vs. two-     language technologies and strongly support work
sided CDA for debiasing BERT in pretraining and          on creating resources and data sets for measuring
show dropout to be effective for bias mitigation.        and attenuating harmful stereotypical biases ex-
                                                         pressed towards all gender identities. Further, we
6   Conclusion
                                                         acknowledge the importance of research on the in-
We presented A DELE, a novel sustainable and mod-        tersectionality (Crenshaw, 1989) of stereotyping,
ular approach to debiasing PLMs based on the             which we did not consider here for similar reasons
adapter modules. In contrast to existing compu-          – lack of training and evaluation data. Our modular
tationally demanding debiasing approaches, which         adapter-based debiasing approach, A DELE, how-
debias the entire PLM via full fine-tuning, A DELE       ever, is conceptually particularly suitable for ad-
performs parameter-efficient debiasing by train-         dressing complex intersectional biases, and this is
ing dedicated debiasing adapters. We extensively         something we intend to explore in our future work.
References                                                Kimberlé Crenshaw. 1989. Demarginalizing the inter-
                                                            section of race and sex: A black feminist critique of
Soumya Barikeri, Anne Lauscher, Ivan Vulić, and            antidiscrimination doctrine, feminist theory and an-
  Goran Glavaš. 2021. RedditBias: A real-world re-          tiracist politics. u. Chi. Legal f., 1989:139.
  source for bias evaluation and debiasing of conver-
  sational language models. In Proceedings of the         Sunipa Dev, Tao Li, Jeff M. Phillips, and Vivek Sriku-
  59th Annual Meeting of the Association for Compu-         mar. 2020. On measuring and mitigating biased in-
  tational Linguistics and the 11th International Joint     ferences of word embeddings. In The Thirty-Fourth
  Conference on Natural Language Processing (Vol-           AAAI Conference on Artificial Intelligence, AAAI
  ume 1: Long Papers), pages 1941–1955, Online. As-         2020, The Thirty-Second Innovative Applications of
  sociation for Computational Linguistics.                  Artificial Intelligence Conference, IAAI 2020, The
                                                            Tenth AAAI Symposium on Educational Advances
Marion Bartl, Malvina Nissim, and Albert Gatt. 2020.        in Artificial Intelligence, EAAI 2020, New York, NY,
 Unmasking contextual stereotypes: Measuring and            USA, February 7-12, 2020, pages 7659–7666. AAAI
 mitigating BERT’s gender bias. In Proceedings              Press.
 of the Second Workshop on Gender Bias in Natu-
 ral Language Processing, pages 1–16, Barcelona,          Sunipa Dev and Jeff Phillips. 2019. Attenuating bias in
 Spain (Online). Association for Computational Lin-         word vectors. In The 22nd International Conference
 guistics.                                                  on Artificial Intelligence and Statistics, pages 879–
                                                            887. PMLR.
Su Lin Blodgett, Solon Barocas, Hal Daumé III, and
  Hanna Wallach. 2020. Language (technology) is           Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
  power: A critical survey of “bias” in NLP. In Pro-         Kristina Toutanova. 2019. BERT: Pre-training of
  ceedings of the 58th Annual Meeting of the Asso-           deep bidirectional transformers for language under-
  ciation for Computational Linguistics, pages 5454–         standing. In Proceedings of the 2019 Conference
  5476, Online. Association for Computational Lin-           of the North American Chapter of the Association
  guistics.                                                  for Computational Linguistics: Human Language
                                                            Technologies, Volume 1 (Long and Short Papers),
Tolga Bolukbasi, Kai-Wei Chang, James Zou,                   pages 4171–4186, Minneapolis, Minnesota. Associ-
  Venkatesh Saligrama, and Adam Kalai. 2016.                 ation for Computational Linguistics.
  Man is to computer programmer as woman is to
  homemaker? debiasing word embeddings. In Pro-           Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and
  ceedings of the 30th International Conference on          James Zou. 2018.     Word embeddings quantify
  Neural Information Processing Systems, NIPS’16,           100 years of gender and ethnic stereotypes. Pro-
  page 4356–4364, Red Hook, NY, USA. Curran                 ceedings of the National Academy of Sciences,
  Associates Inc.                                           115(16):E3635–E3644.
Shikha Bordia and Samuel R. Bowman. 2019. Identify-       Goran Glavaš, Ananya Ganesh, and Swapna Somasun-
  ing and reducing gender bias in word-level language       daran. 2021. Training and domain adaptation for su-
  models. In Proceedings of the 2019 Conference of          pervised text segmentation. In Proceedings of the
  the North American Chapter of the Association for         16th Workshop on Innovative Use of NLP for Build-
  Computational Linguistics: Student Research Work-         ing Educational Applications, pages 110–116, On-
  shop, pages 7–15, Minneapolis, Minnesota. Associ-         line. Association for Computational Linguistics.
  ation for Computational Linguistics.
                                                          Rowan Hall Maudslay, Hila Gonen, Ryan Cotterell,
Tom Brown, Benjamin Mann, Nick Ryder, Melanie               and Simone Teufel. 2019. It’s all in the name: Mit-
  Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind        igating gender bias with name-based counterfactual
  Neelakantan, Pranav Shyam, Girish Sastry, Amanda          data substitution. In Proceedings of the 2019 Con-
  Askell, Sandhini Agarwal, et al. 2020. Language           ference on Empirical Methods in Natural Language
  models are few-shot learners. In Advances in Neural       Processing and the 9th International Joint Confer-
  Information Processing Systems, volume 33, pages          ence on Natural Language Processing (EMNLP-
  1877–1901. Curran Associates, Inc.                        IJCNLP), pages 5267–5275, Hong Kong, China. As-
                                                            sociation for Computational Linguistics.
Aylin Caliskan, Joanna J. Bryson, and Arvind
  Narayanan. 2017. Semantics derived automatically        Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski,
  from language corpora contain human-like biases.          Bruna Morrone, Quentin de Laroussilhe, Andrea
  Science, 356(6334):183–186.                               Gesmundo, Mona Attariyan, and Sylvain Gelly.
                                                            2019. Parameter-efficient transfer learning for NLP.
Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-           In Proceedings of the 36th International Conference
  Gazpio, and Lucia Specia. 2017. SemEval-2017              on Machine Learning, ICML 2019, volume 97 of
  task 1: Semantic textual similarity multilingual and      Proceedings of Machine Learning Research, pages
  crosslingual focused evaluation. In Proceedings           2790–2799, Long Beach, CA, USA. PMLR.
  of the 11th International Workshop on Semantic
  Evaluation (SemEval-2017), pages 1–14, Vancouver,       Masahiro Kaneko and Danushka Bollegala. 2021. De-
  Canada. Association for Computational Linguistics.       biasing pre-trained contextualised embeddings. In
Proceedings of the 16th Conference of the European       First Workshop on Knowledge Extraction and Inte-
  Chapter of the Association for Computational Lin-        gration for Deep Learning Architectures, pages 43–
  guistics: Main Volume, pages 1256–1266, Online.          49, Online. Association for Computational Linguis-
  Association for Computational Linguistics.               tics.

Saket Karve, Lyle Ungar, and João Sedoc. 2019. Con-      Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu,
  ceptor debiasing of word representations evaluated      Dehao Chen, Orhan Firat, Yanping Huang, Maxim
  on WEAT. In Proceedings of the First Workshop           Krikun, Noam Shazeer, and Zhifeng Chen. 2020.
  on Gender Bias in Natural Language Processing,          Gshard: Scaling giant models with conditional com-
  pages 40–48, Florence, Italy. Association for Com-      putation and automatic sharding. arXiv preprint
  putational Linguistics.                                 arXiv:2006.16668.

                                                         Sheng Liang, Philipp Dufter, and Hinrich Schütze.
Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram
                                                           2020. Monolingual and multilingual reduction of
  Khadivi, and Hermann Ney. 2019. Pivot-based
                                                           gender bias in contextualized representations. In
  transfer learning for neural machine translation be-
                                                           Proceedings of the 28th International Conference
  tween non-English languages. In Proceedings of
                                                           on Computational Linguistics, pages 5082–5093,
  the 2019 Conference on Empirical Methods in Natu-
                                                           Barcelona, Spain (Online). International Committee
  ral Language Processing and the 9th International
                                                           on Computational Linguistics.
  Joint Conference on Natural Language Process-
  ing (EMNLP-IJCNLP), pages 866–876, Hong Kong,          Zhaojiang Lin, Andrea Madotto, and Pascale Fung.
  China. Association for Computational Linguistics.        2020.     Exploring versatile generative language
                                                           model via parameter-efficient transfer learning. In
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A             Findings of the Association for Computational Lin-
  method for stochastic optimization. In 3rd Inter-        guistics: EMNLP 2020, pages 441–459, Online. As-
  national Conference on Learning Representations,         sociation for Computational Linguistics.
  ICLR 2015, Conference Track Proceedings, San
  Diego, CA, USA.                                        Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam Aman-
                                                           charla, and Anupam Datta. 2020. Gender bias in
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz,        neural natural language processing. In Logic, Lan-
  Joel Veness, Guillaume Desjardins, Andrei A Rusu,        guage, and Security, pages 189–202. Springer.
  Kieran Milan, John Quan, Tiago Ramalho, Ag-
  nieszka Grabska-Barwinska, et al. 2017. Over-          Michael McCloskey and Neal J Cohen. 1989. Catas-
  coming catastrophic forgetting in neural networks.       trophic interference in connectionist networks: The
  Proceedings of the national academy of sciences,         sequential learning problem. In Psychology of learn-
  114(13):3521–3526.                                       ing and motivation, volume 24, pages 109–165. El-
                                                           sevier.
Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black,
  and Yulia Tsvetkov. 2019. Measuring bias in contex-    Nafise Sadat Moosavi, Angela Fan, Vered Shwartz,
  tualized word representations. In Proceedings of the     Goran Glavaš, Shafiq Joty, Alex Wang, and Thomas
  First Workshop on Gender Bias in Natural Language        Wolf, editors. 2020. Proceedings of SustaiNLP:
  Processing, pages 166–172, Florence, Italy. Associ-      Workshop on Simple and Efficient Natural Language
  ation for Computational Linguistics.                     Processing. Association for Computational Linguis-
                                                           tics, Online.
Anne Lauscher and Goran Glavaš. 2019. Are we con-
                                                         Nikita Nangia, Clara Vania, Rasika Bhalerao, and
  sistently biased? multidimensional analysis of bi-
                                                           Samuel R. Bowman. 2020. CrowS-pairs: A chal-
  ases in distributional word vectors. In Proceedings
                                                           lenge dataset for measuring social biases in masked
  of the Eighth Joint Conference on Lexical and Com-
                                                           language models. In Proceedings of the 2020 Con-
  putational Semantics (*SEM 2019), pages 85–91,
                                                           ference on Empirical Methods in Natural Language
  Minneapolis, Minnesota. Association for Computa-
                                                           Processing (EMNLP), pages 1953–1967, Online. As-
  tional Linguistics.
                                                           sociation for Computational Linguistics.
Anne Lauscher, Goran Glavaš, Simone Paolo Ponzetto,      Matthew Peters, Mark Neumann, Mohit Iyyer, Matt
  and Ivan Vulić. 2020a. A general framework for im-     Gardner, Christopher Clark, Kenton Lee, and Luke
  plicit and explicit debiasing of distributional word    Zettlemoyer. 2018. Deep contextualized word rep-
  vector spaces. Proceedings of the AAAI Conference       resentations. In Proceedings of the 2018 Confer-
  on Artificial Intelligence, 34(05):8131–8138.           ence of the North American Chapter of the Associ-
                                                          ation for Computational Linguistics: Human Lan-
Anne Lauscher, Olga Majewska, Leonardo F. R.              guage Technologies, Volume 1 (Long Papers), pages
  Ribeiro, Iryna Gurevych, Nikolai Rozanov, and           2227–2237, New Orleans, Louisiana. Association
  Goran Glavaš. 2020b. Common sense or world              for Computational Linguistics.
  knowledge? investigating adapter-based knowledge
  injection into pretrained transformers. In Proceed-    Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé,
  ings of Deep Learning Inside Out (DeeLIO): The           Kyunghyun Cho, and Iryna Gurevych. 2021.
AdapterFusion: Non-destructive task composition       Emma Strubell, Ananya Ganesh, and Andrew McCal-
  for transfer learning. In Proceedings of the 16th       lum. 2019. Energy and policy considerations for
  Conference of the European Chapter of the Associ-       deep learning in NLP. In Proceedings of the 57th
  ation for Computational Linguistics: Main Volume,       Annual Meeting of the Association for Computa-
  pages 487–503, Online. Association for Computa-         tional Linguistics, pages 3645–3650, Florence, Italy.
  tional Linguistics.                                     Association for Computational Linguistics.
Jonas Pfeiffer, Andreas Rücklé, Clifton Poth, Aish-     Ahmet Üstün, Arianna Bisazza, Gosse Bouma, and
  warya Kamath, Ivan Vulić, Sebastian Ruder,             Gertjan van Noord. 2020. UDapter: Language adap-
  Kyunghyun Cho, and Iryna Gurevych. 2020a.               tation for truly Universal Dependency parsing. In
  AdapterHub: A framework for adapting transform-         Proceedings of the 2020 Conference on Empirical
  ers. In Proceedings of the 2020 Conference on Em-       Methods in Natural Language Processing (EMNLP),
  pirical Methods in Natural Language Processing:         pages 2302–2315, Online. Association for Computa-
  System Demonstrations, pages 46–54, Online. Asso-       tional Linguistics.
  ciation for Computational Linguistics.
                                                        Marko Vidoni, Ivan Vulić, and Goran Glavaš.
Jonas Pfeiffer, Ivan Vulić, Iryna Gurevych, and Se-     2020. Orthogonal language and task adapters in
  bastian Ruder. 2020b. MAD-X: An Adapter-Based          zero-shot cross-lingual transfer. arXiv preprint
  Framework for Multi-Task Cross-Lingual Transfer.       arXiv:2012.06460.
  In Proceedings of the 2020 Conference on Empirical
  Methods in Natural Language Processing (EMNLP),       Ivan Vulić, Simon Baker, Edoardo Maria Ponti, Ulla
  pages 7654–7673, Online. Association for Computa-        Petti, Ira Leviant, Kelly Wing, Olga Majewska, Eden
  tional Linguistics.                                      Bar, Matt Malone, Thierry Poibeau, Roi Reichart,
                                                           and Anna Korhonen. 2020. Multi-SimLex: A large-
Minh Quang Pham, Josep Maria Crego, François Yvon,         scale evaluation of multilingual and crosslingual lex-
  and Jean Senellart. 2020. A study of residual            ical semantic similarity. Computational Linguistics,
  adapters for multi-domain neural machine transla-        46(4):847–897.
  tion. In Proceedings of the Fifth Conference on Ma-
  chine Translation, pages 617–628, Online. Associa-    Tobias Walter, Celina Kirschner, Steffen Eger, Goran
  tion for Computational Linguistics.                     Glavaš, Anne Lauscher, and Simone Paolo Ponzetto.
                                                          2021. Diachronic analysis of german parliamentary
Jerin Philip, Alexandre Berard, Matthias Gallé, and       proceedings: Ideological shifts through the lens of
   Laurent Besacier. 2020. Monolingual adapters for       political biases. arXiv preprint arXiv:2108.06295.
   zero-shot neural machine translation. In Proceed-
   ings of the 2020 Conference on Empirical Methods     Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei,
   in Natural Language Processing (EMNLP), pages          Xuanjing Huang, Cuihong Cao, Daxin Jiang, Ming
   4465–4470, Online. Association for Computational       Zhou, et al. 2020. K-adapter: Infusing knowl-
   Linguistics.                                           edge into pre-trained models with adapters. arXiv
                                                          preprint arXiv:2002.01808.
Yusu Qian, Urwa Muaz, Ben Zhang, and Jae Won
  Hyun. 2019. Reducing gender bias in word-level        Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beu-
  language models with a gender-equalizing loss func-     tel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi,
  tion. In Proceedings of the 57th Annual Meeting of      and Slav Petrov. 2020. Measuring and reducing
  the Association for Computational Linguistics: Stu-     gendered correlations in pre-trained models. arXiv
  dent Research Workshop, pages 223–228, Florence,        preprint arXiv:2010.06032.
  Italy. Association for Computational Linguistics.
                                                        Adina Williams, Nikita Nangia, and Samuel Bowman.
Alec Radford, Jeffrey Wu, Rewon Child, David Luan,        2018. A broad-coverage challenge corpus for sen-
  Dario Amodei, and Ilya Sutskever. 2019. Language        tence understanding through inference. In Proceed-
  models are unsupervised multitask learners. OpenAI      ings of the 2018 Conference of the North American
  Blog, 1(8).                                             Chapter of the Association for Computational Lin-
                                                          guistics: Human Language Technologies, Volume
Sylvestre-Alvise Rebuffi, Andrea Vedaldi, and Hakan       1 (Long Papers), pages 1112–1122, New Orleans,
  Bilen. 2018. Efficient parametrization of multi-        Louisiana. Association for Computational Linguis-
  domain deep neural networks. In 2018 IEEE/CVF           tics.
  Conference on Computer Vision and Pattern Recog-
  nition, pages 8119–8127.                              Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
                                                          Chaumond, Clement Delangue, Anthony Moi, Pier-
Rachel Rudinger, Jason Naradowsky, Brian Leonard,         ric Cistac, Tim Rault, Remi Louf, Morgan Funtow-
  and Benjamin Van Durme. 2018. Gender bias in            icz, Joe Davison, Sam Shleifer, Patrick von Platen,
  coreference resolution. In Proceedings of the 2018      Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
  Conference of the North American Chapter of the         Teven Le Scao, Sylvain Gugger, Mariama Drame,
  Association for Computational Linguistics: Human        Quentin Lhoest, and Alexander Rush. 2020. Trans-
  Language Technologies, Volume 2 (Short Papers),         formers: State-of-the-art natural language process-
  pages 8–14, New Orleans, Louisiana. Association         ing. In Proceedings of the 2020 Conference on Em-
  for Computational Linguistics.                          pirical Methods in Natural Language Processing:
You can also read