Socially Aware Bias Measurements for Hindi Language Representations

Page created by Bob Fletcher
 
CONTINUE READING
Socially Aware Bias Measurements for Hindi Language Representations

                                             Vijit Malik1      Sunipa Dev2 Akihiro Nishi2 Nanyun Peng2 Kai-Wei Chang2
                                                                  1
                                                                    Indian Institute of Technology Kanpur (IIT-K)
                                                                       2
                                                                         University of California, Los Angeles
                                                                            {vijitvm}@iitk.ac.in
                                                                          {akihironishi}@ucla.edu
                                                             {sunipa, violetpeng, kwchang}@cs.ucla.edu

                                                               Abstract                         and history of regions they are used popularly
                                                                                                in, and as we switch between languages, the
arXiv:2110.07871v1 [cs.CL] 15 Oct 2021

                                             Language representations are an efficient tool     notion of bias, the types of societal biases and
                                             used across NLP, but they are strife with en-      what is considered as discrimination or fair also
                                             coded societal biases. These biases are stud-
                                                                                                change accordingly. This key difference how-
                                             ied extensively, but with a primary focus on
                                             English language representations and biases        ever is not reflected in the effort made towards
                                             common in the context of Western society. In       detecting, identifying and mitigating biases in
                                             this work, we investigate the biases present in    language representations, with majority of efforts
                                             Hindi language representations such as caste       predominantly in English language and in the
                                             and religion associated biases. We demon-          context of Western society (Bolukbasi et al.,
                                             strate how biases are unique to specific lan-      2016; Nangia et al., 2020). Some recent work
                                             guage representations based on the history and
                                                                                                tackle the challenges of societal biases in lan-
                                             culture of the region they are widely spoken in,
                                             and also how the same societal bias (such as       guage representations coming from various
                                             binary gender associated biases) when investi-     cultures and languages such as Arabic, French,
                                             gated across languages is encoded by different     Spanish and German (Lauscher et al., 2020;
                                             words and text spans. With this work, we em-       Chávez Mulsa and Spanakis, 2020; Kurpicz-Briki,
                                             phasize on the necessity of social-awareness       2020; Zhou et al., 2019). Ghosh et al. (2021)
                                             along with linguistic and grammatical artefacts    and Sambasivan et al. (2020) explore biases and
                                             when modeling language representations, in         algorithmic fairness concerning non-western
                                             order to understand the biases encoded.
                                                                                                contexts in machine learning and Pujari et al.
                                                                                                (2019) investigate gender bias in Hindi language
                                         1 Introduction
                                                                                                representations. However, it is unclear still if
                                         Language       models      and      representations    Hindi language captures many other biases that
                                         (Pennington et al.,     2014;    Bojanowski et al.,    are unique to the Indian society, and have potential
                                         2017; Devlin et al., 2018) are commonly used           to cause great harm, such as caste and religion
                                         across the world to aid in a variety of daily          related biases.
                                         applications including machine translation                In this work, we make three main contributions:
                                         (Kunchukuttan et al., 2017), information retrieval     (i) using a careful study of the social and cultural
                                         (Rao and Devi, 2018), chatbots (Bhagwat et al.,        composition of the Indian society, we highlight
                                         2019), sentiment classification (Kumar et al.,         and devise measures to detect, and quantify differ-
                                         2019) and more. A large body of work is thus fo-       ent societal biases present in Hindi language repre-
                                         cused on building robust language representations      sentations, (ii) we discuss the gendered nature of
                                         for different languages, including those which         Hindi language and the implications it has on the
                                         have low resources available.                          bias detection techniques, and (iii) we demonstrate
                                            Recent work have demonstrated how these             how translation of word lists or one-to-one map-
                                         representations capture and propagate soci-            ping of bias measures across languages is insuffi-
                                         etal biases such as gender (Bolukbasi et al.,          cient to detect biases meaningfully. In particular,
                                         2016), race (Caliskan et al., 2017), and nation-       word lists are commonly used both for detection
                                         ality (Dev and Phillips, 2019) related.        It is   and mitigation of biases in word representations.
                                         important to note that languages reflect the culture   Even when detecting the same societal bias in a
different language, direct translations of the words      Gendered verbs: Verbs in Hindi can be gen-
from English into the target language whose repre-        dered depending upon the tense of the sentence. In
sentations are being evaluated, does not suffice in       case of past tense and the perfect tenses (present,
achieving the same goals as the words may not ex-         past and future) which are built upon the past tense,
ist, be commonly used or associated with the same         the gender of the verb is determined by the gen-
sentiment or bias (Section 3). Sentiments (good,          der of the subject. For example: ‘veha fal lene
bad, evil, etc) too can be encoded by distinct sets       gaya/gayi’ (He/She went to buy fruits).
of objects and words (Section 3). Along with this
                                                          Gendered Adjectives: Adjectives in Hindi also
social awareness, the differences in language struc-
                                                          provide gender cues towards the subject. How-
ture, the changes in grammar, etc., all play impor-
                                                          ever, not all adjectives change form according to
tant roles in understanding biased associations and
                                                          the gender of the subject. For example: the adjec-
distinguishing them from meaningful ones, which
                                                          tive ‘gharelu’ (Domestic) in Hindi is used the same
we demonstrate in this work.
                                                          whether a man is domestic or a woman is domes-
                                                          tic. On the other hand, adjectives like ‘good’ in
2 Language and Culture Considerations
                                                          English when translated to Hindi, is either ‘achha’
The perception and interpretation of biases are           for ‘a good man’ or ‘acchhi’ for ‘a good woman’.
sensitive to societal construct, socio-cultural struc-
                                                           Gendered titles: Some titles for entities in
tures, and historical compositions of different re-
                                                           Hindi can be gendered.        For example: ‘A
gions of the world. Since languages are culture
                                                           school principal’ when translated to hindi can be
and region specific, there is a requirement to study
                                                          ‘pradhanacharyaa’ or ‘pradhanacharya’ depending
the socio-cultural differences when trying to un-
                                                           upon whether the principal is female or male re-
derstand the biases and potential harms of lan-
                                                           spectively. Another example is ‘adhyapak’ (male
guage representations globally. Consider the ex-
                                                           teacher) and ‘adhyapika’ (female teacher).
ample of Hindi where along with gender bias,
other biases like the ones based on caste and reli-       Gender instantiations of inanimate nouns: In-
gion are also rampant. Caste especially, is some-         stantiations of grammatical gender for inanimate
thing unique to the Indian peninsula and is not           nouns when used in a sentence is an important as-
usually considered when analysing biases in lan-          pect of Hindi language. Note that these intanti-
guages in Western studies. Similarly region and           ations also depend upon the dialect spoken. As
culture specific biases also are present with respect     (Hall, 2002) pointed out, the word ‘dahi’ (yoghurt)
to religion, occupation or geographical location          is assigned feminine forms of verbs and adjectives
(Thorat and Newman, 2007). Additionally, there            in western dialects whereas, it is assigned mascu-
are several key linguistic differences between En-        line forms in Eastern dialects of Hindi.
glish and Hindi languages. For example, pronouns
                                                          2.2 Caste, Religion, and Occupation Biases
in English indicate the gender of the person: ‘He
is good at sports.’, one can easily infer that the sub-   In the past, discrimination based on attributes of
ject is male from the pronoun. However, in Hindi,         caste, religion, and occupation has been prominent
pronouns do not reveal gender: ‘veha khelne me            in India (Banerjee and Knight, 1985; Deshpande,
achha hai’. The pronoun ‘veha’ is gender neutral.         2010). While illegal, some of these biases are still
In Hindi, one uses the cues from the form of adjec-       common and as a result, languages and text re-
tives or verbs in the sentence to infer the gender        sources reflect the same. Caste is a form of social
of the subject, which too in different cases can be       stratification unique to Hinduism and the Indian
gender-neutral. This however, does not imply a            society. It involves the concept of hereditary trans-
lack of gender bias in Hindi, but rather makes it         mission of style of life in terms of occupation, sta-
harder to detect and mitigate.                            tus and social interaction, and is thus commonly
                                                          associated strongly with last names used by per-
2.1   Gender in Hindi                                     sons1 . Originally, caste was decided by the oc-
The syntactical use of gender in Hindi is layered         cupation of the individual, however, later it was
and significantly different than English in differ-       transformed to a hereditary form of status and was
ent ways (Hall, 2002). These differences affect the       associated with strong biases pertaining to purity,
                                                             1
ways biases are likely to be encoded.                            https://en.wikipedia.org/wiki/Caste
goodness, intelligence, etc. of individuals and lead       An extension to WEAT for sentence encoders
to social exclusion. While discrimination on the        was proposed by May et al. (2019) as SEAT (Sen-
basis of caste is illegal in the country, stereotypes   tence Embedding Association Test). Similar to
and marginalization are still rampant and thus is       word lists for WEAT tests, in SEAT we have sen-
reflected commonly in Hindi corpora. India is a         tence lists in which sentences are bleached to fo-
secular country but due to historical clashes, there    cus upon the target word upon which we want to
are biases against the relatively minority religious    calculate bias in the setting when we are working
population practicing Islam as opposed to the ma-       with contextualized embeddings.
jority religion of Hinduism. These biases are asso-
ciated with perceptions (good, bad, evil, etc.) of      3.1 Gender Bias
the different populations. Although biases against      For evaluating binary gender associated bias, we
other religions are also present, we especially fo-     create WEAT tests (Caliskan et al., 2017) in two
cus upon Islam and Hinduism since these are the         ways in Hindi. For the Translated2 test, we di-
most prominent. As some lastnames are highly as-        rectly translate individually each word in each test
sociated with their religion in India, we use the       for career & family, arts & mathematics and arts
lastnames as a proxy to study such a bias. Effec-       & sciences tests in (Caliskan et al., 2017). Note
tively, our study evaluates bias against people with    that direct translations of some words like ‘Shake-
lastnames historically associated with certain reli-    speare’ and ‘NASA’ in Arts and Science lists, are
gion. In addition to these biases, the gaps between     not accurate. Also, some words from English
rural and urban India in terms of the education and     like ‘cousin’ do not have a corresponding word
poverty has lead to a discrepancy of perception         in Hindi. Next, we develop a set of socially-
(positiver versus negative) between urban and ru-       aware Language-specific tests, where we curate
ral occupations. The language of Hindi, which is        word lists (both and attribute and target) based on
popularly spoken in different parts of the country,     popular word usage in Hindi and their associations,
thus may reflect similar biases in its language rep-    and word frequencies in Hindi Wikipedia text. Ta-
resentations.                                           ble 1 highlights how translated sets capture lesser
3 Measuring Biases in Hindi language                    bias as compared to the WEAT tests specifically
                                                        created for Hindi - which are a better reflection
  Representations
                                                        of how much bias exists in the representations. In
To quantify bias in English embeddings,                 particular, the WEAT translated test for binary gen-
Caliskan et al. (2017) propose the Word Em-             der and science versus arts captures significantly
bedding Association Test (WEAT), which                  lower amount of bias, unlike what is prevalent in
measures the stereotypical association between          the society as well as associated text.
two sets each of target concepts (X and Y) and             For the WEAT Hindi test set, we create an-
attributes (A and B).The WEAT test is a difference      other test to quantify gender bias across neutral
between sums over the respective target concepts,       adjectives, based on societal biases in Indic soci-
                  X                X                    ety (Gundugurti et al., 2015). Appendix B lists all
s(X, Y, A, B) =       s(x, A, B) −      s(y, A, B)
                                                        word sets used in each WEAT test. Further, in Sec-
                  x∈X                y∈Y
                                                        tion 2.1, we see that there are four specific gen-
                                                 (1)
                                                        dered word groups in Hindi, all of which are mean-
where the quantity, s(w, A, B) is calculated using      ingfully gendered and important to be encoded
cosine similarity as follows:                           and retained in our representations. For each
                P                   P                   such group, we construct an independent “Mean-
                   a∈A cos(w, a)           cos(w, b)
s(w, A, B) =                       − b∈B                ingful Encoding (ME)” WEAT test (Dev et al.,
                       |A|                |B|           2020)(word lists in Appendix B). The importance
                                                 (2)
                                                        of this is two-fold: (i) it allows us to verify if mean-
The significance of the test statistic is measured
                                                        ingful gendered information is encoded in our rep-
by the non-parametric permutation test in which
                                                        resentations, and (ii) compare with biased associ-
the s(X, Y, A, B) is compared to s(X , Y, A, B),
                                                        ations (measured by WEAT) to gauge the overall
where (X , Y) denotes a random, equally-sized
split of the terms in X ∪ Y. A larger WEAT ef-              2
                                                              We use the Google-Translate API to obtain the transla-
fect size indicates a larger bias.                      tions.
magnitude of bias versus information about an at-             there are no such ambiguous entities.
tribute captured by our embeddings. In Table 2 we                Another WEAT test we build is based upon pos-
can see that for 300 dimensional Hindi GloVe em-              itive and negative adjectives association with caste
beddings (Kumar et al., 2020), significant bias is            names. For the Translated version, we take Hindi
observed using the three WEAT tests (Language-                translations of words from Caliskan et al. (2017)
Specific) for binary gender and adjectives, science           for detecting racial bias. For Language-specific
v/s arts, and maths v/s arts. Each score is over 1.00,        test we curate a new word list of adjectives based
and comparable with the WEAT test for mean-                   on words used popularly as positive or negative in
ingful information encoding, which highlights the             Hindi (Thorat and Newman, 2007).
high magnitude of bias encoded.                                  Table 2 highlights that there is significant caste
   We also develop SEAT tests for Hindi.Similar               related perception bias. For both WEAT and SEAT
to (May et al., 2019), we make a list of bleached             tests, the measured biases are over 1.2 for GloVe
sentence templates and fit each target word (from             embeddings. The results in Table 1 compare the
a WEAT target list) to construct SEAT target lists            Translated and Language-Specific adjective word
for each part of speech category. We used the                 lists. For WEAT test, the bias measured by Trans-
Hindi translations of the bleached sentences pro-             lated word lists are less than half of that mea-
vided in the original work. However, we remove                sured by Language-Specific word lists, empha-
some ambiguous translations and we added other                sizing the importance of creating socially-aware
bleached sentences that we constructed in place of            Language-Specific word lists which correlate bet-
them. The list and examples can be found in Ap-               ter with the society and culture the language is as-
pendix D. We conduct SEAT test upon 300 dimen-                sociated with.
sional Hindi GloVe embeddings and Hindi ELMo
(Kumar et al., 2020). One can observe in Table 2              3.3 Religion Bias
that for GloVe, the SEAT scores measure bias to
be similar as that measured by WEAT tests. For                In this work, we investigate the same by con-
ELMo, the bias is seen mainly with respect to bi-             structing two WEAT and two SEAT tests to de-
nary gender and adjectives. From Table 1 we can               tect religion associated biases in Hindi embed-
note that the Language-Specific word lists provide            dings. Our first test is based upon associating pos-
better signals for the detection and quantification           itive and negative adjectives to religious entities.
of bias. This highlights the significance of con-             One attribute list consists of Hindu religious en-
structing the word lists in WEAT tests while keep-            tities and one consists of Muslim religious enti-
ing language and cultural considerations in mind.             ties. In our second test, we associate adjectives
                                                              with lastnames. This stems from the distinct last
3.2      Caste Bias                                           names commonly used by the two populations
                                                              (see Appendix B). Similar to Caste bias detection,
To evaluate Hindi representations for the presence
                                                              we experiment with the Tranlated and Language-
of caste bias, we build two WEAT tests and two
                                                              Specific adjective lists. Additionally, we evaluate
corresponding SEAT tests using the last names
                                                              if religious information which is correctly associ-
that are statistically more associated with stereo-
                                                              ated is learnt by the representations (for example,
typically lower and upper castes. For lower castes,
                                                              mosque being the place of worship in Islam should
we randomly sample lower caste names from the
                                                              be associated with it in representations). To ascer-
list of scheduled castes provided by the Depart-
                                                              tain this, we create a meaningful encoding (ME)
ment of Social Justice and Empowerment in In-
                                                              test for religious information (word lists in Ap-
dia3 . Our first test is based upon detecting bias in
                                                              pendix).
occupations, whether some occupations are associ-
                                                                  In Table 2 we see using the WEAT and SEAT
ated with upper castes (the upper strata of castes)
                                                              scores for for 300 dimensional GloVe embeddings
and some occupations with lower castes. Note that,
                                                              that significant religious bias with respect to posi-
some caste names mean certain occupations them-
                                                              tive and negative perception is detected.
selves in Hindi language. For example ‘kumhar’
means both a lower caste and the occupation of                    Table 1 compares the measured bias in case of
pottery. We made sure that in our target word lists           Translated and Language-Specific adjective word
                                                              lists. Similar to earlier observations, here as
   3
       http://socialjustice.nic.in/UserView/index?mid=76750   well we have significantly larger bias measured by
WEAT (GloVe)                     SEAT (GloVe)
      Attribute                 Description
                                                             Translated Lang-Specific        Translated Lang-Specific
                         maths, arts vs male, female         0.94 (0.02)   1.12 (0.01)       0.87 (0.00)    1.14 (0.00)
      Gender
                        science, arts vs male, female        0.27 (0.31)   1.13 (0.02)       0.18 (0.17)    1.03 (0.00)
       Caste                  adjectives vs caste            0.72 (0.00)   1.52 (0.00)       0.74 (0.00)    1.40 (0.00)
                         adjectives vs religion terms        1.05 (0.00)   1.28 (0.01)       1.04 (0.00)    1.20 (0.00)
      Religion
                            adjectives vs lastnames          0.93 (0.00)   1.55 (0.00)       0.95 (0.00)    1.41 (0.00)
   Occupation       adjectives vs urban, rural occupations   -0.08 (0.59)  1.58 (0.00)       -0.13 (0.88)   1.42 (0.00)

Table 1: Comparison of word lists obtained by direct English to Hindi translations of word lists in Caliskan’s work
vs Language-Specific word lists. WEAT and SEAT bias measurements (with p-values in parentheses) are provided
in the table.

                                                                         WEAT                      SEAT
           Attribute                       Description
                                                                         GloVe         GloVe       ELMo    RoBERTa
                                   maths, arts vs male, female         1.12 (0.01)   1.14 (0.00)    0.17     -0.01
                       BM         science, arts vs male, female        1.13 (0.01)   1.03 (0.00)    0.14      0.33
                                    adjectives vs male, female         1.21 (0.02)   1.19 (0.00)    1.37      0.49
         Gender                  gendered verbs vs male, female        1.87 (0.00)   1.84 (0.00)    1.66     0.47
                               gendered adjectives vs male, female     1.70 (0.00)   1.63 (0.00)    1.78      0.38
                       ME
                                gendered entities vs male, female      1.14 (0.01)   1.12 (0.00)    1.77     -0.22
                                 gendered titles vs male, female       1.92 (0.00)   1.86 (0.00)    1.64      1.31
                                       occupations vs caste            1.44 (0.00)   1.26 (0.00)    0.89      0.89
          Caste        BM
                                        adjectives vs caste            1.52 (0.00)   1.40 (0.00)    0.48     -0.02
                                   adjectives vs religion terms        1.28 (0.01)   1.20 (0.00)    0.75      0.42
                       BM
         Religion                     adjectives vs lastnames          1.55 (0.00)   1.41 (0.00)    1.02      0.56
                       ME          religious entities vs religion      1.75 (0.00)   1.69 (0.00)    1.23      0.23
       Occupation      BM     adjectives vs urban, rural occupations   1.58 (0.00)   1.42 (0.00)    1.12      0.77

Table 2: Bias measurements (with p-values in parentheses) for Gender, Caste and Religion bias using WEAT
and SEAT tests. For SEAT over ELMo and RoBERTa we only provide the measured bias. These results are
for Language-Specific word lists. Note that BM stands for Bias Measuring test and ME stands for Meaningful
Encoding type test.

Language-Specific word lists.                                     4 Conclusion
                                                                  Biases are inherently complex as are their encod-
                                                                  ings in language representations. Their detection
3.4    Rural v/s Urban Occupation Bias                            and mitigation, in order to be holistic has to take
                                                                  into account a multitude of factors - the language
                                                                  and its grammar, the regions it is spoken in, the
In addition to the other cultural biases, we detect
                                                                  laws and policies therein as well as the history and
bias in urban and rural occupations, which is preva-
                                                                  culture of the region. In this paper, we demon-
lent in Indic society - with urban occupations seen
                                                                  strate how a predetermined set of biases and a
as better, richer, more desirable, and even of a
                                                                  peripheral evaluation consisting of a narrow per-
higher social status. For this, we develop a WEAT
                                                                  spective of how biases manifest themselves is not
and corresponding SEAT tests. The attribute list
                                                                  sufficient to achieve fair representations across the
consisted of lists of urban occupations and rural
                                                                  globe.
occupations and the target lists consisted of polar-
ized adjectives. (Appendix B specifies word lists).
   Table 2 illutrates with WEAT and SEAT scores                   References
the biased associations of perception between ur-                 Biswajit Banerjee and John B Knight. 1985. Caste dis-
ban and rural occupations. For 300 dimensional                      crimination in the indian urban labour market. Jour-
                                                                    nal of development Economics, 17(3):277–307.
GloVe embeddings the WEAT test measures bias
as ≈ 1.6 and SEAT ≈ 1.42. We observe that the                     Varad Bhagwat, Mrunali N Nagarkar, Pooja Paramane,
measured bias is greater than 1.00 for ELMo as                      and Shrikant Jindam. 2019. Review of chatbot sys-
well which highlight the presence of occupation                     tem in hindi language. In Review of Chatbot System
                                                                    in Hindi Language.
associated bias in the contextualized embeddings
too.                                                              Piotr Bojanowski, Edouard Grave, Armand Joulin, and
Tomas Mikolov. 2017. Enriching word vectors with        Anoop Kunchukuttan, Pratik Mehta, and Pushpak Bhat-
  subword information. Transactions of the Associa-         tacharyya. 2017. The iit bombay english-hindi par-
  tion for Computational Linguistics, 5:135–146.            allel corpus. arXiv preprint arXiv:1710.02855.

Tolga Bolukbasi, Kai-Wei Chang, James Zou,                Mascha Kurpicz-Briki. 2020. Cultural differences in
  Venkatesh Saligrama, and Adam Kalai. 2016. Man           bias? origin and gender bias in pre-trained german
  is to computer programmer as woman is to home-           and french word embeddings. Proceedings of the
  maker? debiasing word embeddings. arXiv preprint         5th Swiss Text Analytics Conference (SwissText) &
  arXiv:1607.06520.                                        16th Conference on Natural Language Processing
                                                           (KONVENS).
Aylin Caliskan, Joanna J Bryson, and Arvind
  Narayanan. 2017. Semantics derived automatically        Anne Lauscher, Rafik Takieddin, Simone Paolo
  from language corpora contain human-like biases.          Ponzetto,     and       Goran      Glavaš.    2020.
  Science, 356(6334):183–186.                               AraWEAT: Multidimensional analysis of biases in Arabic word embe
                                                            In Proceedings of the Fifth Arabic Natural Language
Rodrigo        Alejandro       Chávez      Mulsa           Processing Workshop, pages 192–199, Barcelona,
  and        Gerasimos        Spanakis.     2020.           Spain (Online). Association for Computational
  Evaluating bias in Dutch word embeddings.    In           Linguistics.
  Proceedings of the Second Workshop on Gender
  Bias in Natural Language Processing, pages 56–          Chandler May, Alex Wang, Shikha Bordia,
  71, Barcelona, Spain (Online). Association for            Samuel R. Bowman, and Rachel Rudinger. 2019.
  Computational Linguistics.                                On measuring social biases in sentence encoders.
                                                            In Proceedings of the 2019 Conference of the
Manali S Deshpande. 2010. History of the indian caste       North American Chapter of the Association for
 system and its impact on india today. CalPoly Stu-         Computational Linguistics: Human Language Tech-
 dent Research.                                             nologies, Volume 1 (Long and Short Papers), pages
                                                            622–628, Minneapolis, Minnesota. Association for
Sunipa      Dev,      Tao     Li,        Jeff      M
                                                            Computational Linguistics.
  Phillips,    and     Vivek    Srikumar.      2020.
  Oscar: Orthogonal subspace correction and rectification Nikita
                                                          of biases inNangia,
                                                                       word embeddings.
                                                                                 Clara     Vania,      Rasika
Sunipa Dev and Jeff Phillips. 2019. Attenuating bias in     Bhalerao, and Samuel R. Bowman. 2020.
  word vectors. In The 22nd International Conference        CrowS-pairs: A challenge dataset for measuring social biases in mask
  on Artificial Intelligence and Statistics, pages 879–     In Proceedings of the 2020 Conference on Empir-
  887. PMLR.                                                ical Methods in Natural Language Processing
                                                            (EMNLP), pages 1953–1967, Online. Association
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and               for Computational Linguistics.
   Kristina Toutanova. 2018. Bert: Pre-training of deep
   bidirectional transformers for language understand-   Jeffrey Pennington, Richard Socher, and Christopher D
   ing. arXiv preprint arXiv:1810.04805.                    Manning. 2014. Glove: Global vectors for word rep-
                                                            resentation. In Proceedings of the 2014 conference
Sayan Ghosh,         Dylan Baker,        David Jur-         on empirical methods in natural language process-
  gens, and Vinodkumar Prabhakaran. 2021.                   ing (EMNLP), pages 1532–1543.
  Detecting cross-geographic biases in toxicity modeling on social media.
                                                         Arun K. Pujari, Ansh Mittal, Anshuman Padhi, An-
Prasad Gundugurti, KL Vidya, and V Sriramya. 2015.          shul Jain, Mukesh Jadon, and Vikas Kumar. 2019.
  The indian ”girl” psychology: A perspective.     In-      Debiasing  gender biased hindi words with word-embedding.
  dian journal of psychiatry, 57:S212–5.                    In Proceedings of the 2019 2nd International Con-
                                                            ference on Algorithms, Computing and Artificial
Kira Hall. 2002. ”Unnatural” Gender in Hindi, pages         Intelligence, ACAI 2019, page 450–456, New York,
  133–162. Oxford University Press.                         NY, USA. Association for Computing Machinery.
Saurav Kumar, Saunack Kumar, Diptesh Kanojia, and         Pattabhi RK Rao and Sobha Lalitha Devi. 2018.
  Pushpak Bhattacharyya. 2020. “a passage to india”:        Eventxtract-il: Event extraction from newswires and
  Pre-trained word embeddings for indian languages.         social media text in indian languages@ fire 2018-an
  In Proceedings of the 1st Joint Workshop on Spoken        overview. In FIRE (Working Notes), pages 282–290.
  Language Technologies for Under-resourced lan-
  guages (SLTU) and Collaboration and Computing           Nithya Sambasivan, Erin Arnesen, Ben Hutchin-
  for Under-Resourced Languages (CCURL), pages              son, and Vinodkumar Prabhakaran. 2020. Non-
  352–357.                                                  portability of algorithmic fairness in india. ArXiv,
                                                            abs/2012.03659.
Yaman Kumar, Debanjan Mahata, Sagar Aggarwal,
  Anmol Chugh, Rajat Maheshwari, and Rajiv Ratn           Sukhadeo Thorat and Katherine S Newman. 2007.
  Shah. 2019.      Bhaav-a text corpus for emo-             Caste and economic discrimination: causes, con-
  tion analysis from hindi stories. arXiv preprint          sequences and remedies. Economic and Political
  arXiv:1910.04073.                                         Weekly, pages 4121–4124.
Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-                Appendix
  Hao Huang,        Muhao Chen,         Ryan Cot-
  terell,     and     Kai-Wei      Chang.     2019.    A WEAT
  Examining gender bias in languages with grammatical gender.
  In Proceedings of the 2019 Conference on Empiri-     Let X and Y be equal-sized sets of target concept
  cal Methods in Natural Language Processing and
                                                       embeddings and let A and B be sets of attribute
  the 9th International Joint Conference on Natural
  Language Processing (EMNLP-IJCNLP), pages            embeddings. Let cos(a, b) denote the cosine sim-
  5276–5284, Hong Kong, China. Association for         ilarity between vectors a and b. The test statistic
  Computational Linguistics.                           is a difference between sums over the respective
                                                      target concepts,
                                                                          X                    X
                                                      s(X, Y, A, B) =           s(x, A, B) −         s(y, A, B)
                                                                          x∈X                  y∈Y
                                                                                                           (3)

                                                      where the quantity, s(w, A, B) is calculated using
                                                      cosine similarity as follows:
                                                                      P                       P
                                                                          a∈A   cos(w, a)    cos(w, b)
                                                                                                  b∈B
                                                      s(w, A, B) =                        −
                                                                                |A|         |B|
                                                                                                   (4)
                                                      The amount of bias in WEAT is analyzed by effect
                                                      size d calculated as:

                                                            meanx∈X s(x, A, B) − meany∈Y s(y, A, B)
                                                      d=
                                                                       stddevw∈X∪Y s(w, A, B)
                                                                                                           (5)
                                                      In order to compute the test significance (p-value),
                                                      X and Y lists are merged together, and 10,000 per-
                                                      mutations, P of the combined list is generated. For
                                                      the i-th list in P , it is split in new pairs of Xi and
                                                      Yi lists. Then the test statistic equation is used to
                                                      calculate the p-value in the following way:

                                                             P
                                                                 i∈P [s(Xi , Yi , A, B)]   > s(X, Y, A, B)]
                                                        p=
                                                                                   |P |
                                                                                                           (6)

                                                      B WEAT word lists
                                                      Refer to Table 7, Table 8 and Table 9.

                                                      C Debiasing
                                                      There have been notable advances towards debias-
                                                      ing embeddings along the direction of gender bias.
                                                      Both Bolukbasi et al. (2016) and Dev and Phillips
                                                      (2019) propose using linear projection to debias
                                                      word embeddings, but the former in addition also
                                                      equalizes word pairs about the attribute (e.g., gen-
                                                      der) axis. In this work we use the more general
                                                      approach of linear projection as it can be adapted
                                                      to several biases apart from gender.
In the method of linear projection, all words          the information. The direction for gendered verbs
w ∈ W are debiased to w′ by being projected or-          (dv ), adjectives (da ), titles (dt ) and entities (de )
thogonally to the identified bias vector vB .            were calculated by conducting PCA over the word
                                                         lists (App C). For LPSG method, we provide re-
               w′ = w − hw, vB ivB                (7)    sults of orthogonalizing the semantic gender with
                                                         respect to verbs and adjectives directions. In an-
   In case of hard debiasing, we required list of
                                                         other experiment, we orthogonalize the semantic
equalizing pairs and list of words to not debias
                                                         gender with respect to all the 4 directions.
in Hindi. However, direct translation of the word
                                                            Table 3 demonstrates how different gender sub-
lists to Hindi did not always make sense. Since,
                                                         spaces affect the WEAT effect sizes in both bias
some words like ‘she’ and ‘he’ had overlapping
                                                         measuring and information retention tests. Note
translations and both the pronouns are referred to                                            ~ − nar}
                                                         that the single direction of {naari          ~     was
as ‘veha’ in Hindi. This overlapping translation is
                                                         able to debias the best upon the math and arts test.
true the other way round as well, the word grand-
                                                         Pairwise PCA over gendered words debiased the
father can be either ‘nana’ (maternal grandfather)
                                                         science and arts test quite significantly with an ef-
or ‘dada’ (paternal grandfather).
                                                         fect size of only 0.001 after debiasing. Hard de-
   For languages with grammatical gender,
                                                         biasing is not able to debias the first two tests in
Zhou et al. (2019) proposed to determine seman-
                                                         the WEAT setting, however, it reduces the effect
tic gender direction. To obtain the semantic
                                                         sizes in case of SEAT (see Table 4). In both WEAT
gender direction (ds ), the grammatical gender
                                                         and SEAT tests for neutral adjectives vs gendered
component (dg ) in the computed gender direction
                                                         words, hard debiasing performs best against any
(obtained from PCA over gendered word pairs,
                                                         other methods. Although hard debiasing works
dP CA ) to make the semantic gender direction
                                                         competitively, it comes with the downside that it
orthogonal to grammatical gender.
                                                         does not retain the gendered information in our in-
                                                         formation retention (IR) tests. Both of the LPSG
           ds = dP CA − hdP CA , dg idg           (8)
                                                         variants were able to debias competitively while at
  We use this orthogonalized gender direction to         the same time were best in retaining the gendered
perform linear debiasing. We refer to this method        information of IR tests.
as LPSG (Linear Projection with Semantic Gen-
der).                                                    C.2 Debiasing Caste
                                                         Similar to debiasing gender, for caste we first be-
C.1 Debiasing Binary Gender                              gin with determining the caste direction with a two
The first step in debiasing using linear projection      words, one stereotypically considered upper caste
is to identify a bias subspace/vector. We experi-        and one lower: {ghasiya ~          ~
                                                                                      − pandit}.     We also
ment with different settings to identify the gender                                  ~           ~
                                                         try with the direction {ghasiya − desai}.      Since
vector in Hindi, including (i) a single gender spe-      castes do not occur in pairs, a set of word pairs
cific word pair direction of {naari~ − nar},~     (ii)   cannot be meaningfully constructed as done with
PCA over a list of paired gender specific words (in      binary gender in English (Bolukbasi et al., 2016).
the form {m ~ i − f~i }). For more results with other    Hence, we compose lists of stereotypically upper
gender directions, refer to the Appendix. Also, the      and lower castes, and conduct PCA over the com-
word lists used in the experiment are provided in        bined list to obtain the vector of caste bias. Refer
Appendix C.                                              to Appendix C for the word lists used in the exper-
   For hard debiasing, we considered two types of        iment. In Table 5 we can observe that the linear de-
gender definition word lists. In one list we in-                                                     ~
                                                         biasing using the single direction of {ghasiya     −
cluded only the gender definitional pairs translated       ~
                                                         desai} is unable to debias competitively when
to Hindi from the original English lists (after some     compared with the other two methods. Note that
modifications to remove ambiguous translations).                                     ~
                                                         the single direction of {ghasiya       ~
                                                                                           − pandit}   is able
In another experiment, we added pairs of gendered        to debias better than PCA over list of caste names.
verbs to the list as well.
   Hindi is a language having grammatical gender.        C.3 Debiasing Religion
As introduced in Section 2.1, we have 4 special          In order to mitigate religious biases in Hindi, we
gender directions along which we want to preserve        acknowledge how in Indian culture, the religion
of a person is generally identifiable by their last        • Hindi-SEAT-verbs: ‘yeha hai’, ‘veha hai’,
names. We thus, utilize last names to determine              ‘vo hai’, ‘ye hai’, ‘vahan hai’, ‘yahan
the direction of bias. We use both (i) a single set           hai’
of common last names {acharya  ~          ~
                                      − nasir},   and
(ii) a set of hindu and muslim entities.                   • Hindi-SEAT-adjectives: ‘yeha hai’, ‘veha
   Another religion direction is calculated by com-          hai’, ‘vo hai’, ‘ye hai’
bining word lists of Hindu and Muslim lastnames          In other words, if the target word is adjective,
and then conducting PCA over them, we call this          we use Hindi-SEAT-adjective list of bleached sen-
religion bias direction as dlast . The words lists are   tences with each WEAT target word.
provided in Appendix C.
   In Hindi language, various religious entities are
inherently associated with a particular religion, for
example, “Bible is to Christianity as Bhagwad
Gita is to Hinduism” is not bias. To accomo-
date for such cases, we again take motivation from
(Zhou et al., 2019) to obtain a direction dent from
the entities word lists (Appendix C) and keep the
religion direction calculation calculated by Hindu
and Muslim lastnames dlast , orthogonal to it.

         d′last = dlast − hdlast , dent ident     (9)
   We believe that if we debias words using     d′last
as bias direction, we should be able to preserve the
knowledge of religion information retention test
and debias competitively.
   In Table 6, we see that linear debiasing by con-
ducting PCA over a list of religious entities is not
able to debias much in any of the tests. The same
could be observed for linear debiasing using sin-
                                    ~
gle set of common last names {acharya           ~
                                           − nasir}.
However, if we linear debias by PCA over a list
of lastnames, we are able to debias significantly.
Although the Information Retention WEAT effect
size is less than the previous methods, they did not
even affect the religion bias which is our primary
goal. Zhou’s variant for religion debias performs
well since it is able to debias competitively as well
as retains greater amount of necessary religion in-
formation. Refer to Appendix B for the word lists
used in the test.

D    SEAT
We define a list of bleached sentences for each part
of speech type of words as follows:
    • Hindi-SEAT-name: ‘yeha hai’, ‘veha hai’,
      ‘vahan hai’, ‘yahan hai’, ‘ yahan hai’, ‘
      vahan hai’, ‘iska naam hai’, ‘uska naam h’

    • Hindi-SEAT-common-nouns: ‘yeha hai’,
      ‘veha hai’, ‘vahan hai’, ‘yahan hai’, ‘
      yahan hai’, ‘ vahan hai’, ‘vo hai’, ‘ye hai’
Description           Original        Linear Projection               Hard Debiasing                       LPSG
      (vs male, female)        WEAT                                          Gen.         Gen.               w/o           w/o
                                            naari-nar          PCA
                                                                            words     words,verbs        verbs & adj      all dir.
            maths, arts       1.12 (0.01)   0.44 (0.20)     0.77 (0.06)   1.48 (0.00) 1.13 (0.00)        0.95 (0.03)   1.04 (0.02)
           science, arts      1.13 (0.02)   0.50 (0.18)     0.00 (0.49)   1.66 (0.00) 1.57 (0.00)        0.24 (0.28)   0.42 (0.21)
            adjectives        1.22 (0.02)   0.96 (0.06)     0.82 (0.08)   0.37 (0.27) 0.49 (0.20)        0.94 (0.05)   0.94 (0.049)
          gendered verbs      1.87 (0.00)   1.87 (0.00)     1.79 (0.00)   1.12 (0.01) -1.18 (0.99)       1.85 (0.00)   1.85 (0.00)
           gendered adj       1.70 (0.00)   1.63 (0.00)     1.66 (0.00)   1.19 (0.00) 0.78 (0.05)        1.71 (0.00)   1.75 (0.00)
 IR
         gendered entities    1.14 (0.01)   1.01 (0.02)     0.99 (0.02)   0.66 (0.08) 0.27 (0.28)        1.13 (0.00)   1.18 (0.00)
          gendered titles     1.92 (0.00)   1.91 (0.00)     1.89 (0.00)   1.19 (0.00) 1.59 (0.00)        1.92 (0.00)   1.91 (0.00)

Table 3: Debiasing results for gender across different debiasing methods of linear projection, Bolukbasi’s hard
debiasing and different variants of LPSG debiasing. We provide WEAT effect sizes with p-values of the test in
parentheses. PCA for Linear Projection was done on gendered word pairs. IR stands for Information Retention.

         Description           Original        Linear Projection               Hard Debiasing                      LPSG
      (vs male, female)         SEAT                                         Gen.         Gen.               w/o          w/o
                                            naari-nar          PCA
                                                                            words     words,verbs        verbs & adj     all dir.
            maths, arts       1.14 (0.00)   0.64 (0.00)     0.78 (0.00)   0.76 (0.00) 1.09 (0.00)        0.96 (0.00)   0.99 (0.00)
           science, arts      1.03 (0.00)   0.55 (0.00)     0.07 (0.33)   0.70 (0.00) 0.91 (0.00)        0.26 (0.06)   0.38 (0.02)
            adjectives        1.19 (0.00)   0.98 (0.00)     0.80 (0.00)   0.23 (0.30) 0.34 (0.31)        0.94 (0.00)   0.92 (0.00)
          gendered verbs      1.84 (0.00)   1.83 (0.00)     1.67 (0.00)   0.31 (0.33) -0.70 (0.70)       1.80 (0.00)   1.78 (0.00)
           gendered adj       1.63 (0.00)   1.58 (0.00)     1.54 (0.00)   0.45 (0.17) 0.36 (0.33)        1.63 (0.00)   1.67 (0.00)
 IR
         gendered entities    1.12 (0.00)   1.02 (0.00)     0.99 (0.00)   0.42 (0.14) 0.45 (0.17)        1.13 (0.00)   1.16 (0.00)
          gendered titles     1.86 (0.00)   1.85 (0.00)     1.75 (0.00)   0.15 (0.41) 0.90 (0.22)        1.82 (0.00)   1.80 (0.00)

Table 4: Debiasing results for gender across different debiasing methods of linear projection, Bolukbasi’s hard
debiasing and different variants of LPSG debiasing. We provide SEAT effect sizes with p-values of the test in
parentheses. PCA for Linear Projection was done on gendered word pairs. IR stands for Information Retention.

                    Test      Description     Original                          Linear Projection
                    Type       (vs caste)       Score         ghasiya - desai      ghasiya - pandit      PCA
                              occupations    1.44 (0.00)        1.34 (0.00)          0.78 (0.09)      1.21 (0.02)
                   WEAT
                               adjectives    1.52 (0.00)        1.51 (0.00)          1.31 (0.01)      1.33 (0.00)
                              occupations    1.26 (0.00)        1.17 (0.00)          0.67 (0.00)      0.89 (0.00)
                   SEAT
                               adjectives    1.40 (0.00)        1.36 (0.00)          1.18 (0.00)      1.18 (0.00)

Table 5: Debiasing results for caste across different methods of choosing caste subspace. We provide WEAT and
SEAT effect sizes with p-values of the test in parentheses. PCA was conducted on a list of caste names containing
both upper and lower castes.

        Test                                              Original              Linear Projection                     LPSG
                             Description
        Type                                               Score      Acharya -       PCA           PCA                 w/o
                                                                         Nasir       entities    lastnames            entities
                   adjectives vs religion terms       1.28 (0.01)     1.28 (0.01) 1.28 (0.01) 0.91 (0.04)           0.92 (0.06)
       WEAT          adjectives vs lastnames          1.55 (0.00)     1.57 (0.00) 1.55 (0.00) 0.71 (0.10)           0.71 (0.11)
                 IR religious entities vs religion    1.75 (0.00)     1.61 (0.00) 1.72 (0.00) 1.54 (0.00)           1.59 (0.00)
                   adjectives vs religion terms       1.20 (0.00)     1.22 (0.00) 1.19 (0.00) 0.85 (0.00)           0.85 (0.00)
       SEAT          adjectives vs lastnames          1.41 (0.00)     1.43 (0.00) 1.41 (0.00) 0.70 (0.00)           0.68 (0.00)
                 IR religious entities vs religion    1.69 (0.00)     1.52 (0.00) 1.65 (0.00) 1.43 (0.00)           1.50 (0.00)

Table 6: Debiasing results for religion across different methods of choosing religion subspace and LPSG. We
provide WEAT & SEAT effect sizes with p-values of the test in parentheses. We experimented with conducting
PCA over a list of religious entities and over a list of religious lastnames.
Description         Word list type               List
                    Math words                   [ganit, beejganit, jyamiti, kalan, sameekaran, ganna, sankhya, yog]
Math, Arts vs
                    Arts words                   [kavita, kala, nritya, sahitya, upanyas, raag, naatak, murti]
Gender specific
                    Male gendered words          [purush, aadmi, ladka, bhai, pati, chacha, maama, beta]
words
                    Female gendered words        [mahila, aurat, ladki, behen, patni, chachi, maami, beti]
                    Science terms                [vigyan, praudyogiki, bhautik, rasayan, prayogshala, niyam, prayog, khagol]
Science, Arts vs
                    Arts terms                   [kavita, kala, naach, nritya, sahitya, upanyas, raag, naatak]
Gender specific
                    Male gendered words          [bhai, chacha, daada, beta, purush, pati, aadmi, ladka]
words
                    Female gendered words        [behen, chachi, daadi, beti, mahila, patni, aurat, ladki]
                    Stereo male adjectives       [krodhit, shramik, takatwar, nipun, veer, sahsi, diler]
Adjectives vs
                    Stereo female adjectives     [sundar, sharm, aakarshak, manmohak, madhur, gharelu, kamzor]
Gender specific
                    Male gendered words          [purush, aadmi, ladka, bhai, pati, chacha, maama, beta]
words
                    Female gendered words        [mahila, aurat, ladki, behen, patni, chachi, maami, beti]
                    Male verbs                   [gaya, aaya, khelta, baitha, leta, rehta, deta, padhta]
Gendered verbs
                    Female verbs                 [gayi, aayi, khelti, baithi, leti, rehti, deti, padhti]
vs Gender
                    Male gendered words          [purush, aadmi, ladka, bhai, pati, chacha, maama, beta]
specific words
                    Female gendered words        [mahila, aurat, ladki, behen, patni, chachi, maami, beti]
Gendered            Male verbs                   [accha, bura, ganda, lamba, chota, meetha, neela, bada, pehla]
adjectives vs       Female verbs                 [acchi, buri, gandi, lambi, choti, meethi, neeli, badi, pehli]
Gender              Male gendered words          [purush, aadmi, ladka, bhai, pati, chacha, maama, beta]
specific words      Female gendered words        [mahila, aurat, ladki, behen, patni, chachi, maami, beti]
Gendered            Male titles                  [adhyapak, shishya, vidvan, saadhu, kavi, chhatr, pradhanacharya, mahoday]
titles vs                                        [adhyapika, shishyaa, vidushi, saadhvi, kavitri, chhatra, pradhanacharya,
                    Female titles
Gender specific                                  mahodaya]
words               Male gendered words          [purush, aadmi, ladka, bhai, pati, chacha, maama, beta]
                    Female gendered words        [mahila, aurat, ladki, behen, patni, chachi, maami, beti]
Gendered            Male entities                [pajama, ghada, kurta, phool, kapda, pahiya, yantra, putla, taala]
entities vs         Female entities              [almaari, chadar, poshaak, bijli, buddhi, tasvir, ghadi, raakhi, kameez]
Gender specific     Male gendered words          [purush, aadmi, ladka, bhai, pati, chacha, maama, beta]
words               Female gendered words        [mahila, aurat, ladki, behen, patni, chachi, maami, beti]

                                Table 7: Word lists for Gender WEAT and SEAT tests

    Description      Word List type                      List
                     Stereo Upper caste occupations      [vyapar, jameendar, sunar, guru, munim, chikitsak, pandit]
    Occupations      Stereo Lower caste occupations      [safai, dhobi, mallah, maali, naai, mochi, machuara]
    vs Caste         Upper caste names                   [thakur, brahmin, rajput, kshatriya, arya, jaat, baniya, kayastha]
                     Lower caste names                   [dalit, shudra, bhangi, chamaar, valimiki, harijan, chuhda, jatav]
                     Upper caste adjectives              [ameer, gyani, veer, taakatvar, sundar, ucch, sahsi]
    Adjectives       Lower caste adjectives              [neech, ghrana, ganda, kamzor, gareeb, agyani, nirbal]
    vs Caste         Upper caste names                   [thakur, brahmin, rajput, kshatriya, arya, jaat, baniya, kayastha]
                     Lowe caste names                    [dalit, shudra, bhangi, chamaar, valimiki, harijan, chuhda, jatav]

                                    Table 8: Word lists for Caste WEAT and SEAT tests

   Description      Word List type             List
   Adjectives       Positive adjectives        [shikshit, veer, ucch, sahsi, shant, dayalu, safal]
   vs               Negative adjectives        [neech, ghrana, ashikshit, hinsak, krodhi, nirdayi, atyachaari]
   Religion         Hindu religion terms       [hindu, bhagwan, geeta, brahmin, pandit, mandir, ram, vrat]
   terms            Muslim religion terms      [musalman, allah, quran, shiya, sunni, masjid, muhammad, roza]
   Adjectives       Positive adjectives        [shikshit, veer, ucch, sahsi, shant, dayalu, safal]
   vs               Negative adjectives        [neech, ghrana, ashikshit, hinsak, krodhi, nirdayi, atyachaari]
   Religion         Hindu lastnames            [sharma, verma, agrawal, gupta, chauhan, bansal, mittal, singh, chaudhary]
   Lastnames        Muslim lastnames           [yusuf, malik, khan, ansari, sheikh, abdullah, ahmad, pathan, mirza]
                    Hindu religion terms       [bhagwan, geeta, brahmin, pandit, mandir, ram, vrat]
   Religious
                    Muslim religion terms      [allah, quran, shiya, sunni, masjid, muhammad, roza]
   entities
                    Hindu religion             [hindu, hindutva]
   vs Religion
                    Muslim religion            [musalman, islam]

                               Table 9: Word lists for Religion WEAT and SEAT tests

    Description       Word List type                                                 List
   Adjectives v/s    Positive Adjectives                     [ameer, gyani, veer, takatvar, sundar, ucchh, sahsi]
     Rural and       Negative Adjectives                  [neech, ganda, ghrana, kamzor, gareeb, agyani, nirbal]
      Urban          Urban Occupations         [banker, vyavsayi, engineer, vakeel, vaigyanik, chaalak, abhineta, manager]
   Occupations       Rural Occupations           [lohar, jalvahak, kisaan, gwala, charwaaha, kumhar, jameendar, julaha]

                    Table 10: Word lists for Rural v/s Urban Occupations WEAT and SEAT tests
You can also read