A Multi-Dimensional View of Aggression when voicing Opinion

Page created by Karl Freeman
 
CONTINUE READING
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pages 13–20
                                                                  Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020
                                                                         c European Language Resources Association (ELRA), licensed under CC-BY-NC

             A Multi-Dimensional View of Aggression when voicing Opinion

                             Arjit Srivastava1* , Avijit Vajpayee2* , Syed Sarfaraz Akhtar3* ,
                                   Naman Jain2 , Vinay Singh1 , Manish Shrivastava1
                         1
                             International Institute of Information Technology, Hyderabad, Telangana, India
                         2
                              Department of Computer Science, Columbia University, New York, NY, USA
                               3
                                 Department of Linguistics, University of Washington, Seattle, WA, USA

                                                   arjit.srivastava@research.iiit.ac.in
                                                  {ssa2184, nj2387}@columbia.edu,
                                                              avijitv@uw.edu
                                                    vinay.singh@research.iiit.ac.in
                                                         m.shrivastava@iiit.ac.in

                                                               Abstract
The advent of social media has immensely proliferated the amount of opinions and arguments voiced on the internet. These virtual
debates often present cases of aggression. While research has been focused largely on analyzing aggression and stance in isolation from
each other, this work is the first attempt to gain an extensive and fine-grained understanding of patterns of aggression and figurative
language use when voicing opinion. We present a Hindi-English code-mixed dataset of opinion on the politico-social issue of ‘2016
India banknote demonetisation‘ and annotate it across multiple dimensions such as aggression, hate speech, emotion arousal and
figurative language usage (such as sarcasm/irony, metaphors/similes, puns/word-play).

Keywords: Social Media, Stance, Opinion, Aggression, Hate Speech, Figurative Language, Emotion

                    1.        Introduction                              prestige” (also cited by Kumar et al. (2018)). Baron and
There has been an explosion in terms of the amount of data              Richardson (2004) identified some characteristics of ag-
generated by users online. Social media and online forums               gression as :
encourage users to share their thoughts with the world                    • Form of behaviour rather than an emotion, motive or
resulting in a vast resource of opinion-rich data. This has                 attitude.
garnered a lot of attention from the research community
as it allows for analyzing the interactions between users as              • Visible intention to hurt or harm (may not be physical).
well as their usage of informal language in depth.
                                                                          • Must involve actions / intentions against living beings.
Stance detection is the task of automatically determining                 • Recipient is motivated to avoid such treatment.
the opinion of users with respect to a given issue. The
author of the opinion may be in favour, against or neutral              People often express their opinion on socio-political
towards the issue. In this paper, we attempt to analyze with            issues on social media forums like Twitter by displaying
respect to stance, nuances of displayed aggression towards              aggression towards people that support a contradicting
supporters / detractors of the opinion as well as the usage             belief or towards particular group of stake-holders on the
of various forms of figurative language such as metaphors,              issue. Given below are some example tweets from our
rhetorical questions, sarcasm, irony, puns and word-play.               dataset on the target issue of demonetisation in India. The
We additionally also look at the emotion arousal level and              reader is warned on the strongly-worded and derogatory
instances of hate speech. The target issue analyzed in                  nature of these tweets.
this paper is ‘2016 Indian banknote demonetisation’. On
8 November 2016, the Government of India announced
                                                                         1. Tweet: ’ye AAPtards aise behave kar rahe hain jaise
the demonetisation of all |500 and |1,000 banknotes of
                                                                            Modi ji ne Notebandi nahi inki Nassbandi kara di ho’
the Mahatma Gandhi Series. It also announced that new
banknotes of |500 and |2,000 banknotes will be circulated
                                                                             Translation: ’These AAPtards are behaving as
in exchange for the demonetised banknotes. However, this
                                                                             if its not demonetisation but castration for them.’
decision received mixed reactions from the people of India
with many people questioning its effectiveness.
                                                                             Gloss: ”AAP”: Opposition political party, ”AAP-
                                                                             tards”: slang term for supporters of AAP (inspired
Culpeper (2011) defined verbal aggression as ”any kind
                                                                             by the English slang ”Libtards”), ”Notebandi”: de-
of linguistic behaviour which intends to damage the so-
                                                                             monetisation of higher currency notes, ”Nassbandi”:
cial identity of the target person and lower their status and
                                                                             castration
    * These authors contributed equally to this work.

                                                                   13
Tweet 1 is in favor of the decision and is overtly               • Aggression (covert, overt, non-aggressive)
     aggressive towards the people who are voicing their
     views against it. Due to its abusive language and                • Hate Speech (true, false)
     suggestion of violence, it is also labeled as hate               • Figurative language use
     speech. Lastly, this tweet contains word-play as
     ”Notebandi”(Demonetisation) rhymes with ”Nass-                         – Sarcasm / Irony / Rhetorical Questions (true,
     bandi”(Castration).                                                      false)
                                                                            – Puns / Word-play (true, false)
 2. Tweet: ’Aam admi se jyada politician ko dikate ho                       – Metaphors / Similes (true, false)
    rhi h notebandi se aisa kyun?????’
                                                                      • Emotion arousal (1 to 5 rating)
     Translation: ’Politicians seem to be more af-                  This is the first attempt at analysing social media opinion
     fected by demonetisation compared to the common                on a political issue across varied modalities. More in depth
     man. Why is so ?’                                              datasets like the one we present here are required for -
     In tweet 2, the author rhetorically questions why                • Analyzing the not so apparent forms of verbal aggres-
     politicians seem to be more troubled about Demon-                  sion displayed on social media.
     etisation than the normal public, which is to imply
     that the general public supports the legislation and             • Better understanding linguistic patterns when voicing
     corrupt politicians are opposing it. This is an example            opinion and displaying aggression.
     of covert aggression towards the politicians while               • Analyzing social dynamics of opinion.
     voicing favourable opinion on the decision.
                                                                      • Facilitate classification models that leverage corpora
                                                                        annotated for auxiliary tasks through transfer learning,
 3. Tweet: ’tera kejri mar jaye sala suar to modi ji vaise              joint modelling as well as semi-supervised label prop-
    he ek din ke liye notebandi vapis le lege’                          agation methods.
     Translation: ’If your leader Kejri, a stupid pig,              The structure of this paper is as follows. In section 2, we re-
     dies then Modi ji would take demonetisation back for           view related work in the fields of stance detection, aggres-
     a day.’                                                        sion detection, hate speech detection, figurative language
                                                                    constructions, emotion analysis and code-mixed data anal-
     Gloss: ”Kejri”: refering to Arvind Kejriwal (leader of         ysis. In section 3, we explain the annotation guidelines used
     opposition party AAP), ”Modi ji”: Honorific refering           to for creation of this dataset. Section 4 we present statistics
     to Narendra Modi (Prime Minister of India)                     and analysis on the corpus. Finally, section 5 we present our
                                                                    conclusions as well as lay out scope of extending this work.
     Tweet 3 supports the decision of demonetisation
     and is overtly aggressive to both the members of AAP                              2.    Related Work
     and their leader Arvind Kejriwal. Its author suggests          User generated data from social media forums like Twitter
     that the opposition leader should die and proceeds to          has attracted a lot of attention from the research commu-
     verbally abuse him.                                            nity. Mohammad et al. (2017) and Krejzl et al. (2017)
                                                                    analyzed stance in tweets and online discussion forums
 4. Tweet: ’what if .. Modi Ji says Mitron ,,, kal raat ko          respectively. The task of stance detection on tweets at
    zyada ho gayi thi ,,. Kuch nahi badla he.. #NoteBandi’          SemEval 2016 (Mohammad et al., 2016) led to targeted
                                                                    interest in the area with contributions from Augenstein
     Translation: ’What if Modi says that he had too                et al. (2016), Liu et al. (2016) etc. Aggression and
     much to drink last night and nothing has really                offensive language was the focus of a SemEval 2019
     changed. #Demonetisation.’                                     task (Zampieri et al., 2019b) and some of the works on
                                                                    aggression identification are Kumar et al. (2018), Zampieri
     Tweet 4 makes a sarcastic joke about how Prime                 et al. (2019a). Closely related is detecting hate speech
     Minister Modi might have been joking and hung over             in social media which has been explored by Malmasi and
     while making this sudden announcement. Despite                 Zampieri (2017), Schmidt and Wiegand (2017), Davidson
     its humorous take, this tweet is non-aggressive and            et al. (2017), Badjatiya et al. (2017) among others.
     neutral in stance.
                                                                    Domains of verbal aggression, abuse, hate have till now
                                                                    been studied in isolation from stance and opinion mining.
The main contributions of this paper is a unified dataset of        Additionally, usage of figurative language expressions
1001 Hindi-English code-mixed tweets annotated for mul-             such as sarcasm, metaphor, rhetorical questions, puns etc.,
tiple dimensions namely -
                                                                         ** The dataset is publically available at :
  • Stance (favourable, against, neutral)                           https://github.com/arjitsrivastava/MultidimensionalViewOpinionMining

                                                               14
when voicing opinion or displaying aggression, has not               that majority of opinion was displayed through attacking
been explored in depth. As most of the datasets available            / supporting other opinions on the issue i.e. examples of
are annotated with a singular task at hand, it precludes             indirect or implied support / disapproval. For example look
understanding the correlations along multiple dimensions.            at the tweet below -
The next frontier is analyzing in depth these patterns and
correlations. To do this, we undertook a data annotation
effort on a singular set of tweets for multiple tasks previ-           5. Tweet: ’Notebandi k khilaf kyu ho...? Kaale dhan m
ously studied separately. We hope that this dataset makes                 share holder ho kya @ArvindKejriwal’
way for joint modelling / multi-task learning systems as
well provide insights on underlying latent factors.                       Translation: ’Why are you against demonetisa-
                                                                          tion ? Are you are shareholder in black money
For this work, we wished to analyze multiple dimen-                       @ArvindKejriwal’
sions of opinion on a single target issue. The choice of
demonetisation of higher currency notes in India 2017                     Gloss: ”kale dhan”: black money, ”Arvind Ke-
as our target issue was motivated by the familiarity of                   jriwal”: Leader of opposition political party AAP,
the authors and annotators with its nuances as well as                    ”Notebandi”: demonetisation
the highly polarizing nature of opinions on the topic.
Gafaranga (2007) describes code-mixing as use of lin-                     Tweet 5 was originally classified as a neutral stance.
guistic units from different languages in a single utterance              We feel that cases like above can be confidently
or sentence and code-switching as the co-occurrence of                    annotated as favourable to the issue (i.e. favourable
speech extracts belonging to two different grammatical                    to demonetisation). The author rhetorically and
systems. Majority of user generated data on social media                  sarcastically questions the opinion, intentions and
is code-mixed and consequently, so is our dataset. Code                   reasons of those against the issue (in this case leader
mixed datasets for Hindi-English tweets have been previ-                  of opposition party). This tweet is also an example of
ously created for humor (Khandelwal et al., 2018), sarcasm                what we consider covert aggression.
(Swami et al., 2018a), aggression (Kumar et al., 2018), hate
speech (Bohra et al., 2018) and emotion (Vijay et al., 2018).        For aggression annotation, we follow the guidelines by Ku-
                                                                     mar et al. (2018) who had presented a detailed typology
                                                                     of aggression on Hindi-English code-mixed data. We only
                    3.   Annotation                                  annotate for aggression level and they had additional layers
                                                                     based on discursive role (attack, defend, abet) and discur-
Swami et al. (2018b) had collected 3500 code-mixed                   sive effect (physical threat, sexual aggression, gendered ag-
Hindi-English tweets using the Twitter Scraper API fil-              gression, racial aggression, communal aggression, casteist
tering by the keywords ”notebandi” and ”demonetisation”              aggression, political aggression, geographical aggression,
over a period of 6 months after Demonetisation was imple-            general non-threatening aggression, curse). The defini-
mented and annotated them for stance (favourable, against            tions for 3 aggression levels along with examples from our
and neutral). We randomly sampled 1001 tweets from this              dataset are :
dataset and annotated these sampled tweets for the dimen-
sions (3 domain expert annotators for each dimension) :              Covertly-Aggressive (C) Contains text which is an indi-
                                                                         rect attack and is often packaged as (insincere) polite
  • Aggression : Overt vs. Covert vs. Neutral
                                                                         expressions (through the use of conventionalized po-
  • Hate Speech : True vs. False                                         lite structures) such as satire, rhetorical questions, etc.

  • Sarcasm / Irony / Rhetorical Question : True vs. False                  6. Tweet: ’Notebandi ka niyam : khata nahi hai to
                                                                               khulwao. Aam aadmi : khulwa to lun. Par bhai
  • Metaphor / Simile : True vs. False                                         bank main ghusun Kasey ?’
  • Pun / Word-play : True vs. False
                                                                               Translation: ’Rule of Demonetisation: If
  • Emotion Arousal : 5 point ordinal scale                                    you don’t have an account then open one.
                                                                               Common man: I’ll open but let me know how to
The final label on each binary classification dimension                        enter the bank first?’
was taken as the majority label from choices of 3 an-
notators. For aggression classification, which was a                           Disapproval of demonetisation through sar-
multi-class classification, adjudication was provided by                       castic reference to long queues in front of banks
us for cases where no simple majority could be reached.                        due to high demand for exchange of demonetised
For emotion arousal levels, scores from individual annota-                     currency.
tors were averaged for the final emotion arousal level score.
                                                                     Overtly-Aggressive (O) Contains texts in which aggres-
We also re-annotated the original dataset for stance for it              sion is overtly expressed either through the use of spe-
had favourable or against tags only on tweets that displayed             cific kind of lexical items, syntactic structures or lexi-
outright support or disapproval respectively. We found                   cal features.

                                                                15
7. Tweet: ’Ye Notebandi Atankbaadiyo aur Bha-                      Translation: ’Sometimes sheep need to be sac-
          rashtachaariyo ki NAKEBANDI hai. Sare Rash-                     rificed in order to to hunt lions Demonetisation.’
          trabhakta is nakebandi ke sath aur samarthan
          me aye.’                                                        In tweet 9, ’sheep’ is a metaphor for some members
                                                                          of common public and ’lions’ is a metaphor for large
          Translation: ’Demonetisation is a barricading                   scale corruption.
          of terrorists and corrupt. All the nationalists
                                                                     Burnap and Williams (2015) defined hate speech as
          should support this barricading.’
                                                                     responses that include written expressions of hateful and
                                                                     antagonistic sentiment toward a particular race, ethnicity,
                                                                     or religion. They used a binary classification scheme of
Non-Aggressive (NAG) Refers to texts which are not ly-               hate speech vs. non hate speech, which was also followed
    ing in the above two categories.                                 by Bohra et al. (2018) for their dataset on Hindi-English
                                                                     code-mixed tweets. Malmasi and Zampieri (2017) used
       8. Tweet: ’kya Aam aadmi ke liye NoteBandi ka                 a 3 way classification scheme between hate speech vs.
          Faisla Shi hai?’                                           offensive language but not hate speech vs. no offensive
                                                                     language. As aggression levels are highly predictive of
          Translation: ’Is the decision of demoneti-                 offensive language but not of hate speech category, we
          sation in the favour of common man?’                       used a binary classification speech. However annotators
                                                                     faced difficulty in differentiating over a personal attack full
                                                                     of hatred than a community being targeted. An example :

Prior works regarding sarcasm and irony detection on
                                                                     10. Tweet: ’ab itni taklif hai to atmadaah kyo nahi kar
social media data like Reddit (Wallace et al., 2014) and
                                                                         lete notebandi k khilf. Delhi walo ko bhi mukti milegi
Twitter (Bamman and Smith, 2015) have shown that
                                                                         tumse’
context is essential in understanding sarcasm. Therefore,
most social media datasets of sarcasm are self-annotated
                                                                          Translation: ’If you have such a huge issue
i.e. hashtag specific twitter scraping like #sarcasm and
                                                                          with it, why don’t you perform a self-immolation? The
#notsarcasm. As we are re-annotating a previously scraped
                                                                          people of Delhi would also get freedom from you’
dataset which was not self-annotated through specific
hashtags, we rely on the domain knowledge of context
                                                                          In tweet 10, the author is referring to Arvind
expert annotators on the Indian socio-political scenario
                                                                          Kejriwal who is the leader of opposition party AAP
and focus issue of demonetisation. This however is not a
                                                                          and also the Chief Minister of Delhi (capital of India).
drawback because in a dataset like ours , rich with strongly
                                                                          The author suggests Kejriwal should kill himself to
opinionated tweets, annotating sarcasm is fairly easy. In
                                                                          free the residents of Delhi. In the process of sup-
the current scope of the research, rhetorical questions are
                                                                          porting the decision of Demonetisation, the author of
thought of as functioning similar to sarcasm and irony.
                                                                          the tweet is making extreme and graphic suggestions
We understand that fine grained linguistic differences
                                                                          towards one of the main opponents of target issue.
between sarcasm, irony and rhetorical questions exist, for
our purpose we have clubbed them into a single category              Emotion classification in text is widely understood as lying
of figurative language. Similarly, puns and word-play are            across two orthogonal dimensions - valence (polarity of
merged into a single category of figurative language as well         emotion) and arousal (intensity of emotion) (Russell and
and the annotation guidelines were based on the SemEval              Barrett, 1999). Despite that, many works on emotion
2017 task of detecting english puns (Miller et al., 2017).           classification in text have generally used directly annotated
Rhyming usage of ’Notebandi’ (demonetisation) with                   6 emotion categories (happy, sad, anger, fear, disgust,
’Nasbandi’ (castration) as shown in the earlier examples,            surprise) instead of first annotating arousal and valence
was the most common word-play seen. A third figurative               separately before mapping them into emotion categories.
language category of metaphors (and occasionally similes)            We restricted the scope for this project to analyze only
can also be clearly observed in our corpus. Metaphor                 for emotion arousal level as emotion valence level is
identification has been typically treated as a token level or        analogous to sentiment. For emotion arousal level, Bradley
phrase level tagging task (Shutova and Teufel, 2010). To             and Lang (1999) averaged annotations on a 9 point scale
be consistent we other figurative language categories used           and Mohammad (2018) used a Best-Worst scale to obtain
in this work, we annotated metaphors at the tweet level              fine-grained scores. Similar to the SemEval 2017 task
which was also the annotation level for SemEval 2015 task            (Rosenthal et al., 2019) for sentiment analysis on Twitter,
on figurative language in Twitter data (Ghosh et al., 2015).         we use a 5-point ordinal scale (Very Low, Low, Neutral,
The following tweet is an example of metaphor usage -                High, Very High) for emotion arousal level.

 9. Tweet: ’kabhi kabhi sher ka shikar karne ke liye bhed                     4.    Data Statistics and Analysis
    (aam janta) ko chara banana padta hai. notebandi’                Table 1 presents the tweet level average statistics on the
                                                                     corpus. The dataset tweets contain majorly Hindi language

                                                                16
tokens (written in Roman script instead of Devnagri). A                               Task             Fleiss’s kappa
total of 119 tweets had discernible code-mixing (3 or more                           Stance                 0.84
english words). As our tweets were sampled from the                               Aggression                0.62
dataset by Swami et al. (2018b) who had referred to their                        Hate Speech                0.47
dataset as code-mixed, we continue to refer it that way.                       Sarcasm / Irony /
                                                                                                             0.61
Subsequent model building on this corpus would benefit                        Rhetorical Questions
from special handling for token-level spelling differences                     Puns / Word-play              0.72
that come with Devnagri to Latin script switching for Hindi.                  Metaphors / Similes            0.65

Table 2 has the corpus wide statistics across various               Table 3: Fleiss’s kappa score on multiple annotations across
phenomena annotated. There is a significant skew towards            dimensions
favourable stance in the corpus. To accommodate for this
imbalance, subsequent analysis of phenomena with respect                          Spearman’s Rank Correlation
to stance contain marginal class percentage statistics, for                             Emotion Arousal
example percentage of sarcastic tweets in favour of the                           Annotator     2        3
issue with respect to total number of tweets favourable to
                                                                                      1       0.655   0.652
the issue. Another point to note is the very low number
                                                                                      2                 0.64
of hate speech instances. This could be attributed to
the stringent guideline that only directed abusive attacks          Table 4: Spearman correlation on emotion arousal annota-
on specific groups/communities are to be regarded as                tions across annotator pairs
hate speech. Annotations with looser guidelines, where
personal offensive language against individuals are also
considered hate speech, would correlate highly with
overt aggression category. Since we annotated on tweets             correlation. Hate speech annotations had the worst kappa
regarding a polarizing legislation, it was expected that a          score and can be attributed to what constitutes a personal
fair amount would display aggression (either covert or              abusive attack. For figurative language use, the annotations
overt). The same observation is evident from the statistics.        for puns and word-play were of higher correlation as can
                                                                    be expected due to the apparentness in surface forms.
                                                                    Annotations for sarcasm / irony / rhetorical questions while
                    Avg. # tokens        21.1                       still being of high agreement had lower agreement rate
                 Avg. # tokens (EN)      1.0                        than both metaphors / similes as well as word-play. This
                  Avg. # tokens (HI)     16.9                       can be attributed to the general greater subjective nature
                 Avg. # tokens (Rest)    3.2                        of sarcasm as well as it being a more context-dependent
                                                                    phenomenon than metaphor or word-play.
                Table 1: Tweet Level Statistics
                                                                    Table 4 gives the Spearman’s rank correlation coefficient
                                                                    across 3 annotators for emotional arousal which has been
4.1.     Annotation Agreement                                       rated on an ordinal scale of 1 to 5. Although annotating for
We used Fleiss’s kappa to measure inter-annotator agree-            emotion is a fairly difficult task and annotating for only the
ment on categorical annotation tasks and the results are            arousal dimension even more so. However, we achieve a
given in table 3. Due to the clear polarizing nature of             decent average correlation of 0.65 which can be attributed
issue at hand, annotations for stance were of very high             to the fact that these tweets were sampled for a polariz-
                                                                    ing issue which had clearly apparent emotional states (high
                                                                    arousal emotions like anger as well as low arousal emotions
             Task           Category         # Tweets               like sadness). For each pair of annotators, the results of
                              Favour            583                 emotional arousal agreement were statistically significant
            Stance           Against            180                 with p-values
MEDIA NEWS AUR TV SAB SAALE CHOR AUE                           puns and we see that it does not follow the same pattern
     HARAMKHOR HAI. NOTEBANDI’                                      with respect to stance. The scope of this work is limited
                                                                    to a single issue and it would be interesting to note if
     Translation: ’Mr. Ravish, businessmen are hon-                 these trends are observed across datasets. A dataset of
     est and respect the law. But media, news and TV                annotations of multiple issues would allow for hypothesis
     (personalities) are thieves and bastards.’                     testing to validate these trends.

     Glosses: ”Ravish”:       Refering to news anchor               Finally in table 8, statistics for emotion arousal are pre-
     Ravish Kumar                                                   sented across stance classes. Opposed to prior analyzed
                                                                    phenomena (hate speech, aggression and figurative lan-
     Tweet 11, defends integrity of businessmen while               guage use), the data for emotion arousal is ordinal on a 1 to
     attacking and name calling news personalities.                 5 scale. The average emotion arousal for favourable stance
                                                                    (majority class) is much more than that in against stance
                                                                    (minority class). Similarly, looking at the very high arousal
      Stance     Marginal Class % of Hate Speech                    state bucket of 5 emotion arousal (when all three annota-
      Favour                 2.92%                                  tors gave a 5 rating), the percentage for majority stance
      Against                 2.2%                                  (favourable) is three times than that for minority stance
      Neutral                3.36%                                  (against). These findings are in line with the observations
                                                                    for other phenomena like overt aggression and figurative
    Table 5: Distribution of hate speech across stance              language use in the majority stance. The higher percentage
                                                                    of lowest arousal state tweets when against the issue must
                                                                    also be noted. These lowest arousal tweets correspond to
Table 6 gives the distribution of aggression categories
                                                                    emotions like depression and sadness.
(covert / overt / non) across stance. It is interesting to
note the comparisons for overt vs. covert aggression when
in favour (majority population stance in this sample) as                    5.   Conclusion and Future Work
opposed to against (minority population in this sample)
                                                                    This research was motivated by the need to provide a
on the issue. Although covert aggression evidence is
                                                                    ground-work for analysis of the nuances of opinion on
always more than overt aggression evidence across stance
                                                                    social media with respect to aggression and figurative lan-
categories, the difference is much lesser for favourable
                                                                    guage use. The observed correlations are encouraging and
stance samples. It is not difficult to hypothesise that
                                                                    call for a deeper analysis of these social dynamics. Testing
holding a majority stance on issues will lead to open
                                                                    for statistical significance along with corpus-linguistic
bullying in a lot of cases. Users in minority tend to be
                                                                    analysis of informative words for each category was
more covert to possibly avoid being bullied by the majority
                                                                    beyond our current scope. The first aim would be to create
group. Though validating this social hypothesis based on
                                                                    similar corpora on wide variety of issues (not limited to
analysis of multiple issues is beyond our current scope.
                                                                    political debate) to evaluate the consistency of these trends
                                                                    and determine significance of our findings.
Table 7 presents the distributions of figurative language
use across stance classes. It is evident from the data of
against issue category, the usage of all types of figurative        Though the scope of this project was limited to corpus
language is consistently high. It should also be noted              creation and analysis of interactions across phenomena,
that evidence for sarcasm is especially higher in against           the larger goal is to allow for better classification systems
issue opinion (minority stance in this dataset). Keeping in         on social media data. An immediate goal is to build
mind the observations on covert aggression when voicing             baseline models and analyze their performance on the
minority stance, it can be noted that covert aggression is          different phenomena annotated in this corpus. It would be
expressed through figurative language like sarcasm and              interesting to compare performance of models that directly
puns. Metaphors are not as disguised as sarcasm and                 model a single dimension with those models that have cas-
                                                                    caded or joint modeling on multiple dimensions. Another
                                                                    avenue we would like to explore is semi-supervised label
 Stance     Aggression     Marginal Class % Aggression              propagation utilizing both larger corpora on a single di-
              Overt                   8.3%                          mension such as sarcasm as well as this corpus containing
 Against      Covert                  40%                           multi-dimensional annotations. Having a single corpus of
              None                   51.7%                          annotations across dimensions has allowed the possibility
              Overt                  17.8%                          to explore transfer learning strategies in classification.
  Favour      Covert                 23.8%
              None                   58.3%                          For the sake of keeping this breadth-wise annotation ef-
              Overt                   8.8%                          fort manageable, we annotated for a 1001 tweets. We plan
 Neutral      Covert                 22.3%                          to extend this dataset to all 3500 tweets from the original
              None                   68.9%                          dataset created by Swami et al. (2018b). We further plan to
                                                                    annotate these tweets for named entities as well as 6 emo-
    Table 6: Distribution of aggression across stance               tion classes similar to Vijay et al. (2018).

                                                               18
Sarcasm / Irony / Rhetorical Question            Pun / Word-play                         Metaphor / Simile
 Stance
                Raw Count        Marginal Class %          Raw Count Marginal Class %             Raw Count Marginal Class %
 Against            45                 25%                    30            16.7%                     35             19.4%
 Favour             76                 13%                    84            14.4%                    123             21.1%
 Neutral            42                17.6%                   26            10.9%                     31              13%

                                   Table 7: Distribution of figurative language across stance

                                               Marginal Class %                        Emotional Arousal
                       Stance
                                    1-2      2-3     3-4       4-5              5         Class Avg.
                      Favour       3.6%    23.67% 49.91% 18.01%               4.8%           3.26
                      Against     13.3%    26.67% 47.78% 10.56%              1.67%           2.91
                      Neutral     10.9%    36.97% 44.12% 6.72%               1.3%            2.83

                                Table 8: Marginal distribution of emotional arousal across stance

           6.     Bibliographical References                          Proceedings of the 9th International Workshop on Se-
                                                                      mantic Evaluation (SemEval 2015), pages 470–478.
Augenstein, I., Rocktäschel, T., Vlachos, A., and                  Khandelwal, A., Swami, S., Akhtar, S. S., and Shrivas-
  Bontcheva, K.        (2016).     Stance detection with              tava, M. (2018). Humor detection in english-hindi code-
  bidirectional conditional encoding. arXiv preprint                  mixed social media content: Corpus and baseline system.
  arXiv:1606.05464.                                                   arXiv preprint arXiv:1806.05513.
Badjatiya, P., Gupta, S., Gupta, M., and Varma, V. (2017).          Krejzl, P., Hourová, B., and Steinberger, J. (2017). Stance
  Deep learning for hate speech detection in tweets. In               detection in online discussions. CoRR, abs/1701.00504.
  Proceedings of the 26th International Conference on
                                                                    Kumar, R., Reganti, A. N., Bhatia, A., and Maheshwari,
  World Wide Web Companion, pages 759–760.
                                                                      T. (2018). Aggression-annotated corpus of hindi-english
Bamman, D. and Smith, N. A. (2015). Contextualized sar-               code-mixed data. In Proceedings of the Eleventh Inter-
  casm detection on twitter. In Ninth International AAAI              national Conference on Language Resources and Evalu-
  Conference on Web and Social Media.                                 ation (LREC 2018).
Baron, R. A. and Richardson, D. R. (2004). Human ag-                Liu, C., Li, W., Demarest, B., Chen, Y., Couture, S.,
  gression. Springer Science & Business Media.                        Dakota, D., Haduong, N., Kaufman, N., Lamont, A.,
Bohra, A., Vijay, D., Singh, V., Akhtar, S. S., and Shrivas-          Pancholi, M., et al. (2016). Iucl at semeval-2016 task
  tava, M. (2018). A dataset of hindi-english code-mixed              6: An ensemble model for stance detection in twitter. In
  social media text for hate speech detection. In Proceed-            Proceedings of the 10th International Workshop on Se-
  ings of the second workshop on computational modeling               mantic Evaluation (SemEval-2016), pages 394–400.
  of peopleâs opinions, personality, and emotions in social        Malmasi, S. and Zampieri, M. (2017). Detect-
  media, pages 36–41.                                                 ing hate speech in social media. arXiv preprint
Bradley, M. M. and Lang, P. J. (1999). Affective norms                arXiv:1712.06427.
  for english words (anew): Instruction manual and affec-           Miller, T., Hempelmann, C., and Gurevych, I. (2017).
  tive ratings. Technical report, Technical report C-1, the           SemEval-2017 task 7: Detection and interpretation
  center for research in psychophysiology â.                         of English puns. In Proceedings of the 11th Inter-
Burnap, P. and Williams, M. L. (2015). Cyber hate speech              national Workshop on Semantic Evaluation (SemEval-
  on twitter: An application of machine classification and            2017), pages 58–68, Vancouver, Canada, August. Asso-
  statistical modeling for policy and decision making. Pol-           ciation for Computational Linguistics.
  icy & Internet, 7(2):223–242.                                     Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X.,
Culpeper, J. (2011). Impoliteness: Using language to                  and Cherry, C. (2016). Semeval-2016 task 6: Detect-
  cause offence. Impoliteness: Using Language to Cause                ing stance in tweets. In Proceedings of the 10th Inter-
  Offence, pages 1–292, 01.                                           national Workshop on Semantic Evaluation (SemEval-
Davidson, T., Warmsley, D., Macy, M., and Weber, I.                   2016), pages 31–41.
  (2017). Automated hate speech detection and the prob-             Mohammad, S. M., Sobhani, P., and Kiritchenko, S.
  lem of offensive language. In Eleventh international                (2017). Stance and sentiment in tweets. ACM Trans. In-
  aaai conference on web and social media.                            ternet Technol., 17(3), June.
Gafaranga, J. (2007). 11. code-switching as a conversa-             Mohammad, S. (2018). Obtaining reliable human ratings
  tional strategy. Handbook of multilingualism and multi-             of valence, arousal, and dominance for 20,000 english
  lingual communication, 5(279):17.                                   words. In Proceedings of the 56th Annual Meeting of
Ghosh, A., Li, G., Veale, T., Rosso, P., Shutova, E., Barn-           the Association for Computational Linguistics (Volume
  den, J., and Reyes, A. (2015). Semeval-2015 task 11:                1: Long Papers), pages 174–184.
  Sentiment analysis of figurative language in twitter. In          Rosenthal, S., Farra, N., and Nakov, P. (2019). Semeval-

                                                               19
2017 task 4: Sentiment analysis in twitter. arXiv preprint
  arXiv:1912.00741.
Russell, J. A. and Barrett, L. F. (1999). Core affect, proto-
  typical emotional episodes, and other things called emo-
  tion: dissecting the elephant. Journal of personality and
  social psychology, 76(5):805.
Schmidt, A. and Wiegand, M. (2017). A survey on hate
  speech detection using natural language processing. In
  Proceedings of the Fifth International Workshop on Nat-
  ural Language Processing for Social Media, pages 1–10.
Shutova, E. and Teufel, S. (2010). Metaphor corpus an-
  notated for source-target domain mappings. In LREC,
  volume 2, pages 2–2.
Swami, S., Khandelwal, A., Singh, V., Akhtar, S. S.,
  and Shrivastava, M. (2018a). A corpus of english-
  hindi code-mixed tweets for sarcasm detection. CoRR,
  abs/1805.11869.
Swami, S., Khandelwal, A., Singh, V., Akhtar, S. S., and
  Shrivastava, M. (2018b). An english-hindi code-mixed
  corpus: Stance annotation and baseline system. CoRR,
  abs/1805.11868.
Vijay, D., Bohra, A., Singh, V., Akhtar, S. S., and Shrivas-
  tava, M. (2018). Corpus creation and emotion prediction
  for hindi-english code-mixed social media text. In Pro-
  ceedings of the 2018 Conference of the North American
  Chapter of the Association for Computational Linguis-
  tics: Student Research Workshop, pages 128–135.
Wallace, B. C., Kertz, L., Charniak, E., et al. (2014). Hu-
  mans require context to infer ironic intent (so computers
  probably do, too). In Proceedings of the 52nd Annual
  Meeting of the Association for Computational Linguis-
  tics (Volume 2: Short Papers), pages 512–516.
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra,
  N., and Kumar, R. (2019a). Predicting the type and tar-
  get of offensive posts in social media. arXiv preprint
  arXiv:1902.09666.
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra,
  N., and Kumar, R. (2019b). Semeval-2019 task 6: Iden-
  tifying and categorizing offensive language in social me-
  dia (offenseval). In Proceedings of the 13th Interna-
  tional Workshop on Semantic Evaluation, pages 75–86.

                                                                20
You can also read