Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media

Page created by Florence Silva
 
CONTINUE READING
Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media
Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude
                                                                   Detection in Social Media
                                                    Lixing Zhu1 , Zheng Fang1 , Gabriele Pergola1 , Rob Procter1,2 , Yulan He1,2
                                                          1
                                                            Department of Computer Science, University of Warwick, UK
                                                                          2
                                                                            The Alan Turing Institute, UK
                                                             {lixing.zhu,z.fang.4,gabriele.pergola.1
                                                               rob.procter,yulan.he}@warwick.ac.uk

                                                               Abstract
                                                                                                      The AstraZeneca one is rough for up to 48 hours; after that you
                                              Building models to detect vaccine attitudes             may still be a bit swollen but you'll basically feel fine. I've had that
                                              on social media is challenging because of               and the virus, and the vaccine is far less unpleasant.

                                              the composite, often intricate aspects involved,        Have felt for the past 24 hours that I’ve been run over by three
arXiv:2205.03296v2 [cs.CL] 20 Jun 2022

                                                                                                      double decker buses after the AstraZeneca vaccine yesterday
                                              and the limited availability of annotated data.         morning. Starting to feel a little normal now but it’s not been nice!
                                              Existing approaches have relied heavily on su-
                                              pervised training that requires abundant anno-          This is quite baffling. I got my second Pfizer vaccine last week and I
                                              tations and pre-defined aspect categories. In-          have gone totally off chocolate! As side effects go, it’s not so bad.
                                              stead, with the aim of leveraging the large             There are some very interesting ties between this vaccines creators
                                              amount of unannotated data now available                and the eugenics movement which is concerning considering it’s
                                              on vaccination, we propose a novel semi-                mainly been promoted as a vaccine for poor folks in the third world.

                                              supervised approach for vaccine attitude de-
                                              tection, called VAD ET. A variational autoen-      Figure 1: Top: Expressions of aspects entangled with
                                              coding architecture based on language models       expressions of opinions. Bottom: Vaccine attitudes
                                              is employed to learn from unlabelled data the      can be expressed towards a wide range of aspects/topics
                                              topical information of the domain. Then, the       relating to vaccination, making it difficult to pre-define
                                              model is fine-tuned with a few manually an-        a set of aspect labels as opposed to corpora typically
                                              notated examples of user attitudes. We vali-       used for aspect-based sentiment analysis.
                                              date the effectiveness of VAD ET on our an-
                                              notated data and also on an existing vaccina-      2021). This scarcity of data is compounded by the
                                              tion corpus annotated with opinions on vac-        diversity of attitudes, making it difficult for models
                                              cines. Our results show that VAD ET is able        to identify all aspects discussed in posts (Morante
                                              to learn disentangled stance and aspect top-       et al., 2020).
                                              ics, and outperforms existing aspect-based sen-
                                              timent analysis models on both stance detec-          As representative examples, consider the two
                                              tion and tweet clustering. Our source code and     tweets about personal experiences for vaccination
                                              dataset are available at http://github.            at the top of Figure 1. The two tweets, despite ad-
                                              com/somethingx1202/VADet.                          dressing a common aspect (vaccine side-effects),
                                                                                                 express opposite stances towards vaccines. How-
                                         1    Introduction
                                                                                                 ever, the aspect and the stances are so fused to-
                                         The aim of vaccine attitude detection in social me-     gether that the whole of the tweets need to be con-
                                         dia is to extract people’s opinions towards vaccines    sidered to derive the proper labels, making it diffi-
                                         by analysing their online posts. This is closely re-    cult to disentangle them using existing methodolo-
                                         lated to aspect-based sentiment analysis in which       gies. Additionally, in the case of vaccines attitude
                                         both aspects and related sentiments need to be          analysis, there is a wide variety of possible aspects
                                         identified. Previous research has been largely fo-      discussed in posts, as shown in the bottom of Fig-
                                         cused on product reviews and relied on aspect-          ure 1, where one tweet ironically addressed vaccine
                                         level sentiment annotations to train models (Barnes     side-effects and the second one expressed instead
                                         et al., 2021), where aspect-opinions are extracted      specific political concerns. This is different from
                                         as triples (Peng et al., 2020), polarized targets (Ma   traditional aspect-based sentiment analysis on prod-
                                         et al., 2018) or sentiment spans (He et al., 2019).     uct reviews where only a small number of aspects
                                         However, for the task of vaccine attitude detec-        need to be pre-defined.
                                         tion on Twitter, such a volume of annotated data is        The recently developed framework for integrat-
                                         barely available (Kunneman et al., 2020; Paul et al.,   ing Variational Auto-Encoder (VAE) (Kingma and
Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media
Welling, 2014) and Independent Component Anal-                 • We have proposed a novel semi-supervised
ysis (ICA) (Khemakhem et al., 2020) sheds light                  approach for joint latent stance/aspect repre-
on this problem. VAE is an unsupervised method                   sentation learning and aspect span detection;
that can be used to glean information that must be
retained from the vaccine-related corpus. Mean-                • The developed disentangled representation
while, a handful of annotations would induce the                 learning facilitates better attitude detection
separation of independent factors following the                  and clustering;
ICA requirement for prior knowledge and induc-
tive biases (Hyvarinen et al., 2019; Locatello et al.,         • We have constructed an annotated dataset for
2020a,b). To this end, we could disentangle the                  vaccine attitude detection.
latent factors that are either specific to the aspect or
to the stance, and improve the quality of the latent       2    Related Work
semantics learned from unannotated data.
   We frame the problem of vaccine attitude de-            Our work is related to three lines of research:
tection as a joint aspect span detection and stance        aspect-based sentiment analysis, disentangled rep-
classification task, assuming that a tweet, which          resentation learning, and vaccine attitude detection.
is limited to 280 characters, would usually only
discuss one aspect. In particular, we extend a pre-        Aspect-Based Sentiment Analysis (ABSA)
trained language model (LM) by adding a topic              aims to identify the aspect terms and their polarities
layer, which aims to model the topical theme dis-          from text. Much work has been focusing on this
cussed in a tweet. In the absence of annotated data,       task. The techniques used include Conditional
the topic layer is trained to reconstruct the input        Random Fields (CRFs) (Marcheggiani et al., 2014),
message built on VAE. Given the annotated data,            Bidirectional Long Short-Term Memory networks
where each tweet is annotated with an aspect span          (BiLSTMs) (Baziotis et al., 2017), Convolutional
and a stance label, the learned topic can be disen-        Neural Networks (CNNs) (Zhang et al., 2015b),
tangled into a stance topic and an aspect topic. The       Attention Networks (Yang et al., 2016; Pergola
stance topic is used to predict the stance label of the    et al., 2021b), DenseLSTMs (Wu et al., 2018),
given tweet, while the aspect topic is used to pre-        NestedLSTMs (Moniz and Krueger, 2017), Graph
dict the start and the ending positions of the aspect      Neural Networks (Zhang et al., 2019) and their
span. By doing so, we can effectively leverage both        combinations (Wang et al., 2018; Zhu et al., 2021;
unannotated and annotated data for model training.         Wan et al., 2020), to name a few.
   To evaluate the effectiveness of our proposed              Zhang et al. (2015a) framed this task as text span
model for vaccine attitude detection on Twitter, we        detection, where they used text spans to denote as-
have collected over 1.9 million tweets relating to         pects. The same annotation scheme was employed
COVID vaccines between February and April 2021.            in (Li et al., 2018b), where intra-word attentions
We have further annotated 2,800 tweets with both           were designed to enrich the representations of as-
aspect spans and stance labels. In addition, we            pects and predict their polarities. Li et al. (2018c)
have also used an existing Vaccination Corpus1 in          formalized the task as a sequence labeling problem
which 294 documents related to the online vacci-           under a unified tagging scheme. Their follow-up
nation debate have been annotated with opinions            work (Li et al., 2019) explored BERT for end-to-
towards vaccination. Our experimental results on           end ABSA. Peng et al. (2020) modified this task
both datasets show that the proposed model outper-         by introducing opinion terms to shape the polarity.
forms existing opinion triple extraction model and         A similar modification was made in (Zhao et al.,
BERT QA model on both aspect span extraction               2020) to extract aspect-opinion pairs. Position-
and stance classification. Moreover, the learned           aware tagging was introduced to entrench the offset
latent aspect topics allow the clustering of user atti-    between the aspect span and opinion term (Xu et al.,
tudes towards vaccines, facilitating easier discovery      2020). More recently, instead of using pipeline ap-
of positive and negative attitudes in social media.        proaches or sequence tagging, Barnes et al. (2021)
The contribution of this work can be summarised            adapted syntactic dependency parsing to perform
as follows:                                                aspect and opinion expression extraction, and po-
                                                           larity classification, thus formalizing the task as
   1
       https://github.com/cltl/VaccinationCorpus           structured sentiment analysis.
Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media
Disentangled representation learning Deep               to perform attitude detection.
generative models learn the hidden semantics of
text, of which many attempt to capture the inde-        3     Methodology
pendent latent factor to steer the generation of text   The goal of our work is to detect the stance ex-
in the context of NLP (Hu et al., 2017; Li et al.,      pressed in a tweet (i.e., ‘pro-vaccination’, ‘anti-
2018a; Pergola et al., 2019; John et al., 2019; Li      vaccination’, or ‘neutral’), identify a text span that
et al., 2020). The majority of the aforementioned       indicates the concerning aspect of vaccination, and
work employs VAE (Kingma et al., 2014) to learn         cluster tweets into groups that share similar aspects.
controllable factors, leading to the abundance of       To this end, we propose a novel latent representa-
VAE-based models in disentangled representation         tion learning model that jointly learns a stance clas-
learning (Higgins et al., 2017; Burgess et al., 2018;   sifier and disentangles the latent variables capturing
Chen et al., 2018). However, previous studies           stance and aspect respectively. Our proposed Vac-
show that unsupervised learning of disentangle-         cine Attitude Detection (VAD ET) model is firstly
ment by optimising the marginal likelihood in a         trained on a large amount of unannotated Twitter
generative model is impossible (Locatello et al.,       data to learn latent topics via masked Language
2019). While it is also the case that non-linear ICA    Model (LM) learning. It is then fine-tuned on a
is unable to uncover the true independent factors,      small amount of Twitter data annotated with stance
Khemakhem et al. (2020) established a connection        labels and aspect text spans for simultaneously
between those two strands of work, which is of par-     stance classification and aspect span start/end po-
ticular interest to us since the proposed framework     sition detection. The rationale is that the inductive
learns to approximate the true factorial prior given    bias imposed by the annotations would encourage
few examples, recovering a disentangled latent vari-    the disentanglement of latent stance topics and as-
able distribution on top of additionally observed       pect topics. In what follows, we will present our
variables. In this paper, stance labels and aspect      proposed VAD ET model, first under the masked
spans are additionally observed on a handful of         LM learning and later extended to the supervised
data, which could be used as inductive biases that      setting for learning disentangled stance and aspect
make disentanglement possible.                          topics.
Vaccine attitude detection Very little literature                        [CLS] Very grateful to those at Oxford…
exists on attitude detection for vaccination. In con-
trast, there is growing interest in Covid-19 corpus     !" /# |", ,(/)         ALBERT Encoder Layer 12
                                                                VAE
                                                                                          …
construction (Shuja et al., 2021). Of particular in-                            ALBERT Encoder Layer 7
                                                             Decoder *
terest to us, Banda et al. (2021) built an on-going                                                       Prior
tweet dataset that traces the development of Covid-      Topic Layer +                     !              ! " ∼ $(0, ()

19 by 3 keywords: “coronavirus”, “2019nCoV”
and “corona virus”. Hussain et al. (2021) uti-              -! "|,(/)           ALBERT Encoder Layer 6
                                                                VAE                        …
lized hydrated tweets from the aforementioned cor-           Encoder ,          ALBERT Encoder Layer 1
pus to analyze the sentiment towards vaccination.
                                                                         [CLS] Very [MASK] to those at Oxford…
They used lexicon-based methods (i.e., VADER
and TextBlob) and pre-trained BERT to classify          Figure 2: VAD ET in masked language model learning.
the sentiment in order to gain insights into the        The latent variables are encoded via the topic layers
temporal sentiment trends. A similar approach           incorporated into the masked language model.
has been proposed in (Hu et al., 2021). Lyu et al.
(2021) employed a topic model to discover vaccine-      VAD ET in the masked LM learning We insert
related themes in twitter discussions and performed     a topic layer into a pre-trained language model such
sentiment classification using lexicon-based meth-      as ALBERT, as shown in Figure 2, allowing the
ods. However, none of the work above constructed        network to leverage pre-trained information while
datasets about vaccine attitudes, nor did they train    fine-tuned on an in-domain corpus. We assume that
                                                         Very grateful to those at Oxford @user and        p I got my first #COVID19 vac
models to detect attitudes. Morante et al. (2020)       there  is a continuous latent variable z involved
                                                         everyone from the @user as I got my first         os
                                                                                                                  in
built the Vaccination Corpus (VC) with events, at-      the  language    model    to reconstruct   the original
                                                         #COVID19 vaccine . Quick , painless and no side iti    text
tributions and opinions annotated in the form of         effects
                                                        from     . Well
                                                               the      apart from
                                                                    masked         this weird
                                                                                tokens.  We urge  to buy
                                                                                              retain       ve
                                                                                                     the weights  of
text spans, which is the only dataset available to us   a language model and learn the latent representa-
[CLS] Very grateful to those at Oxford. I’ve got my first #Covid19 vaccine.

                                                         %$!#%&'                           !! , !"            %!#(&)*_$!#%&'
                            !#
         Outputs                                     Masked LM training            Aspect span start/end     Marked LM training
                       Stance label
                                                                                    position detection

                                             7
                         Softmax            %[./0]          7
                                                           %3:5           MLP            Softmax

                   ALBERT                                                                    ALBERT
                                                                   7                              7
                   +6 %[./0] |'# , ((%[./0] )                  +6 %3:5 |'8 , ((%3:5 )         +6 %!:" |'! , ((%!:" )
                                                                                                                                  Prior
       Topic
       Layer                                    !"          !!                                                     !!             + '! ∼ -(0, /)
                                                                                                                                  + '# ∼ -(0, /)
                          &, '# ((%[./0] ))                    &, '2 ((%3:5 ))                     &, '! ((%!:" ))
                                                                                        #$

                                                                                               !#                      …              !$
                                           [CLS] Very [MASK] to those at Oxford. I’ve got my [MASK] #Covid19 [MASK].
                                           ![&'(] !* !, !-                                …                    !+

Figure 3: VAD ET in supervised learning. The text segment highlighted in blue is the annotated aspect span. The
right part learns latent aspect topic za from aspect text span [wa : wb ] only under masked LM learning. The left part
learns jointly latent stance topic zs and latent aspect topic zw from the whole input text, and trained simultaneously
for stance classification and aspect start/end position detection.

tion during the fine-tuning. More concretely, the                        patible for higher layers of the language model.
                Very gratefulatolanguage
topic layer partitions            those at Oxford
                                              model @user  andlower p IThen
                                                        into              got mythe
                                                                                  first #COVID19 vaccine  Veryz grateful
                                                                                        latent presentation                    toatθOxford
                                                                                                                         to those
                                                                                                                 is passed           to @user and everyo
                everyone from the @user as I got my first             os                                  from the @user as I got my first #COVID19 vaccin
layers and higher       layers denoted as ψ and θ, respec- reconstruct the original text.Quick , painless and no side effects .
                #COVID19 vaccine . Quick , painless and no side iti
                effects layers
tively. The lower        . Well apart from this the
                                 constitute      weird urge to buy
                                                     Encoder    that ve
parameterizes the variational posterior distribution                     VAD ET with disentanglement of aspect and
denoted as qφ (z|ψ(w)), while the higher layers re-                      stance       One of the training objectives of vaccine
construct the input tokens, which is referred to as                      attitude detection is to detect the text span that
the Decoder.                                                             indicates the aspect and to predict the associated
   The objective of VAE is to minimize the KL- stance label. Existing approaches rely on structured
divergence between the variational posterior dis- annotations to indicate the boundary and depen-
tribution and the approximated posterior. This                           dency between aspect span and opinion words (Xu
is equivalent to maximizing the Evidence Lower                           et al., 2020; Barnes et al., 2021), or use a two-stage
BOund (ELBO) expressed as:                                               pipeline to detect the aspect span and the associ-
                                                                         ated opinion separately (Peng et al., 2020). The
Eqφ (z|ψ(w)) [log pθ (wH |z, ψ(w))] − KL[qφ (z|ψ(w))||p(z)],             problem is that the opinion expressed in a tweet
                                                                  (1)    and the aspect span often overlap. To mitigate this
where qφ (z|ψ(w)) is the encoder and                                     issue, we instead separate the stance and aspect
pθ (wH |z, ψ(w)) is the decoder.                               Here, from their representations in the latent semantic
w = [wCLS , w1:n ], since the special classifi- space, that is, disentangling latent topics learned by
cation embedding wCLS is automatically prepended                         VAD ET into latent stance topics and latent aspect
to the input sequence (Devlin et al., 2019), wH                          topics. A recent study in disentangled representa-
denotes the reconstructed input.                                         tion learning (Locatello et al., 2019) shows that un-
   Following (Kingma and Welling, 2014), we                              supervised learning of disentangled representations
choose a standard Gaussian distribution as the prior, is theoretically impossible from i.i.d. observations
denoted as p(z), and the diagonal Gaussian distri- without inductive biases, such as grouping informa-
bution z ∼ N (µφ (ψ(w)), σφ2 (ψ(w))) as the vari- tion (Bouchacourt et al., 2018) or access to labels
ational distribution. The decoder computes the                           (Locatello et al., 2020b; Träuble et al., 2021). As
probability of the original token given the latent                       such, we extend our model to a supervised setting
variable sampled from the Encoder. We use the                            in which disentanglement of the latent vectors can
Memory Scheme (Li et al., 2020) to concatenate                           be trained on annotated data.
z and ψ(w), making the latent representation com-                           Figure 3 outlines the overall structure of VAD ET
in the supervised setting. On the right hand side,                enforce the constraint that zs participates in the
we show VAD ET learned from the annotated aspect                  generation of [CLS], which follows an approx-
text span [wa : wb ] under masked LM learning. The                imate posterior qφ (zs |ψ(w[CLS] )). We place the
latent variable za encodes the hidden semantics of                standard Gaussian as the prior over zs ∼ N (0, I)
the aspect expression. We posit that the aspect span              and obtain the ELBO
is generated from a latent representation with a                                                        H
                                                                         Eqφ (zs |ψ(w[CLS] )) [log pθ (w[CLS] |zs , ψ(w[CLS] ))]
standard Gaussian distribution being its prior. The                                                                                (4)
                                                                                           −KL[qφ (zs |ψ(w[CLS] ))||p(zs )]
ELBO for reconstructing the aspect text span is:
                                                                   Since the variational family in Eq. 1 are Gaussian
                                     H                            distributions with diagonal covariance, the joint
   LA = Eqφ (za |ψ(wa:b )) [log pθ (wa:b |za , ψ(wa:b ))]
                                                            (2)   space of [zs , zw ] factorizes as qφ (zs , zw |ψ(w)) =
                       −KL[qφ (za |ψ(wa:b ))||p(za )],
                                                                  qφ (zs |ψ(w))qφ (zw |ψ(w)) (Nalisnick et al., 2016).
where wa:bH denotes the reconstructed aspect span.                Assuming zw to be solely dependent on ψ(w1:n ),
Ideally, the latent variable za does not encode any               we obtain the ELBO for the entire input sequence:
stance information and only captures the aspect                         LS = Eqφ (zw ) Eqφ (zs ) [log pθ (wH |z, ψ(w))]
mentioned in the sentence. Therefore, the zs for the                          − KL [qφ (zw |ψ(w1:n ))||qφ (zw |ψ(wa:b ))]          (5)
language model on the right hand side is detached                             − KL[qφ (zs |ψ(w))||p(zs )].
and the reconstruction loss for [CLS] is set free.
   On the left hand side of Figure 3, we train                    Note that the expectation term can be decomposed
VAD ET on the whole sentence. The input to                        into the expectation term in Eq. 3 and Eq. 4 accord-
VAD ET is formalized as: ‘[CLS] text’. Instead                    ing to the decoder structure. For the full derivation,
of mapping an input to a single latent variable z,                please refer to Appendix A.
as in masked LM learning of VAD ET, the input                        Finally, we perform stance classification and
is now mapped to a latent variable decomposing                    classification for the starting and ending position
into two components, [zs , zw ], one for the stance               over the aspect span of a tweet. We use negative
and another for the aspect. We use a conditionally                log-likelihood loss for both the stance label and
factorized Gaussian prior over the latent variable                aspect span:
zw ∼ pθ (zw |wa:b ), which enables the separation of                                 H
                                                                   Ls = − log p(ys |w[CLS] ),
zs and zw since the diagonal Gaussian is factorized                                      H
                                                                   La = − log p(ya |MLP(w1:n                     H
                                                                                             )) − log p(yb |MLP(w1:n )),
and the conditioning variable wa:b is observed.
   We establish an association between zw and za                  where MLP is a fully-connected feed-forward net-
by specifying pθ (zw |wa:b ) to be the encoder net-               work with tanh activation, ys is the predicted stance
work of qφ (za |wa:b ), since we want the latent se-              label, ya and yb are the starting and ending position
mantics of aspect span to encourage the disentan-                 of the aspect span. The overall training objective
glement of attitude in the latent space. In other                 in the supervised setting is:
words, the prior of zw is configured as the approx-                               L = Ls + La − LS − LA
imate posterior of za to enforce the association
between the disentangled aspect in sentence and                   4     Experiments
the de facto aspect. As a result, the ELBO for the
                                                                  We present below the experimental setup and eval-
original text is written as
                                                                  uation results.
   Eqφ (zw |ψ(w)) [log pθ (wH |zw , ψ(w))]
                                                            (3)   4.1     Experimental Setup
              −KL[qφ (zw |ψ(w))||qφ (zw |ψ(wa:b ))],
                                                                  Datasets We evaluate our proposed VAD ET and
where   wH   denotes the reconstructed input text,                compare it against baselines on two vaccine attitude
zw |w ∼ N (µφ (ψ(w)), σφ2 (ψ(w))). The KL-                        datasets.
divergence allows for some variability since there                VAD is our constructed Vaccine Attitude Dataset.
might be some semantic drift from the original se-                Following (Hussain et al., 2021), we crawl tweets
mantics when the aspect span is placed in a longer                using the Twitter streaming API with 60 pre-
sequence.                                                         defined keywords2 relating to COVID-19 vaccines
  The annotation of the stance label provides an                      2
                                                                        The full keyword list and the details of dataset construc-
additional input. To exploit this inductive bias, we              tion are presented in Appendix B.
VAD               VC               Baselines We compare the experimental results
         Specification    Train      Test   Train     Test        with the following baselines:
         # tweets          2000      800    1162      531         BertQA (Li et al., 2018c): a pre-trained language
          # anti-vac.       638      240     822      394         model well-suited for span detection. With BertQA,
          # neutral         142       76      41       27         attitude detection is performed by first classifying
          # pro-vac.       1220      484     299      110         stance labels then predicting the answer queried
         Avg. length       33.5    34.13     29.6   30.24         by the stance label. The text span is configured as
          len(aspect)      17.5    18.75     1.03    1.08         the ground-truth answer. We rely on its Hugging-
          len(opinion)    27.97    29.01     3.25    3.15
                                                                  Face4 (Wolf et al., 2020) implementation. We em-
         # tokens           67k    27.3k    34.4k   16.8k         ploy ALBERT (Lan et al., 2020) as the backbone
                                                                  language model for both BertQA and VAD ET.
Table 1: Dataset Statistics. ‘# tweets’ denotes the num-
                                                                  ASTE (Peng et al., 2020): a pipeline approach
ber of tweets in VAD, and for VC it is the number of
sentences. ‘anti-vac.’ means anti-vaccination while               consisting of aspect extraction (Li et al., 2018c)
‘pro-vac.’ means pro-vaccination. ‘Avg. length’ and               and sentiment labelling (Li et al., 2018b).
‘# token’ measure the number of word tokens.
                                                                  Evaluation Metrics For stance classification, we
                                                                  use accuracy and Macro-averaged F1 score. For
(e.g., Pfizer, AstraZeneca, and Moderna). Our final               aspect span detection, we follow Rajpurkar et al.
dataset comprises 1.9 million English tweets col-                 (2016) in adopting exact match (EM) accuracy of
lected between February 7th and April 3rd, 2021.                  the starting-ending position and Macro-averaged
We randomly sample a subset of tweets for anno-                   F1 score of the overlap between the prediction and
tation. Upon an initial inspection, we found that                 ground truth aspect span. For tweet clustering, we
over 97% of tweets mentioned only one aspect. As                  follow Xie et al. (2016) and Zhang et al. (2021)
such, we annotate each tweet with a stance label                  and use the Normalized Mutual Information (NMI)
and a text span characterizing the aspect. In total,              metric to measure how the clustered group aligns
2,800 tweets have been annotated in which 2,000                   with ground-truth categories. In addition, we also
are used for training and the remaining 800 are                   report the clustering accuracy.
used for testing. The statistics of the dataset is
                                                                  4.2    Experimental Results
listed in Table 1. The stance labels are imbalanced.
On the other hand, the average opinion length is                  In all our experiments, VAD ET is firstly pre-trained
longer than the average aspect length, and is close               in an unsupervised way on our collected 1.9 million
to the average tweet length. For the purpose of                   tweets before fine-tuned on the annotated training
evaluation on tweet clustering and latent topic dis-              set from the VAD or VC corpora.
entanglement, we further annotate tweets with a
                                                                  Stance Classification and Aspect Span Detec-
categorical label indicating the aspect category. In-
                                                                  tion In Table 2, we report the performance on at-
spired by (Morante et al., 2020), we identify 24
                                                                  titude detection. In stance classification, our model
aspect categories3 and each tweet is annotated with
                                                                  outperforms both baselines with more significant
one of these categories. It is worth mentioning that
                                                                  improvements on ASTE. On aspect span extraction,
aspect category labels are not used for training.
                                                                  VAD ET yields even more noticeable improvements,
VC (Morante et al., 2020) is a vaccination corpus                 with a 2.3% increase in F1 over BertQA on VAD,
consisting of 294 Internet documents about online                 and 2.7% on VC. These results indicate that the
vaccine debate annotated with events, 210 of which                successful prediction relies on the hidden represen-
are annotated with opinions (in the form of text                  tation learned in the unsupervised training. The
spans) towards vaccines. The stance label is con-                 disentanglement of stance and aspect may have
sidered to be the stance for the whole sentence.                  also contributed to the improvement.
Those sentences with conflicting stance labels are
regarded as neutral. We split the dataset into a ra-              Clustering To assess whether the learned latent
tio of 2:1 for training and testing. This eventually              aspect topics would allow meaningful categoriza-
left us with 1,162 sentences for training and 531                 tion of documents into attitude clusters, we perform
sentences for testing.                                               4
                                                                     https://huggingface.co/
                                                                  transformers/model_doc/albert.html#
   3
       The full list of aspect categories is shown in Table A1.   albertforquestionanswering
Model                VAD                VC
      Stance           Acc.      F1    Acc.        F1
      BertQA           0.754   0.742   0.719   0.708
      ASTE             0.723   0.710   0.704   0.686
      VAD ET           0.763   0.756   0.727   0.713
      Aspect Span      Acc.      F1    Acc.        F1
      BertQA           0.546   0.722   0.525   0.670
      ASTE             0.508   0.684   0.467   0.652
      VAD ET           0.556   0.745   0.541   0.697
                                                                  (a) VADET                  (b) BertQA
      Cluster          Acc.    NMI     Acc.        NMI
      DEC (BertQA)     0.633   58.1    0.586       52.8   Figure 4: Clustered groups of VAD ET and BertQA on
      K-means (BERT)   0.618   56.4    0.571       50.1   the VAD dataset. Each color indicates a ground truth
      DEC (VAD ET)     0.679   60.7    0.605       54.7
                                                          aspect category. The clusters are dominated by: (1)
                                                          Red: the (adverse) side effects of vaccines; (2) Green:
Table 2: Results for stance classification, aspect span
                                                          explaining personal experiences with any aspect of vac-
extraction and aspect clustering on both VAD and VC
                                                          cines; and (3) Cyan: the immunity level provided by
corpora.
                                                          vaccines.
clustering using the disentangled representations
that encode aspects, i.e., zw . Deep Embedding            ployed as an evaluation metric for cluster quality
Clustering (DEC) (Xie et al., 2016) is employed as        evaluation in an unsupervised way. Recent work
the backend. For comparison, we also run DEC on           of Bilal et al. (2021) found that Text Generation
the aspect representations of documents returned          Metrics (TGMs) align well with human judgement
by BertQA. For each document, its aspect represen-        in evaluating clusters in the context of microblog
tation is obtained by averageing over the fine-tuned      posts. TGM by definition measures the similarity
ALBERT representations of the constituent words           between the ground-truth and the generated text.
in its aspect span. To assess the quality of clus-        The rationale is that a high TGM score means sen-
ters, we need the annotated aspect categories for         tence pairs are semantically similar. Here, two
documents in the test set. In VAD, we use the an-         metrics are used: BERTScore, which calculates the
notated aspect labels as the ground-truth categories      similarity of two sentences as a sum of cosine simi-
whereas in VC we use the annotated event types.           larities between their tokens’ embeddings (Zhang
Results are presented in the lower part of Table 2.       et al., 2020), and BLEURT, a pre-trained adjudica-
We found a prominent increase in NMI score over           tor that fine-tunes BERT on an external dataset of
the baselines. Using the learned latent aspect top-       human ratings (Sellam et al., 2020). As in (Bilal
ics as features, DEC (VAD ET) outperforms DEC             et al., 2021), we adopt the Exhaustive Approach
(BertQA) by 4.6% and 1.9% in accuracy on VAD              that for a cluster C, its coherence score is the aver-
and VC, respectively. We also notice that using           age TGM score of every possible tweet pair in the
K-means as the clustering approach directly on the        cluster:
BERT-encoded tweet representations gives worse                         1      X
results compared to DEC. A similar trend is ob-            f (C) = 2                  TGM(tweeti , tweetj ).
                                                                      N
                                                                         i,j∈[1,N ],i
0.4                                0.4
                                                                      factor.
                      BERTScore                         BERTScore
                      BLEURT                            BLEURT
 0.3                                0.3

                                                                                   1200                                 1200
 0.2                                0.2

                                                                      Perplexity

                                                                                                           Perplexity
                                                                                   1000                                 1000
 0.1                                0.1

                                                                                   800                                  800
  0                                  0
       DEC      K-means DEC               DEC      K-means DEC
                                                                                   600                                  600
       (BertQA) (Bert)    (VADet)         (BertQA) (Bert)   (VADet)
        Vaccine Attitude Dataset              Vaccination Corpus

                                                                                          et

                                                                                                                               et
Figure 5: Semantic coherence evaluated in two metrics.

                                                                                   Figure 6: Conditional perplexity on two corpora.
adopt the language model perplexity conditioned
on za to evaluate the extent to which the disentan-
gled representation improves language generation                      Ablations We conduct ablation studies to inves-
on held-out data. Perplexity is widely used in the                    tigate the effect of semi-supervised learning that
literature of text style transfer (John et al., 2019;                 uses the variational latent representation learning
Yi et al., 2020), where the probability of the gen-                   approach and aspect-stance disentanglement on the
erated language is calculated conditioned on the                      latent semantics. We study their effects on stance
controlled latent code. A lower perplexity score                      classification and aspect span detection. The results
indicates better language generation performance.                     are reported in Table 3.
Following John et al. (2019), we compute an esti-
                        (k)
mated aspect vector ẑa of a cluster k in the train-                                Model               VAD                                VC
ing set as                                                                          Stance          Acc.                F1          Acc.        F1
                      P               (k)
                         i∈cluster k za,i
                                                                                    VADET          0.763         0.756              0.727   0.713
             (k)
           ẑa =                          ,                                         VADET-D        0.751         0.746              0.736   0.716
                   # tweets in cluster k
                                                                                    VADET-U        0.741         0.734              0.712   0.698
            (k)
where za,i is the learned aspect vector of the i-th                                 Aspect Span     Acc.                F1          Acc.        F1
tweet in the k-th cluster. For the stance vector zs ,
                                                                                    VADET          0.556         0.745              0.541   0.697
we sample one value per tweet. The stance vec-
                                               (k)                                  VADET-D        0.540         0.728              0.537   0.684
tor is concatenated with the aspect vector ẑa to                                   VADET-U        0.528         0.712              0.525   0.653
calculate the probability of generating the held-out
data, i.e., the testing set. For the baseline mod-                    Table 3: Results of stance classification and aspect span
els, we choose β-VAE (Higgins et al., 2017) and                       detection of VADET without disentanglement (-D) or
SCHOLAR (Card et al., 2018). We train β-VAE                           unsupervised pre-training (-U).
on the same data with β set to different values.
SCHOLAR is trained on tweet content and stance                           We can observe that on VAD without disentan-
labels. For both the baselines we use ELBO on the                     gled learning or unsupervised pre-training results in
held-out data as an upper bound on perplexity.                        the degradation of the stance classification perfor-
   Figure 6 plots the perplexity score achieved by                    mance. However, on VC, we see a slight increase in
all the methods. Our model achieves the lowest                        classification accuracy without disentangled learn-
perplexity score on both datasets. It managed                         ing. We attribute this to the vagueness of the stance
to decrease the perplexity value by roughly 200                       which might cause the model to disentangle more
compared to the baseline models. SCHOLAR out-                         than it should be. On the aspect span detection task,
performs β-VAE under three settings of β value.                       we observe consistent performance drop across all
We speculate that this might be due to the in-                        metrics and on both datasets. In particular, with-
corporation of the class labels in the training of                    out the pre-training module, the performance drops
SCHOLAR. Nevertheless, VADET produces con-                            more significantly. These results indicate that semi-
genial sentences in aspect groups, with latent codes                  supervised learning is highly effective with VAE,
tweaked to proxy centroids, showing that the dis-                     and the disentanglement of stance and aspect serves
entangled representation does capture the desired                     as a useful component, which leads to noticeable
improvements.                                             Christos Baziotis, Nikos Pelekis, and Christos Doulk-
                                                            eridis. 2017. DataStories at SemEval-2017 task 4:
5   Conclusions                                             Deep LSTM with attention for message-level and
                                                            topic-based sentiment analysis. In Proceedings of
In this work, we presented a semi-supervised model          the 11th International Workshop on Semantic Eval-
to detect user attitudes and distinguish aspects of         uation (SemEval-2017), pages 747–754, Vancouver,
                                                            Canada. Association for Computational Linguistics.
interest about vaccines on social media. We em-
ployed a Variational Auto-Encoder to encode the           Iman Munire Bilal, Bo Wang, Maria Liakata, Rob
main topical information into the language model            Procter, and Adam Tsakalidis. 2021. Evaluation
by unsupervised training on a massive, unannotated          of thematic coherence in microblogs. In Proceed-
                                                            ings of the 59th Annual Meeting of the Association
dataset. The model is then further trained under            for Computational Linguistics and the 11th Interna-
a semi-supervised setting that leverages annotated          tional Joint Conference on Natural Language Pro-
stance labels and aspect spans to induce the disen-         cessing, pages 6800–6814. Association for Compu-
tanglement between stances and aspects in a latent          tational Linguistics.
semantic space. We empirically showed the bene-           Diane Bouchacourt, Ryota Tomioka, and Sebastian
fits of such an approach for attitude detection and         Nowozin. 2018. Multi-level variational autoencoder:
aspect clustering over two vaccine corpora. Ab-             Learning disentangled representations from grouped
                                                            observations. Proceedings of the AAAI Conference
lation studies show that disentangled learning and
                                                            on Artificial Intelligence, 32(1).
unsupervised pre-training are important to effective
vaccine attitude detection. Further investigations        Christopher P Burgess, Irina Higgins, Arka Pal, Loic
on the quality of the disentangled representations          Matthey, Nick Watters, Guillaume Desjardins, and
                                                            Alexander Lerchner. 2018. Understanding disentan-
verify the effectiveness of the disentangled factors.       gling in β-vae. arXiv preprint arXiv:1804.03599.
While our current work mainly focuses on short
text of social media data where a sentence is as-         Dallas Card, Chenhao Tan, and Noah A. Smith. 2018.
sumed to discuss a single aspect, it would be inter-        Neural models for documents with metadata. In Pro-
                                                            ceedings of the 56th Annual Meeting of the Asso-
esting to extend our model to deal with longer text         ciation for Computational Linguistics, pages 2031–
such as online debates in which multiple arguments          2040, Melbourne, Australia. Association for Compu-
or aspects may appear in a single sentence.                 tational Linguistics.

Acknowledgements                                          Ricky T. Q. Chen, Xuechen Li, Roger B Grosse, and
                                                            David K Duvenaud. 2018. Isolating sources of dis-
This work was funded by the the UK Engineering              entanglement in variational autoencoders. In Ad-
                                                            vances in Neural Information Processing Systems,
and Physical Sciences Research Council (grant no.           volume 31. Curran Associates, Inc.
EP/T017112/1, EP/V048597/1). LZ is supported
by a Chancellor’s International Scholarship at the        Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
University of Warwick. YH is supported by a Tur-             Kristina Toutanova. 2019. BERT: Pre-training of
                                                             deep bidirectional transformers for language under-
ing AI Fellowship funded by the UK Research and              standing. In Proceedings of the 2019 Conference
Innovation (grant no. EP/V020579/1).                         of the North American Chapter of the Association
                                                             for Computational Linguistics: Human Language
                                                            Technologies, Volume 1 (Long and Short Papers),
References                                                   pages 4171–4186, Minneapolis, Minnesota. Associ-
                                                             ation for Computational Linguistics.
Juan M Banda, Ramya Tekumalla, Guanyu Wang,
  Jingyuan Yu, Tuo Liu, Yuning Ding, Ekaterina            Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel
  Artemova, Elena Tutubalina, and Gerardo Chowell.          Dahlmeier. 2019. An interactive multi-task learn-
  2021. A large-scale covid-19 twitter chatter dataset      ing network for end-to-end aspect-based sentiment
  for open scientific research—an international collab-     analysis. In Proceedings of the 57th Annual Meet-
  oration. Epidemiologia, 2(3):315–324.                     ing of the Association for Computational Linguis-
                                                            tics, pages 504–515, Florence, Italy. Association for
Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja            Computational Linguistics.
   Øvrelid, and Erik Velldal. 2021. Structured senti-
   ment analysis as dependency graph parsing. In Pro-     Irina Higgins, Loïc Matthey, Arka Pal, Christopher
   ceedings of the 59th Annual Meeting of the Associa-       Burgess, Xavier Glorot, Matthew Botvinick, Shakir
   tion for Computational Linguistics and the 11th In-       Mohamed, and Alexander Lerchner. 2017. beta-vae:
   ternational Joint Conference on Natural Language          Learning basic visual concepts with a constrained
   Processing, pages 3387–3402. Association for Com-         variational framework. In 5th International Con-
   putational Linguistics.                                   ference on Learning Representations, ICLR 2017,
Toulon, France, April 24-26, 2017, Conference                messages. BMC medical informatics and decision
  Track Proceedings. OpenReview.net.                           making, 20(1):1–14.

Tao Hu, Siqin Wang, Wei Luo, Mengxi Zhang, Xiao              Zhenzhong Lan, Mingda Chen, Sebastian Goodman,
  Huang, Yingwei Yan, Regina Liu, Kelly Ly, Viraj              Kevin Gimpel, Piyush Sharma, and Radu Soricut.
  Kacker, Bing She, and Zhenlong Li. 2021. Reveal-             2020. ALBERT: A lite BERT for self-supervised
  ing public opinion towards covid-19 vaccines with            learning of language representations. In 8th Inter-
  twitter data in the united states: Spatiotemporal per-       national Conference on Learning Representations,
  spective. J Med Internet Res, 23(9):e30854.                  ICLR 2020, Addis Ababa, Ethiopia, April 26-30,
                                                               2020. OpenReview.net.
Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan
  Salakhutdinov, and Eric P. Xing. 2017. Toward con-         Chunyuan Li, Xiang Gao, Yuan Li, Baolin Peng, Xiu-
  trolled generation of text. In Proceedings of the            jun Li, Yizhe Zhang, and Jianfeng Gao. 2020. Opti-
  34th International Conference on Machine Learning,           mus: Organizing sentences via pre-trained modeling
  volume 70 of Proceedings of Machine Learning Re-             of a latent space. In Proceedings of the 2020 Con-
  search, pages 1587–1596. PMLR.                               ference on Empirical Methods in Natural Language
                                                               Processing, pages 4678–4699. Association for Com-
Amir Hussain, Ahsen Tahir, Zain Hussain, Zakariya              putational Linguistics.
 Sheikh, Mandar Gogate, Kia Dashtipour, Azhar
 Ali, and Aziz Sheikh. 2021. Artificial intelligence–        Juncen Li, Robin Jia, He He, and Percy Liang. 2018a.
 enabled analysis of public attitudes on facebook and          Delete, retrieve, generate: a simple approach to sen-
 twitter toward covid-19 vaccines in the united king-          timent and style transfer. In Proceedings of the
 dom and the united states: Observational study. J             2018 Conference of the North American Chapter
 Med Internet Res, 23(4):e26627.                               of the Association for Computational Linguistics:
                                                               Human Language Technologies, pages 1865–1874,
Aapo Hyvarinen, Hiroaki Sasaki, and Richard Turner.            New Orleans, Louisiana. Association for Computa-
  2019. Nonlinear ica using auxiliary variables and            tional Linguistics.
  generalized contrastive learning. In Proceedings of        Xin Li, Lidong Bing, Wai Lam, and Bei Shi. 2018b.
  the Twenty-Second International Conference on Ar-            Transformation networks for target-oriented senti-
  tificial Intelligence and Statistics, volume 89 of Pro-      ment classification. In Proceedings of the 56th An-
  ceedings of Machine Learning Research, pages 859–            nual Meeting of the Association for Computational
  868. PMLR.                                                   Linguistics, pages 946–956, Melbourne, Australia.
Vineet John, Lili Mou, Hareesh Bahuleyan, and Olga             Association for Computational Linguistics.
  Vechtomova. 2019. Disentangled representation              Xin Li, Lidong Bing, Piji Li, Wai Lam, and Zhimou
  learning for non-parallel text style transfer. In Pro-       Yang. 2018c. Aspect term extraction with history
  ceedings of the 57th Annual Meeting of the Associa-          attention and selective transformation. In Proceed-
  tion for Computational Linguistics, pages 424–434,           ings of the Twenty-Seventh International Joint Con-
  Florence, Italy. Association for Computational Lin-          ference on Artificial Intelligence, IJCAI-18, pages
  guistics.                                                    4194–4200. International Joint Conferences on Ar-
                                                               tificial Intelligence Organization.
Ilyes Khemakhem, Diederik Kingma, Ricardo Monti,
   and Aapo Hyvarinen. 2020. Variational autoen-             Xin Li, Lidong Bing, Wenxuan Zhang, and Wai Lam.
   coders and nonlinear ica: A unifying framework. In          2019. Exploiting BERT for end-to-end aspect-based
   Proceedings of the Twenty Third International Con-          sentiment analysis. In Proceedings of the 5th Work-
   ference on Artificial Intelligence and Statistics, vol-     shop on Noisy User-generated Text (W-NUT 2019),
   ume 108 of Proceedings of Machine Learning Re-              pages 34–41, Hong Kong, China. Association for
   search, pages 2207–2217. PMLR.                              Computational Linguistics.
Diederik P. Kingma, Danilo J. Rezende, Shakir Mo-            Francesco Locatello, Stefan Bauer, Mario Lucic, Gun-
  hamed, and Max Welling. 2014. Semi-supervised                nar Raetsch, Sylvain Gelly, Bernhard Schölkopf,
  learning with deep generative models. In Proceed-            and Olivier Bachem. 2019. Challenging common
  ings of the 27th International Conference on Neu-            assumptions in the unsupervised learning of disen-
  ral Information Processing Systems, NIPS’14, page            tangled representations. In Proceedings of the 36th
  3581–3589, Cambridge, MA, USA. MIT Press.                    International Conference on Machine Learning, vol-
                                                               ume 97 of Proceedings of Machine Learning Re-
Diederik P. Kingma and Max Welling. 2014. Auto-                search, pages 4114–4124. PMLR.
  Encoding Variational Bayes. In 2nd International
  Conference on Learning Representations, ICLR               Francesco Locatello, Ben Poole, Gunnar Raetsch,
  2014, Banff, AB, Canada, April 14-16, 2014, Con-             Bernhard Schölkopf, Olivier Bachem, and Michael
  ference Track Proceedings.                                   Tschannen. 2020a. Weakly-supervised disentangle-
                                                               ment without compromises. In Proceedings of the
Florian Kunneman, Mattijs Lambooij, Albert Wong,               37th International Conference on Machine Learning,
  Antal Van Den Bosch, and Liesbeth Mollema. 2020.             volume 119 of Proceedings of Machine Learning Re-
   Monitoring stance towards vaccination in twitter            search, pages 6348–6359. PMLR.
Francesco Locatello, Michael Tschannen, Stefan               Linguistics: Human Language Technologies, pages
  Bauer, Gunnar Rätsch, Bernhard Schölkopf, and              2870–2883. Association for Computational Linguis-
  Olivier Bachem. 2020b. Disentangling factors of            tics.
  variations using few labels. In 8th International
  Conference on Learning Representations, ICLR             Gabriele Pergola, Elena Kochkina, Lin Gui, Maria Li-
  2020, Addis Ababa, Ethiopia, April 26-30, 2020.            akata, and Yulan He. 2021b. Boosting low-resource
  OpenReview.net.                                            biomedical QA via entity-aware masking strategies.
                                                             In Proceedings of the 16th Conference of the Euro-
Joanne Chen Lyu, Eileen Le Han, and Garving K Luli.          pean Chapter of the Association for Computational
  2021. Covid-19 vaccine–related discussion on twit-         Linguistics: Main Volume, pages 1977–1985. Asso-
  ter: topic modeling and sentiment analysis. Journal        ciation for Computational Linguistics.
  of medical Internet research, 23(6):e24435.
                                                           Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and
Yukun Ma, Haiyun Peng, and Erik Cambria. 2018. Tar-          Percy Liang. 2016. SQuAD: 100,000+ questions for
  geted aspect-based sentiment analysis via embed-           machine comprehension of text. In Proceedings of
  ding commonsense knowledge into an attentive lstm.         the 2016 Conference on Empirical Methods in Natu-
  Proceedings of the AAAI Conference on Artificial In-       ral Language Processing, pages 2383–2392, Austin,
  telligence, 32(1).                                         Texas. Association for Computational Linguistics.
Diego Marcheggiani, Oscar Täckström, Andrea Esuli,
                                                           Thibault Sellam, Dipanjan Das, and Ankur Parikh.
  and Fabrizio Sebastiani. 2014. Hierarchical multi-
                                                             2020. BLEURT: Learning robust metrics for text
  label conditional random fields for aspect-oriented
                                                             generation. In Proceedings of the 58th Annual Meet-
  opinion mining. In Advances in Information Re-
                                                             ing of the Association for Computational Linguis-
  trieval, pages 273–285, Cham. Springer Interna-
                                                             tics, pages 7881–7892. Association for Computa-
  tional Publishing.
                                                             tional Linguistics.
Joel Ruben Antony Moniz and David Krueger. 2017.
  Nested lstms. In Proceedings of the Ninth Asian          Junaid Shuja, Eisa Alanazi, Waleed Alasmary, and Ab-
  Conference on Machine Learning, volume 77 of Pro-          dulaziz Alashaikh. 2021. Covid-19 open source data
  ceedings of Machine Learning Research, pages 530–          sets: a comprehensive survey. Applied Intelligence,
  544, Yonsei University, Seoul, Republic of Korea.          51(3):1296–1325.
  PMLR.
                                                           Frederik Träuble, Elliot Creager, Niki Kilbertus,
Roser Morante, Chantal van Son, Isa Maks, and Piek           Francesco Locatello, Andrea Dittadi, Anirudh
  Vossen. 2020. Annotating perspectives on vacci-            Goyal, Bernhard Schölkopf, and Stefan Bauer. 2021.
  nation. In Proceedings of the 12th Language Re-            On disentangled representations learned from corre-
  sources and Evaluation Conference, pages 4964–             lated data. In Proceedings of the 38th International
  4973, Marseille, France. European Language Re-             Conference on Machine Learning, volume 139 of
  sources Association.                                       Proceedings of Machine Learning Research, pages
                                                             10401–10412. PMLR.
Eric Nalisnick, Lars Hertel, and Padhraic Smyth. 2016.
  Approximate inference for deep latent gaussian mix-      Hai Wan, Yufei Yang, Jianfeng Du, Yanan Liu, Kunxun
   tures. In NIPS Workshop on Bayesian Deep Learn-           Qi, and Jeff Z. Pan. 2020. Target-aspect-sentiment
   ing, volume 2, page 131.                                  joint detection for aspect-based sentiment analysis.
                                                             Proceedings of the AAAI Conference on Artificial In-
Elise Paul, Andrew Steptoe, and Daisy Fancourt. 2021.        telligence, 34(05):9122–9129.
   Attitudes towards vaccines and intention to vacci-
   nate against covid-19: Implications for public health   Jingjing Wang, Jie Li, Shoushan Li, Yangyang Kang,
   communications. The Lancet Regional Health - Eu-           Min Zhang, Luo Si, and Guodong Zhou. 2018. As-
   rope, 1:100012.                                            pect sentiment classification with both word-level
                                                              and clause-level attention networks. In Proceed-
Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu,
                                                              ings of the Twenty-Seventh International Joint Con-
  and Luo Si. 2020. Knowing what, how and why:
                                                              ference on Artificial Intelligence, IJCAI-18, pages
  A near complete solution for aspect-based sentiment
                                                              4439–4445. International Joint Conferences on Ar-
  analysis. Proceedings of the AAAI Conference on
                                                              tificial Intelligence Organization.
  Artificial Intelligence, 34(05):8600–8607.
Gabriele Pergola, Lin Gui, and Yulan He. 2019.             Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
  TDAM: A topic-dependent attention model for sen-           Chaumond, Clement Delangue, Anthony Moi, Pier-
  timent analysis. Information Processing & Manage-          ric Cistac, Tim Rault, Remi Louf, Morgan Funtow-
  ment, 56(6):102084.                                        icz, Joe Davison, Sam Shleifer, Patrick von Platen,
                                                             Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
Gabriele Pergola, Lin Gui, and Yulan He. 2021a. A            Teven Le Scao, Sylvain Gugger, Mariama Drame,
  disentangled adversarial neural topic model for sep-       Quentin Lhoest, and Alexander Rush. 2020. Trans-
  arating opinions from plots in user reviews. In Pro-       formers: State-of-the-art natural language process-
  ceedings of the 2021 Conference of the North Amer-         ing. In Proceedings of the 2020 Conference on Em-
  ican Chapter of the Association for Computational          pirical Methods in Natural Language Processing:
System Demonstrations, pages 38–45. Association            In Proceedings of the 2015 Conference on Empiri-
  for Computational Linguistics.                             cal Methods in Natural Language Processing, pages
                                                             612–621, Lisbon, Portugal. Association for Compu-
Chuhan Wu, Fangzhao Wu, Sixing Wu, Junxin                    tational Linguistics.
  Liu, Zhigang Yuan, and Yongfeng Huang. 2018.
  THU_NGN at SemEval-2018 task 3: Tweet irony              Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q.
  detection with densely connected LSTM and multi-           Weinberger, and Yoav Artzi. 2020. Bertscore: Eval-
  task learning. In Proceedings of The 12th Interna-          uating text generation with BERT. In 8th Inter-
  tional Workshop on Semantic Evaluation, pages 51–           national Conference on Learning Representations,
  56, New Orleans, Louisiana. Association for Com-            ICLR 2020, Addis Ababa, Ethiopia, April 26-30,
  putational Linguistics.                                    2020. OpenReview.net.

Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016.         Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015b.
  Unsupervised deep embedding for clustering analy-          Character-level convolutional networks for text clas-
  sis. In Proceedings of The 33rd International Con-         sification. In Advances in Neural Information Pro-
  ference on Machine Learning, volume 48 of Pro-             cessing Systems, volume 28. Curran Associates, Inc.
  ceedings of Machine Learning Research, pages 478–
  487, New York, New York, USA. PMLR.                      He Zhao, Longtao Huang, Rong Zhang, Quan Lu, and
                                                             Hui Xue. 2020. SpanMlt: A span-based multi-task
Lu Xu, Hao Li, Wei Lu, and Lidong Bing. 2020.                learning framework for pair-wise aspect and opinion
  Position-aware tagging for aspect sentiment triplet        terms extraction. In Proceedings of the 58th Annual
  extraction. In Proceedings of the 2020 Conference          Meeting of the Association for Computational Lin-
  on Empirical Methods in Natural Language Pro-              guistics, pages 3239–3248. Association for Compu-
  cessing, pages 2339–2349. Association for Compu-           tational Linguistics.
  tational Linguistics.
                                                           Lixing Zhu, Gabriele Pergola, Lin Gui, Deyu Zhou,
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He,             and Yulan He. 2021. Topic-driven and knowledge-
  Alex Smola, and Eduard Hovy. 2016. Hierarchical            aware transformer for dialogue emotion detection.
  attention networks for document classification. In         In Proceedings of the 59th Annual Meeting of the
  Proceedings of the 2016 Conference of the North            Association for Computational Linguistics and the
  American Chapter of the Association for Computa-           11th International Joint Conference on Natural Lan-
  tional Linguistics: Human Language Technologies,           guage Processing (Volume 1: Long Papers), pages
  pages 1480–1489, San Diego, California. Associa-           1571–1582, Online. Association for Computational
  tion for Computational Linguistics.                        Linguistics.

Xiaoyuan Yi, Zhenghao Liu, Wenhao Li, and Maosong
  Sun. 2020. Text style transfer via learning style in-
  stance supported latent space. In Proceedings of
  the Twenty-Ninth International Joint Conference on
  Artificial Intelligence, IJCAI-20, pages 3801–3807.
  International Joint Conferences on Artificial Intelli-
  gence Organization. Main track.

Chen Zhang, Qiuchi Li, and Dawei Song. 2019.
  Aspect-based sentiment classification with aspect-
  specific graph convolutional networks. In Proceed-
  ings of the 2019 Conference on Empirical Methods
  in Natural Language Processing and the 9th Inter-
  national Joint Conference on Natural Language Pro-
  cessing (EMNLP-IJCNLP), pages 4568–4578, Hong
  Kong, China. Association for Computational Lin-
  guistics.

Dejiao Zhang, Feng Nan, Xiaokai Wei, Shang-Wen Li,
  Henghui Zhu, Kathleen McKeown, Ramesh Nallap-
  ati, Andrew O. Arnold, and Bing Xiang. 2021. Sup-
  porting clustering with contrastive learning. In Pro-
  ceedings of the 2021 Conference of the North Amer-
  ican Chapter of the Association for Computational
  Linguistics: Human Language Technologies, pages
  5419–5430. Association for Computational Linguis-
  tics.

Meishan Zhang, Yue Zhang, and Duy-Tin Vo. 2015a.
 Neural networks for open domain targeted sentiment.
A    Derivation of the Decomposed ELBO                  B    Data Collection and Preprocessing
Unsupervised training is based on maximizing the           We are qualified Twitter Academic Research API 5
Evidence Lower Bound (ELBO):                               users. We obtained the ethical approval for our
      Eqφ (zs ,zw |ψ(w)) [log pθ (w|zs , zw , ψ(w))]       proposed research from the university’s ethics com-
                                                           mittee before the start of our work. We col-
            −KL[qφ (zs , zw |ψ(w))||p(zs , zw )],          lected tweets between February 7th and April 3rd,
where z is partitioned into zs and zw . Like standard      2022 using 60 vaccine-related keywords. The
VAE (Kingma and Welling, 2014), the variational            exhaustive list is: ‘covid-19 vax’, ‘covid-19 vac-
distribution is a multivariate Gaussian with a diag- cine’, ‘covid-19 vaccines’, ‘covid-19 vaccination’,
onal covariance:                                          ‘covid-19 vaccinations’, ‘covid-19 jab’, ‘covid-19
                                                           jabs’, ‘covid19 vax’, ‘covid19 vaccine’, ‘covid19
       qφ (zs , zw |ψ(w)) = N (zs , zw |µ, σ 2 I),
                                                           vaccines’, ‘covid19 vaccination’, ‘covid19 vac-
where µ = [µs , µw ] and σ = [σ s , σ w ]. Since the       cinations’, ‘covid19 jab’, ‘covid19 jabs’, ‘covid
coveriance matrix is diagonal, zs and zw are uncor- vax’, ‘covid vaccine’, ‘covid vaccines’, ‘covid
related. Therefore, the joint probability is decom- vaccination’, ‘covid vaccinations’, ‘covid jab’,
posed into:                                               ‘covid jabs’, ‘coronavirus vax’, ‘coronavirus vac-
                                                           cine’, ‘coronavirus vaccines’, ‘coronavirus vacci-
  qφ (zs , zw |ψ(w)) = qφ (zs |ψ(w))qφ (zw |ψ(w)),
                                                           nation’, ‘coronavirus vaccinations’, ‘coronavirus
where qφ (zs |ψ(w)) = N (zs |µs , σ s ), φ are the         jab’, ‘coronavirus jabs’, ‘Pfizer vaccine’, ‘BioN-
variational parameters. The prior of [zs , zw ] ∼          Tech vaccine’, ‘Oxford vaccine’, ‘AstraZeneca
N (zs , zw |0, I) can also be decomposed into the          vaccine’, ‘Moderna vaccine’, ‘Sputnik vaccine’,
product of p(zs ) and p(zw ), then the KL term be- ‘Sinovac vaccine’, ‘Sinopharm vaccine’, ‘Pfizer
comes:                                                     jab’, ‘BioNTech jab’, ‘Oxford jab’, ‘AstraZeneca
   KL[qφ (zs |ψ(w))||p(zs )] + KL[qφ (zw |ψ(w))||p(zw )].  jab’, ‘Moderna jab’, ‘Sputnik jab’, ‘Sinovac jab’,
                                                          ‘Sinopharm jab’, ‘Pfizer vax’, ‘BioNTech vax’, ‘Ox-
As for the decoder pθ (w|zs , zw , ψ(w)), the recon-
                                                           ford vax’, ‘AstraZeneca vax’, ‘Moderna vax’, ‘Sput-
struction of each masked token and w[CLS] are
                                                           nik vax’, ‘Sinovac vax’, ‘Sinopharm vax’, ‘Pfizer
independent from each other, i.e., they are not pre-
                                                           vaccinate’, ‘BioNTech vaccinate’, ‘Oxford vacci-
dicted in an autoregressive way. Therefore, the
                                                           nate’, ‘AstraZeneca vaccinate’, ‘Moderna vacci-
joint probability is decomposed into:
                                                           nate’, ‘Sputnik vaccinate’, ‘Sinovac vaccinate’,
pθ (w|zs , zw , ψ(w))                                     ‘Sinopharm vaccinate’.
 = pθ (w[CLS] |zs , zw , ψ(w)) pθ (w1:n |zs , zw , ψ(w))      Only tweets in English were collected. Retweets
                                                           were discarded. For pre-processing, hyperlinks,
We customize the decoder network to make w[CLS]
                                                           usernames and irregular symbols were removed.
solely dependent on zs , and obtain
                                                           Emojis and emoticons were converted to their lit-
    Eqφ (zs ) Eqφ (zw ) [log pθ (w[CLS] |zs , ψ(w)) +      eral meanings using an emoticon dictionary6 .
                          log pθ (w1:n |zw , ψ(w))]
                                                        C    Hyper-parameters and Training
Here, we omit ψ(w) for notational simplicity.                Details
Given the supervision of annotated aspect spans,
                                                        The dimensions of za , zw and zs are 768, 768 and
the prior of zw is constrained by qφ (zw |ψ(wa:b ))
                                                        32, respectively. For each tweet, the number of
(a.k.a., the encoder of wa:b ), this will change the
                                                        samples from  ∼ N (0, I) is 1. We modified
KL term into:
                                                        the LM-fine-tuning script7 from the HuggingFace
       KL[qφ (zs |ψ(w))||p(zs )]
                                                        library to implement VADET in the masked LM
       + KL[qφ (zw |ψ(w1:n ))||qφ (zw |ψ(wa:b ))],
                                                        learning. We use default settings for the training
and finally the ELBO is expressed as
                                                          5
                                                            https://developer.twitter.com/en/
    Eqφ (zs ) [log pθ (w[CLS] |zs , ψ(w))]              products/twitter-api/academic-research/
                                                        application-info
     + Eqφ (zw ) [log pθ (w1:n |zw , ψ(w))]               6
                                                            https://wprock.fr/en/t/kaomoji/
                                                          7
                                                            https://github.com/huggingface/
     − KL[qφ (zs |ψ(w))||p(zs )]                        transformers/blob/master/examples/
     − KL[qφ (zw |ψ(w1:n ))||qφ (zw |ψ(wa:b ))].        pytorch/language-modeling/run_mlm.py
You can also read