Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media

Page created by Florence Silva

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media

Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude
                                                                   Detection in Social Media
                                                    Lixing Zhu1 , Zheng Fang1 , Gabriele Pergola1 , Rob Procter1,2 , Yulan He1,2
                                                          1
                                                            Department of Computer Science, University of Warwick, UK
                                                                          2
                                                                            The Alan Turing Institute, UK
                                                             {lixing.zhu,z.fang.4,gabriele.pergola.1
                                                               rob.procter,yulan.he}@warwick.ac.uk

                                                               Abstract
                                                                                                      The AstraZeneca one is rough for up to 48 hours; after that you
                                              Building models to detect vaccine attitudes             may still be a bit swollen but you'll basically feel fine. I've had that
                                              on social media is challenging because of               and the virus, and the vaccine is far less unpleasant.

                                              the composite, often intricate aspects involved,        Have felt for the past 24 hours that I’ve been run over by three
arXiv:2205.03296v2 [cs.CL] 20 Jun 2022

                                                                                                      double decker buses after the AstraZeneca vaccine yesterday
                                              and the limited availability of annotated data.         morning. Starting to feel a little normal now but it’s not been nice!
                                              Existing approaches have relied heavily on su-
                                              pervised training that requires abundant anno-          This is quite baffling. I got my second Pfizer vaccine last week and I
                                              tations and pre-defined aspect categories. In-          have gone totally off chocolate! As side effects go, it’s not so bad.
                                              stead, with the aim of leveraging the large             There are some very interesting ties between this vaccines creators
                                              amount of unannotated data now available                and the eugenics movement which is concerning considering it’s
                                              on vaccination, we propose a novel semi-                mainly been promoted as a vaccine for poor folks in the third world.

                                              supervised approach for vaccine attitude de-
                                              tection, called VAD ET. A variational autoen-      Figure 1: Top: Expressions of aspects entangled with
                                              coding architecture based on language models       expressions of opinions. Bottom: Vaccine attitudes
                                              is employed to learn from unlabelled data the      can be expressed towards a wide range of aspects/topics
                                              topical information of the domain. Then, the       relating to vaccination, making it difficult to pre-define
                                              model is fine-tuned with a few manually an-        a set of aspect labels as opposed to corpora typically
                                              notated examples of user attitudes. We vali-       used for aspect-based sentiment analysis.
                                              date the effectiveness of VAD ET on our an-
                                              notated data and also on an existing vaccina-      2021). This scarcity of data is compounded by the
                                              tion corpus annotated with opinions on vac-        diversity of attitudes, making it difficult for models
                                              cines. Our results show that VAD ET is able        to identify all aspects discussed in posts (Morante
                                              to learn disentangled stance and aspect top-       et al., 2020).
                                              ics, and outperforms existing aspect-based sen-
                                              timent analysis models on both stance detec-          As representative examples, consider the two
                                              tion and tweet clustering. Our source code and     tweets about personal experiences for vaccination
                                              dataset are available at http://github.            at the top of Figure 1. The two tweets, despite ad-
                                              com/somethingx1202/VADet.                          dressing a common aspect (vaccine side-effects),
                                                                                                 express opposite stances towards vaccines. How-
                                         1    Introduction
                                                                                                 ever, the aspect and the stances are so fused to-
                                         The aim of vaccine attitude detection in social me-     gether that the whole of the tweets need to be con-
                                         dia is to extract people’s opinions towards vaccines    sidered to derive the proper labels, making it diffi-
                                         by analysing their online posts. This is closely re-    cult to disentangle them using existing methodolo-
                                         lated to aspect-based sentiment analysis in which       gies. Additionally, in the case of vaccines attitude
                                         both aspects and related sentiments need to be          analysis, there is a wide variety of possible aspects
                                         identified. Previous research has been largely fo-      discussed in posts, as shown in the bottom of Fig-
                                         cused on product reviews and relied on aspect-          ure 1, where one tweet ironically addressed vaccine
                                         level sentiment annotations to train models (Barnes     side-effects and the second one expressed instead
                                         et al., 2021), where aspect-opinions are extracted      specific political concerns. This is different from
                                         as triples (Peng et al., 2020), polarized targets (Ma   traditional aspect-based sentiment analysis on prod-
                                         et al., 2018) or sentiment spans (He et al., 2019).     uct reviews where only a small number of aspects
                                         However, for the task of vaccine attitude detec-        need to be pre-defined.
                                         tion on Twitter, such a volume of annotated data is        The recently developed framework for integrat-
                                         barely available (Kunneman et al., 2020; Paul et al.,   ing Variational Auto-Encoder (VAE) (Kingma and

Welling, 2014) and Independent Component Anal- • We have proposed a novel semi-supervised
ysis (ICA) (Khemakhem et al., 2020) sheds light approach for joint latent stance/aspect repre-
on this problem. VAE is an unsupervised method sentation learning and aspect span detection;
that can be used to glean information that must be
retained from the vaccine-related corpus. Mean- • The developed disentangled representation
while, a handful of annotations would induce the learning facilitates better attitude detection
separation of independent factors following the and clustering;
ICA requirement for prior knowledge and induc-
tive biases (Hyvarinen et al., 2019; Locatello et al., • We have constructed an annotated dataset for
2020a,b). To this end, we could disentangle the vaccine attitude detection.
latent factors that are either specific to the aspect or
to the stance, and improve the quality of the latent 2 Related Work
semantics learned from unannotated data.
We frame the problem of vaccine attitude de- Our work is related to three lines of research:
tection as a joint aspect span detection and stance aspect-based sentiment analysis, disentangled rep-
classification task, assuming that a tweet, which resentation learning, and vaccine attitude detection.
is limited to 280 characters, would usually only
discuss one aspect. In particular, we extend a pre- Aspect-Based Sentiment Analysis (ABSA)
trained language model (LM) by adding a topic aims to identify the aspect terms and their polarities
layer, which aims to model the topical theme dis- from text. Much work has been focusing on this
cussed in a tweet. In the absence of annotated data, task. The techniques used include Conditional
the topic layer is trained to reconstruct the input Random Fields (CRFs) (Marcheggiani et al., 2014),
message built on VAE. Given the annotated data, Bidirectional Long Short-Term Memory networks
where each tweet is annotated with an aspect span (BiLSTMs) (Baziotis et al., 2017), Convolutional
and a stance label, the learned topic can be disen- Neural Networks (CNNs) (Zhang et al., 2015b),
tangled into a stance topic and an aspect topic. The Attention Networks (Yang et al., 2016; Pergola
stance topic is used to predict the stance label of the et al., 2021b), DenseLSTMs (Wu et al., 2018),
given tweet, while the aspect topic is used to pre- NestedLSTMs (Moniz and Krueger, 2017), Graph
dict the start and the ending positions of the aspect Neural Networks (Zhang et al., 2019) and their
span. By doing so, we can effectively leverage both combinations (Wang et al., 2018; Zhu et al., 2021;
unannotated and annotated data for model training. Wan et al., 2020), to name a few.
To evaluate the effectiveness of our proposed Zhang et al. (2015a) framed this task as text span
model for vaccine attitude detection on Twitter, we detection, where they used text spans to denote as-
have collected over 1.9 million tweets relating to pects. The same annotation scheme was employed
COVID vaccines between February and April 2021. in (Li et al., 2018b), where intra-word attentions
We have further annotated 2,800 tweets with both were designed to enrich the representations of as-
aspect spans and stance labels. In addition, we pects and predict their polarities. Li et al. (2018c)
have also used an existing Vaccination Corpus1 in formalized the task as a sequence labeling problem
which 294 documents related to the online vacci- under a unified tagging scheme. Their follow-up
nation debate have been annotated with opinions work (Li et al., 2019) explored BERT for end-to-
towards vaccination. Our experimental results on end ABSA. Peng et al. (2020) modified this task
both datasets show that the proposed model outper- by introducing opinion terms to shape the polarity.
forms existing opinion triple extraction model and A similar modification was made in (Zhao et al.,
BERT QA model on both aspect span extraction 2020) to extract aspect-opinion pairs. Position-
and stance classification. Moreover, the learned aware tagging was introduced to entrench the offset
latent aspect topics allow the clustering of user atti- between the aspect span and opinion term (Xu et al.,
tudes towards vaccines, facilitating easier discovery 2020). More recently, instead of using pipeline ap-
of positive and negative attitudes in social media. proaches or sequence tagging, Barnes et al. (2021)
The contribution of this work can be summarised adapted syntactic dependency parsing to perform
as follows: aspect and opinion expression extraction, and po-
larity classification, thus formalizing the task as
1
https://github.com/cltl/VaccinationCorpus structured sentiment analysis.

Disentangled representation learning Deep to perform attitude detection.
generative models learn the hidden semantics of
text, of which many attempt to capture the inde- 3 Methodology
pendent latent factor to steer the generation of text The goal of our work is to detect the stance ex-
in the context of NLP (Hu et al., 2017; Li et al., pressed in a tweet (i.e., ‘pro-vaccination’, ‘anti-
2018a; Pergola et al., 2019; John et al., 2019; Li vaccination’, or ‘neutral’), identify a text span that
et al., 2020). The majority of the aforementioned indicates the concerning aspect of vaccination, and
work employs VAE (Kingma et al., 2014) to learn cluster tweets into groups that share similar aspects.
controllable factors, leading to the abundance of To this end, we propose a novel latent representa-
VAE-based models in disentangled representation tion learning model that jointly learns a stance clas-
learning (Higgins et al., 2017; Burgess et al., 2018; sifier and disentangles the latent variables capturing
Chen et al., 2018). However, previous studies stance and aspect respectively. Our proposed Vac-
show that unsupervised learning of disentangle- cine Attitude Detection (VAD ET) model is firstly
ment by optimising the marginal likelihood in a trained on a large amount of unannotated Twitter
generative model is impossible (Locatello et al., data to learn latent topics via masked Language
2019). While it is also the case that non-linear ICA Model (LM) learning. It is then fine-tuned on a
is unable to uncover the true independent factors, small amount of Twitter data annotated with stance
Khemakhem et al. (2020) established a connection labels and aspect text spans for simultaneously
between those two strands of work, which is of par- stance classification and aspect span start/end po-
ticular interest to us since the proposed framework sition detection. The rationale is that the inductive
learns to approximate the true factorial prior given bias imposed by the annotations would encourage
few examples, recovering a disentangled latent vari- the disentanglement of latent stance topics and as-
able distribution on top of additionally observed pect topics. In what follows, we will present our
variables. In this paper, stance labels and aspect proposed VAD ET model, first under the masked
spans are additionally observed on a handful of LM learning and later extended to the supervised
data, which could be used as inductive biases that setting for learning disentangled stance and aspect
make disentanglement possible. topics.
Vaccine attitude detection Very little literature [CLS] Very grateful to those at Oxford…
exists on attitude detection for vaccination. In con-
trast, there is growing interest in Covid-19 corpus !" /# |", ,(/) ALBERT Encoder Layer 12
VAE
…
construction (Shuja et al., 2021). Of particular in- ALBERT Encoder Layer 7
Decoder *
terest to us, Banda et al. (2021) built an on-going Prior
tweet dataset that traces the development of Covid- Topic Layer + ! ! " ∼ $(0, ()

19 by 3 keywords: “coronavirus”, “2019nCoV”
and “corona virus”. Hussain et al. (2021) uti- -! "|,(/) ALBERT Encoder Layer 6
VAE …
lized hydrated tweets from the aforementioned cor- Encoder , ALBERT Encoder Layer 1
pus to analyze the sentiment towards vaccination.
[CLS] Very [MASK] to those at Oxford…
They used lexicon-based methods (i.e., VADER
and TextBlob) and pre-trained BERT to classify Figure 2: VAD ET in masked language model learning.
the sentiment in order to gain insights into the The latent variables are encoded via the topic layers
temporal sentiment trends. A similar approach incorporated into the masked language model.
has been proposed in (Hu et al., 2021). Lyu et al.
(2021) employed a topic model to discover vaccine- VAD ET in the masked LM learning We insert
related themes in twitter discussions and performed a topic layer into a pre-trained language model such
sentiment classification using lexicon-based meth- as ALBERT, as shown in Figure 2, allowing the
ods. However, none of the work above constructed network to leverage pre-trained information while
datasets about vaccine attitudes, nor did they train fine-tuned on an in-domain corpus. We assume that
Very grateful to those at Oxford @user and p I got my first #COVID19 vac
models to detect attitudes. Morante et al. (2020) there is a continuous latent variable z involved
everyone from the @user as I got my first os
in
built the Vaccination Corpus (VC) with events, at- the language model to reconstruct the original
#COVID19 vaccine . Quick , painless and no side iti text
tributions and opinions annotated in the form of effects
from . Well
the apart from
masked this weird
tokens. We urge to buy
retain ve
the weights of
text spans, which is the only dataset available to us a language model and learn the latent representa-

[CLS] Very grateful to those at Oxford. I’ve got my first #Covid19 vaccine.

                                                         %$!#%&'                           !! , !"            %!#(&)*_$!#%&'
                            !#
         Outputs                                     Masked LM training            Aspect span start/end     Marked LM training
                       Stance label
                                                                                    position detection

                                             7
                         Softmax            %[./0]          7
                                                           %3:5           MLP            Softmax

                   ALBERT                                                                    ALBERT
                                                                   7                              7
                   +6 %[./0] |'# , ((%[./0] )                  +6 %3:5 |'8 , ((%3:5 )         +6 %!:" |'! , ((%!:" )
                                                                                                                                  Prior
       Topic
       Layer                                    !"          !!                                                     !!             + '! ∼ -(0, /)
                                                                                                                                  + '# ∼ -(0, /)
                          &, '# ((%[./0] ))                    &, '2 ((%3:5 ))                     &, '! ((%!:" ))
                                                                                        #$

                                                                                               !#                      …              !$
                                           [CLS] Very [MASK] to those at Oxford. I’ve got my [MASK] #Covid19 [MASK].
                                           ![&'(] !* !, !-                                …                    !+

Figure 3: VAD ET in supervised learning. The text segment highlighted in blue is the annotated aspect span. The
right part learns latent aspect topic za from aspect text span [wa : wb ] only under masked LM learning. The left part
learns jointly latent stance topic zs and latent aspect topic zw from the whole input text, and trained simultaneously
for stance classification and aspect start/end position detection.

tion during the fine-tuning. More concretely, the                        patible for higher layers of the language model.
                Very gratefulatolanguage
topic layer partitions            those at Oxford
                                              model @user  andlower p IThen
                                                        into              got mythe
                                                                                  first #COVID19 vaccine  Veryz grateful
                                                                                        latent presentation                    toatθOxford
                                                                                                                         to those
                                                                                                                 is passed           to @user and everyo
                everyone from the @user as I got my first             os                                  from the @user as I got my first #COVID19 vaccin
layers and higher       layers denoted as ψ and θ, respec- reconstruct the original text.Quick , painless and no side effects .
                #COVID19 vaccine . Quick , painless and no side iti
                effects layers
tively. The lower        . Well apart from this the
                                 constitute      weird urge to buy
                                                     Encoder    that ve
parameterizes the variational posterior distribution                     VAD ET with disentanglement of aspect and
denoted as qφ (z|ψ(w)), while the higher layers re-                      stance       One of the training objectives of vaccine
construct the input tokens, which is referred to as                      attitude detection is to detect the text span that
the Decoder.                                                             indicates the aspect and to predict the associated
   The objective of VAE is to minimize the KL- stance label. Existing approaches rely on structured
divergence between the variational posterior dis- annotations to indicate the boundary and depen-
tribution and the approximated posterior. This                           dency between aspect span and opinion words (Xu
is equivalent to maximizing the Evidence Lower                           et al., 2020; Barnes et al., 2021), or use a two-stage
BOund (ELBO) expressed as:                                               pipeline to detect the aspect span and the associ-
                                                                         ated opinion separately (Peng et al., 2020). The
Eqφ (z|ψ(w)) [log pθ (wH |z, ψ(w))] − KL[qφ (z|ψ(w))||p(z)],             problem is that the opinion expressed in a tweet
                                                                  (1)    and the aspect span often overlap. To mitigate this
where qφ (z|ψ(w)) is the encoder and                                     issue, we instead separate the stance and aspect
pθ (wH |z, ψ(w)) is the decoder.                               Here, from their representations in the latent semantic
w = [wCLS , w1:n ], since the special classifi- space, that is, disentangling latent topics learned by
cation embedding wCLS is automatically prepended                         VAD ET into latent stance topics and latent aspect
to the input sequence (Devlin et al., 2019), wH                          topics. A recent study in disentangled representa-
denotes the reconstructed input.                                         tion learning (Locatello et al., 2019) shows that un-
   Following (Kingma and Welling, 2014), we                              supervised learning of disentangled representations
choose a standard Gaussian distribution as the prior, is theoretically impossible from i.i.d. observations
denoted as p(z), and the diagonal Gaussian distri- without inductive biases, such as grouping informa-
bution z ∼ N (µφ (ψ(w)), σφ2 (ψ(w))) as the vari- tion (Bouchacourt et al., 2018) or access to labels
ational distribution. The decoder computes the                           (Locatello et al., 2020b; Träuble et al., 2021). As
probability of the original token given the latent                       such, we extend our model to a supervised setting
variable sampled from the Encoder. We use the                            in which disentanglement of the latent vectors can
Memory Scheme (Li et al., 2020) to concatenate                           be trained on annotated data.
z and ψ(w), making the latent representation com-                           Figure 3 outlines the overall structure of VAD ET

in the supervised setting. On the right hand side,                enforce the constraint that zs participates in the
we show VAD ET learned from the annotated aspect                  generation of [CLS], which follows an approx-
text span [wa : wb ] under masked LM learning. The                imate posterior qφ (zs |ψ(w[CLS] )). We place the
latent variable za encodes the hidden semantics of                standard Gaussian as the prior over zs ∼ N (0, I)
the aspect expression. We posit that the aspect span              and obtain the ELBO
is generated from a latent representation with a                                                        H
                                                                         Eqφ (zs |ψ(w[CLS] )) [log pθ (w[CLS] |zs , ψ(w[CLS] ))]
standard Gaussian distribution being its prior. The                                                                                (4)
                                                                                           −KL[qφ (zs |ψ(w[CLS] ))||p(zs )]
ELBO for reconstructing the aspect text span is:
                                                                   Since the variational family in Eq. 1 are Gaussian
                                     H                            distributions with diagonal covariance, the joint
   LA = Eqφ (za |ψ(wa:b )) [log pθ (wa:b |za , ψ(wa:b ))]
                                                            (2)   space of [zs , zw ] factorizes as qφ (zs , zw |ψ(w)) =
                       −KL[qφ (za |ψ(wa:b ))||p(za )],
                                                                  qφ (zs |ψ(w))qφ (zw |ψ(w)) (Nalisnick et al., 2016).
where wa:bH denotes the reconstructed aspect span.                Assuming zw to be solely dependent on ψ(w1:n ),
Ideally, the latent variable za does not encode any               we obtain the ELBO for the entire input sequence:
stance information and only captures the aspect                         LS = Eqφ (zw ) Eqφ (zs ) [log pθ (wH |z, ψ(w))]
mentioned in the sentence. Therefore, the zs for the                          − KL [qφ (zw |ψ(w1:n ))||qφ (zw |ψ(wa:b ))]          (5)
language model on the right hand side is detached                             − KL[qφ (zs |ψ(w))||p(zs )].
and the reconstruction loss for [CLS] is set free.
   On the left hand side of Figure 3, we train                    Note that the expectation term can be decomposed
VAD ET on the whole sentence. The input to                        into the expectation term in Eq. 3 and Eq. 4 accord-
VAD ET is formalized as: ‘[CLS] text’. Instead                    ing to the decoder structure. For the full derivation,
of mapping an input to a single latent variable z,                please refer to Appendix A.
as in masked LM learning of VAD ET, the input                        Finally, we perform stance classification and
is now mapped to a latent variable decomposing                    classification for the starting and ending position
into two components, [zs , zw ], one for the stance               over the aspect span of a tweet. We use negative
and another for the aspect. We use a conditionally                log-likelihood loss for both the stance label and
factorized Gaussian prior over the latent variable                aspect span:
zw ∼ pθ (zw |wa:b ), which enables the separation of                                 H
                                                                   Ls = − log p(ys |w[CLS] ),
zs and zw since the diagonal Gaussian is factorized                                      H
                                                                   La = − log p(ya |MLP(w1:n                     H
                                                                                             )) − log p(yb |MLP(w1:n )),
and the conditioning variable wa:b is observed.
   We establish an association between zw and za                  where MLP is a fully-connected feed-forward net-
by specifying pθ (zw |wa:b ) to be the encoder net-               work with tanh activation, ys is the predicted stance
work of qφ (za |wa:b ), since we want the latent se-              label, ya and yb are the starting and ending position
mantics of aspect span to encourage the disentan-                 of the aspect span. The overall training objective
glement of attitude in the latent space. In other                 in the supervised setting is:
words, the prior of zw is configured as the approx-                               L = Ls + La − LS − LA
imate posterior of za to enforce the association
between the disentangled aspect in sentence and                   4     Experiments
the de facto aspect. As a result, the ELBO for the
                                                                  We present below the experimental setup and eval-
original text is written as
                                                                  uation results.
   Eqφ (zw |ψ(w)) [log pθ (wH |zw , ψ(w))]
                                                            (3)   4.1     Experimental Setup
              −KL[qφ (zw |ψ(w))||qφ (zw |ψ(wa:b ))],
                                                                  Datasets We evaluate our proposed VAD ET and
where   wH   denotes the reconstructed input text,                compare it against baselines on two vaccine attitude
zw |w ∼ N (µφ (ψ(w)), σφ2 (ψ(w))). The KL-                        datasets.
divergence allows for some variability since there                VAD is our constructed Vaccine Attitude Dataset.
might be some semantic drift from the original se-                Following (Hussain et al., 2021), we crawl tweets
mantics when the aspect span is placed in a longer                using the Twitter streaming API with 60 pre-
sequence.                                                         defined keywords2 relating to COVID-19 vaccines
  The annotation of the stance label provides an                      2
                                                                        The full keyword list and the details of dataset construc-
additional input. To exploit this inductive bias, we              tion are presented in Appendix B.

VAD VC Baselines We compare the experimental results
Specification Train Test Train Test with the following baselines:
# tweets 2000 800 1162 531 BertQA (Li et al., 2018c): a pre-trained language
# anti-vac. 638 240 822 394 model well-suited for span detection. With BertQA,
# neutral 142 76 41 27 attitude detection is performed by first classifying
# pro-vac. 1220 484 299 110 stance labels then predicting the answer queried
Avg. length 33.5 34.13 29.6 30.24 by the stance label. The text span is configured as
len(aspect) 17.5 18.75 1.03 1.08 the ground-truth answer. We rely on its Hugging-
len(opinion) 27.97 29.01 3.25 3.15
Face4 (Wolf et al., 2020) implementation. We em-
# tokens 67k 27.3k 34.4k 16.8k ploy ALBERT (Lan et al., 2020) as the backbone
language model for both BertQA and VAD ET.
Table 1: Dataset Statistics. ‘# tweets’ denotes the num-
ASTE (Peng et al., 2020): a pipeline approach
ber of tweets in VAD, and for VC it is the number of
sentences. ‘anti-vac.’ means anti-vaccination while consisting of aspect extraction (Li et al., 2018c)
‘pro-vac.’ means pro-vaccination. ‘Avg. length’ and and sentiment labelling (Li et al., 2018b).
‘# token’ measure the number of word tokens.
Evaluation Metrics For stance classification, we
use accuracy and Macro-averaged F1 score. For
(e.g., Pfizer, AstraZeneca, and Moderna). Our final aspect span detection, we follow Rajpurkar et al.
dataset comprises 1.9 million English tweets col- (2016) in adopting exact match (EM) accuracy of
lected between February 7th and April 3rd, 2021. the starting-ending position and Macro-averaged
We randomly sample a subset of tweets for anno- F1 score of the overlap between the prediction and
tation. Upon an initial inspection, we found that ground truth aspect span. For tweet clustering, we
over 97% of tweets mentioned only one aspect. As follow Xie et al. (2016) and Zhang et al. (2021)
such, we annotate each tweet with a stance label and use the Normalized Mutual Information (NMI)
and a text span characterizing the aspect. In total, metric to measure how the clustered group aligns
2,800 tweets have been annotated in which 2,000 with ground-truth categories. In addition, we also
are used for training and the remaining 800 are report the clustering accuracy.
used for testing. The statistics of the dataset is
4.2 Experimental Results
listed in Table 1. The stance labels are imbalanced.
On the other hand, the average opinion length is In all our experiments, VAD ET is firstly pre-trained
longer than the average aspect length, and is close in an unsupervised way on our collected 1.9 million
to the average tweet length. For the purpose of tweets before fine-tuned on the annotated training
evaluation on tweet clustering and latent topic dis- set from the VAD or VC corpora.
entanglement, we further annotate tweets with a
Stance Classification and Aspect Span Detec-
categorical label indicating the aspect category. In-
tion In Table 2, we report the performance on at-
spired by (Morante et al., 2020), we identify 24
titude detection. In stance classification, our model
aspect categories3 and each tweet is annotated with
outperforms both baselines with more significant
one of these categories. It is worth mentioning that
improvements on ASTE. On aspect span extraction,
aspect category labels are not used for training.
VAD ET yields even more noticeable improvements,
VC (Morante et al., 2020) is a vaccination corpus with a 2.3% increase in F1 over BertQA on VAD,
consisting of 294 Internet documents about online and 2.7% on VC. These results indicate that the
vaccine debate annotated with events, 210 of which successful prediction relies on the hidden represen-
are annotated with opinions (in the form of text tation learned in the unsupervised training. The
spans) towards vaccines. The stance label is con- disentanglement of stance and aspect may have
sidered to be the stance for the whole sentence. also contributed to the improvement.
Those sentences with conflicting stance labels are
regarded as neutral. We split the dataset into a ra- Clustering To assess whether the learned latent
tio of 2:1 for training and testing. This eventually aspect topics would allow meaningful categoriza-
left us with 1,162 sentences for training and 531 tion of documents into attitude clusters, we perform
sentences for testing. 4
https://huggingface.co/
transformers/model_doc/albert.html#
3
The full list of aspect categories is shown in Table A1. albertforquestionanswering

Model VAD VC
Stance Acc. F1 Acc. F1
BertQA 0.754 0.742 0.719 0.708
ASTE 0.723 0.710 0.704 0.686
VAD ET 0.763 0.756 0.727 0.713
Aspect Span Acc. F1 Acc. F1
BertQA 0.546 0.722 0.525 0.670
ASTE 0.508 0.684 0.467 0.652
VAD ET 0.556 0.745 0.541 0.697
(a) VADET (b) BertQA
Cluster Acc. NMI Acc. NMI
DEC (BertQA) 0.633 58.1 0.586 52.8 Figure 4: Clustered groups of VAD ET and BertQA on
K-means (BERT) 0.618 56.4 0.571 50.1 the VAD dataset. Each color indicates a ground truth
DEC (VAD ET) 0.679 60.7 0.605 54.7
aspect category. The clusters are dominated by: (1)
Red: the (adverse) side effects of vaccines; (2) Green:
Table 2: Results for stance classification, aspect span
explaining personal experiences with any aspect of vac-
extraction and aspect clustering on both VAD and VC
cines; and (3) Cyan: the immunity level provided by
corpora.
vaccines.
clustering using the disentangled representations
that encode aspects, i.e., zw . Deep Embedding ployed as an evaluation metric for cluster quality
Clustering (DEC) (Xie et al., 2016) is employed as evaluation in an unsupervised way. Recent work
the backend. For comparison, we also run DEC on of Bilal et al. (2021) found that Text Generation
the aspect representations of documents returned Metrics (TGMs) align well with human judgement
by BertQA. For each document, its aspect represen- in evaluating clusters in the context of microblog
tation is obtained by averageing over the fine-tuned posts. TGM by definition measures the similarity
ALBERT representations of the constituent words between the ground-truth and the generated text.
in its aspect span. To assess the quality of clus- The rationale is that a high TGM score means sen-
ters, we need the annotated aspect categories for tence pairs are semantically similar. Here, two
documents in the test set. In VAD, we use the an- metrics are used: BERTScore, which calculates the
notated aspect labels as the ground-truth categories similarity of two sentences as a sum of cosine simi-
whereas in VC we use the annotated event types. larities between their tokens’ embeddings (Zhang
Results are presented in the lower part of Table 2. et al., 2020), and BLEURT, a pre-trained adjudica-
We found a prominent increase in NMI score over tor that fine-tunes BERT on an external dataset of
the baselines. Using the learned latent aspect top- human ratings (Sellam et al., 2020). As in (Bilal
ics as features, DEC (VAD ET) outperforms DEC et al., 2021), we adopt the Exhaustive Approach
(BertQA) by 4.6% and 1.9% in accuracy on VAD that for a cluster C, its coherence score is the aver-
and VC, respectively. We also notice that using age TGM score of every possible tweet pair in the
K-means as the clustering approach directly on the cluster:
BERT-encoded tweet representations gives worse 1 X
results compared to DEC. A similar trend is ob- f (C) = 2 TGM(tweeti , tweetj ).
N
i,j∈[1,N ],i

0.4                                0.4
                                                                      factor.
                      BERTScore                         BERTScore
                      BLEURT                            BLEURT
 0.3                                0.3

                                                                                   1200                                 1200
 0.2                                0.2

                                                                      Perplexity

                                                                                                           Perplexity
                                                                                   1000                                 1000
 0.1                                0.1

                                                                                   800                                  800
  0                                  0
       DEC      K-means DEC               DEC      K-means DEC
                                                                                   600                                  600
       (BertQA) (Bert)    (VADet)         (BertQA) (Bert)   (VADet)
        Vaccine Attitude Dataset              Vaccination Corpus

                                                                                          et

                                                                                                                               et
Figure 5: Semantic coherence evaluated in two metrics.

                                                                                   Figure 6: Conditional perplexity on two corpora.
adopt the language model perplexity conditioned
on za to evaluate the extent to which the disentan-
gled representation improves language generation                      Ablations We conduct ablation studies to inves-
on held-out data. Perplexity is widely used in the                    tigate the effect of semi-supervised learning that
literature of text style transfer (John et al., 2019;                 uses the variational latent representation learning
Yi et al., 2020), where the probability of the gen-                   approach and aspect-stance disentanglement on the
erated language is calculated conditioned on the                      latent semantics. We study their effects on stance
controlled latent code. A lower perplexity score                      classification and aspect span detection. The results
indicates better language generation performance.                     are reported in Table 3.
Following John et al. (2019), we compute an esti-
                        (k)
mated aspect vector ẑa of a cluster k in the train-                                Model               VAD                                VC
ing set as                                                                          Stance          Acc.                F1          Acc.        F1
                      P               (k)
                         i∈cluster k za,i
                                                                                    VADET          0.763         0.756              0.727   0.713
             (k)
           ẑa =                          ,                                         VADET-D        0.751         0.746              0.736   0.716
                   # tweets in cluster k
                                                                                    VADET-U        0.741         0.734              0.712   0.698
            (k)
where za,i is the learned aspect vector of the i-th                                 Aspect Span     Acc.                F1          Acc.        F1
tweet in the k-th cluster. For the stance vector zs ,
                                                                                    VADET          0.556         0.745              0.541   0.697
we sample one value per tweet. The stance vec-
                                               (k)                                  VADET-D        0.540         0.728              0.537   0.684
tor is concatenated with the aspect vector ẑa to                                   VADET-U        0.528         0.712              0.525   0.653
calculate the probability of generating the held-out
data, i.e., the testing set. For the baseline mod-                    Table 3: Results of stance classification and aspect span
els, we choose β-VAE (Higgins et al., 2017) and                       detection of VADET without disentanglement (-D) or
SCHOLAR (Card et al., 2018). We train β-VAE                           unsupervised pre-training (-U).
on the same data with β set to different values.
SCHOLAR is trained on tweet content and stance                           We can observe that on VAD without disentan-
labels. For both the baselines we use ELBO on the                     gled learning or unsupervised pre-training results in
held-out data as an upper bound on perplexity.                        the degradation of the stance classification perfor-
   Figure 6 plots the perplexity score achieved by                    mance. However, on VC, we see a slight increase in
all the methods. Our model achieves the lowest                        classification accuracy without disentangled learn-
perplexity score on both datasets. It managed                         ing. We attribute this to the vagueness of the stance
to decrease the perplexity value by roughly 200                       which might cause the model to disentangle more
compared to the baseline models. SCHOLAR out-                         than it should be. On the aspect span detection task,
performs β-VAE under three settings of β value.                       we observe consistent performance drop across all
We speculate that this might be due to the in-                        metrics and on both datasets. In particular, with-
corporation of the class labels in the training of                    out the pre-training module, the performance drops
SCHOLAR. Nevertheless, VADET produces con-                            more significantly. These results indicate that semi-
genial sentences in aspect groups, with latent codes                  supervised learning is highly effective with VAE,
tweaked to proxy centroids, showing that the dis-                     and the disentanglement of stance and aspect serves
entangled representation does capture the desired                     as a useful component, which leads to noticeable

improvements. Christos Baziotis, Nikos Pelekis, and Christos Doulk-
eridis. 2017. DataStories at SemEval-2017 task 4:
5 Conclusions Deep LSTM with attention for message-level and
topic-based sentiment analysis. In Proceedings of
In this work, we presented a semi-supervised model the 11th International Workshop on Semantic Eval-
to detect user attitudes and distinguish aspects of uation (SemEval-2017), pages 747–754, Vancouver,
Canada. Association for Computational Linguistics.
interest about vaccines on social media. We em-
ployed a Variational Auto-Encoder to encode the Iman Munire Bilal, Bo Wang, Maria Liakata, Rob
main topical information into the language model Procter, and Adam Tsakalidis. 2021. Evaluation
by unsupervised training on a massive, unannotated of thematic coherence in microblogs. In Proceed-
ings of the 59th Annual Meeting of the Association
dataset. The model is then further trained under for Computational Linguistics and the 11th Interna-
a semi-supervised setting that leverages annotated tional Joint Conference on Natural Language Pro-
stance labels and aspect spans to induce the disen- cessing, pages 6800–6814. Association for Compu-
tanglement between stances and aspects in a latent tational Linguistics.
semantic space. We empirically showed the bene- Diane Bouchacourt, Ryota Tomioka, and Sebastian
fits of such an approach for attitude detection and Nowozin. 2018. Multi-level variational autoencoder:
aspect clustering over two vaccine corpora. Ab- Learning disentangled representations from grouped
observations. Proceedings of the AAAI Conference
lation studies show that disentangled learning and
on Artificial Intelligence, 32(1).
unsupervised pre-training are important to effective
vaccine attitude detection. Further investigations Christopher P Burgess, Irina Higgins, Arka Pal, Loic
on the quality of the disentangled representations Matthey, Nick Watters, Guillaume Desjardins, and
Alexander Lerchner. 2018. Understanding disentan-
verify the effectiveness of the disentangled factors. gling in β-vae. arXiv preprint arXiv:1804.03599.
While our current work mainly focuses on short
text of social media data where a sentence is as- Dallas Card, Chenhao Tan, and Noah A. Smith. 2018.
sumed to discuss a single aspect, it would be inter- Neural models for documents with metadata. In Pro-
ceedings of the 56th Annual Meeting of the Asso-
esting to extend our model to deal with longer text ciation for Computational Linguistics, pages 2031–
such as online debates in which multiple arguments 2040, Melbourne, Australia. Association for Compu-
or aspects may appear in a single sentence. tational Linguistics.

Acknowledgements Ricky T. Q. Chen, Xuechen Li, Roger B Grosse, and
David K Duvenaud. 2018. Isolating sources of dis-
This work was funded by the the UK Engineering entanglement in variational autoencoders. In Ad-
vances in Neural Information Processing Systems,
and Physical Sciences Research Council (grant no. volume 31. Curran Associates, Inc.
EP/T017112/1, EP/V048597/1). LZ is supported
by a Chancellor’s International Scholarship at the Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
University of Warwick. YH is supported by a Tur- Kristina Toutanova. 2019. BERT: Pre-training of
deep bidirectional transformers for language under-
ing AI Fellowship funded by the UK Research and standing. In Proceedings of the 2019 Conference
Innovation (grant no. EP/V020579/1). of the North American Chapter of the Association
for Computational Linguistics: Human Language
Technologies, Volume 1 (Long and Short Papers),
References pages 4171–4186, Minneapolis, Minnesota. Associ-
ation for Computational Linguistics.
Juan M Banda, Ramya Tekumalla, Guanyu Wang,
Jingyuan Yu, Tuo Liu, Yuning Ding, Ekaterina Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel
Artemova, Elena Tutubalina, and Gerardo Chowell. Dahlmeier. 2019. An interactive multi-task learn-
2021. A large-scale covid-19 twitter chatter dataset ing network for end-to-end aspect-based sentiment
for open scientific research—an international collab- analysis. In Proceedings of the 57th Annual Meet-
oration. Epidemiologia, 2(3):315–324. ing of the Association for Computational Linguis-
tics, pages 504–515, Florence, Italy. Association for
Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Computational Linguistics.
Øvrelid, and Erik Velldal. 2021. Structured senti-
ment analysis as dependency graph parsing. In Pro- Irina Higgins, Loïc Matthey, Arka Pal, Christopher
ceedings of the 59th Annual Meeting of the Associa- Burgess, Xavier Glorot, Matthew Botvinick, Shakir
tion for Computational Linguistics and the 11th In- Mohamed, and Alexander Lerchner. 2017. beta-vae:
ternational Joint Conference on Natural Language Learning basic visual concepts with a constrained
Processing, pages 3387–3402. Association for Com- variational framework. In 5th International Con-
putational Linguistics. ference on Learning Representations, ICLR 2017,

Toulon, France, April 24-26, 2017, Conference messages. BMC medical informatics and decision
Track Proceedings. OpenReview.net. making, 20(1):1–14.

Tao Hu, Siqin Wang, Wei Luo, Mengxi Zhang, Xiao Zhenzhong Lan, Mingda Chen, Sebastian Goodman,
Huang, Yingwei Yan, Regina Liu, Kelly Ly, Viraj Kevin Gimpel, Piyush Sharma, and Radu Soricut.
Kacker, Bing She, and Zhenlong Li. 2021. Reveal- 2020. ALBERT: A lite BERT for self-supervised
ing public opinion towards covid-19 vaccines with learning of language representations. In 8th Inter-
twitter data in the united states: Spatiotemporal per- national Conference on Learning Representations,
spective. J Med Internet Res, 23(9):e30854. ICLR 2020, Addis Ababa, Ethiopia, April 26-30,
2020. OpenReview.net.
Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan
Salakhutdinov, and Eric P. Xing. 2017. Toward con- Chunyuan Li, Xiang Gao, Yuan Li, Baolin Peng, Xiu-
trolled generation of text. In Proceedings of the jun Li, Yizhe Zhang, and Jianfeng Gao. 2020. Opti-
34th International Conference on Machine Learning, mus: Organizing sentences via pre-trained modeling
volume 70 of Proceedings of Machine Learning Re- of a latent space. In Proceedings of the 2020 Con-
search, pages 1587–1596. PMLR. ference on Empirical Methods in Natural Language
Processing, pages 4678–4699. Association for Com-
Amir Hussain, Ahsen Tahir, Zain Hussain, Zakariya putational Linguistics.
Sheikh, Mandar Gogate, Kia Dashtipour, Azhar
Ali, and Aziz Sheikh. 2021. Artificial intelligence– Juncen Li, Robin Jia, He He, and Percy Liang. 2018a.
enabled analysis of public attitudes on facebook and Delete, retrieve, generate: a simple approach to sen-
twitter toward covid-19 vaccines in the united king- timent and style transfer. In Proceedings of the
dom and the united states: Observational study. J 2018 Conference of the North American Chapter
Med Internet Res, 23(4):e26627. of the Association for Computational Linguistics:
Human Language Technologies, pages 1865–1874,
Aapo Hyvarinen, Hiroaki Sasaki, and Richard Turner. New Orleans, Louisiana. Association for Computa-
2019. Nonlinear ica using auxiliary variables and tional Linguistics.
generalized contrastive learning. In Proceedings of Xin Li, Lidong Bing, Wai Lam, and Bei Shi. 2018b.
the Twenty-Second International Conference on Ar- Transformation networks for target-oriented senti-
tificial Intelligence and Statistics, volume 89 of Pro- ment classification. In Proceedings of the 56th An-
ceedings of Machine Learning Research, pages 859– nual Meeting of the Association for Computational
868. PMLR. Linguistics, pages 946–956, Melbourne, Australia.
Vineet John, Lili Mou, Hareesh Bahuleyan, and Olga Association for Computational Linguistics.
Vechtomova. 2019. Disentangled representation Xin Li, Lidong Bing, Piji Li, Wai Lam, and Zhimou
learning for non-parallel text style transfer. In Pro- Yang. 2018c. Aspect term extraction with history
ceedings of the 57th Annual Meeting of the Associa- attention and selective transformation. In Proceed-
tion for Computational Linguistics, pages 424–434, ings of the Twenty-Seventh International Joint Con-
Florence, Italy. Association for Computational Lin- ference on Artificial Intelligence, IJCAI-18, pages
guistics. 4194–4200. International Joint Conferences on Ar-
tificial Intelligence Organization.
Ilyes Khemakhem, Diederik Kingma, Ricardo Monti,
and Aapo Hyvarinen. 2020. Variational autoen- Xin Li, Lidong Bing, Wenxuan Zhang, and Wai Lam.
coders and nonlinear ica: A unifying framework. In 2019. Exploiting BERT for end-to-end aspect-based
Proceedings of the Twenty Third International Con- sentiment analysis. In Proceedings of the 5th Work-
ference on Artificial Intelligence and Statistics, vol- shop on Noisy User-generated Text (W-NUT 2019),
ume 108 of Proceedings of Machine Learning Re- pages 34–41, Hong Kong, China. Association for
search, pages 2207–2217. PMLR. Computational Linguistics.
Diederik P. Kingma, Danilo J. Rezende, Shakir Mo- Francesco Locatello, Stefan Bauer, Mario Lucic, Gun-
hamed, and Max Welling. 2014. Semi-supervised nar Raetsch, Sylvain Gelly, Bernhard Schölkopf,
learning with deep generative models. In Proceed- and Olivier Bachem. 2019. Challenging common
ings of the 27th International Conference on Neu- assumptions in the unsupervised learning of disen-
ral Information Processing Systems, NIPS’14, page tangled representations. In Proceedings of the 36th
3581–3589, Cambridge, MA, USA. MIT Press. International Conference on Machine Learning, vol-
ume 97 of Proceedings of Machine Learning Re-
Diederik P. Kingma and Max Welling. 2014. Auto- search, pages 4114–4124. PMLR.
Encoding Variational Bayes. In 2nd International
Conference on Learning Representations, ICLR Francesco Locatello, Ben Poole, Gunnar Raetsch,
2014, Banff, AB, Canada, April 14-16, 2014, Con- Bernhard Schölkopf, Olivier Bachem, and Michael
ference Track Proceedings. Tschannen. 2020a. Weakly-supervised disentangle-
ment without compromises. In Proceedings of the
Florian Kunneman, Mattijs Lambooij, Albert Wong, 37th International Conference on Machine Learning,
Antal Van Den Bosch, and Liesbeth Mollema. 2020. volume 119 of Proceedings of Machine Learning Re-
Monitoring stance towards vaccination in twitter search, pages 6348–6359. PMLR.

Francesco Locatello, Michael Tschannen, Stefan Linguistics: Human Language Technologies, pages
Bauer, Gunnar Rätsch, Bernhard Schölkopf, and 2870–2883. Association for Computational Linguis-
Olivier Bachem. 2020b. Disentangling factors of tics.
variations using few labels. In 8th International
Conference on Learning Representations, ICLR Gabriele Pergola, Elena Kochkina, Lin Gui, Maria Li-
2020, Addis Ababa, Ethiopia, April 26-30, 2020. akata, and Yulan He. 2021b. Boosting low-resource
OpenReview.net. biomedical QA via entity-aware masking strategies.
In Proceedings of the 16th Conference of the Euro-
Joanne Chen Lyu, Eileen Le Han, and Garving K Luli. pean Chapter of the Association for Computational
2021. Covid-19 vaccine–related discussion on twit- Linguistics: Main Volume, pages 1977–1985. Asso-
ter: topic modeling and sentiment analysis. Journal ciation for Computational Linguistics.
of medical Internet research, 23(6):e24435.
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and
Yukun Ma, Haiyun Peng, and Erik Cambria. 2018. Tar- Percy Liang. 2016. SQuAD: 100,000+ questions for
geted aspect-based sentiment analysis via embed- machine comprehension of text. In Proceedings of
ding commonsense knowledge into an attentive lstm. the 2016 Conference on Empirical Methods in Natu-
Proceedings of the AAAI Conference on Artificial In- ral Language Processing, pages 2383–2392, Austin,
telligence, 32(1). Texas. Association for Computational Linguistics.
Diego Marcheggiani, Oscar Täckström, Andrea Esuli,
Thibault Sellam, Dipanjan Das, and Ankur Parikh.
and Fabrizio Sebastiani. 2014. Hierarchical multi-
2020. BLEURT: Learning robust metrics for text
label conditional random fields for aspect-oriented
generation. In Proceedings of the 58th Annual Meet-
opinion mining. In Advances in Information Re-
ing of the Association for Computational Linguis-
trieval, pages 273–285, Cham. Springer Interna-
tics, pages 7881–7892. Association for Computa-
tional Publishing.
tional Linguistics.
Joel Ruben Antony Moniz and David Krueger. 2017.
Nested lstms. In Proceedings of the Ninth Asian Junaid Shuja, Eisa Alanazi, Waleed Alasmary, and Ab-
Conference on Machine Learning, volume 77 of Pro- dulaziz Alashaikh. 2021. Covid-19 open source data
ceedings of Machine Learning Research, pages 530– sets: a comprehensive survey. Applied Intelligence,
544, Yonsei University, Seoul, Republic of Korea. 51(3):1296–1325.
PMLR.
Frederik Träuble, Elliot Creager, Niki Kilbertus,
Roser Morante, Chantal van Son, Isa Maks, and Piek Francesco Locatello, Andrea Dittadi, Anirudh
Vossen. 2020. Annotating perspectives on vacci- Goyal, Bernhard Schölkopf, and Stefan Bauer. 2021.
nation. In Proceedings of the 12th Language Re- On disentangled representations learned from corre-
sources and Evaluation Conference, pages 4964– lated data. In Proceedings of the 38th International
4973, Marseille, France. European Language Re- Conference on Machine Learning, volume 139 of
sources Association. Proceedings of Machine Learning Research, pages
10401–10412. PMLR.
Eric Nalisnick, Lars Hertel, and Padhraic Smyth. 2016.
Approximate inference for deep latent gaussian mix- Hai Wan, Yufei Yang, Jianfeng Du, Yanan Liu, Kunxun
tures. In NIPS Workshop on Bayesian Deep Learn- Qi, and Jeff Z. Pan. 2020. Target-aspect-sentiment
ing, volume 2, page 131. joint detection for aspect-based sentiment analysis.
Proceedings of the AAAI Conference on Artificial In-
Elise Paul, Andrew Steptoe, and Daisy Fancourt. 2021. telligence, 34(05):9122–9129.
Attitudes towards vaccines and intention to vacci-
nate against covid-19: Implications for public health Jingjing Wang, Jie Li, Shoushan Li, Yangyang Kang,
communications. The Lancet Regional Health - Eu- Min Zhang, Luo Si, and Guodong Zhou. 2018. As-
rope, 1:100012. pect sentiment classification with both word-level
and clause-level attention networks. In Proceed-
Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu,
ings of the Twenty-Seventh International Joint Con-
and Luo Si. 2020. Knowing what, how and why:
ference on Artificial Intelligence, IJCAI-18, pages
A near complete solution for aspect-based sentiment
4439–4445. International Joint Conferences on Ar-
analysis. Proceedings of the AAAI Conference on
tificial Intelligence Organization.
Artificial Intelligence, 34(05):8600–8607.
Gabriele Pergola, Lin Gui, and Yulan He. 2019. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
TDAM: A topic-dependent attention model for sen- Chaumond, Clement Delangue, Anthony Moi, Pier-
timent analysis. Information Processing & Manage- ric Cistac, Tim Rault, Remi Louf, Morgan Funtow-
ment, 56(6):102084. icz, Joe Davison, Sam Shleifer, Patrick von Platen,
Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
Gabriele Pergola, Lin Gui, and Yulan He. 2021a. A Teven Le Scao, Sylvain Gugger, Mariama Drame,
disentangled adversarial neural topic model for sep- Quentin Lhoest, and Alexander Rush. 2020. Trans-
arating opinions from plots in user reviews. In Pro- formers: State-of-the-art natural language process-
ceedings of the 2021 Conference of the North Amer- ing. In Proceedings of the 2020 Conference on Em-
ican Chapter of the Association for Computational pirical Methods in Natural Language Processing:

System Demonstrations, pages 38–45. Association In Proceedings of the 2015 Conference on Empiri-
for Computational Linguistics. cal Methods in Natural Language Processing, pages
612–621, Lisbon, Portugal. Association for Compu-
Chuhan Wu, Fangzhao Wu, Sixing Wu, Junxin tational Linguistics.
Liu, Zhigang Yuan, and Yongfeng Huang. 2018.
THU_NGN at SemEval-2018 task 3: Tweet irony Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q.
detection with densely connected LSTM and multi- Weinberger, and Yoav Artzi. 2020. Bertscore: Eval-
task learning. In Proceedings of The 12th Interna- uating text generation with BERT. In 8th Inter-
tional Workshop on Semantic Evaluation, pages 51– national Conference on Learning Representations,
56, New Orleans, Louisiana. Association for Com- ICLR 2020, Addis Ababa, Ethiopia, April 26-30,
putational Linguistics. 2020. OpenReview.net.

Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015b.
Unsupervised deep embedding for clustering analy- Character-level convolutional networks for text clas-
sis. In Proceedings of The 33rd International Con- sification. In Advances in Neural Information Pro-
ference on Machine Learning, volume 48 of Pro- cessing Systems, volume 28. Curran Associates, Inc.
ceedings of Machine Learning Research, pages 478–
487, New York, New York, USA. PMLR. He Zhao, Longtao Huang, Rong Zhang, Quan Lu, and
Hui Xue. 2020. SpanMlt: A span-based multi-task
Lu Xu, Hao Li, Wei Lu, and Lidong Bing. 2020. learning framework for pair-wise aspect and opinion
Position-aware tagging for aspect sentiment triplet terms extraction. In Proceedings of the 58th Annual
extraction. In Proceedings of the 2020 Conference Meeting of the Association for Computational Lin-
on Empirical Methods in Natural Language Pro- guistics, pages 3239–3248. Association for Compu-
cessing, pages 2339–2349. Association for Compu- tational Linguistics.
tational Linguistics.
Lixing Zhu, Gabriele Pergola, Lin Gui, Deyu Zhou,
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, and Yulan He. 2021. Topic-driven and knowledge-
Alex Smola, and Eduard Hovy. 2016. Hierarchical aware transformer for dialogue emotion detection.
attention networks for document classification. In In Proceedings of the 59th Annual Meeting of the
Proceedings of the 2016 Conference of the North Association for Computational Linguistics and the
American Chapter of the Association for Computa- 11th International Joint Conference on Natural Lan-
tional Linguistics: Human Language Technologies, guage Processing (Volume 1: Long Papers), pages
pages 1480–1489, San Diego, California. Associa- 1571–1582, Online. Association for Computational
tion for Computational Linguistics. Linguistics.

Xiaoyuan Yi, Zhenghao Liu, Wenhao Li, and Maosong
Sun. 2020. Text style transfer via learning style in-
stance supported latent space. In Proceedings of
the Twenty-Ninth International Joint Conference on
Artificial Intelligence, IJCAI-20, pages 3801–3807.
International Joint Conferences on Artificial Intelli-
gence Organization. Main track.

Chen Zhang, Qiuchi Li, and Dawei Song. 2019.
Aspect-based sentiment classification with aspect-
specific graph convolutional networks. In Proceed-
ings of the 2019 Conference on Empirical Methods
in Natural Language Processing and the 9th Inter-
national Joint Conference on Natural Language Pro-
cessing (EMNLP-IJCNLP), pages 4568–4578, Hong
Kong, China. Association for Computational Lin-
guistics.

Dejiao Zhang, Feng Nan, Xiaokai Wei, Shang-Wen Li,
Henghui Zhu, Kathleen McKeown, Ramesh Nallap-
ati, Andrew O. Arnold, and Bing Xiang. 2021. Sup-
porting clustering with contrastive learning. In Pro-
ceedings of the 2021 Conference of the North Amer-
ican Chapter of the Association for Computational
Linguistics: Human Language Technologies, pages
5419–5430. Association for Computational Linguis-
tics.

Meishan Zhang, Yue Zhang, and Duy-Tin Vo. 2015a.
Neural networks for open domain targeted sentiment.

A    Derivation of the Decomposed ELBO                  B    Data Collection and Preprocessing
Unsupervised training is based on maximizing the           We are qualified Twitter Academic Research API 5
Evidence Lower Bound (ELBO):                               users. We obtained the ethical approval for our
      Eqφ (zs ,zw |ψ(w)) [log pθ (w|zs , zw , ψ(w))]       proposed research from the university’s ethics com-
                                                           mittee before the start of our work. We col-
            −KL[qφ (zs , zw |ψ(w))||p(zs , zw )],          lected tweets between February 7th and April 3rd,
where z is partitioned into zs and zw . Like standard      2022 using 60 vaccine-related keywords. The
VAE (Kingma and Welling, 2014), the variational            exhaustive list is: ‘covid-19 vax’, ‘covid-19 vac-
distribution is a multivariate Gaussian with a diag- cine’, ‘covid-19 vaccines’, ‘covid-19 vaccination’,
onal covariance:                                          ‘covid-19 vaccinations’, ‘covid-19 jab’, ‘covid-19
                                                           jabs’, ‘covid19 vax’, ‘covid19 vaccine’, ‘covid19
       qφ (zs , zw |ψ(w)) = N (zs , zw |µ, σ 2 I),
                                                           vaccines’, ‘covid19 vaccination’, ‘covid19 vac-
where µ = [µs , µw ] and σ = [σ s , σ w ]. Since the       cinations’, ‘covid19 jab’, ‘covid19 jabs’, ‘covid
coveriance matrix is diagonal, zs and zw are uncor- vax’, ‘covid vaccine’, ‘covid vaccines’, ‘covid
related. Therefore, the joint probability is decom- vaccination’, ‘covid vaccinations’, ‘covid jab’,
posed into:                                               ‘covid jabs’, ‘coronavirus vax’, ‘coronavirus vac-
                                                           cine’, ‘coronavirus vaccines’, ‘coronavirus vacci-
  qφ (zs , zw |ψ(w)) = qφ (zs |ψ(w))qφ (zw |ψ(w)),
                                                           nation’, ‘coronavirus vaccinations’, ‘coronavirus
where qφ (zs |ψ(w)) = N (zs |µs , σ s ), φ are the         jab’, ‘coronavirus jabs’, ‘Pfizer vaccine’, ‘BioN-
variational parameters. The prior of [zs , zw ] ∼          Tech vaccine’, ‘Oxford vaccine’, ‘AstraZeneca
N (zs , zw |0, I) can also be decomposed into the          vaccine’, ‘Moderna vaccine’, ‘Sputnik vaccine’,
product of p(zs ) and p(zw ), then the KL term be- ‘Sinovac vaccine’, ‘Sinopharm vaccine’, ‘Pfizer
comes:                                                     jab’, ‘BioNTech jab’, ‘Oxford jab’, ‘AstraZeneca
   KL[qφ (zs |ψ(w))||p(zs )] + KL[qφ (zw |ψ(w))||p(zw )].  jab’, ‘Moderna jab’, ‘Sputnik jab’, ‘Sinovac jab’,
                                                          ‘Sinopharm jab’, ‘Pfizer vax’, ‘BioNTech vax’, ‘Ox-
As for the decoder pθ (w|zs , zw , ψ(w)), the recon-
                                                           ford vax’, ‘AstraZeneca vax’, ‘Moderna vax’, ‘Sput-
struction of each masked token and w[CLS] are
                                                           nik vax’, ‘Sinovac vax’, ‘Sinopharm vax’, ‘Pfizer
independent from each other, i.e., they are not pre-
                                                           vaccinate’, ‘BioNTech vaccinate’, ‘Oxford vacci-
dicted in an autoregressive way. Therefore, the
                                                           nate’, ‘AstraZeneca vaccinate’, ‘Moderna vacci-
joint probability is decomposed into:
                                                           nate’, ‘Sputnik vaccinate’, ‘Sinovac vaccinate’,
pθ (w|zs , zw , ψ(w))                                     ‘Sinopharm vaccinate’.
 = pθ (w[CLS] |zs , zw , ψ(w)) pθ (w1:n |zs , zw , ψ(w))      Only tweets in English were collected. Retweets
                                                           were discarded. For pre-processing, hyperlinks,
We customize the decoder network to make w[CLS]
                                                           usernames and irregular symbols were removed.
solely dependent on zs , and obtain
                                                           Emojis and emoticons were converted to their lit-
    Eqφ (zs ) Eqφ (zw ) [log pθ (w[CLS] |zs , ψ(w)) +      eral meanings using an emoticon dictionary6 .
                          log pθ (w1:n |zw , ψ(w))]
                                                        C    Hyper-parameters and Training
Here, we omit ψ(w) for notational simplicity.                Details
Given the supervision of annotated aspect spans,
                                                        The dimensions of za , zw and zs are 768, 768 and
the prior of zw is constrained by qφ (zw |ψ(wa:b ))
                                                        32, respectively. For each tweet, the number of
(a.k.a., the encoder of wa:b ), this will change the
                                                        samples from  ∼ N (0, I) is 1. We modified
KL term into:
                                                        the LM-fine-tuning script7 from the HuggingFace
       KL[qφ (zs |ψ(w))||p(zs )]
                                                        library to implement VADET in the masked LM
       + KL[qφ (zw |ψ(w1:n ))||qφ (zw |ψ(wa:b ))],
                                                        learning. We use default settings for the training
and finally the ELBO is expressed as
                                                          5
                                                            https://developer.twitter.com/en/
    Eqφ (zs ) [log pθ (w[CLS] |zs , ψ(w))]              products/twitter-api/academic-research/
                                                        application-info
     + Eqφ (zw ) [log pθ (w1:n |zw , ψ(w))]               6
                                                            https://wprock.fr/en/t/kaomoji/
                                                          7
                                                            https://github.com/huggingface/
     − KL[qφ (zs |ψ(w))||p(zs )]                        transformers/blob/master/examples/
     − KL[qφ (zw |ψ(w1:n ))||qφ (zw |ψ(wa:b ))].        pytorch/language-modeling/run_mlm.py

You can also read