Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media Lixing Zhu1 , Zheng Fang1 , Gabriele Pergola1 , Rob Procter1,2 , Yulan He1,2 1 Department of Computer Science, University of Warwick, UK 2 The Alan Turing Institute, UK {lixing.zhu,z.fang.4,gabriele.pergola.1 rob.procter,yulan.he}@warwick.ac.uk Abstract The AstraZeneca one is rough for up to 48 hours; after that you Building models to detect vaccine attitudes may still be a bit swollen but you'll basically feel fine. I've had that on social media is challenging because of and the virus, and the vaccine is far less unpleasant. the composite, often intricate aspects involved, Have felt for the past 24 hours that I’ve been run over by three arXiv:2205.03296v2 [cs.CL] 20 Jun 2022 double decker buses after the AstraZeneca vaccine yesterday and the limited availability of annotated data. morning. Starting to feel a little normal now but it’s not been nice! Existing approaches have relied heavily on su- pervised training that requires abundant anno- This is quite baffling. I got my second Pfizer vaccine last week and I tations and pre-defined aspect categories. In- have gone totally off chocolate! As side effects go, it’s not so bad. stead, with the aim of leveraging the large There are some very interesting ties between this vaccines creators amount of unannotated data now available and the eugenics movement which is concerning considering it’s on vaccination, we propose a novel semi- mainly been promoted as a vaccine for poor folks in the third world. supervised approach for vaccine attitude de- tection, called VAD ET. A variational autoen- Figure 1: Top: Expressions of aspects entangled with coding architecture based on language models expressions of opinions. Bottom: Vaccine attitudes is employed to learn from unlabelled data the can be expressed towards a wide range of aspects/topics topical information of the domain. Then, the relating to vaccination, making it difficult to pre-define model is fine-tuned with a few manually an- a set of aspect labels as opposed to corpora typically notated examples of user attitudes. We vali- used for aspect-based sentiment analysis. date the effectiveness of VAD ET on our an- notated data and also on an existing vaccina- 2021). This scarcity of data is compounded by the tion corpus annotated with opinions on vac- diversity of attitudes, making it difficult for models cines. Our results show that VAD ET is able to identify all aspects discussed in posts (Morante to learn disentangled stance and aspect top- et al., 2020). ics, and outperforms existing aspect-based sen- timent analysis models on both stance detec- As representative examples, consider the two tion and tweet clustering. Our source code and tweets about personal experiences for vaccination dataset are available at http://github. at the top of Figure 1. The two tweets, despite ad- com/somethingx1202/VADet. dressing a common aspect (vaccine side-effects), express opposite stances towards vaccines. How- 1 Introduction ever, the aspect and the stances are so fused to- The aim of vaccine attitude detection in social me- gether that the whole of the tweets need to be con- dia is to extract people’s opinions towards vaccines sidered to derive the proper labels, making it diffi- by analysing their online posts. This is closely re- cult to disentangle them using existing methodolo- lated to aspect-based sentiment analysis in which gies. Additionally, in the case of vaccines attitude both aspects and related sentiments need to be analysis, there is a wide variety of possible aspects identified. Previous research has been largely fo- discussed in posts, as shown in the bottom of Fig- cused on product reviews and relied on aspect- ure 1, where one tweet ironically addressed vaccine level sentiment annotations to train models (Barnes side-effects and the second one expressed instead et al., 2021), where aspect-opinions are extracted specific political concerns. This is different from as triples (Peng et al., 2020), polarized targets (Ma traditional aspect-based sentiment analysis on prod- et al., 2018) or sentiment spans (He et al., 2019). uct reviews where only a small number of aspects However, for the task of vaccine attitude detec- need to be pre-defined. tion on Twitter, such a volume of annotated data is The recently developed framework for integrat- barely available (Kunneman et al., 2020; Paul et al., ing Variational Auto-Encoder (VAE) (Kingma and
Welling, 2014) and Independent Component Anal- • We have proposed a novel semi-supervised ysis (ICA) (Khemakhem et al., 2020) sheds light approach for joint latent stance/aspect repre- on this problem. VAE is an unsupervised method sentation learning and aspect span detection; that can be used to glean information that must be retained from the vaccine-related corpus. Mean- • The developed disentangled representation while, a handful of annotations would induce the learning facilitates better attitude detection separation of independent factors following the and clustering; ICA requirement for prior knowledge and induc- tive biases (Hyvarinen et al., 2019; Locatello et al., • We have constructed an annotated dataset for 2020a,b). To this end, we could disentangle the vaccine attitude detection. latent factors that are either specific to the aspect or to the stance, and improve the quality of the latent 2 Related Work semantics learned from unannotated data. We frame the problem of vaccine attitude de- Our work is related to three lines of research: tection as a joint aspect span detection and stance aspect-based sentiment analysis, disentangled rep- classification task, assuming that a tweet, which resentation learning, and vaccine attitude detection. is limited to 280 characters, would usually only discuss one aspect. In particular, we extend a pre- Aspect-Based Sentiment Analysis (ABSA) trained language model (LM) by adding a topic aims to identify the aspect terms and their polarities layer, which aims to model the topical theme dis- from text. Much work has been focusing on this cussed in a tweet. In the absence of annotated data, task. The techniques used include Conditional the topic layer is trained to reconstruct the input Random Fields (CRFs) (Marcheggiani et al., 2014), message built on VAE. Given the annotated data, Bidirectional Long Short-Term Memory networks where each tweet is annotated with an aspect span (BiLSTMs) (Baziotis et al., 2017), Convolutional and a stance label, the learned topic can be disen- Neural Networks (CNNs) (Zhang et al., 2015b), tangled into a stance topic and an aspect topic. The Attention Networks (Yang et al., 2016; Pergola stance topic is used to predict the stance label of the et al., 2021b), DenseLSTMs (Wu et al., 2018), given tweet, while the aspect topic is used to pre- NestedLSTMs (Moniz and Krueger, 2017), Graph dict the start and the ending positions of the aspect Neural Networks (Zhang et al., 2019) and their span. By doing so, we can effectively leverage both combinations (Wang et al., 2018; Zhu et al., 2021; unannotated and annotated data for model training. Wan et al., 2020), to name a few. To evaluate the effectiveness of our proposed Zhang et al. (2015a) framed this task as text span model for vaccine attitude detection on Twitter, we detection, where they used text spans to denote as- have collected over 1.9 million tweets relating to pects. The same annotation scheme was employed COVID vaccines between February and April 2021. in (Li et al., 2018b), where intra-word attentions We have further annotated 2,800 tweets with both were designed to enrich the representations of as- aspect spans and stance labels. In addition, we pects and predict their polarities. Li et al. (2018c) have also used an existing Vaccination Corpus1 in formalized the task as a sequence labeling problem which 294 documents related to the online vacci- under a unified tagging scheme. Their follow-up nation debate have been annotated with opinions work (Li et al., 2019) explored BERT for end-to- towards vaccination. Our experimental results on end ABSA. Peng et al. (2020) modified this task both datasets show that the proposed model outper- by introducing opinion terms to shape the polarity. forms existing opinion triple extraction model and A similar modification was made in (Zhao et al., BERT QA model on both aspect span extraction 2020) to extract aspect-opinion pairs. Position- and stance classification. Moreover, the learned aware tagging was introduced to entrench the offset latent aspect topics allow the clustering of user atti- between the aspect span and opinion term (Xu et al., tudes towards vaccines, facilitating easier discovery 2020). More recently, instead of using pipeline ap- of positive and negative attitudes in social media. proaches or sequence tagging, Barnes et al. (2021) The contribution of this work can be summarised adapted syntactic dependency parsing to perform as follows: aspect and opinion expression extraction, and po- larity classification, thus formalizing the task as 1 https://github.com/cltl/VaccinationCorpus structured sentiment analysis.
Disentangled representation learning Deep to perform attitude detection. generative models learn the hidden semantics of text, of which many attempt to capture the inde- 3 Methodology pendent latent factor to steer the generation of text The goal of our work is to detect the stance ex- in the context of NLP (Hu et al., 2017; Li et al., pressed in a tweet (i.e., ‘pro-vaccination’, ‘anti- 2018a; Pergola et al., 2019; John et al., 2019; Li vaccination’, or ‘neutral’), identify a text span that et al., 2020). The majority of the aforementioned indicates the concerning aspect of vaccination, and work employs VAE (Kingma et al., 2014) to learn cluster tweets into groups that share similar aspects. controllable factors, leading to the abundance of To this end, we propose a novel latent representa- VAE-based models in disentangled representation tion learning model that jointly learns a stance clas- learning (Higgins et al., 2017; Burgess et al., 2018; sifier and disentangles the latent variables capturing Chen et al., 2018). However, previous studies stance and aspect respectively. Our proposed Vac- show that unsupervised learning of disentangle- cine Attitude Detection (VAD ET) model is firstly ment by optimising the marginal likelihood in a trained on a large amount of unannotated Twitter generative model is impossible (Locatello et al., data to learn latent topics via masked Language 2019). While it is also the case that non-linear ICA Model (LM) learning. It is then fine-tuned on a is unable to uncover the true independent factors, small amount of Twitter data annotated with stance Khemakhem et al. (2020) established a connection labels and aspect text spans for simultaneously between those two strands of work, which is of par- stance classification and aspect span start/end po- ticular interest to us since the proposed framework sition detection. The rationale is that the inductive learns to approximate the true factorial prior given bias imposed by the annotations would encourage few examples, recovering a disentangled latent vari- the disentanglement of latent stance topics and as- able distribution on top of additionally observed pect topics. In what follows, we will present our variables. In this paper, stance labels and aspect proposed VAD ET model, first under the masked spans are additionally observed on a handful of LM learning and later extended to the supervised data, which could be used as inductive biases that setting for learning disentangled stance and aspect make disentanglement possible. topics. Vaccine attitude detection Very little literature [CLS] Very grateful to those at Oxford… exists on attitude detection for vaccination. In con- trast, there is growing interest in Covid-19 corpus !" /# |", ,(/) ALBERT Encoder Layer 12 VAE … construction (Shuja et al., 2021). Of particular in- ALBERT Encoder Layer 7 Decoder * terest to us, Banda et al. (2021) built an on-going Prior tweet dataset that traces the development of Covid- Topic Layer + ! ! " ∼ $(0, () 19 by 3 keywords: “coronavirus”, “2019nCoV” and “corona virus”. Hussain et al. (2021) uti- -! "|,(/) ALBERT Encoder Layer 6 VAE … lized hydrated tweets from the aforementioned cor- Encoder , ALBERT Encoder Layer 1 pus to analyze the sentiment towards vaccination. [CLS] Very [MASK] to those at Oxford… They used lexicon-based methods (i.e., VADER and TextBlob) and pre-trained BERT to classify Figure 2: VAD ET in masked language model learning. the sentiment in order to gain insights into the The latent variables are encoded via the topic layers temporal sentiment trends. A similar approach incorporated into the masked language model. has been proposed in (Hu et al., 2021). Lyu et al. (2021) employed a topic model to discover vaccine- VAD ET in the masked LM learning We insert related themes in twitter discussions and performed a topic layer into a pre-trained language model such sentiment classification using lexicon-based meth- as ALBERT, as shown in Figure 2, allowing the ods. However, none of the work above constructed network to leverage pre-trained information while datasets about vaccine attitudes, nor did they train fine-tuned on an in-domain corpus. We assume that Very grateful to those at Oxford @user and p I got my first #COVID19 vac models to detect attitudes. Morante et al. (2020) there is a continuous latent variable z involved everyone from the @user as I got my first os in built the Vaccination Corpus (VC) with events, at- the language model to reconstruct the original #COVID19 vaccine . Quick , painless and no side iti text tributions and opinions annotated in the form of effects from . Well the apart from masked this weird tokens. We urge to buy retain ve the weights of text spans, which is the only dataset available to us a language model and learn the latent representa-
[CLS] Very grateful to those at Oxford. I’ve got my first #Covid19 vaccine. %$!#%&' !! , !" %!#(&)*_$!#%&' !# Outputs Masked LM training Aspect span start/end Marked LM training Stance label position detection 7 Softmax %[./0] 7 %3:5 MLP Softmax ALBERT ALBERT 7 7 +6 %[./0] |'# , ((%[./0] ) +6 %3:5 |'8 , ((%3:5 ) +6 %!:" |'! , ((%!:" ) Prior Topic Layer !" !! !! + '! ∼ -(0, /) + '# ∼ -(0, /) &, '# ((%[./0] )) &, '2 ((%3:5 )) &, '! ((%!:" )) #$ !# … !$ [CLS] Very [MASK] to those at Oxford. I’ve got my [MASK] #Covid19 [MASK]. ![&'(] !* !, !- … !+ Figure 3: VAD ET in supervised learning. The text segment highlighted in blue is the annotated aspect span. The right part learns latent aspect topic za from aspect text span [wa : wb ] only under masked LM learning. The left part learns jointly latent stance topic zs and latent aspect topic zw from the whole input text, and trained simultaneously for stance classification and aspect start/end position detection. tion during the fine-tuning. More concretely, the patible for higher layers of the language model. Very gratefulatolanguage topic layer partitions those at Oxford model @user andlower p IThen into got mythe first #COVID19 vaccine Veryz grateful latent presentation toatθOxford to those is passed to @user and everyo everyone from the @user as I got my first os from the @user as I got my first #COVID19 vaccin layers and higher layers denoted as ψ and θ, respec- reconstruct the original text.Quick , painless and no side effects . #COVID19 vaccine . Quick , painless and no side iti effects layers tively. The lower . Well apart from this the constitute weird urge to buy Encoder that ve parameterizes the variational posterior distribution VAD ET with disentanglement of aspect and denoted as qφ (z|ψ(w)), while the higher layers re- stance One of the training objectives of vaccine construct the input tokens, which is referred to as attitude detection is to detect the text span that the Decoder. indicates the aspect and to predict the associated The objective of VAE is to minimize the KL- stance label. Existing approaches rely on structured divergence between the variational posterior dis- annotations to indicate the boundary and depen- tribution and the approximated posterior. This dency between aspect span and opinion words (Xu is equivalent to maximizing the Evidence Lower et al., 2020; Barnes et al., 2021), or use a two-stage BOund (ELBO) expressed as: pipeline to detect the aspect span and the associ- ated opinion separately (Peng et al., 2020). The Eqφ (z|ψ(w)) [log pθ (wH |z, ψ(w))] − KL[qφ (z|ψ(w))||p(z)], problem is that the opinion expressed in a tweet (1) and the aspect span often overlap. To mitigate this where qφ (z|ψ(w)) is the encoder and issue, we instead separate the stance and aspect pθ (wH |z, ψ(w)) is the decoder. Here, from their representations in the latent semantic w = [wCLS , w1:n ], since the special classifi- space, that is, disentangling latent topics learned by cation embedding wCLS is automatically prepended VAD ET into latent stance topics and latent aspect to the input sequence (Devlin et al., 2019), wH topics. A recent study in disentangled representa- denotes the reconstructed input. tion learning (Locatello et al., 2019) shows that un- Following (Kingma and Welling, 2014), we supervised learning of disentangled representations choose a standard Gaussian distribution as the prior, is theoretically impossible from i.i.d. observations denoted as p(z), and the diagonal Gaussian distri- without inductive biases, such as grouping informa- bution z ∼ N (µφ (ψ(w)), σφ2 (ψ(w))) as the vari- tion (Bouchacourt et al., 2018) or access to labels ational distribution. The decoder computes the (Locatello et al., 2020b; Träuble et al., 2021). As probability of the original token given the latent such, we extend our model to a supervised setting variable sampled from the Encoder. We use the in which disentanglement of the latent vectors can Memory Scheme (Li et al., 2020) to concatenate be trained on annotated data. z and ψ(w), making the latent representation com- Figure 3 outlines the overall structure of VAD ET
in the supervised setting. On the right hand side, enforce the constraint that zs participates in the we show VAD ET learned from the annotated aspect generation of [CLS], which follows an approx- text span [wa : wb ] under masked LM learning. The imate posterior qφ (zs |ψ(w[CLS] )). We place the latent variable za encodes the hidden semantics of standard Gaussian as the prior over zs ∼ N (0, I) the aspect expression. We posit that the aspect span and obtain the ELBO is generated from a latent representation with a H Eqφ (zs |ψ(w[CLS] )) [log pθ (w[CLS] |zs , ψ(w[CLS] ))] standard Gaussian distribution being its prior. The (4) −KL[qφ (zs |ψ(w[CLS] ))||p(zs )] ELBO for reconstructing the aspect text span is: Since the variational family in Eq. 1 are Gaussian H distributions with diagonal covariance, the joint LA = Eqφ (za |ψ(wa:b )) [log pθ (wa:b |za , ψ(wa:b ))] (2) space of [zs , zw ] factorizes as qφ (zs , zw |ψ(w)) = −KL[qφ (za |ψ(wa:b ))||p(za )], qφ (zs |ψ(w))qφ (zw |ψ(w)) (Nalisnick et al., 2016). where wa:bH denotes the reconstructed aspect span. Assuming zw to be solely dependent on ψ(w1:n ), Ideally, the latent variable za does not encode any we obtain the ELBO for the entire input sequence: stance information and only captures the aspect LS = Eqφ (zw ) Eqφ (zs ) [log pθ (wH |z, ψ(w))] mentioned in the sentence. Therefore, the zs for the − KL [qφ (zw |ψ(w1:n ))||qφ (zw |ψ(wa:b ))] (5) language model on the right hand side is detached − KL[qφ (zs |ψ(w))||p(zs )]. and the reconstruction loss for [CLS] is set free. On the left hand side of Figure 3, we train Note that the expectation term can be decomposed VAD ET on the whole sentence. The input to into the expectation term in Eq. 3 and Eq. 4 accord- VAD ET is formalized as: ‘[CLS] text’. Instead ing to the decoder structure. For the full derivation, of mapping an input to a single latent variable z, please refer to Appendix A. as in masked LM learning of VAD ET, the input Finally, we perform stance classification and is now mapped to a latent variable decomposing classification for the starting and ending position into two components, [zs , zw ], one for the stance over the aspect span of a tweet. We use negative and another for the aspect. We use a conditionally log-likelihood loss for both the stance label and factorized Gaussian prior over the latent variable aspect span: zw ∼ pθ (zw |wa:b ), which enables the separation of H Ls = − log p(ys |w[CLS] ), zs and zw since the diagonal Gaussian is factorized H La = − log p(ya |MLP(w1:n H )) − log p(yb |MLP(w1:n )), and the conditioning variable wa:b is observed. We establish an association between zw and za where MLP is a fully-connected feed-forward net- by specifying pθ (zw |wa:b ) to be the encoder net- work with tanh activation, ys is the predicted stance work of qφ (za |wa:b ), since we want the latent se- label, ya and yb are the starting and ending position mantics of aspect span to encourage the disentan- of the aspect span. The overall training objective glement of attitude in the latent space. In other in the supervised setting is: words, the prior of zw is configured as the approx- L = Ls + La − LS − LA imate posterior of za to enforce the association between the disentangled aspect in sentence and 4 Experiments the de facto aspect. As a result, the ELBO for the We present below the experimental setup and eval- original text is written as uation results. Eqφ (zw |ψ(w)) [log pθ (wH |zw , ψ(w))] (3) 4.1 Experimental Setup −KL[qφ (zw |ψ(w))||qφ (zw |ψ(wa:b ))], Datasets We evaluate our proposed VAD ET and where wH denotes the reconstructed input text, compare it against baselines on two vaccine attitude zw |w ∼ N (µφ (ψ(w)), σφ2 (ψ(w))). The KL- datasets. divergence allows for some variability since there VAD is our constructed Vaccine Attitude Dataset. might be some semantic drift from the original se- Following (Hussain et al., 2021), we crawl tweets mantics when the aspect span is placed in a longer using the Twitter streaming API with 60 pre- sequence. defined keywords2 relating to COVID-19 vaccines The annotation of the stance label provides an 2 The full keyword list and the details of dataset construc- additional input. To exploit this inductive bias, we tion are presented in Appendix B.
VAD VC Baselines We compare the experimental results Specification Train Test Train Test with the following baselines: # tweets 2000 800 1162 531 BertQA (Li et al., 2018c): a pre-trained language # anti-vac. 638 240 822 394 model well-suited for span detection. With BertQA, # neutral 142 76 41 27 attitude detection is performed by first classifying # pro-vac. 1220 484 299 110 stance labels then predicting the answer queried Avg. length 33.5 34.13 29.6 30.24 by the stance label. The text span is configured as len(aspect) 17.5 18.75 1.03 1.08 the ground-truth answer. We rely on its Hugging- len(opinion) 27.97 29.01 3.25 3.15 Face4 (Wolf et al., 2020) implementation. We em- # tokens 67k 27.3k 34.4k 16.8k ploy ALBERT (Lan et al., 2020) as the backbone language model for both BertQA and VAD ET. Table 1: Dataset Statistics. ‘# tweets’ denotes the num- ASTE (Peng et al., 2020): a pipeline approach ber of tweets in VAD, and for VC it is the number of sentences. ‘anti-vac.’ means anti-vaccination while consisting of aspect extraction (Li et al., 2018c) ‘pro-vac.’ means pro-vaccination. ‘Avg. length’ and and sentiment labelling (Li et al., 2018b). ‘# token’ measure the number of word tokens. Evaluation Metrics For stance classification, we use accuracy and Macro-averaged F1 score. For (e.g., Pfizer, AstraZeneca, and Moderna). Our final aspect span detection, we follow Rajpurkar et al. dataset comprises 1.9 million English tweets col- (2016) in adopting exact match (EM) accuracy of lected between February 7th and April 3rd, 2021. the starting-ending position and Macro-averaged We randomly sample a subset of tweets for anno- F1 score of the overlap between the prediction and tation. Upon an initial inspection, we found that ground truth aspect span. For tweet clustering, we over 97% of tweets mentioned only one aspect. As follow Xie et al. (2016) and Zhang et al. (2021) such, we annotate each tweet with a stance label and use the Normalized Mutual Information (NMI) and a text span characterizing the aspect. In total, metric to measure how the clustered group aligns 2,800 tweets have been annotated in which 2,000 with ground-truth categories. In addition, we also are used for training and the remaining 800 are report the clustering accuracy. used for testing. The statistics of the dataset is 4.2 Experimental Results listed in Table 1. The stance labels are imbalanced. On the other hand, the average opinion length is In all our experiments, VAD ET is firstly pre-trained longer than the average aspect length, and is close in an unsupervised way on our collected 1.9 million to the average tweet length. For the purpose of tweets before fine-tuned on the annotated training evaluation on tweet clustering and latent topic dis- set from the VAD or VC corpora. entanglement, we further annotate tweets with a Stance Classification and Aspect Span Detec- categorical label indicating the aspect category. In- tion In Table 2, we report the performance on at- spired by (Morante et al., 2020), we identify 24 titude detection. In stance classification, our model aspect categories3 and each tweet is annotated with outperforms both baselines with more significant one of these categories. It is worth mentioning that improvements on ASTE. On aspect span extraction, aspect category labels are not used for training. VAD ET yields even more noticeable improvements, VC (Morante et al., 2020) is a vaccination corpus with a 2.3% increase in F1 over BertQA on VAD, consisting of 294 Internet documents about online and 2.7% on VC. These results indicate that the vaccine debate annotated with events, 210 of which successful prediction relies on the hidden represen- are annotated with opinions (in the form of text tation learned in the unsupervised training. The spans) towards vaccines. The stance label is con- disentanglement of stance and aspect may have sidered to be the stance for the whole sentence. also contributed to the improvement. Those sentences with conflicting stance labels are regarded as neutral. We split the dataset into a ra- Clustering To assess whether the learned latent tio of 2:1 for training and testing. This eventually aspect topics would allow meaningful categoriza- left us with 1,162 sentences for training and 531 tion of documents into attitude clusters, we perform sentences for testing. 4 https://huggingface.co/ transformers/model_doc/albert.html# 3 The full list of aspect categories is shown in Table A1. albertforquestionanswering
Model VAD VC Stance Acc. F1 Acc. F1 BertQA 0.754 0.742 0.719 0.708 ASTE 0.723 0.710 0.704 0.686 VAD ET 0.763 0.756 0.727 0.713 Aspect Span Acc. F1 Acc. F1 BertQA 0.546 0.722 0.525 0.670 ASTE 0.508 0.684 0.467 0.652 VAD ET 0.556 0.745 0.541 0.697 (a) VADET (b) BertQA Cluster Acc. NMI Acc. NMI DEC (BertQA) 0.633 58.1 0.586 52.8 Figure 4: Clustered groups of VAD ET and BertQA on K-means (BERT) 0.618 56.4 0.571 50.1 the VAD dataset. Each color indicates a ground truth DEC (VAD ET) 0.679 60.7 0.605 54.7 aspect category. The clusters are dominated by: (1) Red: the (adverse) side effects of vaccines; (2) Green: Table 2: Results for stance classification, aspect span explaining personal experiences with any aspect of vac- extraction and aspect clustering on both VAD and VC cines; and (3) Cyan: the immunity level provided by corpora. vaccines. clustering using the disentangled representations that encode aspects, i.e., zw . Deep Embedding ployed as an evaluation metric for cluster quality Clustering (DEC) (Xie et al., 2016) is employed as evaluation in an unsupervised way. Recent work the backend. For comparison, we also run DEC on of Bilal et al. (2021) found that Text Generation the aspect representations of documents returned Metrics (TGMs) align well with human judgement by BertQA. For each document, its aspect represen- in evaluating clusters in the context of microblog tation is obtained by averageing over the fine-tuned posts. TGM by definition measures the similarity ALBERT representations of the constituent words between the ground-truth and the generated text. in its aspect span. To assess the quality of clus- The rationale is that a high TGM score means sen- ters, we need the annotated aspect categories for tence pairs are semantically similar. Here, two documents in the test set. In VAD, we use the an- metrics are used: BERTScore, which calculates the notated aspect labels as the ground-truth categories similarity of two sentences as a sum of cosine simi- whereas in VC we use the annotated event types. larities between their tokens’ embeddings (Zhang Results are presented in the lower part of Table 2. et al., 2020), and BLEURT, a pre-trained adjudica- We found a prominent increase in NMI score over tor that fine-tunes BERT on an external dataset of the baselines. Using the learned latent aspect top- human ratings (Sellam et al., 2020). As in (Bilal ics as features, DEC (VAD ET) outperforms DEC et al., 2021), we adopt the Exhaustive Approach (BertQA) by 4.6% and 1.9% in accuracy on VAD that for a cluster C, its coherence score is the aver- and VC, respectively. We also notice that using age TGM score of every possible tweet pair in the K-means as the clustering approach directly on the cluster: BERT-encoded tweet representations gives worse 1 X results compared to DEC. A similar trend is ob- f (C) = 2 TGM(tweeti , tweetj ). N i,j∈[1,N ],i
0.4 0.4 factor. BERTScore BERTScore BLEURT BLEURT 0.3 0.3 1200 1200 0.2 0.2 Perplexity Perplexity 1000 1000 0.1 0.1 800 800 0 0 DEC K-means DEC DEC K-means DEC 600 600 (BertQA) (Bert) (VADet) (BertQA) (Bert) (VADet) Vaccine Attitude Dataset Vaccination Corpus et et Figure 5: Semantic coherence evaluated in two metrics. Figure 6: Conditional perplexity on two corpora. adopt the language model perplexity conditioned on za to evaluate the extent to which the disentan- gled representation improves language generation Ablations We conduct ablation studies to inves- on held-out data. Perplexity is widely used in the tigate the effect of semi-supervised learning that literature of text style transfer (John et al., 2019; uses the variational latent representation learning Yi et al., 2020), where the probability of the gen- approach and aspect-stance disentanglement on the erated language is calculated conditioned on the latent semantics. We study their effects on stance controlled latent code. A lower perplexity score classification and aspect span detection. The results indicates better language generation performance. are reported in Table 3. Following John et al. (2019), we compute an esti- (k) mated aspect vector ẑa of a cluster k in the train- Model VAD VC ing set as Stance Acc. F1 Acc. F1 P (k) i∈cluster k za,i VADET 0.763 0.756 0.727 0.713 (k) ẑa = , VADET-D 0.751 0.746 0.736 0.716 # tweets in cluster k VADET-U 0.741 0.734 0.712 0.698 (k) where za,i is the learned aspect vector of the i-th Aspect Span Acc. F1 Acc. F1 tweet in the k-th cluster. For the stance vector zs , VADET 0.556 0.745 0.541 0.697 we sample one value per tweet. The stance vec- (k) VADET-D 0.540 0.728 0.537 0.684 tor is concatenated with the aspect vector ẑa to VADET-U 0.528 0.712 0.525 0.653 calculate the probability of generating the held-out data, i.e., the testing set. For the baseline mod- Table 3: Results of stance classification and aspect span els, we choose β-VAE (Higgins et al., 2017) and detection of VADET without disentanglement (-D) or SCHOLAR (Card et al., 2018). We train β-VAE unsupervised pre-training (-U). on the same data with β set to different values. SCHOLAR is trained on tweet content and stance We can observe that on VAD without disentan- labels. For both the baselines we use ELBO on the gled learning or unsupervised pre-training results in held-out data as an upper bound on perplexity. the degradation of the stance classification perfor- Figure 6 plots the perplexity score achieved by mance. However, on VC, we see a slight increase in all the methods. Our model achieves the lowest classification accuracy without disentangled learn- perplexity score on both datasets. It managed ing. We attribute this to the vagueness of the stance to decrease the perplexity value by roughly 200 which might cause the model to disentangle more compared to the baseline models. SCHOLAR out- than it should be. On the aspect span detection task, performs β-VAE under three settings of β value. we observe consistent performance drop across all We speculate that this might be due to the in- metrics and on both datasets. In particular, with- corporation of the class labels in the training of out the pre-training module, the performance drops SCHOLAR. Nevertheless, VADET produces con- more significantly. These results indicate that semi- genial sentences in aspect groups, with latent codes supervised learning is highly effective with VAE, tweaked to proxy centroids, showing that the dis- and the disentanglement of stance and aspect serves entangled representation does capture the desired as a useful component, which leads to noticeable
improvements. Christos Baziotis, Nikos Pelekis, and Christos Doulk- eridis. 2017. DataStories at SemEval-2017 task 4: 5 Conclusions Deep LSTM with attention for message-level and topic-based sentiment analysis. In Proceedings of In this work, we presented a semi-supervised model the 11th International Workshop on Semantic Eval- to detect user attitudes and distinguish aspects of uation (SemEval-2017), pages 747–754, Vancouver, Canada. Association for Computational Linguistics. interest about vaccines on social media. We em- ployed a Variational Auto-Encoder to encode the Iman Munire Bilal, Bo Wang, Maria Liakata, Rob main topical information into the language model Procter, and Adam Tsakalidis. 2021. Evaluation by unsupervised training on a massive, unannotated of thematic coherence in microblogs. In Proceed- ings of the 59th Annual Meeting of the Association dataset. The model is then further trained under for Computational Linguistics and the 11th Interna- a semi-supervised setting that leverages annotated tional Joint Conference on Natural Language Pro- stance labels and aspect spans to induce the disen- cessing, pages 6800–6814. Association for Compu- tanglement between stances and aspects in a latent tational Linguistics. semantic space. We empirically showed the bene- Diane Bouchacourt, Ryota Tomioka, and Sebastian fits of such an approach for attitude detection and Nowozin. 2018. Multi-level variational autoencoder: aspect clustering over two vaccine corpora. Ab- Learning disentangled representations from grouped observations. Proceedings of the AAAI Conference lation studies show that disentangled learning and on Artificial Intelligence, 32(1). unsupervised pre-training are important to effective vaccine attitude detection. Further investigations Christopher P Burgess, Irina Higgins, Arka Pal, Loic on the quality of the disentangled representations Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. 2018. Understanding disentan- verify the effectiveness of the disentangled factors. gling in β-vae. arXiv preprint arXiv:1804.03599. While our current work mainly focuses on short text of social media data where a sentence is as- Dallas Card, Chenhao Tan, and Noah A. Smith. 2018. sumed to discuss a single aspect, it would be inter- Neural models for documents with metadata. In Pro- ceedings of the 56th Annual Meeting of the Asso- esting to extend our model to deal with longer text ciation for Computational Linguistics, pages 2031– such as online debates in which multiple arguments 2040, Melbourne, Australia. Association for Compu- or aspects may appear in a single sentence. tational Linguistics. Acknowledgements Ricky T. Q. Chen, Xuechen Li, Roger B Grosse, and David K Duvenaud. 2018. Isolating sources of dis- This work was funded by the the UK Engineering entanglement in variational autoencoders. In Ad- vances in Neural Information Processing Systems, and Physical Sciences Research Council (grant no. volume 31. Curran Associates, Inc. EP/T017112/1, EP/V048597/1). LZ is supported by a Chancellor’s International Scholarship at the Jacob Devlin, Ming-Wei Chang, Kenton Lee, and University of Warwick. YH is supported by a Tur- Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language under- ing AI Fellowship funded by the UK Research and standing. In Proceedings of the 2019 Conference Innovation (grant no. EP/V020579/1). of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), References pages 4171–4186, Minneapolis, Minnesota. Associ- ation for Computational Linguistics. Juan M Banda, Ramya Tekumalla, Guanyu Wang, Jingyuan Yu, Tuo Liu, Yuning Ding, Ekaterina Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Artemova, Elena Tutubalina, and Gerardo Chowell. Dahlmeier. 2019. An interactive multi-task learn- 2021. A large-scale covid-19 twitter chatter dataset ing network for end-to-end aspect-based sentiment for open scientific research—an international collab- analysis. In Proceedings of the 57th Annual Meet- oration. Epidemiologia, 2(3):315–324. ing of the Association for Computational Linguis- tics, pages 504–515, Florence, Italy. Association for Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Computational Linguistics. Øvrelid, and Erik Velldal. 2021. Structured senti- ment analysis as dependency graph parsing. In Pro- Irina Higgins, Loïc Matthey, Arka Pal, Christopher ceedings of the 59th Annual Meeting of the Associa- Burgess, Xavier Glorot, Matthew Botvinick, Shakir tion for Computational Linguistics and the 11th In- Mohamed, and Alexander Lerchner. 2017. beta-vae: ternational Joint Conference on Natural Language Learning basic visual concepts with a constrained Processing, pages 3387–3402. Association for Com- variational framework. In 5th International Con- putational Linguistics. ference on Learning Representations, ICLR 2017,
Toulon, France, April 24-26, 2017, Conference messages. BMC medical informatics and decision Track Proceedings. OpenReview.net. making, 20(1):1–14. Tao Hu, Siqin Wang, Wei Luo, Mengxi Zhang, Xiao Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Huang, Yingwei Yan, Regina Liu, Kelly Ly, Viraj Kevin Gimpel, Piyush Sharma, and Radu Soricut. Kacker, Bing She, and Zhenlong Li. 2021. Reveal- 2020. ALBERT: A lite BERT for self-supervised ing public opinion towards covid-19 vaccines with learning of language representations. In 8th Inter- twitter data in the united states: Spatiotemporal per- national Conference on Learning Representations, spective. J Med Internet Res, 23(9):e30854. ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P. Xing. 2017. Toward con- Chunyuan Li, Xiang Gao, Yuan Li, Baolin Peng, Xiu- trolled generation of text. In Proceedings of the jun Li, Yizhe Zhang, and Jianfeng Gao. 2020. Opti- 34th International Conference on Machine Learning, mus: Organizing sentences via pre-trained modeling volume 70 of Proceedings of Machine Learning Re- of a latent space. In Proceedings of the 2020 Con- search, pages 1587–1596. PMLR. ference on Empirical Methods in Natural Language Processing, pages 4678–4699. Association for Com- Amir Hussain, Ahsen Tahir, Zain Hussain, Zakariya putational Linguistics. Sheikh, Mandar Gogate, Kia Dashtipour, Azhar Ali, and Aziz Sheikh. 2021. Artificial intelligence– Juncen Li, Robin Jia, He He, and Percy Liang. 2018a. enabled analysis of public attitudes on facebook and Delete, retrieve, generate: a simple approach to sen- twitter toward covid-19 vaccines in the united king- timent and style transfer. In Proceedings of the dom and the united states: Observational study. J 2018 Conference of the North American Chapter Med Internet Res, 23(4):e26627. of the Association for Computational Linguistics: Human Language Technologies, pages 1865–1874, Aapo Hyvarinen, Hiroaki Sasaki, and Richard Turner. New Orleans, Louisiana. Association for Computa- 2019. Nonlinear ica using auxiliary variables and tional Linguistics. generalized contrastive learning. In Proceedings of Xin Li, Lidong Bing, Wai Lam, and Bei Shi. 2018b. the Twenty-Second International Conference on Ar- Transformation networks for target-oriented senti- tificial Intelligence and Statistics, volume 89 of Pro- ment classification. In Proceedings of the 56th An- ceedings of Machine Learning Research, pages 859– nual Meeting of the Association for Computational 868. PMLR. Linguistics, pages 946–956, Melbourne, Australia. Vineet John, Lili Mou, Hareesh Bahuleyan, and Olga Association for Computational Linguistics. Vechtomova. 2019. Disentangled representation Xin Li, Lidong Bing, Piji Li, Wai Lam, and Zhimou learning for non-parallel text style transfer. In Pro- Yang. 2018c. Aspect term extraction with history ceedings of the 57th Annual Meeting of the Associa- attention and selective transformation. In Proceed- tion for Computational Linguistics, pages 424–434, ings of the Twenty-Seventh International Joint Con- Florence, Italy. Association for Computational Lin- ference on Artificial Intelligence, IJCAI-18, pages guistics. 4194–4200. International Joint Conferences on Ar- tificial Intelligence Organization. Ilyes Khemakhem, Diederik Kingma, Ricardo Monti, and Aapo Hyvarinen. 2020. Variational autoen- Xin Li, Lidong Bing, Wenxuan Zhang, and Wai Lam. coders and nonlinear ica: A unifying framework. In 2019. Exploiting BERT for end-to-end aspect-based Proceedings of the Twenty Third International Con- sentiment analysis. In Proceedings of the 5th Work- ference on Artificial Intelligence and Statistics, vol- shop on Noisy User-generated Text (W-NUT 2019), ume 108 of Proceedings of Machine Learning Re- pages 34–41, Hong Kong, China. Association for search, pages 2207–2217. PMLR. Computational Linguistics. Diederik P. Kingma, Danilo J. Rezende, Shakir Mo- Francesco Locatello, Stefan Bauer, Mario Lucic, Gun- hamed, and Max Welling. 2014. Semi-supervised nar Raetsch, Sylvain Gelly, Bernhard Schölkopf, learning with deep generative models. In Proceed- and Olivier Bachem. 2019. Challenging common ings of the 27th International Conference on Neu- assumptions in the unsupervised learning of disen- ral Information Processing Systems, NIPS’14, page tangled representations. In Proceedings of the 36th 3581–3589, Cambridge, MA, USA. MIT Press. International Conference on Machine Learning, vol- ume 97 of Proceedings of Machine Learning Re- Diederik P. Kingma and Max Welling. 2014. Auto- search, pages 4114–4124. PMLR. Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR Francesco Locatello, Ben Poole, Gunnar Raetsch, 2014, Banff, AB, Canada, April 14-16, 2014, Con- Bernhard Schölkopf, Olivier Bachem, and Michael ference Track Proceedings. Tschannen. 2020a. Weakly-supervised disentangle- ment without compromises. In Proceedings of the Florian Kunneman, Mattijs Lambooij, Albert Wong, 37th International Conference on Machine Learning, Antal Van Den Bosch, and Liesbeth Mollema. 2020. volume 119 of Proceedings of Machine Learning Re- Monitoring stance towards vaccination in twitter search, pages 6348–6359. PMLR.
Francesco Locatello, Michael Tschannen, Stefan Linguistics: Human Language Technologies, pages Bauer, Gunnar Rätsch, Bernhard Schölkopf, and 2870–2883. Association for Computational Linguis- Olivier Bachem. 2020b. Disentangling factors of tics. variations using few labels. In 8th International Conference on Learning Representations, ICLR Gabriele Pergola, Elena Kochkina, Lin Gui, Maria Li- 2020, Addis Ababa, Ethiopia, April 26-30, 2020. akata, and Yulan He. 2021b. Boosting low-resource OpenReview.net. biomedical QA via entity-aware masking strategies. In Proceedings of the 16th Conference of the Euro- Joanne Chen Lyu, Eileen Le Han, and Garving K Luli. pean Chapter of the Association for Computational 2021. Covid-19 vaccine–related discussion on twit- Linguistics: Main Volume, pages 1977–1985. Asso- ter: topic modeling and sentiment analysis. Journal ciation for Computational Linguistics. of medical Internet research, 23(6):e24435. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Yukun Ma, Haiyun Peng, and Erik Cambria. 2018. Tar- Percy Liang. 2016. SQuAD: 100,000+ questions for geted aspect-based sentiment analysis via embed- machine comprehension of text. In Proceedings of ding commonsense knowledge into an attentive lstm. the 2016 Conference on Empirical Methods in Natu- Proceedings of the AAAI Conference on Artificial In- ral Language Processing, pages 2383–2392, Austin, telligence, 32(1). Texas. Association for Computational Linguistics. Diego Marcheggiani, Oscar Täckström, Andrea Esuli, Thibault Sellam, Dipanjan Das, and Ankur Parikh. and Fabrizio Sebastiani. 2014. Hierarchical multi- 2020. BLEURT: Learning robust metrics for text label conditional random fields for aspect-oriented generation. In Proceedings of the 58th Annual Meet- opinion mining. In Advances in Information Re- ing of the Association for Computational Linguis- trieval, pages 273–285, Cham. Springer Interna- tics, pages 7881–7892. Association for Computa- tional Publishing. tional Linguistics. Joel Ruben Antony Moniz and David Krueger. 2017. Nested lstms. In Proceedings of the Ninth Asian Junaid Shuja, Eisa Alanazi, Waleed Alasmary, and Ab- Conference on Machine Learning, volume 77 of Pro- dulaziz Alashaikh. 2021. Covid-19 open source data ceedings of Machine Learning Research, pages 530– sets: a comprehensive survey. Applied Intelligence, 544, Yonsei University, Seoul, Republic of Korea. 51(3):1296–1325. PMLR. Frederik Träuble, Elliot Creager, Niki Kilbertus, Roser Morante, Chantal van Son, Isa Maks, and Piek Francesco Locatello, Andrea Dittadi, Anirudh Vossen. 2020. Annotating perspectives on vacci- Goyal, Bernhard Schölkopf, and Stefan Bauer. 2021. nation. In Proceedings of the 12th Language Re- On disentangled representations learned from corre- sources and Evaluation Conference, pages 4964– lated data. In Proceedings of the 38th International 4973, Marseille, France. European Language Re- Conference on Machine Learning, volume 139 of sources Association. Proceedings of Machine Learning Research, pages 10401–10412. PMLR. Eric Nalisnick, Lars Hertel, and Padhraic Smyth. 2016. Approximate inference for deep latent gaussian mix- Hai Wan, Yufei Yang, Jianfeng Du, Yanan Liu, Kunxun tures. In NIPS Workshop on Bayesian Deep Learn- Qi, and Jeff Z. Pan. 2020. Target-aspect-sentiment ing, volume 2, page 131. joint detection for aspect-based sentiment analysis. Proceedings of the AAAI Conference on Artificial In- Elise Paul, Andrew Steptoe, and Daisy Fancourt. 2021. telligence, 34(05):9122–9129. Attitudes towards vaccines and intention to vacci- nate against covid-19: Implications for public health Jingjing Wang, Jie Li, Shoushan Li, Yangyang Kang, communications. The Lancet Regional Health - Eu- Min Zhang, Luo Si, and Guodong Zhou. 2018. As- rope, 1:100012. pect sentiment classification with both word-level and clause-level attention networks. In Proceed- Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu, ings of the Twenty-Seventh International Joint Con- and Luo Si. 2020. Knowing what, how and why: ference on Artificial Intelligence, IJCAI-18, pages A near complete solution for aspect-based sentiment 4439–4445. International Joint Conferences on Ar- analysis. Proceedings of the AAAI Conference on tificial Intelligence Organization. Artificial Intelligence, 34(05):8600–8607. Gabriele Pergola, Lin Gui, and Yulan He. 2019. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien TDAM: A topic-dependent attention model for sen- Chaumond, Clement Delangue, Anthony Moi, Pier- timent analysis. Information Processing & Manage- ric Cistac, Tim Rault, Remi Louf, Morgan Funtow- ment, 56(6):102084. icz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Gabriele Pergola, Lin Gui, and Yulan He. 2021a. A Teven Le Scao, Sylvain Gugger, Mariama Drame, disentangled adversarial neural topic model for sep- Quentin Lhoest, and Alexander Rush. 2020. Trans- arating opinions from plots in user reviews. In Pro- formers: State-of-the-art natural language process- ceedings of the 2021 Conference of the North Amer- ing. In Proceedings of the 2020 Conference on Em- ican Chapter of the Association for Computational pirical Methods in Natural Language Processing:
System Demonstrations, pages 38–45. Association In Proceedings of the 2015 Conference on Empiri- for Computational Linguistics. cal Methods in Natural Language Processing, pages 612–621, Lisbon, Portugal. Association for Compu- Chuhan Wu, Fangzhao Wu, Sixing Wu, Junxin tational Linguistics. Liu, Zhigang Yuan, and Yongfeng Huang. 2018. THU_NGN at SemEval-2018 task 3: Tweet irony Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. detection with densely connected LSTM and multi- Weinberger, and Yoav Artzi. 2020. Bertscore: Eval- task learning. In Proceedings of The 12th Interna- uating text generation with BERT. In 8th Inter- tional Workshop on Semantic Evaluation, pages 51– national Conference on Learning Representations, 56, New Orleans, Louisiana. Association for Com- ICLR 2020, Addis Ababa, Ethiopia, April 26-30, putational Linguistics. 2020. OpenReview.net. Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015b. Unsupervised deep embedding for clustering analy- Character-level convolutional networks for text clas- sis. In Proceedings of The 33rd International Con- sification. In Advances in Neural Information Pro- ference on Machine Learning, volume 48 of Pro- cessing Systems, volume 28. Curran Associates, Inc. ceedings of Machine Learning Research, pages 478– 487, New York, New York, USA. PMLR. He Zhao, Longtao Huang, Rong Zhang, Quan Lu, and Hui Xue. 2020. SpanMlt: A span-based multi-task Lu Xu, Hao Li, Wei Lu, and Lidong Bing. 2020. learning framework for pair-wise aspect and opinion Position-aware tagging for aspect sentiment triplet terms extraction. In Proceedings of the 58th Annual extraction. In Proceedings of the 2020 Conference Meeting of the Association for Computational Lin- on Empirical Methods in Natural Language Pro- guistics, pages 3239–3248. Association for Compu- cessing, pages 2339–2349. Association for Compu- tational Linguistics. tational Linguistics. Lixing Zhu, Gabriele Pergola, Lin Gui, Deyu Zhou, Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, and Yulan He. 2021. Topic-driven and knowledge- Alex Smola, and Eduard Hovy. 2016. Hierarchical aware transformer for dialogue emotion detection. attention networks for document classification. In In Proceedings of the 59th Annual Meeting of the Proceedings of the 2016 Conference of the North Association for Computational Linguistics and the American Chapter of the Association for Computa- 11th International Joint Conference on Natural Lan- tional Linguistics: Human Language Technologies, guage Processing (Volume 1: Long Papers), pages pages 1480–1489, San Diego, California. Associa- 1571–1582, Online. Association for Computational tion for Computational Linguistics. Linguistics. Xiaoyuan Yi, Zhenghao Liu, Wenhao Li, and Maosong Sun. 2020. Text style transfer via learning style in- stance supported latent space. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 3801–3807. International Joint Conferences on Artificial Intelli- gence Organization. Main track. Chen Zhang, Qiuchi Li, and Dawei Song. 2019. Aspect-based sentiment classification with aspect- specific graph convolutional networks. In Proceed- ings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th Inter- national Joint Conference on Natural Language Pro- cessing (EMNLP-IJCNLP), pages 4568–4578, Hong Kong, China. Association for Computational Lin- guistics. Dejiao Zhang, Feng Nan, Xiaokai Wei, Shang-Wen Li, Henghui Zhu, Kathleen McKeown, Ramesh Nallap- ati, Andrew O. Arnold, and Bing Xiang. 2021. Sup- porting clustering with contrastive learning. In Pro- ceedings of the 2021 Conference of the North Amer- ican Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5419–5430. Association for Computational Linguis- tics. Meishan Zhang, Yue Zhang, and Duy-Tin Vo. 2015a. Neural networks for open domain targeted sentiment.
A Derivation of the Decomposed ELBO B Data Collection and Preprocessing Unsupervised training is based on maximizing the We are qualified Twitter Academic Research API 5 Evidence Lower Bound (ELBO): users. We obtained the ethical approval for our Eqφ (zs ,zw |ψ(w)) [log pθ (w|zs , zw , ψ(w))] proposed research from the university’s ethics com- mittee before the start of our work. We col- −KL[qφ (zs , zw |ψ(w))||p(zs , zw )], lected tweets between February 7th and April 3rd, where z is partitioned into zs and zw . Like standard 2022 using 60 vaccine-related keywords. The VAE (Kingma and Welling, 2014), the variational exhaustive list is: ‘covid-19 vax’, ‘covid-19 vac- distribution is a multivariate Gaussian with a diag- cine’, ‘covid-19 vaccines’, ‘covid-19 vaccination’, onal covariance: ‘covid-19 vaccinations’, ‘covid-19 jab’, ‘covid-19 jabs’, ‘covid19 vax’, ‘covid19 vaccine’, ‘covid19 qφ (zs , zw |ψ(w)) = N (zs , zw |µ, σ 2 I), vaccines’, ‘covid19 vaccination’, ‘covid19 vac- where µ = [µs , µw ] and σ = [σ s , σ w ]. Since the cinations’, ‘covid19 jab’, ‘covid19 jabs’, ‘covid coveriance matrix is diagonal, zs and zw are uncor- vax’, ‘covid vaccine’, ‘covid vaccines’, ‘covid related. Therefore, the joint probability is decom- vaccination’, ‘covid vaccinations’, ‘covid jab’, posed into: ‘covid jabs’, ‘coronavirus vax’, ‘coronavirus vac- cine’, ‘coronavirus vaccines’, ‘coronavirus vacci- qφ (zs , zw |ψ(w)) = qφ (zs |ψ(w))qφ (zw |ψ(w)), nation’, ‘coronavirus vaccinations’, ‘coronavirus where qφ (zs |ψ(w)) = N (zs |µs , σ s ), φ are the jab’, ‘coronavirus jabs’, ‘Pfizer vaccine’, ‘BioN- variational parameters. The prior of [zs , zw ] ∼ Tech vaccine’, ‘Oxford vaccine’, ‘AstraZeneca N (zs , zw |0, I) can also be decomposed into the vaccine’, ‘Moderna vaccine’, ‘Sputnik vaccine’, product of p(zs ) and p(zw ), then the KL term be- ‘Sinovac vaccine’, ‘Sinopharm vaccine’, ‘Pfizer comes: jab’, ‘BioNTech jab’, ‘Oxford jab’, ‘AstraZeneca KL[qφ (zs |ψ(w))||p(zs )] + KL[qφ (zw |ψ(w))||p(zw )]. jab’, ‘Moderna jab’, ‘Sputnik jab’, ‘Sinovac jab’, ‘Sinopharm jab’, ‘Pfizer vax’, ‘BioNTech vax’, ‘Ox- As for the decoder pθ (w|zs , zw , ψ(w)), the recon- ford vax’, ‘AstraZeneca vax’, ‘Moderna vax’, ‘Sput- struction of each masked token and w[CLS] are nik vax’, ‘Sinovac vax’, ‘Sinopharm vax’, ‘Pfizer independent from each other, i.e., they are not pre- vaccinate’, ‘BioNTech vaccinate’, ‘Oxford vacci- dicted in an autoregressive way. Therefore, the nate’, ‘AstraZeneca vaccinate’, ‘Moderna vacci- joint probability is decomposed into: nate’, ‘Sputnik vaccinate’, ‘Sinovac vaccinate’, pθ (w|zs , zw , ψ(w)) ‘Sinopharm vaccinate’. = pθ (w[CLS] |zs , zw , ψ(w)) pθ (w1:n |zs , zw , ψ(w)) Only tweets in English were collected. Retweets were discarded. For pre-processing, hyperlinks, We customize the decoder network to make w[CLS] usernames and irregular symbols were removed. solely dependent on zs , and obtain Emojis and emoticons were converted to their lit- Eqφ (zs ) Eqφ (zw ) [log pθ (w[CLS] |zs , ψ(w)) + eral meanings using an emoticon dictionary6 . log pθ (w1:n |zw , ψ(w))] C Hyper-parameters and Training Here, we omit ψ(w) for notational simplicity. Details Given the supervision of annotated aspect spans, The dimensions of za , zw and zs are 768, 768 and the prior of zw is constrained by qφ (zw |ψ(wa:b )) 32, respectively. For each tweet, the number of (a.k.a., the encoder of wa:b ), this will change the samples from ∼ N (0, I) is 1. We modified KL term into: the LM-fine-tuning script7 from the HuggingFace KL[qφ (zs |ψ(w))||p(zs )] library to implement VADET in the masked LM + KL[qφ (zw |ψ(w1:n ))||qφ (zw |ψ(wa:b ))], learning. We use default settings for the training and finally the ELBO is expressed as 5 https://developer.twitter.com/en/ Eqφ (zs ) [log pθ (w[CLS] |zs , ψ(w))] products/twitter-api/academic-research/ application-info + Eqφ (zw ) [log pθ (w1:n |zw , ψ(w))] 6 https://wprock.fr/en/t/kaomoji/ 7 https://github.com/huggingface/ − KL[qφ (zs |ψ(w))||p(zs )] transformers/blob/master/examples/ − KL[qφ (zw |ψ(w1:n ))||qφ (zw |ψ(wa:b ))]. pytorch/language-modeling/run_mlm.py
You can also read