Emotion Label Enhancement via Emotion Wheel and Lexicon

Page created by Dawn Franklin
 
CONTINUE READING
Hindawi
Mathematical Problems in Engineering
Volume 2021, Article ID 6695913, 11 pages
https://doi.org/10.1155/2021/6695913

Research Article
Emotion Label Enhancement via Emotion Wheel and Lexicon

 Xueqiang Zeng,1 Qifan Chen,1 Sufen Chen ,2 and Jiali Zuo1
 1
 School of Computer & Information Engineering, Jiangxi Normal University, Nanchang 330022, China
 2
 School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China

 Correspondence should be addressed to Sufen Chen; csf@nit.edu.cn

 Received 22 November 2020; Revised 23 March 2021; Accepted 22 April 2021; Published 3 May 2021

 Academic Editor: Zenghui Wang

 Copyright © 2021 Xueqiang Zeng et al. This is an open access article distributed under the Creative Commons Attribution License,
 which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

 Emotion Distribution Learning (EDL) is a recently proposed multiemotion analysis paradigm, which identifies basic emotions
 with different degrees of expression in a sentence. Different from traditional methods, EDL quantitatively models the expression
 degree of the corresponding emotion on the given instance in an emotion distribution. However, emotion labels are crisp in most
 existing emotion datasets. To utilize traditional emotion datasets in EDL, label enhancement aims to convert logical emotion labels
 into emotion distributions. This paper proposed a novel label enhancement method, called Emotion Wheel and Lexicon-based
 emotion distribution Label Enhancement (EWLLE), utilizing the affective words’ linguistic emotional information and the
 psychological knowledge of Plutchik’s emotion wheel. The EWLLE method generates separate discrete Gaussian distributions for
 the emotion label of sentence and the emotion labels of sentiment words based on the psychological emotion distance and
 combines the two types of information into a unified emotion distribution by superposition of the distributions. The extensive
 experiments on 4 commonly used text emotion datasets showed that the proposed EWLLE method has a distinct advantage over
 the existing EDL label enhancement methods in the emotion classification task.

1. Introduction assigns multiple emotion labels to a sentence [9]. However,
 the modeling ability of MLL is insufficient to quantitatively
Text emotion classification (recognition) is an important analyze the ambiguity in multiple emotions. Given a sen-
research topic with many promising novel applications [1], tence, MLL identifies the prominent emotion labels, while it
such as emotional human-computer interaction [2], intel- cannot tell the specific expression intensity of each emotion
ligent customer service [3], music emotion classification [4], [10].
anticipating corporate financial performance [5], and online To address this problem, by drawing on the idea of Label
product review analysis [6]. The goal of text emotion rec- Distribution Learning (LDL) [11,12], Zhou et al. [7] pro-
ognition is to find out the writers’ emotional states contained posed Emotion Distribution Learning (EDL) for facial
in sentences [1]. In recent years, researchers have proposed a emotion recognition. In the next year, EDL was applied to
lot of effective and fruitful work in the field of text emotion text emotion classification [13]. EDL deals with multiple
classification [2–6]. emotions by associating each instance (e.g., facial image or
 In general, the emotion expressed in a sentence is a sentence) with an emotion distribution vector, where the
mixture of a variety of basic emotions (e.g., anger, fear, joy, vector’s dimension is the number of all possible emotions. In
or sadness), where each basic emotion has a certain degree of the emotion distribution, each component represents the
contribution to the overall expression [7]. Traditionally, text intensity of the corresponding emotion on the given in-
emotion recognition models are based on Single-Label stance. Obviously, EDL is suitable to solve the quantitative
Learning (SLL) or Multi-Label Learning (MLL). In SLL, one multiemotion analysis problem, especially when emotion
sentence is assumed to be associated with only one emotion ambiguities occur [7]. In recent years, many effective EDL
label [8]. To cope with the situation where one sentence methods have been proposed. For example, Zhou et al. [13]
simultaneously evokes several different emotions, MLL designed an EDL model by introducing the constraint of
2 Mathematical Problems in Engineering

interrelationships between emotions. Jia et al. [14] proposed disadvantage of the LLE method is that the psychological
an EDL method that considers the local correlation of correlation between human emotions is not considered.
emotion labels. Zhao et al. [15] put forward an EDL model Psychological emotional knowledge can be used to effec-
based on meta-learning with small samples. He et al. [16] tively obtain the intrinsic correlations among emotions [23],
designed an EDL method based on graph convolutional while affective words contain effective discriminating in-
neural networks, where the correlation between emotions is formation about emotions [25]. Both of them are essential to
considered. Xiong et al. [17] proposed an EDL model based efficient emotional classification. However, to the best of our
on convolutional neural networks that utilizes the polarity knowledge, no existing label enhancement work has con-
and the sparsity of emotion labels. Fan et al. [18] designed an sidered both the psychological and linguistic knowledge.
EDL method to predict image emotion distribution by In this paper, we present a novel Emotion Wheel and
learning labels’ correlation. Xi et al. [19] proposed emotion Lexicon-based emotion distribution Label Enhancement
distribution learning based on surface electromyography for (EWLLE) method, which calculates the psychological dis-
predicting the intensities of basic emotions. Liang et al. [20] tances between emotions according to Plutchik’s emotion
proposed a novel semisupervised multimodal emotion wheel and utilizes the linguistic information of affective
recognition model based on cross-modality distribution words from some classical lexicons. Plutchik’s wheel of
matching. All these EDL methods exhibited better perfor- emotions is a well-known psychological model proposed by
mance than traditional models. Robert Plutchik in 1980 to describe human emotional re-
 Most existing EDL works focused on improving emotion lationships [26]. We exploit Plutchik’s wheel of emotions to
recognition accuracy by proposing novel prediction models. determine similarities between different emotions through
Few methods have been proposed to determine the emotion Gaussian distribution. For a given sentence, based on psy-
distribution from existing annotated datasets, which only chological emotion distances, EWLLE generates discrete
contain single-labeled emotions. A radical solution to the Gaussian distribution of sentence emotion labels and the
problem would be the creation of datasets, in which emotion affective words’ emotion labels and then superposes them
distributions (or quantitative multilabeled instances) are into a unified emotion distribution. Different from existing
annotated instead of single emotion labels, which is difficult EDL label enhancement methods, EWLLE takes into con-
to do in practice. A convenient method to obtain EDL sideration the psychological and linguistic emotional
datasets is to transfer the existing quantitative multilabeled knowledge at the same time during the label enhancement
datasets by label score normalization (normalizing the sum procedure. Extensive experiments on 4 public text emotion
of label score of each instance to be 1). However, the datasets, TEC [27], Fairy Tales [28], CBET [29], and ISEAR
quantitative multilabeled text emotion datasets are scarce, [30], demonstrate that the proposed EWLLE method per-
except for a few special ones such as the SemEval 2007 Task forms favorably against the state-of-the-art approaches in
14 [21]. Most of the existing text emotion datasets are single- the task of text emotion recognition.
labeled. To utilize the traditional emotion datasets in EDL, The rest of the paper is organized as follows. Section 2
label enhancement aims to transform single-labeled emo- introduces some related works of emotion label enhance-
tions to emotion distributions, whose idea is similar to that ment. Section 3 describes the proposed method of label
of LDL [22]. enhancement based on the emotion wheel and lexicon in
 Label enhancement needs to leverage some extra detail. Section 4 provides a series of comparative experi-
knowledge to convert a single label to a distribution, in ments to verify the effectiveness of the proposed method.
which the effectiveness of knowledge is essential. An up-to- Finally, Section 5 concludes the paper.
date introduction of label enhancement in label distribution
learning can be found in Xu et al. [22]. Up to now, only 2. Related Works
several EDL label enhancement methods have been pro-
posed [23, 24]. Yang et al. [23] proposed Mikels’ Wheel- 2.1. Emotion Distribution Learning. In general, the emo-
based emotion distribution Label Enhancement (MWLE) tional expression in a text sentence or facial image is usually
method, which utilizes psychological emotional knowledge a combination of multiple basic emotions. All related basic
to transform emotion labels into distributions. However, the emotions play a certain role in the overall expression and
MWLE method is proposed for facial emotion classification together constitute an emotion distribution [7]. Figure 1
without considering the affective words that are effective in shows two representative sentences and the corresponding
text emotion analysis. The affective words have different annotations from the SemEval dataset [21]. For the sentence
intensity and emotional tendency, which are generally an- (a), the dominating emotion, surprise, accounts for 41.9%
notated based on linguistic knowledge. Emotional words expression level; meanwhile, the other two major emotions,
contain a lot of emotional information, which is discrimi- joy and anger, present 20.1% and 16.9%, respectively. The
native for text emotion recognition [25]. Zhang et al. [24] situation of the sentence (b) is analogous. The examples
proposed the Lexicon-based emotion distribution Label show that a single sentence may possibly contain multiple
Enhancement (LLE) method, which generates emotion emotions with different intensities rather than a single label.
distributions from a single-label by introducing the lin- As we stated earlier, most traditional emotion recog-
guistic information of affective words. The experimental nition methods are based on Single-Label Learning (SLL) or
results show that the performance of the LLE method is Multi-Label Learning (MLL), which address the problem of
better than that of the MWLE method [24], but the “which emotions are used to describe the sample,” rather
Mathematical Problems in Engineering 3

 1: anger 2: fear 3: joy 4: disgust 5: sadness 6: surprise

 50 50

 40 40
 Expression level (%)

 Expression level (%)
 30 30

 20 20

 10 10

 0 0
 1 2 3 4 5 6 1 2 3 4 5 6
 Emotions Emotions
 (a) (b)
Figure 1: Two sentences of the SemEval dataset and the corresponding emotion expression levels. (a) Hacker unlocks Apple music
download protection. (b) Pacers’ Jackson misses gun hearing.

than the problem of “how to describe the sample’s emotions development of EDL models is the lack of emotion distri-
quantitatively.” Actually, both SLL and MLL cannot obtain bution in annotated datasets, due to the difficulty to annotate
the specific intensity of each basic emotion [11]. If we simply emotion distributions. In the existing text emotion datasets,
annotate sentences with multiple major emotions, whose it is rare to see multiemotion score annotations (e.g., the
expression levels are higher than a given threshold, the SemEval 2007 Task 14 [21]), which can be transferred to
multiemotion recognition task could be solved by MLL, but emotion distribution by label score normalization. In order
such a simplistic approach would discard the information of to utilize abundant traditional single-labeled text emotion
the emotion expression level, which makes it is impossible to datasets, methods of label enhancement are required.
quantitatively analyze related emotions in the subsequent
affective computing task [7].
 In light of the success of the novel machine learning 2.2. Label Enhancement. In label distribution learning, label
paradigm of Label Distribution Learning (LDL) [11], enhancement refers to the process of recovering the label
Emotion Distribution Learning (EDL) was proposed to solve distribution from a single-label or multilabel dataset [22]. If
the facial and text emotion classification tasks [7, 13]. we regard the original ground-truth label yi as the score of 1,
Different from SLL and MLL, EDL assigns each instance the basic idea of most label enhancement methods is to
with an emotion distribution, where each distribution reduce the score of yi and increase the scores of some other
component represents the intensity of the corresponding related labels in the generated distribution, where the score
emotion to the given instance. of yi generally remains the highest. The label score ad-
 In the traditional single-label text emotion recognition justment strategies vary among different label enhancement
task, each sentence si has a corresponding emotion label yi , methods. In practice, both the correlation among the labels
yi ∈ {1, 2, . . . , C}, where C is the number of all possible and the topological information of the feature space are
emotions. Rather than just predicting the emotion label yi , usually utilized to recover label distribution from logical
the goal of EDL is to find a function to map sentence si to an label. After the concept of label distribution learning was
 j C j
emotion distribution di � di j�1 , where di represents the proposed, some literatures have proposed some effective
 j
intensity of j-th emotion to the sentence si , di ∈ [0, 1] and label enhancement methods. The existing label enhancement
 j
 j di � 1. Note that, unlike the probability distribution that algorithms are roughly divided into three types [22], i.e.,
assumes only one label is correct at a time, the emotion fuzzy theory-based method [31, 32], graph model-based
distribution allows an instance to have multiple emotion method [33, 34], and prior knowledge-based method
labels simultaneously. Any emotion with an intensity higher [35, 36]. We give a brief introduction to these 3 kinds of label
than 0 is a possible label for the instance, while the ex- enhancement methods as follows.
pression level of each emotion label can vary. The fuzzy theory-based label enhancement method digs
 EDL predicts the intensity values of sentences across a set out the correlation among the labels by introducing the
of emotion categories by the emotion distribution. Such fuzziness into the originally rigid logical labels and trans-
information is important for understanding fine-grained forms logical label into label distribution. For example, the
emotions, especially when ambiguities exist [7]. In recent label enhancement algorithm based on fuzzy clustering uses
years, many effective EDL methods have been proposed the membership degree of the examples generated in the
[13–20]. However, one of the major problems for the fuzzy clustering process to each cluster and converts the
4 Mathematical Problems in Engineering

membership degree of the example to the cluster into the words, and one affective word can be associated with
membership degree of the category, thereby generating the several emotion labels. Many existing studies showed that
label distribution [31]. The label enhancement algorithm affective words contain abundant discriminative infor-
based on the fuzzy kernel membership degree uses the kernel mation for text emotion recognition [25]. Researchers
technique to calculate the fuzzy membership degree of the have proposed a variety of emotion recognition methods
example to each category in the high-dimensional space, so based on emotional lexicon, where affective words are
as to mine the correlation between the categories in the extracted from sentences and then used to predict emo-
training set [32]. tion labels. For example, Agrawal and An [38] utilized the
 The graph model-based label enhancement algorithm constructed emotion lexicon to classify emotional sen-
uses graph models to represent the topological structure tences; Wang and Pal [39] proposed a multiconstraint
between examples, establishes the relationship between emotion classification model based on emotional lexicon.
examples and labels through some model assumptions, and Similar to that of LDL [22], label enhancement in EDL
then enhances logical label to label distribution. For in- converts the emotion label yi of sentence si to the emotion
 j C
stance, the label enhancement algorithm based on label distribution di � di j�1 . Given the rich information
propagation expresses the topology structure between ex- encoded in affective words, it is desirable to introduce af-
amples through a graph model and uses the difference of fective words into the text based EDL label enhancement
path weights in the propagation process to make descriptive models. Following this idea, Zhang et al. [24] proposed the
differences, so as to mine the relationship between the labels method of Lexicon-based emotion distribution Label En-
in the training set [33]. The manifold-based method re- hancement (LLE), whose main idea is to attach secondary
constructs the manifolds of the feature space and the label emotions based on affective words to the ground-truth
space and uses the smoothing assumption to migrate the emotion. As shown in Figure 2, the ground-truth emotion
topological relationship of the feature space to the label label of the example is sadness, which has the highest score in
space, thereby enhancing logical label to label distribution the generated emotion distribution. Meanwhile, four sec-
[34]. ondary emotions of anger, fear, joy, and disgust are extracted
 The prior knowledge-based label enhancement algo- based on lexicon and added to the emotion distribution.
rithm introduces some kind of prior knowledge, mines the Secondary emotions have a lower score than the dominating
implicit correlation between labels according to the char- emotion.
acteristics of the dataset, and enhances logical label into label Given a sentence si and its emotion label yi , the specific
distribution. Obviously, the validity of prior knowledge is approach of LLE is as follows [24]: (1) extracting all affective
the key to the success of this kind of method. By introducing words from the sentence si based on affective lexicons and
the prior knowledge of correlation among different head obtaining the corresponding emotion label set D; (2)
poses, Geng et al. [35] built label distribution from logical assigning intensity scores to the corresponding emotion
label and its neighboring head poses. In the application of labels if there are other affective words than the ones with the
facial age estimation, the lack of facial images with definite ground-truth emotion; otherwise, the one-hot vector is used
age labels makes traditional age prediction algorithms in- as emotion distribution for the sentence si .
efficient. Based on the prior knowledge of the facial simi- Formally, the intensity score of the ground-truth emo-
larity between adjacent human ages, Geng et al. [36] tion yi is calculated by
recovered label distribution from ground-truth age and its
 j�yi ε, if D\yi ≠ ∅,
adjacent ages and proposed an adaptive label distribution di � (1)
learning model to learn the human age ambiguity. Fur- 1, otherwise.
thermore, Zhang et al. [37] presented a developed prior
assumption of facial age correlation, which limits age label Meanwhile, the expression level of the j-th emotion with
distribution that only covers a reasonable number of j ≠ yi is computed by
neighboring ages. Based on the developed prior knowledge, 
 ⎪
 ⎧ y 
Zhang et al. [37] proposed a practical label distribution ⎪
 ⎪ j 
 ⎪
 ⎨ (1 − ε) , if D\yi ≠ ∅,
paradigm and outperformed current state-of-the-art facial j ≠ yi
 di �⎪ D\y i (2)
age recognition methods. ⎪
 ⎪
 ⎪
 ⎩
 0, otherwise,
2.3. Lexicon-Based Label Enhancement. In addition to the where |yj | is the number of affective words of the j-th
annotations, emotion label, or emotion distribution, text emotion in the sentence, and ε is the weight parameter of the
based EDL can utilize the extra prior information of af- ground-truth emotion label. After calculating the intensity
 j
fective words contained in sentences compared to the score for all emotion labels, j di � 1 is guaranteed by
classical LDL. Affective words are words with different normalization.
intensites of emotional tendency [25], which are associ- Given a sentence with its logical emotion label, LLE
ated with certain emotion labels. The affective lexicon is a extracts affective words from text and attaches the corre-
dictionary of affective words, which is one of the most sponding emotional information to the logical label, which is
important linguistic resources in affective computing for used to generate the final emotion distribution. Compared to
text [25]. One sentence may include multiple affective the label enhancement method without using affective words
Mathematical Problems in Engineering 5

 Sentence Label Generated label
 distribution
 Not even Anger:0 Affective
 worth a lexicon
 Disgust:0
 rejection
 Fear:0 Affective
 letter.
 words
 Joy:0 Extraction

 Anger

 Fear

 Disgust

 Sadness

 Surprise
 Joy
 Sadness:1
 Surprise:0

 Figure 2: The lexicon-based emotion distribution label enhancement method.

[23], experimental results demonstrated that the emotion [26] proposed the theory of emotional wheel, which is a
distribution generated by LLE has better performance [24]. classic model that describes human emotional relation-
 In summary, the lack of textual emotional datasets with ships from a psychological perspective. Plutchik’s wheel of
annotated emotion distributions is a distinct obstacle to the emotions contains 8 basic emotions: anger, disgust,
development of EDL. To address this problem, some prior sadness, surprise, fear, trust, joy, and expect. As shown in
knowledge-based label enhancement methods have been Figure 3, these 8 emotions are divided into 4 groups of
recently proposed by scholars to enhance logical label to opposite emotions and allocated accordingly in the
emotion distribution [23, 24]. However, most existing emotion wheel. The two emotions in the diagonal position
methods are deficient in the sense of utilizing extra prior are the opposition (negative correlation), and the adjacent
knowledge, where the validity of prior knowledge is the key emotions have some kind of similarity (positive
to success. MWLE calculates emotion distances by Mikels’ correlation).
wheel and then adopts the Gaussian function to transform Since the similarity in Plutchik’s emotion wheel repre-
the sentence label to emotion distribution. However, MWLE sents the corresponding psychological distance of emotions,
is a label enhancement method without using the prior we use the interval angle in the emotion wheel to measure
linguistic knowledge of affective words [23]. LLE builds the the distance between emotions, where each 45-degree in-
emotion distribution based on the information of affective terval in the wheel is defined as 1 scale. The bigger the
words. Nevertheless, LLE does not consider the prior psy- interval angle, the larger the emotional distance. For ex-
chology knowledge of human emotions, which contains the ample, for the adjacent emotions of anger and expect, which
information of widely observed intercorrelation among are separated by 45°, their distance is 1. The distance between
emotions [24]. The major difference between MWLE and joy and sadness is 4, since they are two opposite emotions
LLE is the use of two different kinds of prior knowledge. To with 180-degree intervals. In our study, the distance between
the best of our knowledge, the prior knowledge of psy- any two different emotions is an integer from 1 to 4, and the
chology and linguistics has never been used together in label distance between the same emotions is 0.
enhancement methods. In previous work of EDL, some effective models based on
 In order to effectively integrate both psychological the prior psychological knowledge of human emotions have
knowledge and linguistic knowledge into the label en- been proposed [13, 23, 40]. For instance, Zhou et al. [13]
hancement model, this paper uses Plutchik’s wheel of introduced the emotion label constraints into the optimi-
emotions to calculate the psychological distance between zation function of the maximum entropy based EDL model,
emotions and proposes an emotion distribution label en- where the constraints are calculated according to Plutchik’s
hancement method based on the emotion wheel and the emotion wheel. Mikels et al. proposed an EDL method based
emotion lexicon. The details of the proposed method will be on Mikels’ wheel of emotions [40] and the convolutional
discussed in the next section. neural network, where Mikels’ emotion wheel is another
 classical psychological emotion model. The main difference
 between two emotion wheels is that they contain different
2.4. The Emotion Wheel and Lexicon-Based Label emotions. Both studies using wheels achieved excellent
 Enhancement Method results.
 For the task of emotion distribution label enhancement,
2.4.1. Plutchik’s Wheel of Emotions. Human emotional Yang et al. [23] proposed the method of Mikels’ Wheel-
expression is a complex phenomenon, where intrinsic based emotion distribution Label Enhancement (MWLE).
strong intercorrelations among emotions exist widely The MWLE method calculates the emotion distances based
[26]. Some particular emotions often appear simulta- on Mikels’ wheel of emotions and transfers emotion label
neously in a face or a sentence, which shows a high into emotion distribution by the Gaussian function. How-
positive correlation. Meanwhile, some other emotions ever, the information of affective words is ignored, since no
present the opposite cooccurrence phenomenon, which text is available in facial emotion analysis. As a result, the
can be regarded as a negative correlation. Robert Plutchik performance of MWLE is inferior to that of LLE [24]. Until
6 Mathematical Problems in Engineering

 emotions form a loop in Plutchik’s emotion wheel, the
 1 emotion distribution based on psychological distance will be
 Expect Joy a symmetrical distribution centered on the label α with
 decreasing values on both sides. The distribution may look
 like a Gaussian distribution or triangular distribution. From
 Anger Trust the label enhancement results in the facial age estimation
 task, Gaussian distribution is a better choice than triangular
 distribution [41]. We follow this work and use Gaussian
 Fear
 distribution.
 Disgust
 4 Based on the above principles, the EWLLE method
 adopts the discrete Gaussian function to enhance the
 C
 Sadness Surprise emotion label α to the distribution f α � faα a�1 . Formally,
 the discrete Gaussian distribution f α centered on the label α
 is calculated as follows:
 1 |a − α|2
 Figure 3: Plutchik’s wheel of emotions. faα � √��� exp − , (3)
 σ 2πZ 2σ 2

now, we have not found existing work on emotion distri- 1 |a − α|2
bution label enhancement that considers both psychological Z � √��� exp − , (4)
and linguistic emotional knowledge.
 σ 2π a 2σ 2

 where σ is the standard deviation of the Gaussian function, Z
 is the normalization factor to ensure z faα � 1, |a − α| is the
2.4.2. The Proposed Method. Just as upgrading a low-reso- distance between the emotion a and the ground-truth
lution image to a high-resolution one actually requires more emotion α. We use the psychological distance described in
information, label enhancement needs to introduce some Section 3.1 to calculate |a − α|. When the standard deviation
external knowledge to effectively transform a single label σ is larger, more similar emotions are considered in the
into a distribution. In addition to the label yi , we propose generated emotion distribution since the corresponding
using both psychological emotional knowledge and lin- Gaussian distribution is flatter. The standard deviation σ is
guistic information of affective words. Based on the ground- set to 1 in our experiments.
truth sentence label and the emotion labels of affective Once all emotion labels in sentence si are obtained,
words, combined with the emotion correlation knowledge EWLLE generates the Gaussian distribution f yi and f pt by
learned from Plutchik’s emotion wheel, we propose the i,k
 formula (3) for the sentence label yi and the affective words’
Emotion Wheel and Lexicon-based emotion distribution t
 emotion labels pi,k respectively. And then, in order to
Label Enhancement (EWLLE) method. EWLLE defines the combine the two kinds of information, EWLLE interpolates
psychological emotion distance according to the corre- the distribution of f yi and f pt to obtain the emotion dis-
sponding interval angle in Plutchik’s wheel of emotions, tribution di by
 i,k

then generates the discrete Gaussian distributions across all
 n m
emotion categories for the ground-truth sentence label and 1−λ k

 di � · f t + λ · f yi , (5)
the emotion labels of affective words, respectively, and finally nk�1 mk k�1 t�1 pi,k
integrates them into a unified emotion distribution by the
superposition operation. where n is the affective word number in sentence si , mk is the
 In particular, for the sentence si , EWLLE extracts all
 n
 number of emotion labels of the k-th affective word wi,k , pti,k
affective words wi � wi,k k�1 by looking up the emotion is the t-th emotion label of the affective word wi,k , yi is the
lexicon, where n is the number of affective words in si . The ground-truth emotion label of sentence si , f pt and f yi are the
 i,k
number of n is zero when there is no affective word con- generated Gaussian distributions of the emotion labels pti,k
tained in the sentence. Meanwhile, each affective word w mi,k
 and the sentence label yi respectively, and the weight co-
has several (at least 1) associated emotion labels pti,k t�1k , efficient λ is used to control the proportion of f yi in the
where mk is the number of emotion labels of wi,k . The total emotion distribution di . The specific steps of the EWLLE
number of emotion labels of affective words extracted from algorithm are shown in Algorithm 1.
sentence si is nk�1 mk . The range of the parameter λ is [0, 1]. The emotion
 For the enhancement from an emotion label α to an distribution di is solely generated from the ground-truth
emotion distribution, we make two reasonable assumptions. emotion label yi when λ � 1, where no affective word in-
Firstly, the ground-truth emotion label α should have the formation is included in EWLLE. In contrast, when λ � 0,
highest value in the generated distribution to ensure its EWLLE produces the emotion distribution only based on
dominating position. Secondly, the score of other emotions the affective words and the lexicon without the help of the
is reduced along with the distance to the label α, which annotation yi . Since the manually labeled label yi is generally
reflects the fact that an emotion similar to the dominating accurate, its emotion discriminating power should be greater
emotion has a higher weight than distanced ones. Since the than that of the automatic extracted emotional information
Mathematical Problems in Engineering 7

 Input: training sentence si and its emotion label yi , weighting parameter λ, emotion lexicon L
 Output: emotion distribution di of si
 ni
 (1) Extract all affective words wi,k k�1 from si by looking up L
 (2) for each wi,k
 m
 (3) Obtain all emotion labels pti,k t�1k of wi,k by looking up L
 (4) Generate discrete Gaussian distribution f pt for each pti,k according to (3)
 i,k
 (5) end for
 (6) Generate discrete Gaussian distribution f yi for yi according to (3)
 ni ni m
 (7) return di � (1 − λ/ k�1 mk ) · k�1 t�1k fpt + λ · f yi
 i,k

 ALGORITHM 1: The pseudocode of EWLLE.

of affective words. Therefore, we consider the optimal The affective lexicon utilized in the methods of EWLLE
threshold of the parameter λ is greater than 0.5, reflecting the and LLE is a combination of two classical lexicons, i.e., NRC
fact that the sentence label is more important than the af- [43] and Emosenticnet [44]. NRC contains 14,182 affective
fective words’ labels. We will investigate the effect of the words and 10 emotions. Emosenticnet includes 13,189 af-
parameter λ in the experiments. fective words and 6 emotions. We retained 6 intersected
 The EWLLE method defines the psychological distance emotions of the two lexicons, namely, anger, fear, joy,
between emotions based on the interval perspective on disgust, sadness, and surprise, and removed affective words
Plutchik’s emotion wheel, generates discrete Gaussian dis- not marked by any retained emotion. The emotion labels of
tributions for real emotion labels and emotion labels of an affective word were set as the union of the corresponding
emotion words, respectively, based on this distance, and original labels. Finally, we got 15,603 affective words, and
finally superimposes the distributions of the two labels into a each word has 1.31 emotion labels on average.
unified emotion distribution. Unlike the existing label en- The state-of-the-art EDL model based on multitask
hancement methods, the EWLLE method integrates the convolutional neural network (CNN) was used as the pre-
psychological and linguistic knowledge of emotions, and the diction model [24]. The emotion with the highest score in
generated label distributions contain more information the output emotion distribution is regarded as the predicted
quantity. Furthermore, the EWLLE method is not a simple emotion. For the setting of CNN framework, we use filter
combination of the existing MWLE and LLE method. The windows of 3, 4, and 5 with 100 feature maps each, dropout
method of LLE is to count the number of affective words in rate of 0.5, mini-batch size of 50, and optimization function
sentences and assign scores to the secondary emotion of SGD algorithm following the same routing in Zhang et al.
according to the number of affective words. The more the [24].
affective words appear, the higher the scores are. The The standard stratified 10-fold cross-validation proce-
modeling steps of the LLE approach cannot be simply dure was applied. We divide the dataset into ten subsets of
combined with the consideration of psychological infor- equal size according to the category proportion. Each subset
mation in the MWLE approach. The following experimental is used in turn as a test set, and the remaining subsets are
results will demonstrate that the EWLLE method can obtain used as the training set. To make the experimental results
better results by combining both psychological and lexical comparable, the models participating in the comparison
knowledge. used the same data division. The final performances were
 recorded by the averaged emotion classification accuracy
3. Experiment and the corresponding standard deviations over 10-fold
 cross-validation.
3.1. Experimental Setup. The experiments were conducted The codes were implemented in Python language with
on 4 widely used single-labeled text emotion datasets, i.e., the machine learning framework of Pytorch 1.3.1 and carried
TEC [27], Fairy Tales [28], CBET [29], and ISEAR [30]. For out on a Lenovo PC with Intel(R) Core(TM) i7-6700
detailed information of all experimental datasets, we list the 3.40 GHz CPU, 32 GB RAM.
sentence number of each emotion, the total sentence To evaluate the performance of our proposed method,
number, and the averaged word number of each sentence in we conducted the following two sets of experiments: (1)
Table 1. analyzing the effect of the weight coefficient λ on the EWLLE
 We preprocessed the text in a normative manner, where method; (2) comparing the classification accuracy of EWLLE
all of the numbers and stop words were removed, words to some state-of-the-art EDL label enhancement methods.
were converted into lowercase, and word stemming was
performed. Then, the pretrained word2vec word embedding
model [42] was used to represent words in the form of a 300- 3.2. Effects of the Parameter λ. As described in Section 3, the
dimensional vector. The words unseen in the word2vec emotion distribution generated by EWLLE is a combination
model were initialized by the random uniform distribution of the Gaussian distribution of the sentence emotion labels
U[−1.0,1.0] . Then, each sentence was converted into a matrix and the affective words’ emotion labels. The parameter λ
and fed to the EDL prediction model. plays an important role in controlling the relative proportion
8 Mathematical Problems in Engineering

 Table 1: The summary of 4 experimental datasets.
 Number of sentences of each emotion
Datasets Total sentence num. Avg. word num. Per sentence
 Anger Fear Joy Disgust Sadness Surprise
TEC 1,555 2,816 8,240 761 3,830 3,849 21,051 15.3
Fairy tales 216 166 444 — 264 114 1,204 24.0
CBET 8,540 8,540 8,540 8,540 8,540 8,540 51,240 15.0
ISEAR 1,087 1,090 1,090 1,081 1,083 — 5,431 21.7

of the two kinds of information of sentence label and af- (ii) MWLE: Yang et al. [23] proposed an emotion
fective words’ labels. In order to investigate the effects of λ, distribution Label Enhancement method based on
we varied the value of λ from 0 to 1 with the step of 0.1 and Mikels’ Wheel. MWLE calculates emotion distances
recorded the corresponding classification accuracy of the by Mikels’ wheel and then adopts the Gaussian
CNN based EDL model. Figure 4 shows the specific results function to transform the sentence label to emotion
on all experimental datasets. distribution. Two versions of MWLE are proposed
 As we can see from Figure 4, although the absolute score by Yang et al. [23], and we use the better version
of emotion classification accuracy is quite different, the (constraint 1) in our experiments. Note that Mikels’
accuracy curves show some similar trends on all datasets. emotion wheel in MWLE was replaced by Plutchik’s
When the value of λ increases from 0 to 0.7, the accuracies emotion wheel, because its emotions fit better the
are consistently improved, which indicates that it is bene- emotions labels in the experimental datasets.
ficial to incorporate the sentence emotion label at this stage. (iii) LLE: Zhang et al. [24] designed the Lexicon-based
However, when λ is beyond a certain point, the scores emotion distribution Label Enhancement method.
generally drop, which illustrates that relying too much on In the emotion distribution generated by LLE, the
the information from sentence labels, to the detriment of ground-truth label is assigned a certain score, and
affective words, is harmful. The value of 0.8 is the optimal the remaining scores are allocated across all other
value of λ on all 4 datasets, which is the point where the emotion labels by counting the corresponding af-
information from the sentence emotion label and the af- fective words. LLE does not include any psycho-
fective words reaches a kind of balance. Furthermore, the logical emotional knowledge.
fact that the optimal value of λ is much greater than 0.5
 (iv) EWLLE: Our proposed Emotion Wheel and Lexi-
verifies our previous conjecture that the importance of
 con-based emotion distribution Label Enhancement
sentence emotion labels is greater than that of affective
 method. EWLLE considers both the psychological
words.
 and linguistics information. The weight coefficient λ
 Furthermore, we find that the optimal accuracy score
 was set to 0.8.
with λ � 0.8 is significantly higher than that of λ � 0 (only
considering the information from affective words) or that The detailed comparative results of 4 label enhance
of λ � 1 (only including the information of sentence methods are shown in Table 2, where the mean classification
emotion label) on all 4 datasets. This demonstrates that it accuracy ± the standard deviation over the ten-fold cross-
is essential to consider both sentence emotion labels and validation is reported. The last row of Table 2 lists the score
affective words’ emotion labels in the label enhancement of comparative methods averaged on 4 datasets. The best
process. score of each row is highlighted in bold.
 The results of Table 2 show clearly that EWLLE
 outperforms other label enhancement methods on all
3.3. Comparative Results of Different Label Enhancement four datasets. Regarding the averaged accuracy on all
Methods. We compared the proposed EWLLE method datasets, EWLLE has the score of 0.663, which is 0.017
with some state-of-the-art label enhancement methods, higher than LLE, 0.027 better than MWLE, and 0.043
which are One-hot, MWLLE, and LLE. The comparative higher than One-hot. Compared to the suboptimal
experiment worked in a pipeline way. Firstly, the tradi- method of LLE, the performance of EWLLE is signifi-
tional single-label datasets were converted into emotion cantly improved, which indicates that introducing the
distribution labeled ones by label enhancement. Then, the psychological emotional knowledge into label enhance-
CNN model was built on the enhanced datasets to predict ment is necessary.
the emotions, among which the highest one is selected as In addition, the performance of LLE is superior to that of
the final prediction. At last, the emotion classification MWLE, which is consistent with the experimental results of
accuracy was recorded to represent the performance of the Zhang et al. [24]. The performance difference of LLE and
corresponding method. MWLE illustrates the discriminative power of affective
 (i) One-hot: Representing the sentence emotion label words, which is beneficial to text-based EDL. It is not enough
 directly by the one-hot vector, where the vector to enhance a single label to an emotion distribution solely
 component of the ground-truth label is 1; otherwise, based on the psychological emotional knowledge, as used in
 it is 0. The length of the one-hot vector is the MWLE. Since neither the prior emotional knowledge nor the
 number of all possible emotion labels. information of affective words is included, the one-hot
Mathematical Problems in Engineering 9

 0.74

 0.72

 0.70

 0.68

 Accuracy
 0.66

 0.64

 0.62

 0.60

 0.58
 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
 λ

 TEC CBET
 Fairy tales ISEAR

 Figure 4: Accuracy of CNN based EDL model combined with EWLLE with various values of the parameter λ.

 Table 2: Accuracy of CNN based EDL model with 4 different EDL label enhancement methods.
Datasets One-hot MWLE LLE EWLLE
TEC 0.567 ± 0.165 0.578 ± 0.190 0.592 ± 0.173 0.607 ± 0.169
Fairy tales 0.691 ± 0.227 0.708 ± 0.242 0.715 ± 0.235 0.733 ± 0.214
CBET 0.567 ± 0.182 0.588 ± 0.161 0.597 ± 0.130 0.603 ± 0.137
ISEAR 0.654 ± 0.203 0.669 ± 0.221 0.681 ± 0.197 0.707 ± 0.208
Avg. 0.620 ± 0.194 0.636 ± 0.204 0.646 ± 0.184 0.663 ± 0.182

method has the worst performance in the experiment Data Availability
expectedly.
 The raw data required to reproduce these findings are
 available in the cited references in Section 3.1 of the
4. Conclusions manuscript.
In the field of Emotion Distribution Learning (EDL), label
enhancement is an important method to solve the in- Conflicts of Interest
sufficient problem of emotion distribution annotated
datasets. In this paper, we proposed a method of Emotion The authors declare that they have no conflicts of interest.
Wheel and Lexicon-based emotion distribution Label
Enhancement (EWLLE) to effectively enhance the sen- Acknowledgments
tence emotion label in single-labeled datasets to the
emotion distribution. Unlike existing methods, EWLLE This research was supported in part by the Natural Science
adopts both the psychological emotional knowledge and Foundation of China under Grant nos. 61866017, 61866018,
the linguistics information of affective words. Based on and 61966019, in part by the Support Program for Out-
Plutchik’s wheel of emotions, EWLLE generates discrete standing Youth Talents in Jiangxi Province no.
Gaussian distribution for sentence emotion labels and 20171BCB23013, and in part by the Natural Science
emotion labels of affective words, respectively, and then Foundation of Jiangxi Province under Grant no.
superposes them into a unified emotion distribution. 20192BAB207027.
Extensive experimental results showed that EWLLE
performs favorably against the state-of-the-art label en- References
hancement methods.
 In the next research, we will introduce more prior af- [1] A. Yadollahi, A. G. Shahraki, and O. R. Zaiane, “Current state
fective knowledge into the EDL label enhancement method of text sentiment analysis from opinion to emotion mining,”
 ACM Computing Surveys, vol. 50, no. 2, pp. 1–33, 2017.
and try many different affective modeling methods to make
 [2] H. Zhou, M. Huang, T. Zhang et al., “Emotional chatting
use of prior knowledge more effectively. In addition, the machine: emotional conversation generation with internal
recognition of negative and other sophisticated affective and external memory,” in Proceedings of the 32nd AAAI
words in the label enhancement method is also the problem Conference on Artificial Intelligence, pp. 730–739, Louisiana,
we will study in the future. LA, USA, February 2018.
10 Mathematical Problems in Engineering

 [3] D. S. Manoharan and Sathish, “Geospatial and social media [18] Y. Fan, H. Yang, Z. Li, and S. Liu, “Predicting image emotion
 analytics for emotion analysis of theme park visitors using text distribution by learning labels’ correlation,” IEEE Access,
 mining and GIS,” June 2020, vol. 2, no. 2, pp. 100–107, 2020. vol. 7, pp. 129997–130007, 2019.
 [4] C. Chen and Q. Li, “A multimodal music emotion classifi- [19] X. Xi, Y. Zhang, X. Hua, S. M. Miran, Y.-B. Zhao, and Z. Luo,
 cation method based on multifeature combined network “Facial expression distribution prediction based on surface
 classifier,” Mathematical Problems in Engineering, vol. 2020, electromyography,” Expert Systems with Applications, vol. 161,
 Article ID 4606027, 11 pages, 2020. Article ID 113683, 2020.
 [5] S. Che, W. Zhu, and X. Li, “Anticipating corporate financial [20] J. Liang, R. Li, and Q. Jin, “Semi-supervised multi-modal
 performance from CEO letters utilizing sentiment analysis,” emotion recognition with cross-modal distribution match-
 Mathematical Problems in Engineering, vol. 2020, Article ID ing,” in Proceedings of the 28th ACM International Conference
 5609272, 17 pages, 2020. on Multimedia, pp. 2852–2861, New York, NY, USA, October
 [6] B. S. Rintyarna, R. Sarno, and C. Fatichah, “Enhancing the 2020.
 performance of sentiment analysis task on product reviews by [21] C. Strapparava and R. Mihalcea, “Semeval-2007 task 14: af-
 handling both local and global context,” International Journal fective text,” in Proceeding of Fourth International Workshop
 on Semantic Evaluations (SemEval-2007), pp. 70–74, Prague,
 of Information and Decision Sciences, vol. 12, no. 1, pp. 75–101,
 Czech Republic, June 2007.
 2020.
 [22] N. Xu, Y.-P. Liu, and X. Geng, “Label enhancement for label
 [7] Y. Zhou, H. Xue, and X. Geng, “Emotion distribution rec-
 distribution learning,” IEEE Transactions on Knowledge and
 ognition from facial expressions,” in Proceeding of the 23rd
 Data Engineering, vol. 33, no. 4, pp. 1632–1643, 2021.
 ACM international conference on Multimedia, pp. 1247–1250, [23] J. Yang, D. She, and M. Sun, “Joint image emotion classifi-
 Brisbane, Australia, October 2015. cation and distribution learning via deep convolutional neural
 [8] M. Abdul-Mageed and L. Ungar, “EmoNet: fine-grained network,” in Proceeding of the 26th International Joint Con-
 emotion detection with gated recurrent neural networks,” in ferences on Artificial Intelligence, pp. 3266–3272, Melbourne,
 Proceeding of the 55th annual meeting of the association for Australia, August 2017.
 computational linguistics, pp. 718–728, Vancouver, Canada, [24] Y. Zhang, J. Fu, D. She et al., “Text emotion distribution
 August 2017. learning via multi-task convolutional neural network,” in
 [9] J. Yu, L. Marujo, J. Jiang et al., “Improving multi-label Proceeding of the 27th International Joint Conferences on
 emotion classification via sentiment classification with dual Artificial Intelligence, pp. 95–4601, Stockholm, Swede, July
 attention transfer network,” in Proceeding of the 2018 Con- 2018.
 ference on Empirical Methods in Natural Language Processing, [25] Z. Teng, D. T. Vo, and Y. Zhang, “Context-sensitive lexicon
 pp. 1097–1102, Brussels, Belgium, November 2018. features for neural sentiment analysis,” in Proceeding of the
[10] B.-B. Gao, C. Xing, C.-W. Xie, J. Wu, and X. Geng, “Deep label 2016 conference on empirical methods in natural language
 distribution learning with label ambiguity,” IEEE Transactions processing, pp. 1629–1638, Texas, TX, USA, November 2016.
 on Image Processing, vol. 26, no. 6, pp. 2825–2838, 2017. [26] R. Plutchik, “A general psychoevolutionary theory of emo-
[11] X. Geng, “Label distribution learning,” IEEE Transactions on tion,” Theories of Emotion, vol. 1, pp. 3–33, 1980.
 Knowledge and Data Engineering, vol. 28, no. 7, pp. 1734– [27] S. M. Mohammad, “Emotional tweets,” in Proceeding of the
 1748, 2016. First Joint Conference on Lexical and Computational Se-
[12] D. Xue, Z. Hong, S. Guo et al., “Personality recognition on mantics, pp. 246–255, Montréal, QC, USA, June 2012.
 social media with label distribution learning,” IEEE Access, [28] C. O. Alm and R. Sproat, “Emotional sequencing and de-
 vol. 5, pp. 13478–13488, 2017. velopment in fairy tales,” in Proceeding of 1st International
[13] D. Zhou, X. Zhang, Y. Zhou et al., “Emotion distribution Conference on Affective Computing and Intelligent Interaction,
 learning from texts,,” in Proceeding of the 2016 Conference on pp. 668–674, Beijing, China, October 2005.
 Empirical Methods in Natural Language Processing, pp. 638– [29] A. G. Shahraki, “Emotion mining from text,” M.S. thesis,
 647, Texas, TX, USA, November 2016. University of Alberta, Edmonton, Canada, 2015.
[14] X. Jia, X. Zheng, W. Li et al., “Facial emotion distribution [30] K. R. Scherer and H. G. Wallbott, “Evidence for universality
 learning by exploiting low-rank label correlations locally,” in and cultural variation of differential emotion response pat-
 terning,” Journal of Personality and Social Psychology, vol. 662,
 Proceeding of the IEEE conference on computer vision and
 pp. 310–328, February 1994.
 pattern recognition, pp. 9833–9842, Seattle, WA, USA, June
 [31] N. E. Gayer, F. Schwenker, and G. Palm, “A study of the
 2019.
 robustness of KNN classifiers trained using soft labels,” in
[15] Z. Zhao and X. Ma, “Text emotion distribution learning from
 Proceeding of the 2nd Conference Artificial Neural Networks in
 small sample: a meta-learning approach,” in Proceeding of the Pattern Recognition, pp. 67–80, Berlin, Germany, September
 2019 Conference on Empirical Methods in Natural Language 2006.
 and the 9th International Joint Conference on Natural Lan- [32] X. Jiang, Z. Yi, and J. C. Lv, “Fuzzy SVM with a new fuzzy
 guage Processing, pp. 3957–3967, Hong Kong, China, No- membership function,” Neural Computing and Applications,
 vember 2019. vol. 15, no. 3-4, pp. 268–276, 2006.
[16] T. He and X. Jin, “Image emotion distribution learning with [33] Y. Li, M. Zhang, and X. Geng, “Leveraging implicit relative
 graph convolutional networks,” in Proceeding of the 2019 on labeling-importance information for effective multi-label
 International Conference on Multimedia Retrieval, pp. 382– learning,” in Proceeding of IEEE International Conference on
 390, Ontario, Canada, June 2019. Data Mining, pp. 251–260, Barcelona, Spain, January 2016.
[17] H. Xiong, H. Liu, B. Zhong et al., “Structured and sparse [34] P. Hou and X. Geng, M. Zhang, “Multi-label manifold
 annotations for image emotion distribution learning,” in learning,” in Proceeding of the 30th AAAI Conference on
 Proceeding of the 33rd AAAI Conference on Artificial Intel- Artificial Intelligence, pp. 1680–1686, Arizona, AZ, USA,
 ligence, pp. 363–370, Hawaii, HI, USA, January 2019. February 2016.
Mathematical Problems in Engineering 11

[35] X. Geng and Y. Xia, “Head pose estimation based on mul-
 tivariate label distribution,” in Proceeding of IEEE Conference
 on Computer Vision and Pattern Recognition, pp. 3742–3747,
 Columbus, GA, USA, June 2014.
[36] X. Geng, Q. Wang, and Y. Xia, “Facial age estimation by
 adaptive label distribution learning,” in Proceeding of the 22nd
 International Conference on Pattern Recognition, pp. 4465–
 4470, Stockholm, Sweden, August 2014.
[37] H. Zhang, Y. Zhang, and X. Geng, “Practical age estimation
 using deep label distribution learning,” Frontiers of Computer
 Science, vol. 15, no. 3, pp. 1–6, 2020.
[38] A. Agrawal and A. An, “Unsupervised emotion detection
 from text using semantic and syntactic relations,” in Pro-
 ceeding of 2012 IEEE/WIC/ACM International Conferences on
 Web Intelligence and Intelligent Agent Technology, pp. 346–
 353, Macau, China, December 2012.
[39] Y. Wang and A. Pal, “Detecting emotions in social media: a
 constrained optimization approach,” in Proceeding of the 24th
 International Joint Conference on Artificial Intelligence,
 pp. 996–1002, Buenos Aires, Argentina, July 2015.
[40] J. A. Mikels, B. L. Fredrickson, G. R. Larkin, C. M. Lindberg,
 S. J. Maglio, and P. A. Reuter-Lorenz, “Emotional category
 data on images from the international affective picture sys-
 tem,” Behavior Research Methods, vol. 37, no. 4, pp. 626–630,
 2005.
[41] X. Geng, C. Yin, and Z. H. Zhou, “Facial age estimation by
 learning from label distributions,” IEEE Transactions on
 Pattern Analysis and Machine Intelligence, vol. 35, no. 10,
 pp. 2401–2412, 2013.
[42] T. Mikolov, I. Sutskever, K. Chen et al., “Distributed repre-
 sentations of words and phrases and their compositionality,”
 in Proceeding of the 26th Advances in Neural Information
 Processing Systems, pp. 3111–3119, Nevada, NV, USA, De-
 cember 2013.
[43] S. M. Mohammad and P. D. Turney, “Nrc emotion lexicon,”
 NRC Technical Report, vol. 2, 2013.
[44] S. Poria, A. Gelbukh, E. Cambria, A. Hussain, and
 G.-B. Huang, “EmoSenticSpace: a novel framework for af-
 fective common-sense reasoning,” Knowledge-Based Systems,
 vol. 69, pp. 108–123, 2014.
You can also read