Twitter Summarization Based on Social Network and Sparse Reconstruction

Page created by Debra Pena
 
CONTINUE READING
The Thirty-Second AAAI Conference
                                                       on Artificial Intelligence (AAAI-18)

                                     Twitter Summarization Based
                             on Social Network and Sparse Reconstruction ∗

                                                     Ruifang He, Xingyi Duan
                    School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
                    Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin 300350, China
                                               {rfhe,xingyiduan}@tju.edu.cn

                            Abstract                                        tion and aid agencies monitor crisis progress so as to assist
                                                                            recovery and provide disaster relief.
  With the rapid growth of microblogging services, such as
  Twitter, a vast of short and noisy messages are produced by                  Despite document summarization has been researched for
  millions of users, which makes people difficult to quickly                 many years, it is still a knotty problem due to the large
  grasp essential information of their interested topics. In this           scale short, noisy and informal nature of messages in so-
  paper, we study extractive topic-oriented Twitter summariza-              cial media, such as tweets. Existing Twitter summariza-
  tion as a solution to address this problem. Traditional sum-              tion approaches usually regard tweets as sentences, and
  marization methods only consider text information, which is               adopt traditional summarization methods (Inouye and Kalita
  insufficient in social media situation. Existing Twitter sum-              2011), such as SumBasic (Vanderwende et al. 2007), Cen-
  marization techniques rarely explore relations between tweets             troid (Radev, Blair-Goldensohn, and Zhang 2001), LexRank
  explicitly, ignoring that information can spread along the                (Erkan and Radev 2004) and TextRank (Mihalcea and Tarau
  social network. Inspired by social theories that expression
                                                                            2004) to validate the relevant performance on microblog-
  consistence and expression contagion are observed in so-
  cial network, we propose a novel approach for Twitter sum-                ging posts. However, it is not clear whether adding the com-
  marization in short and noisy situation by integrating So-                plexity of methods will improve the performance of Twitter
  cial Network and Sparse Reconstruction (SNSR). We ex-                     summarization. Some other researches (Chang et al. 2016;
  plore whether social relations can help Twitter summariza-                Liu et al. 2012) explore to utilize static social features ex-
  tion, modeling relations between tweets described as the so-              cept for textual content, such as number of replies, number
  cial regularization and integrating it into the group sparse              of retweets, number of likes, author popularity (i.e. number
  optimization framework. It conducts a sparse reconstruction               of followers for a given tweet’s author) and temporal signals.
  process by selecting tweets that can best reconstruct the orig-              All the above methods ignore the fact that Twitter data
  inal tweets in a specific topic, with considering coverage and
  sparsity. We simultaneously design the diversity regulariza-
                                                                            is networked. There exist some researches (Chang et al.
  tion to remove redundancy. In particular, we present a mathe-             2013; Duan et al. 2012) exploiting social network informa-
  matical optimization formulation and develop an efficient al-              tion. These approaches mainly consider network informa-
  gorithm to solve it. Due to the lack of public corpus, we con-            tion from the user-level perspective, assuming that high au-
  struct the gold standard twitter summary datasets for 12 dif-             thority users are more likely to post salient tweets. However,
  ferent topics. Experimental results on this datasets show the             tweets are also potentially networked through user connec-
  effectiveness of our framework for handling the large scale               tions. Different from traditional methods, which obtain as-
  short and noisy messages in social media.                                 sociated tweet information through measuring similarity be-
                                                                            tween tweets purely based on content information, the net-
                                                                            worked tweets may contain more semantic clues than purely
                        Introduction                                        text-based methods. So we need to explore a new method for
Twitter has become one of the most popular social network                   modeling the tweet-level networked information.
platforms, through which amounts of users can freely pro-                      The social theories indicate the reciprocal influence of
duce content (called tweets) on their interested topics. How-               networked information. People themselves are more likely
ever, the rapid growth of tweets makes it difficult for people               to keep the same sentiment (Hu et al. 2013), preference
to quickly grasp essential information. Twitter summariza-                  (Wang et al. 2015) on a specific topic in a short period, and
tion aims to generate a succinct summary delivering the core                this phenomenon is called expression consistency. More-
information from a sheer volume of tweets in a given topic.                 over, relationship between people is established through a
It can be used to help ordinary people fastly acquire informa-              series of interactions and feedbacks. The influence is subtle
   ∗
     This work is supported by NSFC (61472277) and partly sup-
                                                                            and can make a great impact on ones’ preference, speaking
ported by National Program on Key Basic Research Project (973               manner or even expression content. Thus people gradually
Program, 2013CB329301) and NSFC (61772361).                                 have similar viewpoints about a topic with their friends and
Copyright  c 2018, Association for the Advancement of Artificial            show them with almost the similar tone and words, which
Intelligence (www.aaai.org). All rights reserved.                           is regarded as expression contagion. Inspired by these two

                                                                     5787
social theories, we explore how to utilize them for Twitter             solve the classical challenges existed in summarization, in-
summarization.                                                          cluding coverage, salience, and diversity. Further improved
   Recently, sparse reconstruction based summarization                  researches are contained in (Yao, Wan, and Xiao 2015;
methods have been proposed (He et al. 2012; Liu, Yu, and                Liu, Yu, and Deng 2016). However, the large scale short and
Deng 2016; Yao, Wan, and Xiao 2015), and show a signifi-                 noisy texts in social media make these methods unsuitable
cant performance on traditional corpus DUC/TAC. It is also              for twitter.
because that social information can be seamlessly combined                 Twitter Summarization. The prosperity of social media
into the sparse reconstruction based method. In this paper,             impels people to explore the adaptation of traditional sum-
we propose to integrate social network into a unified op-                marization methods (Inouye and Kalita 2011) on twitter, in-
timization framework for Twitter summarization from the                 cluding Hybrid TF-IDF model and phrase reinforcement al-
perspective of sparse reconstruction, through modeling the              gorithm to find the most commonly used phrase as summary
tweet-level networked information. It assumes that a good               (Sharifi, Hutton, and Kalita 2010; Nichols, Mahmud, and
summary can best reconstruct the original corpus, and better            Drews 2012). All these methods only consider text informa-
address the coverage, sparsity and diversity of summary in              tion. However, social media platform can provide us much
social media. Our contributions are summarized as follows:              more rich information other than texts in Twitter. (Duan et
• From the statistical perspective, we verify the existence of          al. 2012; Liu et al. 2012) extended the PageRank algorithm
  two social theories in twitter data and formally define the            through incorporating social properties. (Alsaedi, Burnap,
  problem of Twitter summarization to enable the utiliza-               and Rana 2016) proposed three methods for event summa-
  tion of social network;                                               rization, by using the temporal information and retweet in-
                                                                        formation. (Chang et al. 2013; 2016) regarded Twitter sum-
• Model the tweet-level networked information as a social               marization as a supervised classification task through mining
  regularization through integrating social network into the            rich social features, such as temporal signal and user influ-
  sparse reconstruction-based method;                                   ence. These approaches mainly use the static social infor-
• Design the group sparsity regularization for Twitter sum-             mation or user-level network information, and don’t further
  marization to keep salient tweets from the corpus-level,              explore tweet-level networked relations which may contain
  and the diversity regularization to avoid the more serious            much more potential semantic clues.
  redundancy bought by social network;                                     Social Network Propagation. Social theories and so-
• Construct 12 gold standard topic-oriented tweet datasets              cial network analysis provide us useful insights to com-
  by asking 24 volunteers to manually select the most infor-            bine topology and content in Twitter summarization. So-
  mative tweets, all in 48 expert summaries;                            cial network propagation also known as social influence or
                                                                        network influence has been researched in several domains,
• We empirically evaluate the proposed SNSR framework                   such as sentiment analysis (Hu et al. 2013), topic identi-
  on this datasets, elaborate the effectiveness of social net-          fication (Wang et al. 2015), topic detection (Bi, Tian, and
  work, and validate the new designed sparsity and diversity            Sismanis 2014), and network inference (He et al. 2015).
  schemes.                                                              From these researches, we know that sentiment and topic
                                                                        can spread along the network. In this paper, we will further
                     Related Work                                       explore how expression content, which is the carrier of sen-
Our proposed method belongs to the extractive and unsu-                 timent and topic, can spread along the network and influence
pervised style. Therefore, we mainly review the relevant re-            Twitter summarization. Social regularization considering so-
searches.                                                               cial network propagation can be seamlessly integrated into
   Multi-Document Summarization. Lots of traditional                    sparse reconstruction based summarization methods. There-
methods extract the result summary from top sentences with              fore, we further study the coverage, salience and diversity
the highest scores, through assigning salient scores to sen-            challenges of summarization in social media from the sparse
tences of the original document. The computation strate-                reconstruction perspective.
gies of salience include: (1) Feature based methods, includ-
ing Centroid and SumBasic, consider the frequency and                                    Problem Statement
the position of word to measure the sentence weight; (2)
Graph based methods are the PageRank like algorithms,                   Assume that the tweets corpus in a specific topic are repre-
such as LexRank and TextRank built by random walk on                    sented as a weighted term frequency inverse tweet frequency
sentence or word graph. However, these methods face the                 matrix, denoted as S = [t1 , t2 , . . . , tn ] ∈ Rm×n , where
redundancy problem. Some researches propose to use clus-                m is the size of vocabulary and n is the total number of
ter based strategies to keep the diversity of summary to                tweets. Each column ti of S stands for a single tweet vector.
avoid the redundancy (Cai et al. 2010; Wang et al. 2011;                U ∈ Rd×n denotes the user-tweet matrix, where Uij = 1
Shen, Li, and Ding 2010; Wang et al. 2009; Gao et al. 2012;             means that the jth tweet is posted by the ith user. We con-
Litvak et al. 2015). They mainly use topic modeling, clus-              struct the user-user matrix F ∈ Rd×d according to the fol-
ter algorithms or matrix factorization to produce the more              lowing relationship, and Fij = 1 indicates that the ith user
coverage summary. Recently, the appearance of sparse re-                is related to the jth user.
construction based summarization methods, originally pro-                  From the notation above, we formally describe Twitter
posed by (He et al. 2012), brings us new possibility to re-             summarization in short and noisy social media texts:

                                                                 5788
social media mining (Harrigan, Achananuparp, and Lim
               Table 1: Statistics of the Data sets                                2012). The analysis indicates that the members of a so-
                                Osama      Joplin      Mavs        Oslo
             Date                0501       0522       0612        0722            cial network often exhibit correlated behavior, sentiment and
         # of Tweets             4780       2896       3859        4571
          # of Users             1309       1082       1780        1026
                                                                                   topic can be diffused through network. Consistency means
    Max Degree of Users           69         68         76          77             that social behaviours conducted by the same person keep
    Min Degree of Users            1          1          1           1             consistent in a short period of time. Contagion means that
 Max Tweets Number of Users       42         93         92          56
 Min Tweets Number of Users        2          1          1           2             friends can influence each other. In this subsection, we in-
    Ave. Tweets per User         3.65       2.68       2.18        4.46            vestigate expression consistency and expression contagion
    P-value(Consistency)      4.78e-125   2.1e-98   9.08e-211   2.62e-131
     P-value(Contagion)        1.82e-33   6.6e-09    8.09e-08   4.98e-19           under a given topic for Twitter summarization. In our work,
                                                                                   we redefine and explore the consistency and contagion as:
                                                                                   • Expression consistency: Whether the tweets posted by
  Given a topic oriented Twitter corpus C with their content                          the same user are more consistent than two randomly se-
S and social context including user-tweet matrix U and user-                          lected tweets?
user matrix F , we aim to learn the reconstruction coefficient
matrix W to automatically produce a summary.                                       • Expression contagion: Whether the two tweets posted by
                                                                                      friends are more similar than the two randomly selected
                 Data and Observations                                                tweets?
Due to the lack of public Twitter summarization corpus, in                            To verify the two questions, we measure the distance be-
this section, we first introduce how to collect the data, and                       tween two tweets as Dij = ti − tj 2 , where ti denotes the
the construction scheme of ground truth corpus is listed in                        vector of the ith tweet. The more similar the two tweets, the
experiment section. Then, we explore whether the social the-                       more Dij tends to 0. For the first question, we construct two
ories can bring some motivating insights for Twitter summa-                        vectors named as consc and consr with equal number of
rization.                                                                          elements. Each element of the first vector is obtained by cal-
                                                                                   culating the distance of two tweets posted by the same user,
Data                                                                               and each element of the second vector is obtained by calcu-
We use the public Twitter data collected by University of                          lating the distance of two randomly selected tweets. Then we
Illinois1 as the raw data. According to the hashtags, we ex-                       conduct the two-sample t-test on the two vectors consc and
tract twelve popular topics happening in May, June and July                        consr . The null hypothesis is that there is no difference be-
2012, including politics, science and technology, sports, nat-                     tween the two vectors, H0 : consc = consr . The alternative
ural disasters, terrorist attacks and entertainment gossips.                       hypothesis is that the distance between two tweets posted by
Each topic can have multiple hashtags, such as “#osama”                            the same user is less than that of those randomly selected
and “#osamabinladen”. Then we search the tweets which                              tweets, H1 : consc < consr .
contain any of these hashtags or any of the keywords ob-                              Similarly, to ask the second question, we construct two
tained by getting rid of “#” from hashtags. Through ob-                            vectors named as contc and contr with equal number of el-
serving the topic trends of tweet number over time, there                          ements. Each element of the first vector is obtained by cal-
are mainly emergence and hot event. To consider expres-                            culating the distance of two tweets posted by friends, and
sion consistency and expression contagion in a short time                          each element of the second vector is obtained by calcu-
interval, we further collect tweets within five days after the                      lating two randomly selected tweets. We also conduct the
emergencies occurred (e.g. Oslo terrorist attack) and tweets                       two-sample t-test on the two vectors contc and contr . The
within five days before and after the hot event occurred (e.g.                      null hypothesis H0 : contc = contr , shows that there
Harrypotter). After obtaining the topic-oriented data, we fil-                      is no difference between two tweets posted by friends and
ter some tweets beforehand if they satisfy one of the follow-                      those randomly selected tweets. The alternative hypothesis
ing conditions:                                                                    H1 : contc < contr , shows that the distance between two
                                                                                   tweets posted by friends is less than those randomly selected
• Appear more than one time (only remain one of them);                             tweets. For all the topics, the consistency null hypothesis and
• The number of words is less than 3 other than hashtags,                          the contagion null hypothesis are rejected respectively at sig-
   keywords, mentions, URL and stop words;                                         nificance level α = 0.01 with p-values presented in the last
• The user of a certain tweet is independent of others;                            two rows of Table 1.
                                                                                      This observation provides strong evidence for the exis-
   Due to the limited space, the statistics of partial topics are                  tence of expression consistency and expression contagion.
shown in Table 1.                                                                  In the next section, we elaborate how to exploit these social
Observations of Social Theories for Twitter                                        theories for Twitter summarization.
Summarization
                                                                                                        Our Approach
Social theories, such as consistency (P.Abelson 1983) and
                                                                                   The large scale short and noisy texts in social media bring
contagion (Shalizi and Thomas 2011; Harrigan, Achananu-
                                                                                   more serious data sparseness, and content redundancy due
parp, and Lim 2012), have been proved to be useful for
                                                                                   to social network propagation. We investigate the new chal-
   1                                                                               lenges from Twitter summarization through the perspective
     https://wiki.illinois.edu/wiki/display/forward/Dataset-UDI
-TwitterCrawl-Aug2012                                                              of sparse reconstruction.

                                                                            5789
SR - Sparse Reconstruction                                                 user-user matrix F , tweet-tweet correlation matrix for ex-
Coverage Through treating Twitter summarization as an                      pression consistency Tcons is defined as Tcons = U T × U ,
issue of sparse reconstruction, original corpus should be re-              where Tcons = 1 denotes two tweets are posted by the
constructed by summary tweets. Given the original corpus                   same user. For expression contagion, Tcont is defined as
C, we can formally describe the reconstruction process as:                 Tcont = U T × F × U , where Tcont = 1 denotes two tweets
                                                                           are posted by friends. Then we obtain the tweet-tweet corre-
                             1                                             lation matrix, which can be either Tcons , Tcont or the combi-
                       min     S − SW                      (1)
                       W     2                                             nation T = Tcons + dTcont , where d is a balance parameter
                                                                           between the two relations. In this paper, we can simply set
Where W = [W∗1 , W∗2 , . . . , W∗n ] ∈ Rn×n is the re-                     d = 1 to construct a relation matrix. Tij = 1 denotes two
construction coefficient matrix, and each column W∗j =                      tweets have a correlated connection, otherwise Tij = 0. We
[W1j , W2j , . . . Wnj ] is a vector of coefficients used for rep-
resenting tweet tj , and each element Wij of W∗j denotes                   define the reconstruction matrix of S as Ŝ = SW , so graph
the proportion of tweet ti in reconstructing tweet tj . To con-            Lasso penalty term, namely social regularization is formu-
sider the original tweets can be regarded as the non-negative              lated as:
linear combination of summary tweets, we add constraint
                                                                                                   1 
                                                                                                      n     n
W ≥ 0 . To avoid tweet being reconstructed by itself, say-                             Ωgraph =              Tij Ŝ∗i − Ŝ∗j 
ing the reconstruction matrix W ≈ I, we add the constraint                                         2 i=1 j=1
diag(W ) = 0. The formula Eq.(1) can be transformed as:
                                                                                                   
                                                                                                   m
                                                                                                                                      (6)
                                                                                               =         Ŝi∗ D − T Ŝi∗
                                                                                                                        T

                     1                                                                             i=1
                 min   S − SW                              (2)
                   W 2                                                                            = tr(SW LW T S T )
                s.t. W ≥ 0, diag(W ) = 0                                   Where tr(·) denotes trace of a matrix, L = D − T is the
                                                                                    nmatrix, D ∈ R
                                                                                                         n×n
Sparsity - Group Lasso Due to that we only select a few                    Laplacian                          is a diagonal matrix with
tweets as a summary to reconstruct the original corpus, and                Dii =       j=1 T ij , and each diagonal element denotes the
thus not all the tweets have an impact on reconstructing one               degree of a tweet in matrix T .
certain tweet. Lots of coefficients of each column should                      Finally, we incorporate the social regularization Eq.(6)
tend to be zero. Inspired by sparse coding (Ye and Liu 2012),              into Eq.(3) as:
we regard each tweet as a group. The problem turns into se-                                1
lecting a subset of these groups to reduce the reconstruction                        min S − SW  + αtr(SW LW T S T )
                                                                                       W 2
loss. We add l2,1 norm constraint on W , so the entire objec-                                                                       (7)
                                                                                           + λ W 2,1
tive function is transformed as:
                                                                                      s.t. W ≥ 0, diag(W ) = 0
                     1
                min S − SW  + λ W 2,1                 (3)              where α is the parameter of social regularization.
                 W 2
               s.t. W ≥ 0, diag(W ) = 0                                    Diversity-Model the Redundancy Information
Where                                                                      Redundancy removal has always been the focus of sum-
                                
                                n                                          marization researches. Social studies show (Harrigan,
                    W 2,1 =         W (i, :)2            (4)           Achananuparp, and Lim 2012) that reciprocal ties and cer-
                                i=1
                                                                           tain triadic structures substantially increase social conta-
                                    Ã                                      gion, yet which also brings the more inherent redundancy
                                     n
                                                                           and the lack of novelty of messages in a certain social net-
             and    W (i, :)2 =            Wij 2         (5)           work. Therefore, this kind of redundancy challenge will be
                                       j=1                                 more serious than that of traditional summarization.
                                                                              Sparse reconstruction based methods tend to select tweets
SN - Model Networked Tweet-level Information                               that cover the whole corpus, yet there is no explicit tendency
To reduce the reconstruction error and make a rectification                 to select tweets containing different aspects of a topic. (Liu,
during the process of reconstruction, we exploit social theo-              Yu, and Deng 2016) introduced a correlation term to control
ries to model the networked tweet-level information as a so-               the diversity, and their optimization process is very com-
cial regularization. This means two correlated tweets which                plex. (Yao, Wan, and Xiao 2015) introduced a dissimilar-
are originally close should keep close during the reconstruc-              ity matrix, which greatly reduces the optimization complex-
tion. Essentially, we need to build a graph Lasso (Ye and                  ity. However, the computational method of this matrix is not
Liu 2012).                                                                 suitable for tweets due to the nature of the large scale short
   In order to utilize the social relations for Twitter sum-               and noisy texts in social media, since it measures the en-
marization, we model the two mentioned social theories to                  coding cost for each word with sentence length or vocabu-
construct tweet-tweet correlation graph through transform-                 lary size. This method makes each dissimilarity value pretty
ing the user-tweet relations and social relations into tweet-              large, and leads each element of W closing to zero, so it is
tweet correlation relations. Given user-tweet matrix U and                 confused to identify the salience of a tweet.

                                                                    5790
Inspired by the dissimilarity matrix, we introduce a rel-             Algorithm 1 An Efficient Optimization Algorithm for
atively simple but effective cosine similarity matrix ∇, and             SNSR
each element ∇ij ∈ [0, 1] denotes the cosine similarity be-              Input: S, U, F, ∇, W0 , α, γ, λ, θ, 
tween tweet ti and tweet tj . In the process of sparse re-               Output: W
construction, we add constraint diag(W ) = 0 to avoid                     1: Initialize μ0 = 0, μ1 = 1, W1 = W0 , lr = 0.1
tweets reconstructing themselves. Based on this knowledge,                2: A = U T × U + U T × F × U, L = D − A, ∇ = ∇ ≥ θ
we have reason to avoid tweets being reconstructed by those               3: for iter = 0,1,2,. . . do
tweets pretty similar to them. Considering the example be-                4:    V = W1 + μ0μ−1   1
                                                                                                    (W1 − W0 )
low:                                                                                  ∂f (W )
                                                                          5:                    = S T SW1 − S T S + γ∇ + αS T SW1 L
• Tweet1: the mood was solemn at the garden of reflec-                     6:   loop
                                                                                       ∂W

   tion in lower makefield following the death of osama bin                                     1 ∂f (W )
                                                                          7:       U = V − lr
   laden. video: http://fb.me/tof3pqok                                                            ∂W
                                                                          8:       for each row Ui∗ of U do
• Tweet2: the mood was solemn at the garden of reflection                  9:          Wi∗ = Sλ/lr (Ui∗ )
   in lower makefield following osama bin laden’s death.                  10:       end for
   video: http://bit.ly/l9tvdw                                           11:       W = W − diag(W ), W = max(W, 0)
   Obviously the above two tweets are similar to each other,             12:       if f (W ) ≤ Glr,V (W ) then
it will lead that both of the reconstruction coefficients W12             13:          break
and W21 close to 1. Therefore, this fact raises the salience of          14:       end if
two tweets throughout the corpus, and brings more redun-                 15:       lr = 2 ∗ lr
dancy. Through the preliminary experiments, we can dis-                  16:   end loop
cover lots of similar pairs presented in the final summary                17:   Set f unV al(iter) = f (W ) + λW 2,1
without handling the diversity. To better avoid the “similar”            18:   if |f unV al(iter) − f unV al(iter − 1)| ≤  then
reconstruction phenomena, we design ∇ as:                                19:       break
                           ®                                             20:   end if
                            1 if ∇ij ≥ θ,                                21:   W0 = W1
                  ∇ij =                                     (8)                W1 = W
                            0 otherwise                                  22:
                                                                         23:   μ0 = μ1 √
where θ is the threshold used to distinguish the similar pairs                          (1+ 1+4μ21 )
and normal pairs. Then we formally introduce the diversity               24:   μ1 =          2
regularization term,                                                     25: end for

                              
                              n 
                                n
               tr(∇T W ) =              ∇ij Wij                          where z ≥ 0 is the radius of the 2,1 −ball, and there is a
                              i=1 j=1
                                                                         one-to-one correspondence between λ and z.
into Eq.(7), so the objective function is transformed as:
                                                                           We omit details of our mathematical derivations due
              1                                                          to the limited space. Interested reader may reference (Hu
          min S − SW  + αtr(SW LW T S T )
          W 2                                                            et al. 2013) and SLEP package2 (Sparse Learning with
                                                            (9)
              + γtr(∇T W ) + λ W 2,1                                   Efficient Projects). The entire algorithm is described in
          s.t. W ≥ 0, diag(W ) = 0                                       Algorithm 1, where
                                                                                                       Å         ãT
where γ is the parameter of diversity regularization term.                                               ∂f (V )
                                                                             Glr,V (W ) = f (V ) + tr(              (W − V ))
   By solving Eq.(9), the ranking score of tweet ti is calcu-                                             ∂V                   (11)
lated as:                                                                                 lr            2
                  Score(ti ) = W (i, :)2                                              + W − V F
                                                                                          2
We select tweets according to this ranking score to form the
                                                                           and the shrinkage operator in Line 9 is defined as:
final summary.
                                                                                                                  λ
         Optimization Algorithm for SNSR                                                   Sλ/lr = max(1 −               , 0)Ui∗      (12)
                                                                                                               lrUi∗ 2
Inspired by (Ji and Ye. 2009; Nesterov and Nesterov 2004;
Hu et al. 2013), we derive an efficient algorithm to solve the
optimization problem in Eq.(9). Objective function can be                                             Experiments
equivalently expressed as:                                               Ground Truth and Evaluation Metric
                 1                                                       In order to evaluate our approach, we construct the ground
  min f (W ) =     S − SW  + αtr(SW LW T S T )                         truth (expert summaries) Corpus for Twitter Summarization
   W             2
                                                 (10)                    (CTS). For each of the twelve topics, we ask four volunteers
              + γtr(∇T W )
                                                                               2
  s.t.        W ≥ 0, diag(W ) = 0, W 2,1 ≤ z                                     http://www.yelab.net/software/SLEP/

                                                                  5791
(a) Social influence. From left to right are SNSR and SNSR-social, SNSR-sparse and SNSR-social-sparse, SNSR-div and SNSR-social-div, SNSR-sparse-div and
     SNSR-social-sparse-div

     (b) Sparse influence. From left to right are SNSR and SNSR-sparse, SNSR-social and SNSR-social-sparse, SNSR-div and SNSR-sparse-div, SNSR-social-div and
     SNSR-context-sparse-div

     (c) Diversity influence. From left to right are SNSR and SNSR-div, SNSR-social and SNSR-social-div, SNSR-sparse and SNSR-sparse-div, SNSR-social-sparse
     and SNSR-social-sparse-div

                Figure 1: The influences of social regularization, sparse regularization and diversity regularization

to selects 25 tweets as a summary respectively, altogether                          tweet; LexRank (Erkan and Radev 2004) ranks tweets by
48 expert summaries. Then we ask three other volunteers to                          the PageRank like algorithm; LSA (Gong and Liu. 2001)
score all summaries based on coverage, diversity and writ-                          exploits SVD (singular value decomposition) to decompose
ing quality of tweets in range [1, 5]. If only 0-6 tweets are                       the TF-IDF matrix, and then selects the highest ranked
satisfactory, then this summary is scored as 1, 12 as 2, 18 as                      tweets from each right singular vector; NNMF (Park et al.
3, 24 as 4, and if all the tweets are good, we score it 5. The                      2007) performs non-negative matrix factorization on the TF-
higher score, the more possible it is a better summary. We                          IDF matrix, and chooses tweet with the maximum prob-
remain the summaries whose scores are greater or equal to                           ability in each cluster; And two other methods based on
3, and require those low-quality summaries to be modified                            sparse reconstruction are namely DSDR (He et al. 2012),
until they are eligible.                                                            and MDS-Sparse (Liu, Yu, and Deng 2016). The SNSR-
   We use ROUGE as our evaluation metric (Lin 2004),                                div, SNSR-sparse, SNSR-social are the degradation mod-
which measures the overlapping N-grams between expert                               els of SNSR, “-” denotes deleting the corresponding diver-
summaries and the model summary. In our experiment, we                              sity, group sparse and social regularizations from our model
report the F-measures of ROUGE-1, 2, and ROUGE-SU4.                                 SNSR. The “-” setting rules of models in Figure 1 are simi-
                                                                                    lar.
Performance Evaluation                                                                 Through the overall comparisons seen in Table 2, we have
                                                                                    the following observations:
Since our model belongs to the extractive and unsupervised
style, we only compare with the relevant systems, includ-                           • Our model outperforms all the baselines, and is below the
ing text and sparse reconstruction based methods. The upper                            upper bound. Yet it is worthy to be noted that all the meth-
bound of human summary and the baselines shown in Table                                ods make a high performance, especially for ROUGE-1.
2 are as follows:                                                                      This phenomenon can be explained that our task is topic-
   Expert denotes the average mutual assessment of expert                              oriented and we collect tweets according to the hashtags,
summaries. Random selects tweets randomly; Centroid                                    thus tweets tend to have the coherent content. It also may
(Radev, Blair-Goldensohn, and Zhang 2001) ranks tweets by                              be due to that we conduct an effective preprocessing;
calculating the similarity between tweets and pseudo-center                         • Among all the comparison experiments, the methods us-

                                                                             5792
Table 2: Performance on the Twitter data                       Table 3: Average improvement of different regularization in
  System          ROUGE-1       ROUGE-2      ROUGE-SU4                  SNSR over the degradation models
  Expert           0.47814       0.16337       0.20389                    Regularization    ROUGE-1       ROUGE-2      ROUGE-SU4
  Random           0.41701       0.09439       0.14231                        social         4.84%         46.01%        23.69%
  Centroid         0.38190       0.12384       0.15668                        sparse          9.98%        21.03%        14.87%
  LexRank          0.42046       0.13273       0.17366                       diversity       10.27%         6.34%        12.51%
  LSA              0.43474       0.13023       0.16625
  NNMF             0.43784       0.13321       0.17433
  DSDR             0.43236       0.12946       0.16521
  MDS-Sparse       0.42240       0.10060       0.14666
                                                                          regularization and then diversity regularization term. Es-
  SNSR-div         0.40191       0.12940       0.15894                    pecially for ROUGE-2 and ROUGE-SU4, adding so-
  SNSR-sparse      0.43327       0.13692       0.17749                    cial regularization outperforms the degradation models
  SNSR-social      0.43236       0.10271       0.15379                    of 46.01% and 23.69% respectively, which demonstrates
  SNSR             0.44887       0.13882       0.18147                    that adding social regularization tends to select tweets
                                                                          close to expert summaries.
                                                                        • It is noted that diversity regularization makes a better per-
  ing matrix factorization, especially NNMF shows a com-                  formance in ROUGE-1 than the other two terms, which
  parable performance. The probable reasons come from                     demonstrates that redundancy removal indeed decreases
  two aspects: (1) NNMF can also be regarded as a recon-                  duplicated words and increases words coverage in com-
  struction method, furthermore, it is similar to (Li et al.              parison with expert summaries.
  2017) that exploits aspect term vectors to reconstruct the
  original term space; (2) It solves coverage and diversity                For social regularization, we just simply set d = 1, and
  challenges to some degree by mining sub-topics.                       have not discussed the different influences of consistency re-
                                                                        lation and contagion relation due to the limited space. They
• In comparison with three degradation models, both so-                 will be further explored in the future.
  cial regularization, diversity regularization are useful for
  SNSR, and group sparsity regularization is also effec-                Parameters Settings and Tunings
  tive in obtaining salient tweets patterns from corpus-level
                                                                        In our experiment, there are mainly four parameters to be
  rather than cluster-level.
                                                                        analyzed: social regularization α, diversity regularization γ,
• Through observing the last four rows in Table 2, we can               group sparse regularization λ and diversity threshold param-
  discover that social regularization has an obviously effect           eter θ. We tune four parameters greedily through setting step
  on ROUGE-2 and ROUGE-SU4, and diversity regulariza-                   size, such as α in range [0, 1], setting step size as 0.01.
  tion dose well on ROUGE-1.                                               Through preliminary experiments, we set α = 0.03,
   In summary, our SNSR method achieves the better per-                 λ = 1, and γ = 1. And we set the similarity threshold
formance. It suggests that integrating social network infor-            θ = 0.1, which is consistent with the observation that the
mation into the proposed sparse reconstruction framework                similarity between sentences is mostly distributed in the area
helps improve Twitter summarization. Mining the group                   of [0, 0.1]. Through this, we try to avoid the “similar” recon-
sparsity patterns of salient tweets and designing the diver-            struction phenomena.
sity regularization in terms of redundancy brought by social
network are also effective.                                                                    Conclusion
                                                                        The large scale short and noisy texts in social media bring
Effect of Different Regularization                                      new challenges for summarization, which make traditional
To further evaluate the effectiveness of (1) Social regu-               document summarization methods unsuitable for twitter. In
larization, we conduct four groups of comparison experi-                this paper, we study Twitter summarization from the sparse
ments, seen in Figure 1(a). For each group, two models are              reconstruction perspective, and propose a SNSR framework.
presented. In addition, we conduct similar evaluations on               Social network makes tweets be correlated by user relations
(2) sparse regularization and (3) diversity regularization,             and also bring more serious coherent redundancy. Therefore,
seen in Figure 1(b) and Figure 1(c) respectively. By analyz-            we model the networked tweet-level information by social
ing experiment results, we have the following observations:             relations as a social regularization, and integrate it into the
                                                                        sparse optimization framework. Meanwhile, we also design
• The experiment performance will drop down without any                 the diversity regularization to avoid the redundancy, espe-
  of these three terms. It demonstrates the effectiveness of            cially due to the inherent redundancy brought by social net-
  adding social regularization, group sparse regularization             work and the consistent property of topic-oriented corpus.
  and diversity regularization.                                         And we use group sparse regularization to extract the better
• For each regularization, we compute the average growth                tweets patterns to form summary from the corpus-level, min-
  percentage under ROUGE-1, ROUGE2 and ROUGE-SU4                        ing the salient tweets as well as keeping the diverse tweets.
  in comparison with all of the corresponding degradation               Apart from this, we construct the CTS corpus. Experimen-
  models. Seen from Table 3, social regularization has the              tal results on this corpus show that our model achieves the
  greatest influence for the entire model, secondly sparse               better performance and the proposed framework is effective.

                                                                 5793
References                                      Liu, X.; Li, Y.; Wei, F.; and Zhou, M. 2012. Graph-based
Alsaedi, N.; Burnap, P.; and Rana, O. 2016. Automatic                  multi-tweet summarization using social signals. COLING.
summarization of real world events using twitter. The Tenth            Liu, H.; Yu, H.; and Deng, Z. 2016. Multi-document sum-
International AAAI Conference on Web and Social Media.                 marization based on two-level sparse representation model.
Bi, B.; Tian, Y.; and Sismanis, Y. 2014. Scalable topic-               AAAI.
specific influence analysis on microblogs. ACM Interna-                  Mihalcea, R., and Tarau, P. 2004. Textrank: Bringing order
tional Conference on Web Search and Data Mining.                       into texts. In Proceedings of the 2004 conference on Empir-
Cai, X.; Li, W.; Ouyang, Y.; and Yan, H. 2010. Simulta-                ical Methods in Natural Language Processing.
neous ranking and clustering of sentences: A reinforcement             Nesterov, Y., and Nesterov, I. 2004. Introductory lectures on
approach to multi-document summarization. COLING.                      convex optimization. A basic course.
Chang, Y.; Wang, X.; Mei, Q.; and Liu, Y. 2013. Towards                Nichols, J.; Mahmud, J.; and Drews, C. 2012. Summarizing
twitter context summarization with user influence models.               sporting events using twitter. IUI.
WSDM.                                                                  P.Abelson, R. 1983. Whatever became of consistency the-
Chang, Y.; Tang, J.; Yin, D.; Yamada, M.; and Liu, Y. 2016.            ory? Personality and Social Psychology Bulletin, Vol. 9 No.
Timeline summarization from social media with life cycle               1, March 1983 37-54.
models. IJCAI.                                                         Park, S.; Lee, J.-H.; Kim, D.-H.; and Ahn, C.-M. 2007.
Duan, Y.; Chen, Z.; Wei, F.; Zhou, M.; and Shum, H.-Y.                 Multi-document summarization based on cluster using non-
2012. Twitter topic summarization by ranking tweets using              negative matrix factorization. In Proceedings of Conference
social influence and content quality. COLING.                           on Current Trends in Theory and Practice of Computer Sci-
Erkan, G., and Radev, D. R. 2004. Lexrank: Graph-based                 ence (SOFSEM).
lexical centrality as salience in text summarization. Journal          Radev, D. R.; Blair-Goldensohn, S.; and Zhang, Z. 2001.
of Artificial Intelligence Research 22 (2004) 457-479.                  Experiments in single and multidocument summarization
Gao, D.; Li, W.; Ouyang, Y.; and Zhang, R. 2012. Lda-                  using mead. First Document Understanding Conference.
based topic formation and topic-sentence reinforcement for             Shalizi, C. R., and Thomas, A. C. 2011. Homophily and
graph-based multi-document summarization. AIRS.                        contagion are generically confounded in observational so-
Gong, Y., and Liu., X. 2001. Generic text summarization                cial network. Sociological Methods & Research, 2011 , 40
using relevance measure and latent semantic analysis. In               (2) :211.
Proc. of the 24th ACM SIGIR, 19-25. ACM.                               Sharifi, B.; Hutton, M.; and Kalita, J. 2010. Summarizing
Harrigan, N.; Achananuparp, P.; and Lim, E.-P. 2012. In-               microblogs automatically. HLT-NAACL.
fluentials, novelty, and social contagion: The viral power of           Shen, C.; Li, T.; and Ding, C. H. Q. 2010. Integrating
average friends, close communities, and old news. Social               clustering and multi-document summarization by bi-mixture
Networks, 34 (2012):470-480.                                           probabilistic latent semantic analysis (plsa) with sentence
He, Z.; Chen, C.; Bu, J.; Wang, C.; Zhang, L.; Cai, D.; and            bases. AAAI.
He, X. 2012. Document summarization based on data re-                  Vanderwende, L.; Suzukia, H.; Brocketta, C.; and
construction. AAAI.                                                    Nenkovab, A. 2007. Beyond sumbasic: Task-focused
He, X.; Rekatsinas, T.; Foulds, J.; Getoor, L.; and Liu, Y.            summarization with sentence simplification and lexical
2015. Hawkestopic: A joint model for network inference                 expansion.      Information Processing and Management
and topic modeling from text-based cascades. ICML.                     43(2007):1606-1618.
Hu, X.; Tang, L.; Tang, J.; and Liu, H. 2013. Exploit-                 Wang, D.; Zhu, S.; Li, T.; and Gong, Y. 2009. Multi-
ing social relations for sentiment analysis in microblogging.          document summarization using sentence-based topic mod-
WSDM.                                                                  els. ACL-IJCNLP 2009 Conference Short Papers 297–300.
Inouye, D., and Kalita, J. 2011. Comparing twitter summa-              Wang, D.; Zhu, S.; Li, T.; Chi, Y.; and Gong, Y. 2011. In-
rization algorithms for multiple post summaries. Socialcom.            tegrating document clustering and multidocument summa-
Ji, S., and Ye., J. 2009. An accelerated gradient method for           rization. KDD.
trace norm minimization. ICML.                                         Wang, X.; YingWang; Zuo, W.; and Cai, G. 2015. Exploring
Li, P.; Wang, Z.; Lam, W.; Ren, Z.; and Bing, L. 2017.                 social context for topic identification in short and noisy texts.
Salience estimation via variational auto-encoders for multi-           AAAI.
document summarization. AAAI.                                          Yao, J.; Wan, X.; and Xiao, J. 2015. Compressive document
Lin, C.-Y. 2004. Rouge: A package for automatic evaluation             summarization via sparse optimization. IJCAI.
of summaries. In Proceedings of Workshop on Text Summa-                Ye, J., and Liu, J. 2012. Sparse methods for biomedical
rization Branches Out, Post-Conference Workshop of ACL                 data. SIGKDD Explorations Newsletter of the Special Inter-
2004.                                                                  est Group on Knowledge Discovery & Data Mining 14(1):4–
Litvak, M.; Vanetik, N.; Liu, C.; Xiao, L.; and Savas, O.              13.
2015. Improving summarization quality with topic model-
ing. CIKM.

                                                                5794
You can also read