Deep Attentive Learning for Stock Movement Prediction From Social Media Text and Company Correlations

Page created by Karen Welch
 
CONTINUE READING
Deep Attentive Learning for Stock Movement Prediction From Social Media Text and Company Correlations
Deep Attentive Learning for Stock Movement Prediction
                From Social Media Text and Company Correlations
                   Ramit Sawhney*                                    Shivam Agarwal*
          Netaji Subhas Institute of Technology                 Manipal Institute of Technology
            ramits.co@nsit.net.in                                shivamag99@gmail.com

                    Arnav Wadhwa                                     Rajiv Ratn Shah
                   MIDAS, IIIT Delhi                                    IIIT Delhi
                arnavw96@gmail.com                              rajivratn@iiitd.ac.in

                       Abstract                             2013). However, price signals alone fail to capture
                                                            market surprises and impacts of sudden unexpected
    In the financial domain, risk modeling and              events. Social media texts like tweets can have
    profit generation heavily rely on the sophisti-
                                                            huge impacts on the stock market. For instance,
    cated and intricate stock movement prediction
    task. Stock forecasting is complex, given the           US President Donald Trump shared tweets express-
    stochastic dynamics and non-stationary behav-           ing negative sentiments against Lockheed Martin,
    ior of the market. Stock movements are in-              which led to a loss of around $5.8 Billion to the
    fluenced by varied factors beyond the conven-           company’s market capitalization.1
    tionally studied historical prices, such as social         The Efficient Market Hypothesis (EMH)
    media and correlations among stocks. The ris-           (Malkiel, 1989) states that financial markets are
    ing ubiquity of online content and knowledge
                                                            informationally efficient, such that stock prices re-
    mandates an exploration of models that factor
    in such multimodal signals for accurate stock
                                                            flect all known information. Existing works (Sec.
    forecasting. We introduce an architecture that          2) mainly focus on subsets of stock relevant data.
    achieves a potent blend of chaotic temporal             Although useful, they do not jointly optimize learn-
    signals from financial data, social media, and          ing over modalities like social media text and inter
    inter-stock relationships via a graph neural            stock relations limiting their potential to capture a
    network in a hierarchical temporal fashion.             broader scope of stock movement affecting data,
    Through experiments on real-world S&P 500               as we show in Sec. 6. Multimodal stock predic-
    index data and English tweets, we show the
                                                            tion involves multiple challenges (Hu et al., 2018).
    practical applicability of our model as a tool
    for investment decision making and trading.             Both price signals and tweets exhibit sequential
                                                            context dependencies, where singular samples may
1   Introduction                                            not be informative enough but can be considered a
                                                            sequence for a unified context. Tweets often have
  Stock prices have an intrinsically volatile and           diverse influence on stock prices, based on their
non-stationary nature, making their rise and fall           intrinsic content, such as breaking news as opposed
hard to forecast (Adam et al., 2016). Investment            to noise like vague comments. Fusing multiple
in stock markets involves a high risk regarding             modalities of vast stock related data generated with
profit-making. Prices are driven by diverse fac-            varying characteristics (frequency, noise, source) is
tors that include but are not limited to company            complex and mandates the careful design of joint
performance (Anthony and Ramesh, 1992), histori-            optimization over modality-specific components.
cal trends (Kohara et al., 1997), investor sentiment           Building on the EMH and prior work (Sec. 2),
(Neal and Wheatley, 1998). Uninformed trading               we propose MAN-SF: Multipronged Attention Net-
decisions can leave traders and investors prone to          work for Stock Forecasting that jointly learns from
financial risk and experience monetary losses. On           historical prices, social media, and inter stock rela-
the contrary, careful investment choices can maxi-          tions. MAN-SF through hierarchical attention cap-
mize profits (de Souza et al., 2018). Conventional          tures relevant signals across diverse data to train a
research focused on time series and technical anal-         Graph Attention Network (GAT) for stock predic-
ysis of a stock, i.e., using patterns from historical       tion (Sec. 3). MAN-SF (Sec. 4) jointly learns from
price signals to forecast stock movements (B et al.,           1
                                                                 https://medium.com/scoop-markets/7-tweets-which-
    * Equal contribution.                                   wiped-40-billion-off-the-stock-market

                                                         8415
      Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 8415–8426,
                       November 16–20, 2020. c 2020 Association for Computational Linguistics
Deep Attentive Learning for Stock Movement Prediction From Social Media Text and Company Correlations
price and tweets over graph-based models for stock          Despite the popularity of NLP and graph-based
prediction. Through varied experiments (Sec. 5),         stock prediction, multimodal methods that capture
we show the predictive power of MAN-SF along             inter stock relations and market sentiment through
with profitability analysis (Sec. 6) and qualitatively   linguistic cues are seldom explored. Jue Liu (2019)
analyze MAN-SF in high risk scenarios (Sec. 7).          combines feature extraction from news sentiment
                                                         scores, financial information (price-earnings ra-
2   Related Work                                         tio, etc.) along with knowledge graph embed-
                                                         dings through TransR. However, such existing ap-
Predicting stock movements spans multiple do-            proaches (Deng et al., 2019) are unable to represent
mains (Jiang, 2020); 1) theoretical: quantitative        textual signals from social media and prices tem-
models like Modern Portfolio Theory (Elton et al.,       porally, as they only utilize sentiment scores and
2009), Black-Scholes model (Black and Scholes,           do not account for stock correlations. To cover this
1973), etc. and, 2) practical: investment strategies     gap in prior research, MAN-SF captures a broader
(Blitz and Van Vliet, 2007), portfolio management        set of features as opposed to both conventional TA
(Hocquard et al., 2013), and beyond the world of         and FA that singularly focus on either text or graph
finance (Erb et al., 1994; Rich and Tracy, 2004).        modalities, but not both together.
Financial models conventionally focused on tech-
nical analysis (TA) relying only on numerical fea-       3   Problem Formulation
tures like past prices (Ding and Qin, 2019; Nguyen
                                                         MAN-SF’s main objective is to learn temporally
et al., 2019) and macroeconomic indicators like
                                                         relevant information jointly from tweets and histor-
GDP (Hoseinzade et al., 2019). Such TA methods
                                                         ical price signals and make use of corporate rela-
include discrete: GARCH (Bollerslev, 1986), con-
                                                         tions among stocks to predict movements. Follow-
tinuous (Andersen, 2007), and neural approaches
                                                         ing Xu and Cohen (2018), we formalize movement
(Nguyen and Yoon, 2019; Nikou et al., 2019).
                                                         based on the difference between the adjusted clos-
   Newer models based on the EMH that are cate-
                                                         ing prices of the stock s ∈ S on trading days d and
gorized under fundamental analysis (FA) (Dichev
                                                         d − 1. We formulate stock movement prediction as
and Tang, 2006), account for stock affecting factors
                                                         a binary classification problem.
beyond numerical ones such as investor sentiment
through news, etc. Work in natural language pro-         Problem Statement: Given stock s ∈ S, and
cessing (NLP) from sources such as news (Hu et al.,      historical price data and tweets for stock s over a
2018), social media data (Xu and Cohen, 2018),           lookback window of T days over the day range
earnings calls (Qin and Yang, 2019; Sawhney et al.,      [t − T, t − 1], we define the price movement of
2020b) shows the merit of FA in capturing mar-           stock s from day t − 1 to t as:
ket sentiment, surprises, mergers, acquisitions that
                                                                               0, pcd < pcd−1
                                                                             
traditional TA based methods fail to account. A                         Yt =                             (1)
limitation of existing NLP methods for stock pre-                              1, pcd ≥ pcd−1
diction is that they assume stock movements to be
                                                         where pcd represents the widely used (Yang et al.,
independent of each other, contrary to true market
                                                         2020; Qin and Yang, 2019) adjusted closing price2
function (Diebold and Yilmaz, 2014). This assump-
                                                         of a given stock on day t. Here, 0 represents a price
tion hinders NLP centric FA’s ability to learn latent
                                                         downfall, and 1 represents a rise in the price.
patterns for the study of interrelated stocks.
   Another line of FA revolves around employing          4   MAN-SF: Components and Learning
graph-based methods to improve TA (e.g., price-
based models) by augmenting them with inter              In this section, we first give an overview of MAN-
stock relations (Feng et al., 2019b; Sawhney et al.,     SF, followed by a detailed explanation of each com-
2020a). Matsunaga et al. (2019) combine historical       ponent. As shown in Figure 1, MAN-SF first en-
prices with stock graphs through Graph Convolu-          codes market data for each stock over a fixed pe-
tion Networks (GCNs), outperforming price-only           riod. Formally, we encode stock features xt ∈ Rw
models. Similarly, Kim et al. (2019) further im-         for each trading day t as, xt = B(ct , qt ); where,
prove graph neural network methods by weighing           ct ∈ Ru represents a social media feature that we
stock relations through attention mechanisms, as           2
                                                             Source: https://www.investopedia.com/
not all stock movements are equally correlated.          terms/a/adjusted_closing_price.asp

                                                     8416
Deep Attentive Learning for Stock Movement Prediction From Social Media Text and Company Correlations
Figure 1: An overview of MAN-SF: Encoding Mechanisms, GAT Mechanism, Joint Optimization.

                                                          where, pi ∈ Rdp is the price vector on day i for
                                                          each stock s in the lookback. The raw price vector,
                                                          pi = [pci , phi , pli ] comprises of a stock’s adjusted
                                                          closing price, highest price and lowest price for a
                                                          trading day i. Since it is the price change that deter-
                                                          mines the stock movement rather than the absolute
                                                          price value, we normalize it with its last adjusted
                                                          closing price, pi = pi /pci−1 .
                                                             It has been shown that the stock trend of each
      Figure 2: An overview of the Price Encoder.         day has a different impact on stock trend prediction
                                                          (Feng et al., 2019a). Towards this end, we employ
                                                          temporal attention ζ(·) (Li et al., 2018) that learns
obtain by encoding tweets over the lag window for
                                                          to weigh critical days and forms an aggregated
each stock s ∈ S = {s1 , s2 , . . . sS }. Similarly,
                                                          feature representation across all hidden states of
qt ∈ Rv are the features obtained from historical
                                                          the GRU (Qin et al., 2017). The temporal attention
prices for a stock in the lag window. We detail
                                                          mechanism yields qt = ζ(hp ); where, hp ∈ Rdp ×T
these encoders first, and then explain the fusion
                                                          is the concatenated hidden states of GRUp for each
B(·) over ct and qt to obtain xt ∈ Rw . We then
                                                          stock s. This temporal attention mechanism ζ(·)
describe the graph to represent the inter stock re-
                                                          rewards days with more impactful information and
lations. Lastly, we explain the GAT to which the
                                                          aggregates it from all days in the lag window to
fused feature vector xt is passed to propagate fea-
                                                          produce price features qt ∈ Rv .
tures based on inter-stock relations along with the
joint optimization of MAN-SF.                             Temporal Attention We use a temporal atten-
                                                          tion mechanism that is a form of additive attention
4.1    Price Encoder                                      (Bahdanau et al., 2014). The mechanism ζ(·) ag-
Technical Analysis shows that historical price infor-     gregates all the hidden representations of the GRU
mation is a strong indicator of future trends (Jean-      across different time-steps into an overall represen-
blanc et al., 2009). Therefore, price data from each      tation with learned adaptive weights (Feng et al.,
day is a crucial input to MAN-SF. The Price En-           2019a). We formulate this mechanism ζ(·) as:
coder shown in Figure 2 encodes historical stock
price movements to produce price feature, qt . It                             exp (hTi W hz )
                                                                      βi = PT                                (3)
                                                                                        T
takes in a per-day price feature from the lookback                           i=1 exp (hi W hz )
of T days and encodes the temporal trend in prices.                                X
To capture such sequential dependencies across                            ζ(hz ) =     βi hi                 (4)
                                                                                       i
trading days, we use a Gated Recurrent Unit (GRU)
(Cho et al., 2014; Giles et al., 2001). The output of        where, hz ∈   RT ×dm denotes the concatenated
the GRU on day i is denoted by:                           hidden states of the GRU. βi represents the learned
                                                          attention weights for trading day i, and W is a
      hi = GRUp (pi , hi−1 ) t − T ≤ i ≤ t          (2)   learnable parameter matrix.

                                                      8417
Deep Attentive Learning for Stock Movement Prediction From Social Media Text and Company Correlations
2002). For each tweet, we obtain a representa-
                                                           tion using the Tweet Embedding layer (USE) as
                                                           [m1 , m2 , . . . mK ] where mj ∈ Rd and K is the
                                                           number of tweets per stock on day i. To model the
                                                           sequence of tweets within a day, we use a GRU.
                                                           For stock s on each day i:

                                                                  hj = GRUm (mj , hj−1 ); j ∈ [1, K]       (5)

                                                           The influence of online tweets on the market can
                                                           vary greatly (Hu et al., 2018). To identify tweets
                                                           that are likely to have a more substantial influence
                                                           on the market, we use an intraday tweet level atten-
      Figure 3: Social Media Information Encoder.          tion. For each stock s on each day i the mechanism
                                                           can be summarized as:

4.2    Social Media Information Encoder (SMI)                                exp (hTj W hm )
                                                                      γj = PK                              (6)
                                                                                       T
Xu and Cohen (2018) suggest that tweets not only                            j=1 exp (hj W hm )
convey factual data, but also portray user sentiment                                X
towards stocks that influence financial prediction                           ri =       γj hj              (7)
(Bollen et al., 2011). A variety of market factors                                  j

beyond historical prices drive stock trends (Abu-          where, hm ∈ RK×dm denotes a concatenation of
Mostafa and Atiya, 1996). With the rising ubiquity         all hidden states from GRUm and dm is the di-
of the Internet, social media platforms, such as           mension of each hidden state. γj represents the
Twitter, influence investors to follow market trends       attention weights and ri represents the features ob-
(Tetlock, 2007; Hu et al., 2018). Tweets not only          tained from several published tweets on day i for
convey factual information but also portray user           each stock s. W is a learned linear transformation.
sentiment towards stocks (Xu and Cohen, 2018;
Fung et al., 2002). To this end, MAN-SF uses the           Learning Representations across days Analyz-
SMI encoder to extract a feature vector ct using           ing a temporal sequence of tweets and combining
tweets. The encoder shown in Figure 3 extracts             them can provide a more reliable assessment of
social media features, ct , by first encoding tweets       market trends (Zhao et al., 2017). We learn a so-
for a day and then over multiple days using a hier-        cial media representation from the sequence of day
archical attention mechanism (Yang et al., 2016).          level tweet representations ri . This feature vector
                                                           encodes all the information in a lookback window.
Tweet Embedding For any given tweet tw , we                We then feed temporal day level tweet vectors to a
generate an embedding vector m ∈ Rd . We ex-               GRU for sequential modeling given by:
plored word and sentence level embedding methods
to learn tweet representations: Global Vectors for             hi = GRUs (ri , hi−1 )    t−T ≤i≤t          (8)
Word Representation (GloVe) (Pennington et al.,
2014), Fasttext (Joulin et al., 2017), and Universal       where, hi summarizes the tweets on day i for stock
Sentence Encoders (USE) (Cer et al., 2018). Empir-         s as well as tweets from preceding days while fo-
ically, sentence-level embeddings generated using          cusing on day i. Like historical prices, tweets from
a deep averaging network encoder variant of the            each day have a different impact on stock move-
USE3 gave us the most promising results. Thus,             ments. Hence, the previously described temporal
we encode each tweet tw using USE.                         attention mechanism used for historical prices is
                                                           also used for social media. This mechanism learns
Learning Representations for one day On any                a procedure to aggregate impactful information to
day i, a variable number tweets [tw1 , tw2 , . . . twK ]   form SMI features ct over a lookback of T days
for each stock s are posted, and these cap-                for each stock s. The temporal attention mecha-
ture and influence the stock trends (Fung et al.,          nism yields ct = ζ(hs ); hs ∈ RT ×ds represents
  3
    Implementation used: https://tfhub.dev/                the concatenated hidden states of GRUs and ds is
google/universal-sentence-encoder/2                        the size of output space of the GRU. This temporal

                                                      8418
Deep Attentive Learning for Stock Movement Prediction From Social Media Text and Company Correlations
attention ζ(·), along with the intraday tweet-level     and Y. R1, R2, and R3, defined in Wikidata, are
attention, forms a hierarchical attention mechanism.    different types of entity-relations. For instance,
This mechanism captures the fact that tweets are        Wells Fargo and Bank of America are related to
differently informative and have varied impacts dur-    Berkshire Hathaway via a first-order company rela-
ing different market phases. The obtained SMI and       tion ”owned by.” Another example is Microsoft and
price features for each stock are then blended to       Berkshire Hathaway that are related through Bill
obtain a joint representation.                          Gates (second-order relation: ”owned by” - ”is a
                                                        board member of”) since Bill Gates possesses own-
4.3    Blending Multimodal Information                  ership over Microsoft and is a Board member of
Signals from different modalities often carry com-      Berkshire Hathaway. We define the stock relation
plementary information about different events in        network as a graph G(S, E) where S denotes the
the market (Robert P. Schumaker, 2019). Direct          set of nodes, and E is the set of edges. Each node
concatenation treats information from Price and         s ∈ S represents a stock, and two stocks s1 , s2 ∈ S
SMI encoders equally (Li et al., 2016). Further-        are joined by an edge e ∈ E if s1 , s2 are linked by
more, the interdependencies between price and           a first or second-order relation.
tweets are not appropriately captured, damping the
framework’s capacity to learn their correlations to     Graph Attention Graph-based representation
market trends (Li et al., 2014). We use a bilinear      learning through graph neural networks can be con-
transformation that learns the pairwise feature in-     sidered as information exchange between related
teractions from historical price features and tweets.   nodes (Gilmer et al., 2017). As each stock has a
Formally, qt ∈ Rv and ct ∈ Ru are obtained from         different degree of influence on another stock, it is
the Price Encoder and SMI Encoder, respectively.        essential that the graph encoding suitably weighs
The output xt ∈ Rw is given by:                         more relevant relations between stocks. To this end,
                                                        we use graph attention networks (GATs), which
      xt = B(ct , qt , ) = ReLU (qtT W ct + b)   (9)    are graph neural networks with node-level atten-
                                                        tion (Veličković et al., 2017).
where, W ∈ Rw×v×u is the weight matrix, and
                                                           We first describe a single GAT layer that is
b ∈ Rw is the bias. Methods like direct mean and
                                                        used throughout the GAT component. The in-
attention-based aggregation (Bahdanau et al., 2014)
                                                        put to the GAT is a set of stock (node) features,
do not account for pair-wise interactions as shown
                                                        h = [x1 , x2 , . . . x|S| ], where xi is the encoded
in the results (Sec. 6). Other methods like fac-
                                                        multi-modal market information (Sec. 4.3). The
torized bilinear pooling (Yu et al., 2017), reduce
                                                        GAT layer produces an updated set of of node fea-
computational complexity; however, we empiri-                                                      0
                                                        tures h0 = [z1 , z2 , . . . z|S| ]; zi ∈ Rw based on the
cally find that the generalized bilinear layer out-
                                                        GAT mechanism (shown in Figure 1). We first
performs these techniques. This layer learns an
                                                        apply a shared linear transform parameterized by
optimum blend of features from prices and tweets                  0
                                                        W ∈ Rw ×w to all the nodes. Then, we apply a
in a translationally invariant manner.
                                                        shared self-attention mechanism to each node i in
4.4    Graph Attention Network (GAT)                    its immediate neighborhood Ni . For each node
Stocks are often interlinked with one another, and      j ∈ Ni , we compute normalized attention coeffi-
thus, we model stocks and their relations as a graph.   cients αij representing the importance of relations
                                                        among stocks i and j. Formally, αij is given as:
Graph Creation Following Feng et al. (2019b),
we make use of Wiki company-based relations. Us-
ing Wikidata4 , we extract first and second-order re-         exp (LeakyReLU (aTw [W xi ⊕ W xj ]))
                                                        αij = P
lations between the company stocks in the S&P 500               exp (LeakyReLU (aTw [W xi ⊕W xk ]))
                                               R1             k∈Ni
index. A first-order relation is defined as X −→ Y                                                   (10)
where X and Y denote entities in Wikidata that          where, .T and ⊕ represent transpose and concatena-
                                                                                    0
correspond to the two stocks. A second-order re-        tion respectively. aw ∈ R2w is a learnable weight
                         R2       R3
lation is defined by X −→ Z ←− Y where Z de-            matrix of a single layer feed forward neural net-
notes another entity connecting the two entities X      work. The learned attention coefficients αij are
  4
    https://www.wikidata.org/wiki/                      used to weigh and aggregate feature vectors from
Wikidata:List_of_properties/all                         neighboring with a non-linearity σ. The updated

                                                   8419
node feature vector zi is given as:                           training, 01/08/2015 to 30/09/2015 for validation,
                                                            and 01/10/2015 to 01/01/2016 for testing. Follow-
                       X                                      ing Xu and Cohen (2018), we align trading days by
            zi = σ         αij W xj                 (11)    dropping samples that lack either prices or tweets,
                           j∈Ni
                                                              and further align the data across trading windows
We use multi-head attention to stabilise training             for related stocks to ensure data is available for all
(Vaswani et al., 2017). Formally, U independent               trading days in the window for all stocks. The hid-
executors apply the above attention mechanism.                den size of all GRUs is 64, and the USE embedding
Their output features are concatenated to yield:              dimension is 512. We use U = 8 attention heads
                                                            for both GAT layers. We use the Adam optimizer
               M U      X                                     with a learning rate set to 5e−4 and train MAN-SF
                              k
          zi =     σ        αij W k xj       (12)           for 10, 000 epochs. It takes 3hrs to train and test
                    k=1      j∈Ni                             MAN-SF on Tesla K80 GPU. We use early stop-
                                                              ping based on Matthew’s Correlation Coefficient
where, αijk and W k denote normalised attention
                                                              (MCC) taken over the validation set.
coefficients and linear transformation parameter
matrix computed by the k th attention mechanism.              5.2     Evaluation
   We use a two-layer GAT, the first layer is fol-            Following prior research for stock prediction (Ding
lowed by Exponential Linear Unit (Clevert et al.,             et al., 2014; Xu and Cohen, 2018), we use accuracy,
2015), and the second layer outputs a vector yi               F1 score, MCC (implementations from sklearn6 )
for each stock i, which is then used to classify              for classification performance. We use MCC be-
the stock’s future price movements. MAN-SF is                 cause, unlike the F1 score, MCC avoids bias due to
trained using the Adam optimiser by optimizing                data skew as it does not depend on the choice of the
the cross-entropy loss, given as:                             positive class and accounts forthe TrueNegatives.
                                                                                               tp f n
              |S|                                             For a given confusion matrix               :
              X                                                                               f p tn
 Lcse = −           Yi ln(yi ) + (1 − Yi ) ln(1 − yi ) (13)
              i=1                                                                tp × tn − f p × f n
                                                                 MCC = p                                         (14)
                                                                        (tp + f p)(tp + f n)(tn + f p)(tn + f n)
where, Yi is the true price movement of stock i.
                                                                 Like prior work (Kim et al., 2019; Feng et al.,
5       Experiments                                           2019b), to evaluate MAN-SF’s applicability to real-
5.1      Dataset and Training Setup                           world trading, we assess its profitability on the
                                                              test data of the S&P 500 index using two metrics:
We adopt the StockNet dataset (Xu and Cohen,                  Cumulative Profit and Sharpe Ratio (Sharpe, 1994).
2018) for the training and evaluation of MAN-SF.              We follow a trading strategy where, if MAN-SF
The dataset contains data of high-trade-volume                predicts a rise in a stock’s value the next day, then
stocks in the S&P 500 index in the NYSE and                   one share of that stock is bought (long position) at
NASDAQ markets. Stock specific tweets are ex-                 the closing price of the current trading session and
tracted using regex queries made out of NASDAQ                sold on the next day’s closing price. Otherwise,
ticker symbols, for instance, $AMZN for Ama-                  if the strategy speculates a fall in price, a short
zon. The price data has been obtained from Ya-                sell7 is performed. We compute the cumulative
hoo Finance5 . We shift a 5-day lag window along              profit (Krauss, 2018) earned as:
the trading days to generate samples. We label
                                                                                    X pti − pt−1
the samples according to the movement percent-                          Profitt =             i
                                                                                                           t−1
                                                                                                 (−1)Actioni     (15)
                                                                                            t−1
age of the closing price such that those ≥ 0.55%                                    i∈S
                                                                                          p i

and ≤ −0.5% are labeled positive and negative
                                                              where, S denotes the set of stocks, pti denotes the
samples, respectively. This leaves us with 26, 614
                                                              price of stock i at day t. Actiont−1   i    is a binary
samples divided as 49.78% and 50.22% in the two
                                                              value [0, 1]. The Actiont−1
                                                                                        i   is 0 if the long position
classes. We temporally split the dataset in a ra-
                                                              is taken at time t for stock i; otherwise it is 1.
tio of Train:Validation:Test in 70:10:20, leaving us
                                                                  6
with date ranges from 01/01/2014 to 31/07/2015 for               sklearn: https://scikit-learn.org
                                                                  7
                                                                 Short sell: https://en.wikipedia.org/wiki/
    5
        https://finance.yahoo.com/industries                  Short_(finance)

                                                          8420
Model                                                                F1 ↑       Accuracy ↑          MCC ↑
       RAND                                                         0.502 ± 8e−4    0.509 ± 8e−4    −0.002 ± 1e−3
 TA    ARIMA (Brown, 2004)                                          0.513 ± 1e−3    0.514 ± 1e−3    −0.021 ± 2e−3
       Selvin et al. (2017)                                         0.529 ± 5e−2    0.530 ± 5e−2    −0.004 ± 7e−2
       RandForest (Venkata Sasank Pagolu, 2016)                     0.527 ± 2e−3    0.531 ± 2e−3     0.013 ± 4e−3
       TSLDA (Nguyen and Shirai, 2015)                              0.539 ± 6e−3    0.541 ± 6e−3     0.065 ± 7e−3
       HAN (Hu et al., 2018)                                        0.572 ± 4e−3    0.576 ± 4e−3     0.052 ± 5e−3
       StockNet - TechnicalAnalyst (Xu and Cohen, 2018)                0.546 ± −       0.550 ± −        0.017 ± −
       StockNet - FundamentalAnalyst (Xu and Cohen, 2018)              0.572 ± −       0.582 ± −        0.072 ± −
       StockNet - IndependentAnalyst (Xu and Cohen, 2018)              0.573 ± −       0.575 ± −        0.037 ± −
 FA    StockNet - DiscriminativeAnalyst (Xu and Cohen, 2018)           0.559 ± −       0.562 ± −        0.056 ± −
       StockNet - HedgeFundAnalyst (Xu and Cohen, 2018)                0.575 ± −       0.582 ± −        0.081 ± −
       HATS (Kim et al., 2019)                                      0.560 ± 2e−3    0.562 ± 2e−3     0.117 ± 6e−3
       Chen et al. (2018)                                           0.530 ± 7e−3    0.532 ± 7e−3     0.093 ± 9e−3
       Adversarial LSTM (Feng et al., 2019a)                           0.570 ± −       0.572 ± −        0.148 ± −
       MAN-SF (This work)                                          0.605 ± 2e−4    0.608 ± 2e−4     0.195 ± 6e−4

Table 1: Results compared with baselines. Bold shows the best results. Green is indicative of higher performance.
TA and FA represent Technical Analysis and Fundamental Analysis models, respectively.

   The Sharpe Ratio is a measure of the return of                  • HAN: A hierarchical attention mechanism to
a portfolio compared to its risk. We calculate the                   encode textual information during a day and
Sharpe ratio by computing the ratio of the expected                  across multiple days (Hu et al., 2018).
return Ra of a portfolio to its standard deviation as:
                                                                   • StockNet: A variational Autoencoder (VAE)
                              E[Ra ]                                 that uses price and text information. Text is
             Sharpe Ratioa =                       (16)
                             std[Ra ]                                encoded using hierarchical attention during
                                                                     and across days. Price features are modeled
5.3   Baselines
                                                                     sequentially (Xu and Cohen, 2018). We com-
We compare MAN-SF with the below baselines                           pare with all five variants of StockNet.
spanning both technical and fundamental analysis.
                                                                   • HATS: A hierarchical graph attention method
Technical Analysis: These methods uses only                          that uses a multi-graph to weigh different rela-
historical price information.                                        tionships between stocks. It uses only histori-
                                                                     cal price data (Kim et al., 2019).
  • RAND: Random guess as price rise or fall.

  • ARIMA: Autoregressive Integrated Moving                        • Chen et al. (2018): GCNs to model inter
    Average models historical prices as a non-                       stock relations with only historical price data.
    stationary time series (Brown, 2004).
                                                               6    Results and Analysis
  • Selvin et al. (2017): Three deep neural archi-
    tectures (RNN, CNN and LSTM) using prices.                 We now discuss the experimental results and some
    We compare with the best performing LSTM.                  findings with their financial implications.

                                                               Performance Comparison Table 1 shows the
Fundamental Analysis: These methods use
                                                               performance of the compared methods on Stock-
other modalities such as text information and com-
                                                               Net’s test data split from 01/10/2015 to 31/12/2015
pany relationships along with historical prices.
                                                               on the S&P 500 index averaged over ten differ-
  • RandForest: Random Forests classifier                      ent runs. Using a learned blend of historical price
    trained over word2vec (Mikolov et al., 2013)               and tweets using corporate relationships, MAN-
    embeddings for tweets.                                     SF achieves the best performance, outperforming
                                                               the strongest baselines, StockNet, and Adversarial
  • TSLDA: Topic Sentiment Latent Dirichlet Al-                LSTM. We also note that Fundamental Analysis
    location model is a generative model that uses             (FA) techniques outperform numerical only Tech-
    sentiments and topic modeling on social me-                nical Analysis (TA) methods, reiterating the effec-
    dia (Nguyen and Shirai, 2015).                             tiveness of factoring in social media signals and

                                                       8421
Model Component                    F1 ↑    MCC ↑                                            4   Adv-LSTM      Stock-Net
                                                          Table 3: Annualized sharpe     3.5
                                                                                                   HATS
                                                                                                  MAN-SF
                                                                                                             MAN-SF (Concat)

 LSTM + Historical Price            0.521     0.002       Ratio comparison with              3
 GRU + Social Media Text (BERT)     0.539     0.077       baselines. Bold and italics    2.5

 GCN + Historical Price             0.532     0.093                                          2

 GRU + Social Media Text (USE)      0.546     0.101
                                                          denotes best and second        1.5

 GCN + Social Media Text (USE)      0.555     0.102       best results, respectively.        1

                                                                                         0.5
 GAT + Historical Price             0.562     0.117
                                                                                             0
 MAN-SF (Concatenation)             0.588     0.156       Model    Sharpe Ratio↑

                                                                                           15 08

                                                                                           15 18

                                                                                           15 28

                                                                                           15 07

                                                                                           15 17

                                                                                           15 27

                                                                                           15 07

                                                                                           15 17
                                                                                                    27
 MAN-SF (Attention Fusion)          0.594     0.173

                                                                                         20 10-

                                                                                         20 10-

                                                                                         20 10-

                                                                                         20 11-

                                                                                         20 11-

                                                                                         20 11-

                                                                                         20 12-

                                                                                         20 12-

                                                                                                 2-
                                                          Stocknet     0.83

                                                                                              -1
                                                                                              -

                                                                                             -

                                                                                              -

                                                                                              -

                                                                                              -

                                                                                             -

                                                                                             -

                                                                                              -
                                                                                           15
 MAN-SF (Bilinear Transformation)   0.605     0.195

                                                                                        20
                                                          HATS         0.78
                                                          MAN-SF       1.05      Figure 6:                  Cumulative
Table 2: Ablation study over MAN-SF’s components.
                                                                                        profit trend

                                                          ships between the two signals leading to a joint
                                                          representation that captures more specific features
                                                          noticed by areas of concentrated attention as com-
                                                          pared to simple concatenation based fusion.
                                                          Analyzing Graph Attention We notice that
                                                          equally weighing all correlations using GCN-based
   (a) Feature fusion maps   (b) Graph attention map      models leads to smaller performance gains, as
   Figure 4: Feature weight heatmaps for MAN-SF
                                                          shown in Table 2, as compared to GAT (GAT, and
                                                          MAN-SF variants). To analyze this difference, we
                                                          first calculate each neighbor’s attention scores in
inter stock relations. These results empirically vali-    the stock relations graph, as shown in Figure 4b.
date the effectiveness of multimodal signals due to       By analyzing the different stock associations with
a broader capture of stock price influencing infor-       the highest and lowest attention scores, we observe
mation, including tweets and other related stocks.        that some relations between stocks, such as being
                                                          a part of the same industry or having the same
Ablation Study In Table 2, we observe the abil-           founder, are more critical than other relations like
ity of price and text models to predict the market        stocks having the same country of origin. For in-
trend to an extent using unimodal features. Im-           stance, C (CitiCorp) and JPM (JP Morgan) have
provements over individual modalities are noted           a relatively high attention score and are a part of
with the inclusion of a graph-based learning model,       the same investment and banking industry, whereas
i.e., GCN and GAT validating the premise of us-           the attention score for JPM and CSCO (Cisco) is
ing inter stock relations for enhanced forecasting.       relatively low. We also observe that some stocks
When the text and price signals are fused, and more       share hidden correlations captured by the GAT due
relevant information is extracted using the atten-        to the market’s temporal nature. We explain one
tion mechanisms, a performance gain is seen. The          such example in Section 7.
ablation study ties up with the EMH, as we add ad-
ditional modalities, we note an increment in MAN-         Profitability We examine MAN-SF’s practical
SF’s ability for stock prediction. Two critical obser-    applicability through a profitability analysis on real-
vations from Table 2 are the substantial MCC gains        world stock data. From Table 3 and Figure 6, we
when using GAT over GCN and the contrast be-              note that MAN-SF achieves higher risk-adjusted
tween fusing text and prices via concatenation and        returns and an overall profit. MAN-SF outperforms
bilinear transformations. We discuss these next.          different baselines over the common testing period
                                                          of three months using the stocks data in the S&P
Impact of Bilinear Transformations Bilinear               500 index. These observations show the profitabil-
blending outperforms concatenation, and attention         ity of MAN-SF over models that do not capture
fusion variants, as seen in Table 2. We postulate         stock correlations (StockNet) and models that do
that the bilinear transformation can better learn the     not use the impact of textual data (HATS). We
interplay between the signals compared to other           potentially attribute these improvements to MAN-
variants. On examining Figure 4a, we observe that         SF’s ability to learn a more concentrated blend of
the bilinear layer blends highly non-linear relation-     text and price features as opposed to competitive

                                                       8422
Figure 5: Graph sample showing attention weights for stock correlations (top left); Stock price movement depicting
inter-stock relationships (bottom left); Tweets with hierarchical temporal attention weights (right)

models. We extend this analysis in the next section.       attention weights during the lookback window,
                                                           indicating a possible increase in Apple’s price.
7   Qualitative Analysis                                   MAN-SF uses hierarchical attention mechanisms
We conduct an extended analysis across two high-           over tweets and inter-stock correlations correctly.
risk scenarios, as shown in Figure 5, to study the ap-     Thereby likely predicting a rise in Apple’s stock
plicability of MAN-SF to investors in the stock mar-       price, similar to models such as StockNet. As op-
ket. The study is based on Apple’s (AAPL) trend            posed to these, models such as HATS forecast a
during 12th Nov - 18th Nov. Figure 5 shows some            continual decrease in Apple’s price, potentially due
of the tweets posted and AAPL’s relations with             to not factoring in social media data.
relevant stocks such as Alibaba (BABA), Google
(GOOG), and among others during that period.               8   Conclusion and Future Work

12th Nov to 16th Nov: Failure of StockNet and              We study stock movement prediction by using nat-
models that do not capture inter stock relations:          ural language, graph-based and numeric features.
From Figure 5, we see from the price movement              We propose MAN-SF, a neural model that jointly
that 12th to 16th November 2015 shows a decline in         learns temporally relevant signals from chaotic
Apple’s stock price. Here, we observe that Stock-          multimodal data spanning historical prices, tweets,
Net predicts a further drop in Apple’s price, and          and inter stock correlations in a hierarchical fashion.
similar models that use only price and text are un-        Extensive quantitative and qualitative experiments
able to predict the price rise for Apple on 17th           on real market data demonstrate MAN-SF’s appli-
November correctly. However, we discover that              cability for neural stock forecasting. We plan to
Apple shares a strong relationship with Alibaba            further use news articles, earnings calls, and other
and Google during that time, as indicated by the at-       data sources to capture market dynamics better. An-
tention weights. MAN-SF incorporates inter-stock           other interesting direction of future research is to
relations through graph attention to learn latent cor-     explore the cold start problem, where MAN-SF
relations between AAPL, BABA, and GOOG, as                 could be leveraged to predict stock movements for
shown by the graph snippet in Figure 5. MAN-SF             new stocks. Lastly, we would also like to extend
correctly predicts a rise in Apple’s price and makes       MAN-SF’s architecture to not be limited to model
a profit, unlike StockNet. We attribute this predic-       all stocks together (because of its GAT component)
tion to MAN-SF likely having a broader context by          to increase scalability to cross-market scenarios.
blending multimodal signals.
14th Nov to 18th Nov: Failure of HATS and                  References
models that do not leverage social media data:             Yaser S. Abu-Mostafa and Amir F. Atiya. 1996. In-
Despite Apple’s sharp fall on 18th November, we              troduction to financial forecasting. Applied Intelli-
see tweets with positive sentiment having higher             gence, 6(3):205–213.

                                                      8423
Klaus Adam, Albert Marcet, and Juan Pablo Nicoli.            Djork-Arné Clevert, Thomas Unterthiner, and Sepp
  2016. Stock market volatility and learning. The              Hochreiter. 2015. Fast and accurate deep network
  Journal of Finance, 71(1):33–82.                             learning by exponential linear units (elus). arXiv
                                                               preprint arXiv:1511.07289.
Leif B. G. Andersen. 2007. Efficient simulation of the
  heston stochastic volatility model. SSRN Electronic        Shumin Deng, Ningyu Zhang, Wen Zhang, Jiaoyan
  Journal.                                                     Chen, Jeff Z. Pan, and Huajun Chen. 2019.
                                                               Knowledge-driven stock trend prediction and expla-
Joseph H Anthony and K Ramesh. 1992. Associa-                  nation via temporal convolutional network. In Com-
   tion between accounting performance measures and            panion Proceedings of The 2019 World Wide Web
   stock prices. Journal of Accounting and Economics,          Conference, WWW ’19, page 678–685, New York,
  15(2-3):203–227.                                             NY, USA. Association for Computing Machinery.

Uma Devi B, Sundar D, and Alli P. 2013. An effective         Ilia D. Dichev and Vicki Wei Tang. 2006. Earnings
 time series analysis for stock trend prediction using          volatility and earnings predictability. SSRN Elec-
 ARIMA model for nifty midcap-50. International                 tronic Journal.
 Journal of Data Mining & Knowledge Management
 Process, 3(1):65–78.                                        F.X. Diebold and K. Yilmaz. 2014. On the network
                                                                topology of variance decompositions: Measuring
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-                the connectedness of financial firms. Journal of
  gio. 2014. Neural machine translation by jointly             Econometrics, 182(1):119–134. Cited By 416.
  learning to align and translate. arXiv preprint
  arXiv:1409.0473.                                           Guangyu Ding and Liangxi Qin. 2019. Study on the
                                                               prediction of stock price based on the associated net-
Fischer Black and Myron Scholes. 1973. The pricing             work model of lstm. International Journal of Ma-
   of options and corporate liabilities. Journal of Polit-     chine Learning and Cybernetics.
   ical Economy, 81(3):637–654.
                                                             Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan.
David C Blitz and Pim Van Vliet. 2007. The volatil-            2014. Using structured events to predict stock price
  ity effect. The Journal of Portfolio Management,             movement: An empirical investigation. In Proceed-
  34(1):102–113.                                               ings of the 2014 Conference on Empirical Methods
                                                               in Natural Language Processing (EMNLP), pages
J. Bollen, H. Mao, and X. Zeng. 2011. Twitter mood             1415–1425.
   predicts the stock market. Journal of Computational
   Science, 2(1):1–8. Cited By 2072.                         Edwin J Elton, Martin J Gruber, Stephen J Brown, and
                                                               William N Goetzmann. 2009. Modern portfolio the-
Tim Bollerslev. 1986. Generalized autoregressive con-          ory and investment analysis. John Wiley & Sons.
  ditional heteroskedasticity. Journal of Economet-
  rics, 31(3):307–327.                                       Claude B Erb, Campbell R Harvey, and Tadas E
                                                               Viskanta. 1994. Forecasting international equity cor-
Robert Goodell Brown. 2004. Smoothing, forecasting             relations. Financial analysts journal, 50(6):32–45.
  and prediction of discrete time series. Courier Cor-
  poration.                                                  Fuli Feng, Huimin Chen, Xiangnan He, Ji Ding,
                                                               Maosong Sun, and Tat-Seng Chua. 2019a. En-
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua,               hancing stock movement prediction with adversar-
  Nicole Limtiaco, Rhomni St John, Noah Constant,              ial training. In Proceedings of the Twenty-Eighth
  Mario Guajardo-Cespedes, Steve Yuan, Chris Tar,              International Joint Conference on Artificial Intel-
  et al. 2018. Universal sentence encoder for english.         ligence, IJCAI-19, pages 5843–5849. International
  In Proceedings of the 2018 Conference on Empirical           Joint Conferences on Artificial Intelligence Organi-
  Methods in Natural Language Processing: System               zation.
  Demonstrations, pages 169–174.
                                                             Fuli Feng, Xiangnan He, Xiang Wang, Cheng Luo,
Yingmei Chen, Zhongyu Wei, and Xuanjing Huang.                 Yiqun Liu, and Tat-Seng Chua. 2019b. Temporal
  2018. Incorporating corporation relationship via             relational ranking for stock prediction. ACM Trans.
  graph convolutional neural networks for stock price          Inf. Syst., 37(2).
  prediction. In Proceedings of the 27th ACM Inter-
  national Conference on Information and Knowledge           Gabriel Pui Cheong Fung, Jeffrey Xu Yu, and Wai Lam.
  Management, CIKM ’18, page 1655–1658, New                    2002. News sensitive stock trend prediction. In Ad-
  York, NY, USA. Association for Computing Machin-             vances in Knowledge Discovery and Data Mining,
  ery.                                                         pages 481–493. Springer Berlin Heidelberg.

Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bah-           C. Lee Giles, Steve Lawrence, and Ah Chung Tsoi.
  danau, and Yoshua Bengio. 2014. On the properties            2001. Noisy time series prediction using a recurrent
  of neural machine translation: Encoder-decoder ap-           neural network and grammatical inference. Machine
  proaches. arXiv preprint arXiv:1409.1259.                    Learning, 44(1/2):161–183.

                                                        8424
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Ri-         Xiaodong Li, Xiaodi Huang, Xiaotie Deng, and Shan-
   ley, Oriol Vinyals, and George E. Dahl. 2017. Neu-         feng Zhu. 2014. Enhancing quantitative intra-day
   ral message passing for quantum chemistry. In              stock return prediction by integrating both market
  Proceedings of the 34th International Conference            news and stock prices information. Neurocomput-
   on Machine Learning - Volume 70, ICML’17, page             ing, 142:228–238.
  1263–1272. JMLR.org.
                                                            Xiaodong Li, Haoran Xie, Ran Wang, Yi Cai, Jingjing
Alexandre Hocquard, Sunny Ng, and Nicolas Papa-               Cao, Feng Wang, Huaqing Min, and Xiaotie Deng.
  georgiou. 2013. A constant-volatility framework for         2016. Empirical analysis: stock market prediction
  managing tail risk. The Journal of Portfolio Man-           via extreme learning machine. Neural Computing
  agement, 39(2):28–40.                                       and Applications, 27(1):67–78.

Ehsan Hoseinzade, Saman Haratizadeh, and Arash              Burton G. Malkiel. 1989. Efficient market hypothe-
  Khoeini. 2019. U-cnnpred: A universal cnn-based             sis. In Finance, pages 127–134. Palgrave Macmillan
  predictor for stock markets.                                UK.

                                                            Daiki Matsunaga, Toyotaro Suzumura, and Toshihiro
Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and
                                                              Takahashi. 2019. Exploring graph neural networks
  Tie-Yan Liu. 2018. Listening to chaotic whispers:
                                                              for stock market predictions with rolling window
  A deep learning framework for news-oriented stock
                                                              analysis. ArXiv, abs/1909.10660.
  trend prediction. In Proceedings of the eleventh
  ACM international conference on web search and            Tomas Mikolov, Kai Chen, Greg Corrado, and Jef-
  data mining, pages 261–269.                                 frey Dean. 2013. Efficient estimation of word
                                                              representations in vector space. arXiv preprint
Monique Jeanblanc, Marc Yor, and Marc Chesney.                arXiv:1301.3781.
 2009. Mathematical methods for financial markets.
 Springer Science & Business Media.                         Robert Neal and Simon M Wheatley. 1998. Do mea-
                                                              sures of investor sentiment predict returns? Journal
Weiwei Jiang. 2020. Applications of deep learning in          of Financial and Quantitative Analysis, 33(4):523–
  stock market prediction: recent progress.                   547.

Armand Joulin, Édouard Grave, Piotr Bojanowski, and        Duc Huu Dat Nguyen, Loc Phuoc Tran, and
  Tomáš Mikolov. 2017. Bag of tricks for efficient text     Vu Nguyen. 2019. Predicting stock prices using
  classification. In Proceedings of the 15th Confer-          dynamic LSTM models. In Communications in
  ence of the European Chapter of the Association for         Computer and Information Science, pages 199–212.
  Computational Linguistics: Volume 2, Short Papers,          Springer International Publishing.
  pages 427–431.
                                                            Thi-Thu Nguyen and Seokhoon Yoon. 2019. A novel
Wei Du Jue Liu, Zhuocheng Lu. 2019. Combining en-             approach to short-term stock price movement pre-
  terprise knowledge graph and news sentiment analy-          diction using transfer learning. Applied Sciences,
  sis for stock price volatility prediction. Proceedings      9(22):4745.
  of the 52nd Hawaii International Conference on Sys-
  tem Sciences.                                             Thien Hai Nguyen and Kiyoaki Shirai. 2015. Topic
                                                              modeling based sentiment analysis on social media
                                                              for stock market prediction. In Proceedings of the
Raehyun Kim, Chan Ho So, Minbyul Jeong, Sanghoon
                                                              53rd Annual Meeting of the Association for Compu-
  Lee, Jinkyu Kim, and Jaewoo Kang. 2019. Hats: A
                                                              tational Linguistics and the 7th International Joint
  hierarchical graph attention network for stock move-
                                                              Conference on Natural Language Processing (Vol-
  ment prediction. arXiv preprint arXiv:1908.07999.
                                                              ume 1: Long Papers), pages 1354–1364.
Kazuhiro Kohara, Tsutomu Ishikawa, Yoshimi                  Mahla Nikou, Gholamreza Mansourfar, and Jamshid
  Fukuhara, and Yukihiro Nakamura. 1997. Stock               Bagherzadeh. 2019. Stock price prediction using
  price prediction using prior knowledge and neu-            deep learning algorithm and its comparison with ma-
  ral networks. Intelligent Systems in Accounting,           chine learning algorithms. Intelligent Systems in Ac-
  Finance & Management, 6(1):11–22.                          counting, Finance and Management, 26.
Thomas Fischer Christopher Krauss. 2018. Deep learn-        Jeffrey Pennington, Richard Socher, and Christopher
  ing with long short-term memory networks for finan-          Manning. 2014. GloVe: Global vectors for word
  cial market predictions. European Journal of Oper-           representation. In Proceedings of the 2014 Confer-
  ational Research.                                            ence on Empirical Methods in Natural Language
                                                               Processing (EMNLP), pages 1532–1543, Doha,
Hao Li, Yanyan Shen, and Yanmin Zhu. 2018. Stock               Qatar. Association for Computational Linguistics.
  price prediction using attention-based multi-input
  lstm. In ACML, volume 95 of Proceedings of Ma-            Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng,
  chine Learning Research, pages 454–469. PMLR.               Guofei Jiang, and Garrison Cottrell. 2017. A

                                                       8425
dual-stage attention-based recurrent neural net-        Babita Majhi Venkata Sasank Pagolu, Kamal Nayan
  work for time series prediction. arXiv preprint           Reddy ; Ganapati Panda. 2016. Sentiment analy-
  arXiv:1704.02971.                                         sis of twitter data for predicting stock market move-
                                                            ments. SCOPES.
Yu Qin and Yi Yang. 2019. What you say and how
  you say it matters: Predicting financial risk using     Yumo Xu and Shay B. Cohen. 2018. Stock move-
  verbal and vocal cues. In 57th Annual Meeting of          ment prediction from tweets and historical prices. In
  the Association for Computational Linguistics (ACL        Proceedings of the 56th Annual Meeting of the As-
  2019), page 390.                                          sociation for Computational Linguistics (Volume 1:
                                                            Long Papers), pages 1970–1979, Melbourne, Aus-
Robert Rich and Joseph Tracy. 2004. Uncertainty and         tralia. Association for Computational Linguistics.
  labor contract durations. Review of Economics and
                                                          Linyi Yang, Tin Lok James Ng, Barry Smyth, and Ri-
  Statistics, 86(1):270–287.
                                                            uhai Dong. 2020. Html: Hierarchical transformer-
                                                            based multi-task learning for volatility prediction.
Hsinchun Chen Robert P. Schumaker. 2019. Textual            In Proceedings of The Web Conference 2020, WWW
  analysis of stock market prediction using breaking        ’20, page 441–451, New York, NY, USA. Associa-
  financial news: The azfin text system. ACM Trans-         tion for Computing Machinery.
  actions on Information Systems.
                                                          Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He,
Ramit Sawhney, Shivam Agarwal, Arnav Wadhwa, and            Alex Smola, and Eduard Hovy. 2016. Hierarchi-
  Rajiv Ratn Shah. 2020a. Spatiotemporal hyper-             cal attention networks for document classification.
  graph convolution network for stock forecasting. In       In Proceedings of the 2016 conference of the North
  2020 IEEE International Conference on Data Min-           American chapter of the association for computa-
  ing (ICDM).                                               tional linguistics: human language technologies,
                                                            pages 1480–1489.
Ramit Sawhney, Puneet Mathur, Ayush Mangal, Piyush
  Khanna, Rajiv Shah, and Roger Zimmermann.               Zhou Yu, Jun Yu, Jianping Fan, and Dacheng Tao.
  2020b. Multimodal multi-task financial risk fore-         2017. Multi-modal factorized bilinear pooling with
  casting. In Proceedings of the 28th ACM Interna-          co-attention learning for visual question answering.
  tional Conference on Multimedia, MM ’20, New              In Proceedings of the IEEE international conference
  York, NY, USA. Association for Computing Machin-          on computer vision, pages 1821–1830.
  ery.
                                                          Z. Zhao, R. Rao, S. Tu, and J. Shi. 2017. Time-
Sreelekshmy Selvin, R Vinayakumar, EA Gopalakrish-          weighted lstm model with redefined labeling for
  nan, Vijay Krishna Menon, and KP Soman. 2017.             stock trend prediction. In 2017 IEEE 29th Inter-
  Stock price prediction using lstm, rnn and cnn-           national Conference on Tools with Artificial Intelli-
  sliding window model. In 2017 international con-          gence (ICTAI), pages 1210–1217.
  ference on advances in computing, communications
  and informatics (icacci), pages 1643–1647. IEEE.

William F Sharpe. 1994. The sharpe ratio. Journal of
  portfolio management, 21(1):49–58.

Matheus    José   Silva    de   Souza,       Danilo
 Guimarães Franco Ramos, Marina Garcia Pena,
 Vinicius Amorim Sobreiro, and Herbert Kimura.
 2018. Examination of the profitability of technical
 analysis based on moving average strategies in brics.
 Financial Innovation, 4(1):3.

Paul C. Tetlock. 2007. Giving content to investor sen-
  timent: The role of media in the stock market. Jour-
  nal of Finance.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
  Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz
  Kaiser, and Illia Polosukhin. 2017. Attention is all
  you need.

Petar Veličković, Guillem Cucurull, Arantxa Casanova,
  Adriana Romero, Pietro Lio, and Yoshua Bengio.
  2017. Graph attention networks. arXiv preprint
  arXiv:1710.10903.

                                                     8426
You can also read