Swedish finance Twitter accounts short term impact on Swedish small cap companies
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2021 Swedish finance Twitter accounts short term impact on Swedish small cap companies John Janér and Noah Rahimzadagan KTH ROYAL INSTITUTE OF TECHNOLOGY ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Abstract
Over the last five years, the amount of retail investors has increased immensely.
Trying to make informed decisions, many of the more active investors look
to social media as a source of information. In early 2021, the eyes of the
world focused on retail investors as Gamestop, a video game retailing company,
experienced an immense price surge over the course of a few weeks on the
stock market. This event, among others, lead the SEC (Securities and Exchange
Commission) to open up a discussion about the impact of social media on the
stock market. It seemed individual social media accounts were able to increase the
volatility in a number of different stocks. This study investigates the immediate
impact of larger Swedish Twitter accounts on the volatility and price of Swedish
smallcap companies. Sentiment analysis and data modeling in the Python
programming language were used in order to compare volatility and price changes
before and after tweets of different sentiments were made about the companies.
Our study was unable to find any correlation between an immediate change in
price or an immediate increase in volatility and tweets made, suggesting Swedish
finance Twitter accounts have little to no immediate impact on Swedish smallcap
companies.
Keywords
Human behavior, Financial markets, Sentiment analysis, Twitter
iSammanfattning
Under de senaste fem åren har antalet privata investerare ökat markant. När
privata investerare försöker göra välgrundade investeringsbeslut brukar de ofta
använda inlägg på sociala medier som ledstjärna. Tidigt på år 2021 vändes
blickarna mot privata investerare när priset på spelåterförsäljningsföretaget
Gamestops aktier ökat med flera hundratals procent under bara loppet av några
få veckor. Denna prisökning fick SEC (Securities and Exchange Commission i
USA) att inleda en diskussion om inverkan av sociala medier på aktiehandeln.
Mycket påvisade att individuella konton på sociala medier hade förmågan
att öka volatilitet av aktiepriser för vissa bolag. Det här forskningsprojektet
ämnar att undersöka den omedelbara inverkan av svenska twitterkonton på
pris och volatilitet av pris av svenska småföretags aktier. Sentimentanalys
och datamodellering gjordes i programmeringsspråket Python för att jämföra
volatilitet och prisändringar innan och efter tweets av olika sentiment gjordes
om de olika företagen. Studien lyckades inte visa på korrelation mellan en
omedelbar ändring i pris eller omedelbar ökning i volatilitet och gjorda tweets,
vilket tyder på att twitterkonton har inget eller väldigt lite inflytande på svenska
småföretag.
Nyckelord
Mänskligt beteende, Finansiella marknader , Sentimentanalys, Twitter
Authors
John Janér and Noah Rahimzadagan
Information and Communication Technology
KTH Royal Institute of Technology
Place for Project
Stockholm, Sweden
iiExaminer
Pawel Herman
Stockholm, Sweden
KTH Royal Institute of Technology
Supervisor
Chris Peters
Stockholm, Sweden
KTH Royal Institute of Technology
iiiContents
1 Introduction 1
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Scope of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Theoretical Background 4
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Market data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Twitter data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Economic theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Previous research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Methods 13
3.1 Determining eligible Twitter accounts . . . . . . . . . . . . . . . . . 14
3.2 Scraping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Natural language processing of the outputted CSV file . . . . . . . . 14
3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Result 17
4.1 Positive sentiment tweets . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Negative sentiment tweets . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Overall volatility for both sentiment tweets . . . . . . . . . . . . . . 19
5 Discussion 20
5.1 RQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2 RQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 RQ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 Conclusion 24
References 25
iv1 Introduction
With the rise of social media, the ability to share feelings, opinions, and gossip
globally is no longer monopolized by traditional institutions such as newspapers
and TV networks. Most individuals are able to create an account at any given
social media platform and share thoughts and statements with the rest of the
world. On the 27th of January of 2021, Gamestop, a video game retailing company
experienced a price surge on the stock market. The price of one Gamestop stock
reached as much as 350 USD, an immense increase considering one share’s value
being only a small fraction of the price two weeks prior to this unusual increase
in price [15]. Price fluctuations in the stock market can rarely be explained
with absolute certainty, this also applies in the case of the Gamestop price surge
of January 2021. Although, many speculate that the increase of price can be
to some extent accredited to the cyber feud between institutional investors and
retail investors that brewed up on the social media platform Reddit before the
price surge, which drove many retail investors to buy shares in Gamestop. The
speculations went as far as the American Congress having Keith Gill, one of many
advocates for buying GME stocks on Reddit, brought in for a hearing in which he
had to testify and answer questions regarding events surrounding Gamestop on
the stock market [7]. Elon Musk, an entrepreneur and founder of Tesla, Space
X, and PayPal has been reported tweeting about companies prior to price surges
[18]. While the tweets cannot be entirely accounted for the increase in price, many
believe that the tweets had some significance.
1.1 Problem statement
Since there is clear evidence that social media, to some extent, influences investors
[16], the question of whether there is a correlation between information on social
media and stock prices is raised. Therefore the aim of this project is to determine
high profile Swedish finance Twitter account’s shortterm impact on the stock
price of Swedish, public smallcap companies. Specifically:
• RQ1 In what way are larger Swedish finance Twitter accounts able to move
the price of a stock?
1Looking at smallcap companies could finance Twitter accounts in general above
a certain follower threshold move the price of a stock in an immediate way.
Using sentiment analysis and data modeling in Python, can a correlation be found
between certain tweets and market movements?
• RQ2 Are tweets from these finance accounts contributing to increased risk
in smallcap companies?
Using the Parkinson volatility formula to evaluate volatility before and after the
tweets, do these tweets have an immediate impact on the volatility of the stock,
contributing to a riskier investment in the short term?
• RQ3 Does the nature of tweet sentiment affect the outcome of the change?
The outcome of change is in other words described as the nature of the price
change, which means if a certain stock increases or decreases in price. An increase
or decrease in volatility are also two different outcomes of change. Studies have
suggested that positive information is perceived as more credible than negative
information [3]. Will positive sentiment tweets have a higher possibility to move
the stock price or increase the volatility?
1.2 Scope of the study
This research project will encompass sentiment analysis of the content of tweets
made by certain Swedish finance Twitter accounts in order to investigate a
correlation between the content of tweets and the state of the stock market. The
tweets that will be analyzed are scraped with a web scraper, a tool that fetches
data from websites. Tweets are chosen only if they happen to mention a certain
Swedish smallcap company. Tweets will be analyzed with natural language
processing and later their effect on the stock prices of the companies mentioned
will be investigated.
1.3 Thesis outline
In the following chapter, the theoretical background of the project will be
presented, in which theory about the subject sentiment analysis will be provided.
The third chapter will cover the methodology of this project, and go into detail
2specifically on how the scraping of tweets was done and how the tweets are
evaluated. Chapter 4 will entail the project’s results, the findings, and namely the
data that was gathered. Chapter 5 will discuss the findings and reflect on previous
work. In chapter 6, a conclusion will be presented.
32 Theoretical Background
This section intends to describe the data sets used. Furthermore, the methods
applied in order to collect data sets will be described. Following directly below
will be definitions of terminology used throughout the paper. An introduction to
previous research related to the subject will also be given.
2.1 Definitions
2.1.1 Small Cap
A company with a market capitalization under one billion USD is considered
a ”smallcap” company [1]. Institutional investors are generally not allowed to
hold large stakes in smallcap companies making them more accessible for retail
investors.
2.1.2 Retail Investor
Retail investors are nonprofessional investors [9]. Also known as individual
investors.
2.1.3 Volatility
Volatility is a measurement for the range of possible returns of a security. Higher
volatility usually indicates a riskier security [4].
2.1.4 Volume of Trade
Volume of Trade or just ”volume” refers to the total quantity of shares being traded
of a specific security, in any given time frame. [23]
2.2 Market data
A data set was received from Nordic Growth Market, a Swedish stock exchange
housing companies with small market capitalization. The data was compiled
in a JSON file containing ticker, name, price, volume, and date for every trade
made on the exchange dating back five years. Price data is of importance since
4this research project aims to investigate the correlation between tweets and their
content related to certain Swedish smallcap companies and how the stock prices
of said companies change in relation to these tweets.
2.2.1 Market data format
Figure 2.1: Example JSON object from market data set.
2.3 Twitter data
The sentiment analysis is performed on tweets made by certain Swedish finance
accounts. Web scraping was used in this work in order to access the data in
the HTML document that Twitter consists of. The tweets, in other words, were
collected using web scraping tools and stored in a CSV file for processing.
2.3.1 Twitter
Twitter is an American social network and microblog platform enabling users to
post and interact with messages called ”tweets.” The platform has more than 180
million daily active users [21] and an average of 330 million active monthly users
in 2019 [22]. While registered users can like, comment, and retweet any user can
view and read tweets, making Twitter a powerful tool to spread information or
opinions.
Twitter is used extensively by politicians and other public figures because of its
accessibility [6]. Due to a large number of users, Twitter contains extensive
5amounts of noise. Filtering through this noise is, for a human, nearly impossible.
However, individuals with a large enough following on the platform can pierce
through the noise using their followers as vessels for likes and retweets. This
allows certain people to spread information, true or false, to large groups within
our society.
2.3.2 Document Object Model
Ever since the birth of websites, the most fundamental part of a website has been
the HTML document, which models how the different components of a website
should be arranged. Furthermore, the HTML document contains the data that is
shown on the website.
Document Object Model or DOM is a common way to define the logical structure
of an HTML document [12]. A DOM structure is easily accessed and manipulated
because of its forestlike structure. Furthermore, the DOM is modeled to work
with any programming language. The DOM is of importance in this research
project since it provides a way to access tweets that will undergo sentiment
analysis.
2.3.3 Web scraping
In order to obtain tweets for this research project, they had to be extracted from
Twitter.
Many websites such as Twitter have the majority of their data available for
everyone in the form of a feed, that can be accessed with any web browser.
Searching through this feed in order to gather data that could stretch over a
specific time frame is not only cumbersome but also timeconsuming. Web
scraping is a method used to collect information that is usually on display for
human consumption on a website. Also known as a crawler, a web scraper
commonly navigates the underlying HTML document of a website in order to find
specific, often predecided, strings of texts across many different pages or profiles.
This data is then collected and compiled into any format desired by the user. One
type of web scraping is called DOM scraping. Since websites implement the DOM
model through the underlying HTML document, the data of a website, Twitter
6included, will be in the form of a tree or forestlike structure. This in turn with the
help of any programming language can easily be accessed. When DOM scraping
was performed in this research project, the Python library Twint was used.
Figure 2.2: Visualisation of DOM (Document Object Model) tree structure.
2.4 Natural Language Processing
Natural Language Processing is the process of interpreting human language with
the help of computers [20]. In this work’s case, the language that is interpreted
is the content of the web scraped tweets. When performing natural language
processing in other words, sentiment analysis is done. Determining a tweet’s
sentiment entails classifying its content, which is text, with a label, in this research
project, the two labels are either negative or positive. Furthermore, Natural
Language Processing consists of multiple steps that are thoroughly explained in
the following sections.
2.4.1 Tokenization
Computers are unable to interpret human language as is. The breaking down of
text is done during the tokenization process [25]. Breaking down a text string into
tokens as a first step when performing natural language processing is a common
practice since it is easier to train a computer to classify separate words than a
7group of words. A text string is split into multiple tokens that are inserted in a list.
The following text string: ”Well yes. I would love a cinnamon bun!”
Would generate the tokens:
[well, yes, ., I, would, love, a, cinnamon, bun, !,.]
In the above example, the sentence is split on all white spaces and then turned
into a list of tokens. This kind of tokenization is what was used in this work. The
token arrays are later evaluated word for word by the classifier.
2.4.2 Normalization of tokens
The act of normalization in the context of natural language processing is to convert
multiple words that have the same meaning but different forms, into the same
form. For instance, the words ”sing”, ”sang” and ”sung” all have the same meaning
but come in different forms. It is timeconsuming and unnecessary to train
a classifier the same words but in different forms, therefore, normalization is
a common practice in natural language processing. Furthermore, stop words,
words such as ”and” and punctuations are removed. Since stop words convey no
meaning and only serve to make reading texts easier for human readers, it is best
to remove these words before performing the sentiment analysis.
2.4.3 Naïve Bayes Message Coding
A common practice when classifying text is the Naïve Bayes Method. As the
name suggests the method is naïve in the sense that it classifies word streams
independent of what other words are in the same text stream. This is easily
demonstrated with an example. The text stream ”Dear friend” and ”Friend
dear” are assigned the same classification score. Naturally, there are more
sophisticated classifying methods, although, Naïve Bayes has been proven to
perform exceptionally well [17]. The name Bayes stems from the mathematician
Thomas Bayes and his mathematical formula, Bayes’ Theorem, a formula that
determines the probability of one event given a certain condition. Bayes’ Theorem
is depicted on the following page.
8P (B | A)P (A)
P (A | B) = . (1)
P (B)
P (A | B) is the conditional probability of event A occurring with the condition of
B occurring.
P (B | A) is the conditional probability of event B occurring with the condition of
A occurring.
P (A) and P (B) are the probability of A or B occurring. Naïve Bayes is derived from
the above formula and more specifically, in this case, it is used for the purpose
of assigning a score for each word in a stream of words. The score is a unitless
number that represents how fitting the stream of words is for a certain class. Naïve
Bayes Formula can be broken down into the following parts.
A priori probability: P (A), is the probability of an event occurring without any
other information given. The posteriori probability P (A | T ) is the probability
of event A occurring given T. Just as the Latin words ”priori” and ”posteriori”
suggest, they refer to the probabilities. Priori is initially known, and the posteriori
is known only after having used the Naïve Bayes formula. The posteriori
probability is also known as the score of the tweet for a certain class. The last part
of Naive Bayes is all the probabilities of a certain attribute Tn given the condition A.
Tn is any word that is in the stream and A is one of the classes. These probabilities
are also called likelihoods and can be represented with a sum of multiplication as
Qn
seen below. i=1 P (Ti | A).
The equation for Naïve Bayes can be seen below.
P (T1 | A)P (T2 | A)...P (Tn | A)P (A)
P (A | T ) = . (2)
P (T )
Simply put, the above formula is executed twice for each tweet and the class that
generates the highest posteriori probability with the tweet is assigned to the tweet.
It is executed twice for each tweet in this research project’s case since two classes
are used. The classes in this case are either negative or positive.
92.5 Economic theory
Twitter, in this study, is viewed as a stream of information in which retail investors
seek new potential trades. Enabling it to have the potential to influence price
movements, especially for the stock of smaller companies. This section aims to
briefly explain the widely accepted Efficient Market Theory in which this study
has its basis.
2.5.1 Efficient Market Theory
According to the efficient market hypothesis (EMH), a widely accepted theory a
generation ago, the price of and potential gain in any security or stock is dependent
on the availability of information to all participants. In a fully efficient market
a certain set of information, α, would not impact the market price of a stock if
revealed to all participants [13].
Figure 2.3: Graph showing company HSTK B releasing a sales report on December
1st 2020. The price rose almost 57% on the day. Since the company issued a report
this observation was omitted from this study’s results.
This is based on the notion that information travels quickly and is subsequently
incorporated into the market price without delay [14]. Figure 2.1 shows an
10example of how the market quickly reacts to new information.
However, due to discrepancies between EMH and measured volatility in the
market many have questioned EMH and also questioned how efficient our
markets actually are [19].
2.5.2 Algorithmic trading
Algorithmic trades are transactions in the stock market made by computer.
In the United States, algorithmic trading makes up around 50% of market
liquidity. These computerexecuted trades can also sometimes lead to unexpected
movements in stocks. The algorithms are often proprietary and identifying
algorithmic trading is almost impossible in most cases [10].
Although this study does not involve algorithmic trading, its undetectable effects
might limit findings in the study.
2.6 Previous research
In this section, we introduce studies that have evaluated the possibility of
prediction market movements using Twitter, as well as the impact of CEO’s
tweeting on their company’s stock performance. Methods used in these studies
have been replicated and applied in this study, more specifically, sentiment
analysis and evaluation of changes in stock pricing.
2.6.1 Twitter mood predicts the stock market
In J. Bollen and H. Mao’s study conducted in 2010, they used sentiment analysis
on Twitter to determine the broader ”mood” of the general public at a given point
[2]. This was later combined with machine learning algorithms to predict the
value movement of the Dow Jones Industrial Average (DJIA). They found an
accuracy of 87.6% in predicting the daily up and down changes of the closing
values of the DJIA as well as a Mean Average Percentage Error reduction by more
than 6%.
112.6.2 How Social Media usage by managers affects corporate value: The
case of Elon Musk
M. Corte investigates highprofile CEOs’ social media usage and its impact on their
company’s stock prices. This master thesis focuses primarily on Elon Musk, the
CEO of Tesla [5]. Using sentiment analysis on Musk’s tweets and comparing it to
the movements of the Tesla stock Corte evaluated 188 tweets made by Musk in the
first quarter of 2020. When evaluating Teslarelated tweets, his models resulted
in a pvalue of 0.08 which is not statistically significant. However, the same model
used on none Teslarelated tweets received a much higher pvalue, a result Corte
was unable to explain. Further evidence of the stock price moving several percent
minutes after Elon Musk’s tweets made Corte believe that a statistically significant
result could be obtained using more advanced models.
2.6.3 Stock Price Forecasting via Sentiment Analysis on Twitter
The conference paper titled ”Stock Price Forecasting via Sentiment Analysis on
Twitter” conducted by J. Kordonis, S. Symeonidis, and A. Arampatzis investigates
stock market predictions using Twitter sentiment analysis [11]. In their study, they
analyze tweets relating to 16 of the most popular technology stocks on the Nasdaq
stock exchange. They then used machine learning (Support Vector Machine) to
predict the movement and daily closing prices of the stock market based on the
daily Twitter sentiment of the stock. Conclusively, they achieved an accuracy of
87% in predicting the movement of the stock and averaged a 1.669% error margin
in predicting the closing price on 23/6 2016.
123 Methods
Determining eligible Twitter accounts whose tweets in turn can be used as
data sets for this research project is the first part of this work. When Twitter
accounts have been determined, the tweets that mention certain Swedish small
cap companies of those accounts will be scraped with the help of a scraping
library in the Python programming language called Twint. Twint is a web scraper,
a piece of software that fetches data that is available on a website. In this
research project, the website is Twitter and the data consists of tweets. The tweets
will be outputted in a CSV file, CSV is a format for displaying data in tabular
format. Natural language processing is later performed on the CSV file in order
to interpret the sentiment of the tweets gathered. Natural language processing is
most easily described as a computer’s way of interpreting human language, in the
natural language processing part, the tweets are assigned sentiment scores. Price
information for the stocks was retrieved from the Nordic Growth Market data set.
The stocks were connected to their respective tweets and evaluated in Python to
calculate price changes as well as volatility using the Parkinson volatility formula.
Lastly, ttests were conducted to determine if the results carried any statistical
significance.
Figure 3.1: Visualisation of method workflow. The method could be divided in to
three parts, building the twitter data set using Twint and it’s sentiment analysis
by the help of NLTK, secondly retrieving all needed market information from the
NGM data set using Python. Lastly, connecting the two data sets and conducting
the evaluation.
133.1 Determining eligible Twitter accounts
The Swedish ”Finance Twitter” community is a relatively small group with a set
group larger accounts having significant follower engagement. This engagement,
likes, retweets, etc, enables wide reach. When selecting these accounts we wanted
an unbiased approach. A threshold was selected of 500 followers, and the
accounts selected needed to have a sole focus on financial markets and stocks.
A list of one hundred eligible accounts was compiled and thirty of these accounts
were selected at random. This to ensure no prior bias in the accounts’ possible
ability to affect the market.
3.2 Scraping
An easy way to retrieve data from Twitter is to use the Twitter API that Twitter Inc
has designed [24]. Unfortunately, the Twitter API can only fetch oneweekold
tweets, that is why a web scraper was used when fetching tweets for this research
project. Scraping or more commonly known as web scraping is the process of
extracting data from a website. The website that was scraped in this case was
Twitter.com. The scraping in this work was done through Twint. Twint is a Python
library that allows users to access all tweets in a specified time period with certain
filtering conditions. In this work’s case, one of the filtering condition was whether
the tweet mentioned any of the companies listed on the dataset provided to us by
NGM (Nordic Growth Market), the second filtering condition was to omit Tweets
that were retweets, the third condition was to omit replies, namely, tweets that
are part of a conversation, the motivation behind the filtering conditions is to
only fetch standalone tweets that mention Swedish smallcap companies. The
tweets were later outputted as rows in a CSV file. As mentioned in chapter 2,
DOM scraping was used when performing web scraping in this project.
3.3 Natural language processing of the outputted CSV file
The Python library NLTK was used when performing natural language processing
on the gathered tweets. NLTK has the ability to perform natural language
processing and therefore has the ability to perform Naive Bayes Coding with an
inbuilt classifier. In this work, the classifier was trained with 750 positive tweets
14and 250 negative tweets. The stream of words in this case is a single tweet, and the
labels are either positive or negative. A python script was run in order to assign
every tweet a class, either positive or negative.
3.4 Evaluation
Before the evaluation began, the dates of the tweets were checked to make sure
no other information was made public on the day of the tweet. This included any
information distributed by the companies themselves, such as earnings reports,
order announcements, and general news as well as any thirdparty institutional
news involving the companies.
A Python program was built to extract all necessary information from the market
data file received from Nordic Growth Market, for the day of the tweet as well as
nine days prior for all tweets. This tenday period is equal to two weeks of trading
days. The information used was a date, close price of the stock, as well the highest
and lowest trade price of the day.
The program then calculated the change in price for all days, in percent, using the
simple equation:
P ricei
Changei = ( − 1) ∗ 100, i = date (3)
P ricei−1
To calculate the volatility the Parkinson volatility formula was implemented:
v
u
u 1 X T
ht 2
V olatility = t ln ( ) , T = timeperiod (4)
4T ln 2 t=1 lt
The Parkinson volatility formula was used due to its incorporation of the daily
high price, ht , and the daily low price, lt , instead of using only the closing price.
This allows for the detection of price swings within a day of trading.
Volatility was calculated for two different time periods, the nine days prior to the
tweet and the day of the tweet.
The evaluation involved comparing the sentiment of the tweets to the outcome
15of its corresponding trading day. As well as the volatility compared to the nine
prior trading days. For a positive sentiment tweet, the expected outcome was an
increase in price, the reverse was expected for negative tweets. An increase in
volatility was expected in both scenarios. A comparison was also made between
the outcome of the different sentiments.
Ttests for all different cases were conducted to determine if there was any
statistical significance found between tweets and their impact on the market.
164 Result
After the set of tweets had been cleaned and process 85 observations remained.
The observations were then grouped based on sentiment type and evaluated in
accordance with the process outlined in the prior chapter.
4.1 Positive sentiment tweets
Figure 4.1: The amount of observations, in percent, which had a positive or
negative change in price on the day of the tweet. This result only includes positive
sentiment tweets.
Figure 4.2: The amount of observations, in percent, which had a increase or
decrease in volatility on the day of the tweet, compared to the volatility of the
stock the nine prior days. This result only includes positive sentiment tweets.
On 56% of the days when a positive sentiment tweet was posted, the price of
the mentioned stocks increased. However, no statistical significance was found
regarding the impact of the tweets on the stock price on the day of the tweet.
Examining the impact of positive tweets on the different stocks’ volatility, the
17volatility only increased in 36% of the observations. With a Pvalue of 0.08, no
statistical significance was found.
4.2 Negative sentiment tweets
Figure 4.3: The amount of observations, in percent, which had a positive or
negative change in price on the day of the tweet. This result only includes negative
sentiment tweets.
Figure 4.4: The amount of observations, in percent, which had a increase or
decrease in volatility on the day of the tweet, compared to the volatility of the
stock the nine prior days. This result only includes negative sentiment tweets.
Approximately 38% of negative tweets resulted in a negative impact on their
respective stocks price on the day of the tweet. No statistical significance was
found between the tweet sentiment and its impact on the stock market.
The volatility of the respective stocks increased in approximately 62% of the
observations, in compression to the volatility of the stocks nine days prior.
184.3 Overall volatility for both sentiment tweets
Figure 4.5: The amount of observations, in percent, which had a increase or
decrease in volatility on the day of the tweet, compared to the volatility of the
stock the nine prior days.. The result includes both negative and positive tweets.
For all observations, the volatility of the observed stocks decreased in around 45%
of the cases. With a Pvalue of 0.326, no statistical significance was found, for the
tweet’s impact on the respective stock’s volatility.
195 Discussion
5.1 RQ1
When evaluating the impact of the tweets on price, no statistically significant
correlation was found for either positive or negative sentiment tweets, as shown
in Figures 4.1 and 4.3. These results may depend on many different parameters,
such as investors not acting particularly fast to new tweets from accounts they are
following or investors using these Twitter accounts as inspiration for their own
research rather than direct investment recommendations. In M.Corte’s study on
the impact of Tesla CEO Elon Musk’s Twitter usage he strongly suggests, although
without showing statistical significance, that his tweets do in fact move the price
of Tesla stock, section 2.6.2 [5]. This is most likely due to the fact that Musk
is the acting CEO of the company and is, therefore, the most knowledgeable
person when it comes to Tesla and its business. The Twitter accounts used in this
study are not, as far as their profiles say, in an active role in the companies they
discuss, basing their knowledge on information already available to the general
public. Another key factor is the size of the accounts, with Musk’s following being
in the tens of millions while the Swedish Twitter accounts usually have below
20 thousand. Furthermore, even though this study focuses on small companies
that are more reactive to market or trade volume changes, the trade volumes
needed to significantly move the price seem larger than the volumes possibly
generated by a tweet. As described in the theoretical background, section 2.4.2,
today’s markets are dominated by algorithmic trading which might counteract
any potential larger change in price for these companies. Since these algorithms
are often proprietary ”black boxes”, it is difficult for noninsiders to determine
how and where these algorithms are working. When receiving our market data
set from Nordic Growth Market, our contact said he was positive one would be
able to find a correlation between Twitter and its impact on the companies their
market houses. Since this study was conducted in a general manner, with no
preconceived notion of which accounts or stocks might have a greater chance
of generating a favorable outcome, interesting further research might include
investigating specific companies highlighted by NGM themselves.
205.2 RQ2
The results closest to generating a statistically significant result was the change
in volatility for positive sentiment tweets, as seen in figure 4.2. However, the
results were the opposite of the expected outcome, with lowered volatility on the
day of the tweets. This might be due to some inherent limitations for the generally
accepted way in which volatility is calculated. The volatility formula used in the
study uses the highest and lowest price traded on any given day. This allows for
detecting large changes within the day of trading, instead of relying solely on the
closing price. However, what follows is for example that a day in which a stock
steadily increases 5% from its opening price without going negative during the
day would generate lower volatility than a day where the open and close prices are
the same but the price at some point oscillates between negative 3% and positive
3%. Taking this into account one reason for the results seen in figure 4.2 might be
that positive sentiment tweets reduce periods of negative price movements and
therefore reducing the total measured volatility of the stock. Another possible
reason could be that a tweet is a reaction to the previous days’ price changes
in a certain stock. A tweet could for example be posted in reaction to a certain
stock’s recent decrease in price, implying the company is now undervalued. A
common theme in the previous research papers used in this study is the difficulty
of reaching a statistically significant result. The modern financial markets are
enormously complex with many moving parts and participants. Determining
how the market will move and why it moves with absolute certainty is in most
cases impossible. However, certain trends and suggested correlations can be
found, which are often used as a basis for many investment strategies. With this
said, even though no statistical significance was found in this case, completely
discarding the reduced volatility result in figure 4.2 is unnecessary.
5.3 RQ3
Comparing the changes in the price of the positive sentiment tweets to the results
of the negative, no significant difference was found. However, the difference
in volatility between positive and negative tweets, although not statistically
21significant, is noticeable. As studies have suggested a bias in credibility towards
positive information compared to negative information [3], these results might
suggest that investors are more likely to act on a tweet containing a positive
sentiment. Therefore reducing negative swings in pricing, as mentioned in the
paragraph above. Bollen et al. concluded in the research paper titled ”Twitter
mood predicts the stock market” [2], an 86,7 % accuracy when taking public mood
into account from tweets when predicting the value of DJIA. Bollen et al., on the
other hand, used a data set of tweets that accounted for approximately 10 million
tweets from 2.7 million users, which could from a quantitative point of view be
more accurate than this work’s data set. The DJIA value is a stock index that
reflects how well the thirty largest companies in the United States perform on the
stock market [8].
5.4 Limitations
This research project did not intend to forecast stock prices and fluctuation.
Nor does this study evaluate the overtime performance of stocks mentioned by
individuals on Twitter. It is rather a study that investigates human behavior
and the impact of the evergrowing ubiquity of social media on retail investor’s
immediate activity on the stock market.
5.5 Future Work
This study has suggested that larger Swedish Twitter accounts, in general, do
not move the prices nor the volatility of certain stocks. However, since other
studies have implied that larger Twitter accounts of people such as Elon Musk can
impact the movements of certain securities interesting future work might include
determining the size of following needed to impact the stock prices of companies
mentioned in tweets. Furthermore, the study only focuses on the immediate
impact of the tweets. Many of the accounts we used in this study tweeted about the
same company multiple times over a longer period of time. Therefore future work
might include comparing the performance of stocks popular at Twitter compared
to companies receiving less or no exposure from on Twitter. Evaluating whether or
not long periods of positive reinforcement on Twitter lead to greater performance
22in smallcap corporations.
236 Conclusion
The results suggest that there is no correlation between the movement of small
cap stocks and tweets from larger finance Twitter accounts mentioning the
companies. Although no statistical significance was found, it could be noted
that positive tweets may have an immediate effect on reduced volatility in the
mentioned stock. Furthermore, the type of sentiment, whether positive or
negative, seem to have little effect on the impact of the tweets on the price of a
stock. In conclusion, this study was unable to detect any direct impact on the
stocks mentioned by the Swedish finance Twitter accounts used in the study.
However, due to the general nature of this study with regards to the selection
of Twitter accounts and companies, further studies need to be conducted to
determine individual Twitter accounts’ possibility to impact the market.
24References
[1] Barone, Adam. “Small Cap”. In: Investopedia (2020). URL: https://www.
investopedia.com/terms/s/small-cap.asp.
[2] Bollen, Johan, Mao, Huina, and Zeng, Xiaojun. “Twitter mood predicts the
stock market”. In: Journal of computational science 2.1 (2011), pp. 1–8.
[3] Callison, Coy. “Do PR practitioners have a PR problem?: The effect
of associating a source with public relations and clientnegative news
on audience perception of credibility”. In: Journal of Public Relations
Research 13.3 (2001), pp. 219–234.
[4] Chen, James. “Volatility”. In: Investopedia (2021). URL: https : / / www .
investopedia.com/terms/v/volatility.asp.
[5] Corte, Miguel Alexandre Barbeira. “How social media usage by managers
affects corporate value: the case of Elon Musk”. PhD thesis. 2020.
[6] Duncombe, Constance. “The politics of Twitter: emotions and the power of
social media”. In: International Political Sociology 13.4 (2019), pp. 409–
429.
[7] Fitzgerald, Maggie. “Roaring Kitty’ Keith Gill defends GameStop posts, says
he is as bullish as ever on the stock ”. In: CNBC (2021).
[8] Ganti, Akhilesh. “Dow Jones Industrial Average (DJIA)”. In: Investopedia
(2021). URL: https://www.investopedia.com/terms/d/djia.asp.
[9] Hayes, Adam. “Retail Investor”. In: Investopedia (2021). URL: https://
www.investopedia.com/terms/r/retailinvestor.asp.
[10] Hendershott, Terrence, Riordan, Ryan, et al. “Algorithmic trading and
information”. In: Manuscript, University of California, Berkeley (2009).
[11] Kordonis, John, Symeonidis, Symeon, and Arampatzis, Avi. “Stock price
forecasting via sentiment analysis on twitter”. In: Proceedings of the 20th
PanHellenic Conference on Informatics. 2016, pp. 1–6.
[12] Le Hégaret Lauren W, Jonathan R. “What is the Document Object Model?”
In: W3c (2000).
25[13] Malkiel, Burton G. “Efficient market hypothesis”. In: Finance. Springer,
1989, pp. 127–134.
[14] Malkiel, Burton G. “The efficient market hypothesis and its critics”. In:
Journal of economic perspectives 17.1 (2003), pp. 59–82.
[15] Ossinger, Joanna. “GameStop’s Volatile Rally Smashes Wall Street Price
Targets”. In: Bloomberg (2021).
[16] PineiroChousa, Juan, VizcainoGonzalez, Marcos, and PerezPico, Ada
Maria. “Influence of social media over the stock market”. In: Psychology
& Marketing 34.1 (2017), pp. 101–108.
[17] Rish, Irina et al. “An empirical study of the naive Bayes classifier”. In: IJCAI
2001 workshop on empirical methods in artificial intelligence. Vol. 3. 22.
2001, pp. 41–46.
[18] Shead, Sam. “Elon Musk’s tweets are moving markets — and some investors
are worried”. In: CNBC (2021).
[19] Shiller, Robert J. “From efficient markets theory to behavioral finance”. In:
Journal of economic perspectives 17.1 (2003), pp. 83–104.
[20] Sun, Shiliang, Luo, Chen, and Chen, Junyu. “A review of natural language
processing techniques for opinion mining systems”. In: Information fusion
36 (2017), pp. 10–25.
[21] Tankovska. “Leading countries based on number of Twitter users as of
January 2021)”. In: Statista (2021). URL: https://www.statista.com/
statistics/242606/number-of-active-twitter-users-in-selected-
countries/.
[22] Tankovska. “Number of monthly active Twitter users worldwide from 1st
quarter 2010 to 1st quarter 2019”. In: Statista (2021). URL: https : / /
www.statista.com/statistics/282087/number- of- monthly- active-
twitter-users/.
[23] Twin, Alexandra. “Volume of Trade”. In: Investopedia (2021). URL: https:
//www.investopedia.com/terms/v/volumeoftrade.asp.
[24] “Twitter API”. In: Twitter (2021). URL: https : / / developer . twitter .
com/en/docs/twitter-api.
26[25] Webster, Jonathan J and Kit, Chunyu. “Tokenization as the initial phase in
NLP”. In: COLING 1992 Volume 4: The 15th International Conference on
Computational Linguistics. 1992.
27TRITA-EECS-EX-2021:443
www.kth.seYou can also read