Preprocessing Data: A Study on Testing Transformations for Stationarity of Financial Data - SARA BARWARY TINA ABAZARI

Page created by Reginald Brewer

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Preprocessing Data: A Study on Testing Transformations for Stationarity of Financial Data - SARA BARWARY TINA ABAZARI

EXAMENSARBETE INOM TEKNIK,
GRUNDNIVÅ, 15 HP
STOCKHOLM, SVERIGE 2019

Preprocessing Data: A Study on
Testing Transformations for
Stationarity of Financial Data

SARA BARWARY

TINA ABAZARI

KTH
SKOLAN FÖR TEKNIKVETENSKAP

Preprocessing Data: A Study on
Testing Transformations for
Stationarity of Financial Data

SARA BARWARY

TINA ABAZARI

Degree Projects in Applied Mathematics and Industrial Economics (15 hp)
Degree Programme in Industrial Engineering and Management (300 hp)
KTH Royal Institute of Technology year 2019
Supervisors Rickard Henricsson, Peyman Dabiri & Cecilia Pettersson
Supervisors at KTH: Camilla Landén, Per Jörgen Säve-Söderbergh & Julia
Liljegren
Examiner at KTH: Per Jörgen Säve-Söderbergh

TRITA-SCI-GRU 2019:270
MAT-K 2019:29

Royal Institute of Technology
School of Engineering Sciences
KTH SCI
SE-100 44 Stockholm, Sweden
URL: www.kth.se/sci

Abstract

In thesis within Industrial Economics and Applied Mathematics in cooperation
with Svenska Handelsbanken given transformations was examined in order to
assess their ability to make a given time series stationary. In addition, a parameter
α belonging to each of the transformation formulas was to be decided. To do this
an extensive study of previous research was conducted and two different tests of
hypothesis where obtained to confirm output. A result was concluded where a
value or interval for α was chosen for each transformation. Moreover, the first
difference transformation is proven to have a positive effect on stationarity of
financial data.

Sammanfattning

Det här kandidatexamensarbetet inom Industriell Ekonomi och tillämpad
matematik i samarbete med Handelsbanken undersöker givna transformationer
för att bedöma deras förmåga att göra givna tidsserier stationära. Dessutom
skulle en parameter α tillhörande varje transformations formel bestämmas. För
att göra detta utfördes en omfattande studie av tidigare forskning och två olika
hypotestester gjordes för att bekräfta output. Ett resultat sammanställdes där ett
värde eller ett intervall för α valdes till varje transformation. Dessutom visade
det sig att ”first difference” transformationen är bra för stationäritet av finansiell
data.

Keywords

Bachelor Thesis, financial outcome, transformations, stationarity, tests of
hypothesis, EWMA

1    Preface

This Bachelor’s thesis was written in the spring of 2019 by Sara Barwary and
Tina Abazari during a five-years Master’s program within Industrial Engineering
and Management at KTH Royal Institute of Technology. The thesis is based on
application of theory from mathematical statistics as well as the field of industrial
economics.    We would like to thank Cecilia Pettersson, Rickard Henricsson
and Peyman Dabiri at Handelsbanken for contributing to the work and giving
resources needed. We would also like to express appreciation to our supervisor
Camilla Landén and additionally Per Jörgen Säve- Söderbergh at KTH for helping
and giving support when facing problems throughout the work. Julia Liljegren
at the department of Industrial Engineering and Management also provided
valuable input and guidance to the project.

                                                                                   ii

Contents

1 Preface                                                                               ii

2 Introduction                                                                          1
  2.1   Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      1
  2.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . .         3
  2.3 Goal and Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . .          4
  2.4 Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . .         5

3 Economic Theory                                                                       6
  3.1   Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       6
        3.1.1   Securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    6
        3.1.2   Market Index . . . . . . . . . . . . . . . . . . . . . . . . . . .      7
        3.1.3   Exchange Rates . . . . . . . . . . . . . . . . . . . . . . . . .        7
        3.1.4   Commodities . . . . . . . . . . . . . . . . . . . . . . . . . . .       7
        3.1.5   Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    8
        3.1.6   Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     9
  3.2 Timing of Entry Framework          . . . . . . . . . . . . . . . . . . . . . .    9
        3.2.1   First Mover Advantages . . . . . . . . . . . . . . . . . . . . . 10
        3.2.2 First Mover Disadvantages . . . . . . . . . . . . . . . . . . . 10
  3.3 Porter’s Five Forces . . . . . . . . . . . . . . . . . . . . . . . . . . .       11

4 Mathematical Theory                                                                  13
  4.1   Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
        4.1.1   The Objectives of Time Series Analysis . . . . . . . . . . . . 13
        4.1.2   Time Series Decomposition . . . . . . . . . . . . . . . . . . . 14
        4.1.3   Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
        4.1.4   Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
        4.1.5   Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
  4.2 Stationarity Hypothesis Testing . . . . . . . . . . . . . . . . . . . . 18
        4.2.1   Dickey-Fuller Test . . . . . . . . . . . . . . . . . . . . . . . . 18
        4.2.2 Kwiatkowski–Phillips–Schmidt–Shin (KPSS)-Test . . . . . 20
  4.3 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
        4.3.1   Level Transformation . . . . . . . . . . . . . . . . . . . . . . 23

                                                                                       iii

4.3.2 First Difference Transformation . . . . . . . . . . . . . . . . 23
        4.3.3 Mean EWMA-transformation . . . . . . . . . . . . . . . . . 24
        4.3.4 Variance-EWMA Transformation . . . . . . . . . . . . . . . 24
        4.3.5   Skewness EWMA Transformation . . . . . . . . . . . . . . . 25
        4.3.6 Kurtosis-EWMA Transformation . . . . . . . . . . . . . . . . 25
        4.3.7   Autocorrelation Transformation . . . . . . . . . . . . . . . . 25
        4.3.8 Correlation-EWMA Transformation . . . . . . . . . . . . . . 26

5 Methodology                                                                        28
  5.1   Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
  5.2 Data and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
        5.2.1   Exchange Rates (FX data)       . . . . . . . . . . . . . . . . . . . 29
        5.2.2   US Sectors Data . . . . . . . . . . . . . . . . . . . . . . . . . 29
        5.2.3   Countries- Stock Index Data . . . . . . . . . . . . . . . . . . 29
        5.2.4   Commodities Data . . . . . . . . . . . . . . . . . . . . . . . . 30
        5.2.5   VIX- Market Volatility Index Data . . . . . . . . . . . . . . . 30
        5.2.6   Bond (IR) Data . . . . . . . . . . . . . . . . . . . . . . . . . . 30
        5.2.7   Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 31
  5.3 Selection of Transformations and Hypothesis Tests . . . . . . . . . 31
  5.4 Selection of Market Entry Frameworks . . . . . . . . . . . . . . . . 31
  5.5 Literature Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
  5.6 Procedure of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6 Results                                                                            40
  6.1   First Trial: Plots for Currency Rates, with Fixed α . . . . . . . . . . 40
        6.1.1   Statistics for First Trial . . . . . . . . . . . . . . . . . . . . . 45
  6.2 Second Trial: Plots for Commodity, with a Fixed α . . . . . . . . . . 47
        6.2.1   Statistics for Second Trial . . . . . . . . . . . . . . . . . . . . 50
  6.3 Third Trial: Plots for Commodity Prices, with a Fixed α . . . . . . . 51
        6.3.1   Statistics with trial 3 . . . . . . . . . . . . . . . . . . . . . . . 54
  6.4 Seasonality and Trends . . . . . . . . . . . . . . . . . . . . . . . . . 55
  6.5 Skewness and Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . 56
  6.6 First Differences on all Data . . . . . . . . . . . . . . . . . . . . . . 59
  6.7 Finding the Optimal α . . . . . . . . . . . . . . . . . . . . . . . . . . 60

                                                                                     iv

6.7.1   Currencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
        6.7.2   US-Sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
        6.7.3   Countries Index . . . . . . . . . . . . . . . . . . . . . . . . . 62
        6.7.4   Commodities . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
        6.7.5   VIX (Market Volatility) . . . . . . . . . . . . . . . . . . . . . 63
        6.7.6   IR (Bonds) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
        6.7.7   Aggregated α . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7 Conclusions                                                                        65
  7.1   Interpretation and Impact      . . . . . . . . . . . . . . . . . . . . . . . 65
        7.1.1   Trial 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
        7.1.2   Trial 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
        7.1.3   Trial 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
        7.1.4   Skewness and Kurtosis . . . . . . . . . . . . . . . . . . . . . 66
        7.1.5   First Difference as a Transformation . . . . . . . . . . . . . . 66
        7.1.6   Finding the Optimal α . . . . . . . . . . . . . . . . . . . . . . 67
  7.2 Analysis of Timing of Entry and Competitive Rivalry . . . . . . . . . 69
  7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      71
  7.4 Benefits for SHB and its Stakeholders . . . . . . . . . . . . . . . . . 73
  7.5 Final Words       . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

                                                                                      v

2 Introduction

2.1 Background

The last couple of years machine learning based forecasting has gained attention
increasingly and become more established. Moreover, using machine learning for
prediction of financial outcome has become desirable among financial institutions
and private investors.1 There are ongoing discussions and research about how to
improve these prediction models, as well as about how to pre-process input data
in order to obtain predictions with high accuracy.2 This is why machine learning
has become essential since the effective method combines computer science and
mathematics to develop models with the intent of delivering maximal predictive
precision.

Predictions of financial outcome, for example security prices or market indices,
involve a time component since future price movements may be dependent on
past values. Thus, the time dimension needs to be taken into account when
using a machine learning based prediction model. These prices in the financial
market can be seen as observations at points in time. Financial price over a time
period can therefore be described as a time series. As mentioned, the interest for
using machine learning for prediction of price movements in the financial market
has grown. Consequently, time series forecasting has become an increasingly
important area of machine learning.3

The underlying assumption in time series forecasting and the related machine
learning methods is that the input data, is a stationary process. That is, the
statistical properties for example the mean, variance and autocorrelation of the
time series should not change over time.4 However, most data is not stationary.
1
Sarlin, Peter. Björk, Kaj-Mikael.”Machine learning in finance”. Neurocomputing. Vol. 264,
2017: 1-88, Retrieved 2019-02-02
2
Palaniappan, Vivek.
”Using Machine Learning to Predict Stock Prices” 2018-10-31 https://medium.com/analytics-
vidhya/using-machine-learning-to-predict-stock-prices-c4d0b23b029a (Retrieved 2019-02-02)
3
Brownlee, Jason. ”What is Time Series Forecasting?”. Machine Learning Mastery. 2016-12-2
https://machinelearningmastery.com/time-series-forecasting/?fbclid=IwAR1Zpv80x-4EEN-
IIo-h1HL5fGHF6fD-OZYpknScLWdmU-p3uJ80 3ZF 9Ag(Retrieved2019 − 05 − 01)
4
Lindgren, George. ”Stationary stochastic processes”p.13-16
http://www.math.chalmers.se/ rootzen/fintid/stationary120312.pdf (Retrieved 2019-02-02)

As the time span of historical observations increases, the greater is the probability
of the time series showing non-stationary characteristics.5

For many machine learning methods, handling non-stationary data sets is a
challenge since it could increase the risk of obtaining prediction outcomes
significantly different from the real outcomes. Non-stationary time series is
a result from data showing trends, seasonal effects, cycles, noise and other
structures dependent on the time observation. Therefore, it cannot be analyzed
through traditional techniques. Instead, forecasting non-stationary time series
may require models with higher complexity. In order to facilitate achieving more
reliable output from a prediction model effects such as seasonal components and
trends may need to be removed from the input data set.6 It is possible to make
data stationary, or at least approximately stationary by the use of mathematical
transformations.

In the last couple of months, Svenska Handelsbanken AB (SHB) has been
discussing a market entry for new financial products. The idea is to predict the
return of the securities with a machine learning based model, which the products
can be based upon in the future.

Richard Henricsson at SHB conducted research ten years ago regarding
mathematical transformations and their ability to generate stationary financial
data. As a result of considering this potential business idea, the question has been
raised by SHB regarding whether the transformations are still applicable to data
today. Henricsson found several transformations, including both established ones
and his own approximations. The approximations were derived with the aim to
reduce complexity of some of the transformations. His studies resulted in seven
chosen transformations.

• Differencing (First order)
• Exponentially weighted moving average: Mean
• Exponentially weighted moving average: Variance
5
Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”
p.16-19 https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf(Retrieved 2019-02-19)
6
Kang,Eugine. ”Time
Series: Check Stationarity”, 2018-08-26. https://medium.com/@kangeugine/time-series-check-
stationarity-1bee9085da05 (Retrieved 2019-02-23)

• Exponentially weighted moving average: Skewness
• Exponentially weighted moving average: Kurtosis
• Exponentially weighted moving average: Autocorrelation
• Exponentially weighted moving average: Correlation

The definition and meaning of these will be explained more thoroughly in the
theoretical background, Section 4. Except for the first difference transformation,
the other transformations depend on a unknown constant α.             Changing the
value of α will result in different output obtained from each transformation.
Consequently, the choice of α for a chosen transformation may have an impact on
whether the data can be made stationary. In accordance, this has raised interest
for SHB to examine the specific values for α to potentially make financial time
series stationary.

Furthermore, SHB is one of the biggest banks in the Nordic countries.             In
the Nordic financial sector, there are not many commercial players today
providing financial products related to machine learning based financial outcome
prediction. Consequently, SHB has the potential to be among the first players in
this area. It is therefore of interest for SHB to understand how the timing of entry
to market can affect their business.

2.2   Research Question

The work of this thesis was done in cooperation with SHB. The main research
question is to examine whether financial data can be transformed to become
stationary, and for what value or values of the parameter α stationarity is achieved.
The time span of all the data sets is 2001-01-01 to 2018-12-31. The main research
questions to be answered are consequently the following:
1. Are the given transformations sufficient enough to make the data stationary?
2. Which parameter value or values of α for each transformation will make the
data potentially stationary?

Also, a discussion will be held regarding the effects of the timing of entry to a new
market. More precisely, given that the transformations can make financial data

                                                                                   3

stationary and SHB can develop financial products based on machine learning
financial outcome prediction, how will the timing of potential new product
launches affect their competitive advantage.

2.3    Goal and Purpose

Predicting financial outcome for securities is relevant for private investors as well
as for financial institutions. The goal of this thesis is to examine how financial
data may be pre-processed in order to make it useful as input data to a prediction
model.

The main goal for SHB is to, based on the results of this thesis, separately develop
a machine learning based forecasting model for prediction of financial outcome.
More precisely, their model will indicate future price movements in the financial
market, mainly for stocks in well developed countries such as the US. For example,
this can be stock indices from the US such as Dow Jones Industrial Average or S&P
500.

Since SHB is currently in the initial phase of the model development, it is of
importance for them to know if it is possible to make input data for the future
model stationary. The goal of this research is to provide an insight regarding the
question. If it not possible to make the data stationary, it may be required for them
to consider conducting further research on building a model on non-stationary
data. Alternatively, this thesis can answer whether it is needed for SHB to conduct
further research regarding how to make data stationary. Therefore, the greater
purpose of this study is to give a direction for the future work for SHB.

In the market entry discussion, the market of SHB will be limited to other banks
institutions in Sweden/the Nordics since SHB:s main business activity lies within
this area.

                                                                                   4

2.4    Scope and Limitations

The scope of this thesis is limited to examining transformations given by SHB. Also
the data is provided by SHB and it is mainly related to the financial markets of the
US and other well developed countries. Moreover, it is necessary to determine
what qualifies as stationarity since there exists a strong and weak form. For this
research, it has been decided that it is sufficient if a time series only fulfills the
requirements of weak stationarity since proving strict stationarity for a whole data
set is complex. The difference between these types of stationarity is explained
in Section 4.1.5.7 Moreover, the project will be limited to only two different
hypothesis tests, both chosen by SHB. These were chosen since they are based
on different model assumptions and hypotheses and may therefore give a wider
perspective to the analysis of the results.

   7
    ”Stationarity                                                                Differencing”
https://www.statisticshowto.datasciencecentral.com/stationarity/ (Retrieved 2019-03-02)

                                                                                            5

3 Economic Theory

Terminology related to the financial market that are mentioned in the thesis or
used as input data in the research are explained in this section. The purpose
is to facilitate obtaining an understanding of the content of this thesis. Theory
regarding stocks, bonds and other financial assets will be provided to understand
why they are important to look at when studying an economy. Moreover a model
of Porters Five Forces will be introduced and discussed as well as benefits with
being a first mover to the market.

3.1 Terminology

3.1.1 Securities

A security is a financial asset that can be traded. There are are several types of
securities and these are in general classified as equity securities, debt securities
and derivatives.

Equity securities represents ownership in an entity. The most common equity
security is a stock, which is an ownership of a share of a company.8

A holder of a debt security borrows money which later must be repaid. For
instance, when a debt security is issued, different terms are formulated for
example, for the size of the loan, the maturity date and the interest rate.
Corporate bonds and government bonds are examples of two frequently debt
securities.9

Derivatives are contracts between at least two parties. The value of the contract
is based on an underlying asset such as a stock, a market index, interest rate or a
market index. There are various derivatives, such as options and futures.10
8
Kenton, Will. ”Security”, 2019-05-20. https://www.investopedia.com/terms/s/security.asp
(Retrieved 2019-05-22)
9
Chen,
James. ”Debt Security”, 2019-03-23. https://www.investopedia.com/terms/d/debtsecurity.asp
(Retrieved 2019-05-20)
10
Chen, James. ”What is a Derivate?”, 2019-05-19.
https://www.investopedia.com/ask/answers/12/derivative.asp (Retrieved 2019-05-22)

3.1.2 Market Index

A market index is a measurement of a segment of the financial market. More
precisely, the index shows the performance of the securities within the chosen
segment. A market index is computed from the prices of the securities. There are
11
several weighting methods for determining the impact of each price.

3.1.3 Exchange Rates

An exchange rate is the value of an economic zone’s currency compared to the
currency of another nation or a specific economic zone. The currency exchange
rate is one of the most important factors to use when indicating a country’s
economic health relative to others. It is vital to a country’s level of trade and
financial flows in the area.12 Movements in the exchange rate has an influence on
the decisions of businesses, government and individuals in society. Collectively,
this may have an effect on the activity on the financial markets (for example on
how people trade and how securities are valued).13

3.1.4 Commodities

Commodities are basic goods used in commerce and as input in productions
of both products and services. The price of it is usually decided by the whole
market. It could be anything from raw material to chemicals sold. Commodities
are most commonly sold and purchased through future contracts that standardize
the quantity and minimum quality of the commodity that is being traded. The
market of commodities is important since it offers a market place where members
can transact business. It also establishes a regulated trading with rules and
11
Young,
Julie. ”Market Index”, 2019-05-02. https://www.investopedia.com/terms/m/marketindex.asp
(Retrieved 2019-05-22)
12
Twin, Alexandra. ”6 Factors that Influence Exchange Rate”, 2019-05-
20. https://www.investopedia.com/trading/factors-influence-exchange-rates/ (Retrieved 2019-
05-20)
13
Hamilton, Adam. ”Understanding Exchange Rates and Why They Are Important”,
2018. https://www.rba.gov.au/publications/bulletin/2018/dec/pdf/understanding-exchange-
rates-and-why-they-are-important.pdf (Retrieved 2019-05-20)

regulations. Moreover it is a place for collecting and disseminating as well as
grading of the commodities depending on quality.14

One example that will be used in the thesis is the the spot price of crude oil which
is considered one of the most important commodities in the world. Since today’s
society and economy is dependent on non-renewable fossil fuels crude oil plays an
important role in the market of commodities. The cost of a barrel of crude oil is
determined by the global market, more precisely the supply and demand of it. For
example, if the demand for crude oil is high and the supply is low, the result will
be higher oil prices. This is important for economists and experts to predict since
the prices are volatile. The price of oil can directly or indirectly through multiple
steps affect the costs of goods and services in the economy which can result in
inflation. The West Texas Intermediate crude oil is considered one of the major
benchmarks of crude oil.15

3.1.5 Volatility

Volatility is the standard deviation the return of an asset. The standard deviation
is the square root of the variance. Both variance and standard deviation measure
the variability of a return.

The volatility is as an indicator of the risk level for an assets, for instance a security,
portfolio or market. It is expected to be more challenging to predict the price of an
highly volatile asset. Consequently volatile assets are viewed as riskier compared
to less volatile assets. Shortly, volatility is considered as the risk related to the
change in the asset’s price.

The VIX Index is an example of a market volatility measure. Before making an
investment decision, investors normally look at the VIX values to gain insight
about the market risk.16
14
Lioudis, Nick. ”Commodities
Trading: An Overview”, 2018-05-18. https://www.investopedia.com/investing/commodities-
trading-overview/(Retrieved 2019-05-20)
15
Premkumar,
Divya. ”How do oil prices affect stock market”, 2019-01-08. https://www.tradebrains.in/how-
do-oil-prices-affect-the-stock-market/(Retrieved 2019-05-01)
16
Kuepper, Justin. ”Volatility Definition”, 2019-04-18.
https://www.investopedia.com/terms/v/volatility.asp (Retrieved 2019-05-01)

3.1.6 Bonds

A bond is a fixed income instrument that is a loan made by an investor to a
borrower. When companies or other financial institutions need to finance new
projects, ongoing operations or other financial investors they can issue bonds
directly to investors. The borrower, the one that issued the bond, for example
includes terms of the loan, interest payments and maturity date. The interest
payment, the coupon, is the earning for bondholders for loaning their funds.
The interest rate that determines the payment is called the coupon rate. A
government bonds is a bond issued by the government. Treasury yield is the
return on investment on the U.S. government’s debt obligations. It is important
when analysing stocks since it tends to signal investor confidence. When it
is high the bond’s price drops and yield increase since investors believe they
can find investments with higher return. When confidence is low, the opposite
occurs.

Bonds will affect the amount of liquidity in countries since it determines how easy
or difficult it will be to take loans and buy on credit for example. Since the bonds
are so strongly related to the economy it means they are important for forecasting.
Bond yields will indicate what investors think the economy will do.17

3.2 Timing of Entry Framework

When firms are about to enter a new market, either by launching a new product
or expanding to new regions, one main concern is regarding when to enter the
market. Entrants are usually divided into three categories depending on their
time of entrance. These are the first movers, early followers and the late entrants.
Earlier research have resulted in contradictory answers to the question of which
entry timing strategy is the optimal and why.

The first movers of a market are the first to bring and sell a new good or service
to the market. Early followers are relatively early to the market, even though
17
Amadeo, Kimberly. ”How Bonds
Affect the U.S. Economy”, 2019-01-20. https://www.thebalance.com/how-do-bonds-affect-the-
us-economy-3305601 (Retrieved 2019-05-01)

they are not the first to enter. Lastly, the late entrants are seen when a product
is becoming or has become more commercial, in other words when the product
gains mass market penetration.

3.2.1 First Mover Advantages

The theory of timing of entry also covers the advantages and disadvantages of
being the first mover. According to theory, the first mover will gain brand loyalty
and technological leadership. Additionally, first movers have more time on the
market, enabling them to gain more market share. This could eventually result
in a Winner-Takes-All Market. The reasons is that the company may be posed
as a technological innovator and gain reputation as a leader. Being the first also
enables the player to develop the characteristics of the technology, for instance its
features, functionality of the technology, as well as forming the pricing.

Firms that enter the market early can capture important resources such as key
locations, government permits, patents to the technology, access to distribution
channels and develop relationships with suppliers. Another advantage with being
early is exploiting buyer switching costs. In other words if a buyer faces switching
costs when changing to other superior technology and has invested time in the
technology, the first mover that captures customers may be able to keep those
customers. If the industry pressures and encourages the adoption of a dominant
design the timing of the entry could be critical to its likelihood of success.

3.2.2 First Mover Disadvantages

Studies have shown that many first movers are exposed to higher costs, which
reduce the profits of their businesses. To become the first mover, it may be
required to add resources to research and development work. The late entrants
have on the other hand the possibility to use already existing work, technology
and knowledge developed by the first mover, to create a similar product. They
can also adapt the product or service development to the customers’ preferences
instead of facing customer uncertainty of requirements. As a result, they can avoid
high development expenses.

                                                                                  10

Another negative aspect is that new developed technologies may require other
technologies or components produced by other firms.            Therefore, they are
dependent on the effort of other firms. The first movers can therefore not rely
on enabling technologies. Moreover, when firms introduces new technology and
innovations, often there are no appropriate suppliers or distributors exist. This
will lead to the firm having to assist the suppliers or perhaps develop its own
suppliers which is a time and resource demanding task.

3.3   Porter’s Five Forces

Porter’s Five Forces Framework, developed by Michael Porter, is a tool for
analyzing the market dynamics and the competition of a business. The purpose
of the model is to identify and analyze five competitive forces that shape every
industry and helps determine an industry’s weaknesses and strengths.            The
insights are often used to see if new product or service offerings can be profitable.
Also it may be used for answering strategic questions such as how, where and
when a market entry should be done. The five forces are threats of new entrants,
bargaining power of suppliers, bargaining power of customers, threats of
substitute products and competitive rivalry. All together, the four first forces
describe the competitive rivalry.

                                                                                  11

Figure 3.1: Porter’s five forces model and important questions to answer during
the analysis

                                                                            12

4 Mathematical Theory

The following section provides information regarding the mathematical theories
and models used in the thesis. It also intends to explain the assumptions which
the models are based upon.

4.1 Time Series

A time series is a series of data points, measured over a time period and indexed
in time order. In other words, values are taken by a variable over time in
chronological order.18 The time series is denoted as a vector {Xt }, t=0,1,2.... where
t represents the time and Xt is seen as a random variable. There exists both
discrete and continuous time series for a time series. For a time t ∈ [0, ∞).

4.1.1 The Objectives of Time Series Analysis

The primary objective of time series analysis is the development of mathematical
models that describe the data sample. The purpose is to extract meaningful
statistics and characteristics of the data. There are in general two main goals of
the time series analysis:
1. Identifying the nature of the phenomenon. What does it contain?
2. Forecasting or in other words predicting future values of the time series
variable.

These goals require an identification of the pattern that is observed in the time
series. With this it can be interpreted and integrated with other data for a forecast
model.19
18
”Time Series” http://www.businessdictionary.com/definition/time-series.html (Retrieved
2019-01-30)
19
”Time Series” https://www.stat.ncsu.edu/people/bloomfield/courses/st730/slides/SnS-01-
2.pdf (Retrieved 2019-02-02)

4.1.2 Time Series Decomposition

Within time series analysis, one can decompose a time series into several
components. Let {Xt } be a sequence of random variables. Then, a time series
can be decomposed either additively as:

Xt = Tt + St + ϵt

or multiplicatively as
Xt = Tt ∗ St ∗ ϵt

where Tt is the trend component at time t, St is the seasonal component at time t
and ϵt is a irregular component at time t.20

Over a long time period a time series may show a general tendency of decrease,
increase or stagnation. This is represented by the trend component in a
decomposition. The seasonal component exhibits patterns affected by seasonal
factors such as the day of the weak or the quarter of the year. The period of
the seasonality is fixed and known. Further, the irregular component portrays
events that do not occur regularly and are of unpredictable characteristics.21
The irregular component corresponds to the residual obtained after the trend
and seasonality have been removed, that is, ϵt is a random noise component.
Additionally, ϵt is stationary at least in the weak (described in Section 4.1.3) sense.
22

4.1.3 Trends

Usually one wants to know if there is a trend in the time series to support future
forecasting. In some cases a trend is seen as an accumulated effect of certain
factors and in other cases trends indicate a kind of influence that needs further
investigation. The trend could for example be linear, exponential or even mixed
20
Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”
https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf (Retrieved 2019-02-16)
21
Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”
p. 12-18 https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf(Retrieved 2019-03-23)
22
Brockwell, J Peter. Davis, A Richard. ”Introduction to Time Series and Forecasting”, p.20.
Third ed, Springer

between different types.23

4.1.4 Seasonality

In time series data, seasonality is a presence of variations that occur at specific
regular intervals for example every autumn. These repeat regularly over time.
Identifying or removing seasonal components could result in a more clear
relationship between the variables that are input and output. It could also provide
information that is helpful for improvement of model performance.24

4.1.5 Stationarity

A stationary assumption is equivalent to saying that the generating mechanism
of the process is itself time-invariant, so that neither the form nor the parameter
values of the generation procedure change over time. A process {Xt }, t ∈ Z (where
Z is the integer set) is defined to be weakly stationary if it satisfies
1. E[Xt ] = µ
2. Var[Xt ] = σx2 < ∞
3. γX (s, t) = γX (s + h, t + h) for all s, t, h ∈ Z, where γ is the autocovariance
function.

In other words this means that a stochastic process that is stationary will have a
mean and variance that do not change over a time period. Also the autocovariance,
meaning the covariance between the values of the process at two points in
time, will only depend on the distance between the time points and not on
time itself.25 There is also a more restrictive definition of stationarity than
the above mentioned. A time series {Xt1 , Xt2 ..., Xtn , t = 0, ±1, ±2, ....} is strictly
stationary if the same joint probability distribution holds for (Xt1 , ..., Xtn ) as for
(Xt1 +h , ..., Xtn +h ), that is
23
Deshpande, Bala. 2014-03-12 ”Time series forecasting:
understanding trend and seasonality” http://www.simafore.com/blog/bid/205420/Time-series-
forecasting-understanding-trend-and-seasonality (Retrieved 2019-05-01)
24
Brownlee, Jason. 2016-12-23 ”How to Identify and Remove Seasonality from Time Series
Data with Python” https://machinelearningmastery.com/time-series-seasonality-with-python/
(Retrieved 2019-04-14)
25
A. Lincoln. Introduction to the theory of time series, Chapter 1 p.4-6

d
(Xt1 , ....., Xtn ) = (Xt1 +h , ....., Xtn +h )

for all integers h and n>0 .26

The importance of stationarity is great. If the data selection of a time series is non-
stationary the series can very much influence both its behaviour and properties.
Thus, a regression depending on the data points will be hard to prove. Also, if the
variables in a regression model not are surely stationary, the assumptions for the
asymptotic analysis may not be valid.27 Non-stationary time series will depend on
data showing trends, seasonal effects and other structures dependent on the time
observation.28 A time series is usually non-deterministic, hence what occurs in the
future can not be predicted with certainty. Therefore, the concept of stationary of
a time series abates the complexity in forecasting the future.29

In order to prove or check for stationarity there are a number of different
approaches that could be useful. The most common methods are examining plots
and statistical tests.30 One can run a sequence of plots and examine them to
find any obvious trends or seasonal effect. With this, summary statistics can be
obtained which are used to summarize a set of observations, to communicate as
much of the information as possible. In the process the data is partitioned into
intervals and then it is checked if there are obvious or significant differences in
the summary statistics between them. Statistical tests can provide a method for
making quantitative decisions about a particular sample.
26
Brockwell, J Peter . Davis, A Richard. ”Introduction to Time Series and Forecasting”, p.13.
Third ed, Springer
27
Ryabko,Daniil. ”Asymptotic Nonparametric Statistical Analysis of Stationary Time Series”,
2019-03-30 https://arxiv.org/abs/1904.00173 (Retrieved 2019-05-01)
28
Kang,Eugine.”Time
Series: Check Stationarity”, 2018-08-26. https://medium.com/@kangeugine/time-series-check-
stationarity-1bee9085da05 (Retrieved 2019-02-23)
29
Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”
p. 12-18 https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf(Retrieved 2019-03-23)
30
”Tests of Stationarity” https://people.maths.bris.ac.uk/ magpn/Research/LSTS/TOS.html
(Retrieved 2019-02-12)

Figure 4.1: The following graph illustrates a non-stationary time series, a random
walk that has not been adjusted

Figure 4.2: This figure illustrates the same data but after stationarity is obtained
with the first difference transformation. As one can see the graph seems more like
a even line, indicating stationarity.

                                                                                 17

4.2 Stationarity Hypothesis Testing

As mentioned in the limitations to this project, we will only use two different
stationarity tests. These hypothesis tests are used to obtain an indication as to
whether a time series is stationary. However they can not be used as a proof
of stationarity. If the counter hypothesis is rejected, the null-hypothesis is not
confirmed. A non significant result only means it can be concluded that the
counter-hypothesis is not a strong competitor to the null-hypothesis. Also, in
general there can be many other null-hypotheses that also would not have been
rejected.31

4.2.1 Dickey-Fuller Test

A commonly used method for checking the existence of a unit root is by the Dickey-
Fuller test, which was developed by David Dickey and Wayne Fuller (1979).
The Dickey-Fuller hypothesis test gives an indication on whether a process is
stationary or not.32 The test checks if a process follows a unit root process. The
augmented Dickey-Fuller (ADF) test is an expansion of the original Dickey-Fuller
(DF) test, used for higher order correlations, since the Dickey-Fuller is only valid
for AR(1)-processes. An AR(1)-process is an autoregressive process of the first
order. This means that the current value is based on the immediately preceding
value.33 Similar to the original DF-test, the ADF tests for a unit root in a time series
sample. The primary difference is that the ADF is used for more complicated and
larger sets of time series models.34 If there is higher order correlation instead of
only AR(1)- processes the augmented version must be used.

The purpose is to test the null hypothesis, that an unit root is present against the
hypothesis that there is no unit root which indicates that the data is stationary.
31
”Hypotesprövning” http://gauss.stat.su.se/gu/sg/2012VT/Kompendium/KAP17new.pdf
(Retrieved 2019-05-03)
32
”ADF — Augmented Dickey
Fuller Test ” https://www.statisticshowto.datasciencecentral.com/adf-augmented-dickey-fuller-
test/ (Retrieved 2019-03-15)
33
Pantelis, Anastasios. 2008. ”Testing for unit roots in the presence of structural change”
http://lup.lub.lu.se/luur/download?func=downloadFilerecordOId=1338330fileOId=1646631
(Retrieved 2019-03-09)
34
”The Augmented Dickey-Fuller Test” https://www.thoughtco.com/the-augmented-dickey-
fuller-test-1145985 (Retrieved 2019-02-27)

Consider the first order autoregressive model

                                        Xt = δ + θXt−1 + ϵt

where θ = 1 corresponds to a unit root and ϵt is a white noise process, with a
constant variance and zero mean. In a stationary AR(1)-process, the constant term
δ can be expressed as δ = (1 − θ)µ, where µ is the mean of the series.

The null hypothesis of a unit root is that θ = 1 which also implies that δ = 0.
Hence, to test the null hypothesis θ = 1 and δ = 0 must be shown. This is difficult
to test, therefore the model is rewritten as

                            ∆Xt = δ + (θ − 1)Xt−1 + ϵt = πXt−1 + ϵt

The null hypothesis states that ϕ − 1 = 0 or equivalently π = 0. The hypothesis is
thus formulated as

                                              H0 : π = 0

                                              H1 : π < 0

When the hypotheses are established the Dickey-Fuller test performs a t-test on
H0 . With the test one obtains a critical value τ̂ , which is a point in the test
distribution and is compared to the test statistics.

                                              ϕ̂ − 1     π̂
                                       τ̂ =          =
                                              SE(ϕ)    SE(π̂)
35
     When performing the ADF test, p-value< 0.05 indicates strong evidence against
the null hypothesis. Thus, stationarity is not rejected. On the other hand, p-
     35
          Verbeek, Marno.”A Guide to Modern Econometrics” 2014, 2nd Edition, p.265-268

                                                                                         19

value≥ 0.05, then evidence against the null-hypothesis is weak, hence stationarity
of the time series can be rejected.

4.2.2 Kwiatkowski–Phillips–Schmidt–Shin (KPSS)-Test

The KPSS-test is a test of the stationarity hypothesis proposed by Kwiatkowski,
Phillips, Schmidt and Shin (1990).           Similar to the Dickey-Fuller test, the
characteristics of the KPSS-test is that it gives an indication on whether there
exists a unit root or the process is stationary.36

Let Xt , t = 1,2,...T be a time series of observed values. Assume, the series can be
decomposed into a deterministic trend, a random walk, and a stationary error.
The data generating process (DGP) of Xt in KPSS can then be defined as

                                   Xt = Yt + ϵt + ξt

where Yt is the deterministic trend term, ϵt is the error term, and ξt is the random
walk term, so that

                                      ξt = ξt−1 + ηt

.

By definition of the random walk ηt ∼ iid(0,σ 2 ).37 If σ 2 =0 meaning the variance of
ηt is zero, then it holds that

                                        ξt = ξt−1

    That is, the random walk process devolves to a constant term and Xt becomes
     36
     ”What                                                                                      is
a Critical Value?”, 2019. https://support.minitab.com/en-us/minitab-express/1/help-and-how-
to/basic-statistics/inference/supporting-topics/basics/what-is-a-critical-value/ (Retrieved 2019-
05-04)
  37
     Nabeya,     Seiji et al.          ”Asymptotic Theory of a Test for the Con-
stancy     of    Regression    Coefficients   Against   the    Random        Walk     alternative”
1987.           https://projecteuclid.org/download/pdf1 /euclid.aos/1176350701?f bclid          =
IwAR2Rt2XpM IT ex A880DiEC4qzo8V EjzmA7HjM KN yp3mKSoKSAXhOaY F f 85c(Retrieved2019−
04 − 30)

                                                                                      20

trend-stationary, meaning that the series grows around the deterministic trend.
Consequently, the null hypothesis can be formulated as

                                      H0 : σ 2 = 0

                                      H1 : σ 2 > 0

Under the null hypothesis the process is trend-stationary (and the counter
hypothesis implies that Xt , t = 1, 2...T is a unit root process).38 To reduce
complexity, the deterministic component of the series may also be removed, Yt =
0. This is a special case for which the null hypothesis is that Xt is level-stationary
around a level or mean (ξ0 ) instead of around a trend, meaning that the mean
value no longer depends on t.39 A statistic that can be used for the null hypothesis
is the LM statistic, which is defined as

                                           ∑
                                           T
                                   LM =          St 2 /σbt2
                                           i=1

where
                                               ∑
                                               t
                                       St2 =         ei
                                               i=1

.

That is, S2t is the squared partial sum of the residuals from a regression of x on the
deterministic term. Further, et , t=1, 2, T denotes the residuals from a regression
of X on a time trend and an intercept. Also, σct2 is the notation for the estimated
value of the variance obtained from the regression. If the aim is to test for trend
stationarity then the residual is redefined as

                                     ei = Xi − X̄
    38
      Cappuccio, Nunzio et al.             ”The Fragility of the KPSS Stationarity Test”
2009.          http://leonardo3.dse.univr.it/home/workingpapers/fragilityk pss.pdf ?f bclid       =
IwAR0snLcQCpmgyN CM q0eR9JgXXwF W 3hnIZykKcv72IbZO7t57goM 9d1W 4xGI(Retrieved2019−
04 − 30)
   39
      Journal of Econonometrics ”Testing the null hypothesis of stationarity against the alternative
of a unit root” 1991. http://debis.deu.edu.tr/userweb//onder.hanedar/dosyalar/kpss.pdf?fbclid=IwAR3uwIVD3WTB
(Retrieved 2019-04-30)

                                                                                       21

which is the regression of X only on an intercept.40

4.3 Transformations

This section will provide theory regarding the transformations that Henricsson
found to be relevant when doing research. Furthermore, the purpose of them
will be discussed. Data transformation is a process where information or data
is converted from one format to another. In this case the goal is to transform data
from non-stationary to stationary. To describe these given equations the following
variables are introduced:

Data is measured on the range ( t0 , .., t, .., tmax ) and consists of T elements. The
dataset X, is an N*T matrix containing the N variable vectors (x1 , x2 ,.., xN ) where
xi = (xi,t0 , …, xit… , xi,tmax ). For a certain point in time t, and a specific variable k,
we will present a number of approximations of transformations.

Most of the generally approximated transformations depend on the rate of
decay α, which can be varied so there are a suitable number of varieties of the
transformations and an estimation may be needed. Generally the formula for the
new forecast after the transformation follows the pattern

N ewF orecast = α(N ewData) + (1 − α)M ostRecentF orecast

One can say that the approximation of α will decide the rate of how much the new
forecast represents of new data and how much to consider the past.41 Studies that
have been performed before have suggested that the value of α should be below
0.3 for a smoothing result.42
40
Journal of Econonometrics ”Testing the null hypothesis of stationarity against the alternative
of a unit root” 1991. http://debis.deu.edu.tr/userweb//onder.hanedar/dosyalar/kpss.pdf?fbclid=IwAR3uwIVD3WTB
(Retrieved 2019-04-30)
41
Ragnarstrom, Elsa. ”How to
calculate forecast accuracy for stocked items with a lumpy demands”, 2015. https://www.diva-
portal.org/smash/get/diva2:901177/FULLTEXT01.pdf (Retrieved 2019-05-03)
42
”How To Identify Patterns in Time Series Data: Time Series Analysis”

4.3.1 Level Transformation

Let {Xt , t = 0, 1, 2...} be a time series. Then the level transformation is defined
as

                                      F 1i,t = Xi, t̄

where
                                    t̄ = max(tj ≤ t)

t̄ = max(tj ≤ t) is the largest t value in the sample at a specific point of time. That,
it corresponds to the latest observation. In other words, if there are any missing
values, the most recent value obtained will be used.

4.3.2 First Difference Transformation

The first difference at time t, F2i,t is obtained by looking at the change between
an observation at time t and the previous time step, t-1, from the original series.43
The first difference transformation is defined as

                                 F 2i,t = Xi,t̄ − Xi,t̄−1

A non-stationary behavior commonly encountered is when the level of the process
changes, although the process still shows homogeneity in the variability. Taking
the (first) difference may in these cases lead to stationarity.44 In time series
analysis, differencing is frequently used for removing dependency on time, for
which structures such as trend and seasonality may be included.
http://www.statsoft.com/Textbook/Time-Series-Analysis (Retrieved 2019-05-03)
  43
     Kulahci, Murat et al. ”Time Series Analysis and Forecasting by Example”, 2011.p 90
  44
     Bisgaard.S, Kulahci.        M. ”Time Series Analysis and Forecasting”, 2017-06-
22.                  https://www.vividcortex.com/blog/exponential-smoothing-for-time-series-
forecasting?fbclid=IwAR2XCtbMASHciBFEIRrpRkVvJda6ziKVJ3qCirAQJ3Oc3GsNBk5VZ4xLd0Q
(Retrieved 2019-02-18)

                                                                                      23

4.3.3 Mean EWMA-transformation

An exponentially weighted moving average, also called EWMA is a type of moving
average that places a greater weight and significance on the most recent data
points. For example, it can be assumed that a security’s price is mostly dependent
on more recent prices compared to long ago historical data. The previous value of
the EWMA is taken into consideration in the calculation of the following EWMA.
The weights are based on the expontential function as the name indicates.45 This
is a very popular scheme to produce a smoothed time series. In general if you have
a time series called {Xt } then the smoother version will look like

                              St = α ∗ xt + (1 − α)St−1

46

The definition for the EWMA mean in this case is

                         F 3i,t = (1 − α) ∗ F 3i,t−1 + α ∗ F 2i,t

4.3.4 Variance-EWMA Transformation

As mentioned, exponentially weighted moving averages are often used for
smoothing irregular fluctuations in a time series to better find the patterns over a
specific time period. Since EWMA has different properties the formula used for
the EWMA variance transformation is

                    F 4i,t = (1 − α) ∗ F 4i,t−1 + α(F 2i,t − F 3i,t )2

From EWMA variance, a future variance is estimated by the weighted average of
  45
     ”Exponentially Weighted Moving Average” https://www.value-at-risk.net/exponentially-
weighted-moving-average-ewma (Retrieved 2019-03-02)
  46
     Jinka, Preetam.      ”Exponential Smoothing for Time Series Forecasting”, 2017-
06-22.              https://www.vividcortex.com/blog/exponential-smoothing-for-time-series-
forecasting?fbclid=IwAR2XCtbMASHciBFEIRrpRkVvJda6ziKVJ3qCirAQJ3Oc3GsNBk5VZ4xLd0Q
(Retrieved 2019-02-18)

                                                                                     24

past variances.47

4.3.5 Skewness EWMA Transformation

This transformation measures the skewness and uses it in order to transform the
data. The formula used is

F 5i,t = (1 − α) ∗ F 5i,t−1 + α(F 2i,t − F 3i,t )3

4.3.6 Kurtosis-EWMA Transformation

This transformation measures the kurtosis of the change in the variable.

F 6i,t = (1 − α) ∗ F 6i,t−1 + α(F 2i,t − F 3i,t )4

4.3.7 Autocorrelation Transformation

In general probability theory and statistics with a known stochastic process in
focus, the autocorrelation will be a number that represents the similarity between
a given time series and a lagged version of it over successive time intervals. In
other words it is the same as calculating the correlation between two different
time series, its current value versus its past. The result varies between -1 and 1. If
the autocorrelation is positive it means that the increase in one time series results
in an increase in the other time series as well.48 Firstly, the EWMA autocovariance
is calculated by the following formula
47
Breaking Down Finance. EXPONENTIALLY MOVING AVERAGE VOLATILITY (EWMA).
https://breakingdownfinance.com/finance-topics/risk-management/ewma/ (Retrieved
2019-05-03)
48
Kenton, Will. ”Autocorrelation”, 2019-03-31.
https://www.investopedia.com/terms/a/autocorrelation.asp (Retrieved 2019-04-13)

F 7i,t = (1 − α) ∗ F 7i,t−1 + α(F 2i,t − F 3i,t )(F 2i,t−1 − F 3i,t−1 )

Normally, the autocovariance function between time t1 and t2 for Xt is defined
as
                                 γX (t1 , t2 ) = Cov(Xt1 , Xt2 )

and the autocorrelation is defined as

                                                       γX (t1 , t2 )
                                   φX,X (t1 , t2 ) =
                                                        σt1 ∗ σt2

where σt 2 is the variance at time t.49            To obtain the EWMA autocorrelation
between, t1 = t and t2 = t − 1 the standard variances are replaced with the
corresponding EWMA variances. Also, the EWMA autocovariance is used and
the formula is hence

                            EWMA autocorrt = √                     √
                                                                F 7i,t
                                                           F 4i,t ∗ F 4i,t−1

4.3.8 Correlation-EWMA Transformation

In probability theory the correlation measures the degree to which two time
series move in relation to each other. Just like in the autocorrelation case, if
the correlation is positive, it indicates that if one series moves up the other will
follow.50 Let {Xt , t = 0, 1, 2...} be a time series representing one set of observed
data, and {Yt , t = 0, 1, 2....} be another time series which represents another set of
observed data.

To begin with, the EWMA covariance is calculated by the formula

          F 8i,j ,t = (1 − α) ∗ F 8i,j ,t−1 + α(F 2i,t − F 3i,t )(F 2j ,t−1 − F 3j ,t−1 )
  49
    Kulahci, Murat et al. ”Time Series Analysis and Forecasting by Example”, 2011 p.62
  50
    Hayes,             Adam.                              ”Correlation”,           2019-04-30.
https://www.investopedia.com/terms/c/correlation.asp (Retrieved 2019-05-01)

                                                                                            26

where index i and index j correspond to Xt and Yt , respectively. In general, the
covariance between to random variables X and Y is denoted Cov(X,Y) and the
correlation between the random variables is defined as

                                                Cov(X, Y )
                                       φX,Y =
                                                 σX ∗ σY

where σX 2 is the variance of X and σY 2 is the variance of Y .51

Using the EWMA covariance and replacing the standard variance with their
corresponding EWMA variances, the EWMA correlation is formulated as

                                 EWMA corrt = √               √
                                                         F 8i,j ,t
                                                      F 4i,t ∗    F 4j ,t

  51
       Kulahci, Murat et al. ”Time Series Analysis and Forecasting by Example”, 2011 p.62

                                                                                            27

5 Methodology

As tools it was decided to limit this project to the programming language Python
and spreadsheet Microsoft Excel. These tools have been chosen since they are
easily used for time series data and one can perform all the hypothesis tests and
transformations required using these.52

5.1 Data Collection

The data was provided by SHB and consisted of different security prices and
indices. These covered the time period from 2001-01-01 to 2018-12-31 and were
noted on a daily basis. This was in order to capture real trends and seasonality
of the time series. The data regarded US related securities, such as US sectors
stock indices, US treasury bonds, exchange rates with the US dollar and more.
Processing this type of data may lay the basis for SHB to use the data and predict
future outcome of the US stock market. For example, future values for US stock
market indices Dow Jones Index or SP 500 may potentially be forecasted by a
prediction model after the data is pre-processed. This was an area of interest for
SHB.

The data is considered to be quantitative since it only contains numbers.53
Qualitative data was also used when discussing experiences with professionals
with previous expertise regarding data pre-processing. For example discussions
on how to interpret results or to understand more about the data chosen.

5.2 Data and Notations

This section contains the data and notation used in this thesis and explanations
regarding them.
52
Brownlee, Jason. ”How to Check if Time Series Data is Stationary with Python”, 2016-12-
30 https://machinelearningmastery.com/time-series-data-stationary-python/ (Retrieved 2019-
03-09)
53
”Collecting Data” http://betterthesis.dk/research-methods/lesson-1different-approaches-to-
research/collecting-data (Retrieved 2019-02-09)

5.2.1 Exchange Rates (FX data)

An exchange rate shows the value of one currency unit relative to a unit of
another currency in the foreign exchange market.54 Further in this report, a
currancy pair Currancy1 Currancy2 represents the price given in currency 2,
for one unit of currency 1. As FX-data, the currency pairs used are EURUSD,
GBPUSD, AUDUSD, NZDUSD, USDCAD, USDCHF, USDJPY, USDNOK and
USDSEK.

5.2.2 US Sectors Data

The sector data used are indices, each one describing the performance of
a chosen sector in the United States.             The index is designed by Morgan
Stanley Capital International (MSCI) and covers securities in the large and
mid cap segment within the specific sector.            MSCI is a provider of security
indices and performance analytics.55             The classification of the securities
follows the Global Industry Classification Standard (GICS®).56                  Notations
for each sector are MXUS0EN (Energy), MXUS0MT (Materials), MXUS0IN
(Industrials), MXUS0CD (Consumer Discretionary), MXUS0CS (Consumer
Staples),      MXUS0HC (Health Care),          MXUS0FN        (Financials),     MXUS0IT
(Information Technology) and MXUS0TC (Telecom Services) and MXUS0UT
(Utilities).

5.2.3 Countries- Stock Index Data

The country (and region) indices used are MXDE (Denmark), MXEU (Europe),
MXGB (United Kingdom),            MXFR (France),        MXCH (Switzerland),         MXES
(Spain), MXIT (Italy) and MXUS (the United States). Each index is developed
  54
     Investopedia,                                                                 ”Currancy
Pair Definition”.https://www.investopedia.com/terms/c/currencypair.asp. (Retrieved 2019-05-
04)
  55
     ”Index solutions”. MSCI, https://www.msci.com/index-solutions (Retrieved 2019-05-18)
  56
     ”MSCI                                                                              USA
MATERIALS INDEX”. MSCI, 2019-04-30. https://www.msci.com/documents/10199/6ce4617e-
9127-480f-8f3b-1fdf4c0c8962 (Retrieved 2019-05-03)

                                                                                         29

You can also read