Predicting the Amount of Professional Matches for Three Different Esports - A time series analysis

Page created by Robin Pena
 
CONTINUE READING
Predicting the Amount of Professional Matches for Three Different Esports - A time series analysis
Predicting the Amount of Professional Matches for Three Different
 Esports
 A time series analysis

 Christopher Englesson and Ludvig Karlin

 Bachelor’s thesis in Statistics

 Advisor
 Lars Forsberg

 2021
Predicting the Amount of Professional Matches for Three Different Esports - A time series analysis
Abstract
In this paper, we will look at the compatibility of different forecasting methods applied to
time series data in esports, specifically three esports, League of Legends, Counter
Strike:Global Offensive and Defence of the Ancients 2. The purpose of the study is to assess
whether forecasting the amount of professional esport matches for the first three months of
2021 is possible and if so, how accurately. The forecasting methods used in the report are
seasonal ARIMA (SARIMA), autoregressive neural networks (NNAR) and a seasonal naïve
model as a benchmark. The results show that, for the chosen methods, all the three datasets
were able to fulfill the statistical requirements for producing forecasts as well as
outperforming the benchmark model, although with various results. Considering the three
games, the one that the study was able to predict with highest accuracy was the CS:GO
dataset with a NNAR model where we achieved a mean absolute percentage error of 31%.

Keywords: Naïve, ARIMA, Neural Networks, seasonality, forecasting, esports.
Predicting the Amount of Professional Matches for Three Different Esports - A time series analysis
Table of Contents
1. Introduction 1
 1.1 Problematization 1
 1.2 Purpose 3
 1.3 Research Questions 3

2. Theory 4
 2.1 Seasonal Naïve Model 4
 2.2 Autoregressive Integrated Moving Average 4
 2.2.1 Seasonal ARIMA 5
 2.2.2 Box-Jenkins Methodology 6
 2.3 Neural Networks 7
 2.4 Evaluation measures 8
 2.4.1 Mean Error 8
 2.4.2 Root Mean Square Error 9
 2.4.3 Mean Absolute Error 10
 2.4.4 Mean Absolute Percentage Error 10
 2.4.5 Akaike Information Criterion 10

3. Data 11

4. Method 13
 4.1 Seasonal Naïve Model 13
 4.2 SARIMA 13
 4.3 Neural Network Autoregressive 15
 4.4 Model Evaluation 15

5. Results 16
 5.1 General Results 16
 5.2 LOL 17
 5.3 CS:GO 18
 5.4 Dota2 18
 5.5 Model Comparison 18

6. Analysis 19
 6.1 Seasonal Naïve Models 19
 6.2 SARIMA Models 19
 6.3 Neural Network Autoregressive Models 20

7. Conclusion 21

References 22

Appendix 24
 Appendix A 24
 Appendix B 25
 Appendix C 27
 Appendix D 29
Predicting the Amount of Professional Matches for Three Different Esports - A time series analysis
1. Introduction
1.1 Problematization
In the beginning of the 80s the world was introduced to the first form of esports when
arcades, with a vast selection of games, opened all around the world. These electronic devices
would be the start of a whole new genre of sport, so called electronic sports (esports) (Lee
and Schoenstedt, 2011). Esport are, unlike traditional sports, a discussed form of sport
(Hamari and Sjöblom 2017; Jonasson and Thiborg 2010; Pizzo et al. 2018), where the actual
exercise happens through electronic environments or in “virtual worlds''. In practical terms,
esport is competitive gaming. Like traditional sports, the competition is between
human-human interactions, although in esport these interactions are facilitated by some
electronic media (Hamari and Sjöblom, 2017). This electronic media can be everything from
a gaming console to a personal computer (Pizzo et al. 2018). Furthermore, esport consist of a
broad spectrum of different games and genres, considered different (e)sports, and does not
necessarily have to mimic traditional sports even though some games do, like the soccer
game FIFA or the ice-hockey game NHL (Hamari and Sjöblom 2017; Pizzo et al. 2018).
Other games are closer to the perception of traditional gaming like the first-person shooter
Counter-Strike:Global Offensive (CS:GO) or the online battle arena games League of
Legends (LOL) and Defense of the Ancients 2 (Dota2) (Hamari and Sjöblom 2017). Based
on the definition of esport, there are simultaneous similarities and differences to traditional
sports.

Esport shares, in terms of structure, a lot of similarities with traditional sport with players
competing for different teams, players having managers and being subject for
player-transfers. Additionally, various esports have recently been introducing multiple
leagues and some colleges even offer esport scholarships (Pizzo et al. 2018). However,
differences are more present in terms of exercising the two; Firstly, the physical movements
required in esports are limited to small muscle groups with focus being on fine motor skills.
The second aspect that differs between the two is the availability of the sport. Esports can
only be exercised with access to the right equipment as well as under the supervision of

 1
Predicting the Amount of Professional Matches for Three Different Esports - A time series analysis
institutions unlike most traditional sports where anyone can play without permission from
institutions or without access to expensive equipment (Jenny, et al. 2017). Whether or not
esport is to be considered a sport will not be further explored in this paper but rather
evaluated as a phenomenon.

Esport has seen an enormous growth in popularity and rising (Rosell Lloren, 2017). Among
the most popular under 2020 we find the games LOL, CS:GO and Dota2 chronologically,
exceeding a total of 1.1 billion hours watched (Borisov, 2021). As a result, the esport market
is expected to see a growth in size to about 32,5% by 2021 (Elasri-Ejjaberi,
Rodriguez-Rodriguez and Aparicio-Chueca, 2020) and to generate revenues near $2 billion
by the year of 2022 (Reyes, 2021). Considering the large revenues esport generates, many
large cap companies, such as Red Bull, Samsung, McDonald’s, Toyota and Microsoft, have
shown increased interest in the industry in terms of sponsors, thus exposing themselves to a
large new market (Elasri-Ejjaberi, Rodriguez-Rodriguez and Aparicio-Chueca 2020; Pizzo et
al. 2018). In extension, advertising and sponsorship stands for 69% of the cash flowing into
the esport industry (Reyes 2021). Not only large cap companies are drawn to the exploding
market that is esport but also private investors and venture capitalists (Newman et al., 2020).
While esport, as an industry, is growing rapidly the investments grow at an even quicker pace
(ibid). Meaning that the industry, as of now, attracts a variety of different stakeholders.

Reasonably, all the esport industry’s stakeholders have an interest in predicting the future of
the industry, thus securing their own interest on the market. In statistics, forecasting methods
are often used for predicting the future, especially when considering time series data. Several
different methods are available, the most proven method for forecasting is the ARIMA
model, following the Box-Jenkins methodology (Zhang, 2003). Even though this model has
shown satisfactory results, new methods for forecasting are constantly emerging. One such
method is forecasting using machine learning, or more specifically neural networks
(Makridakis, Spiliotis and Assimakopoulos, 2018). While these methods seem to be proven
on time series, there is to our knowledge, no forecasting research done on time series data
trying to predict the growth of esport as an industry nor the frequency of matches being
played. In addition to the fact that there is no relevant research done on this subject,

 2
comparing forecasting results can be challenging. A commonly used benchmark model is the
naïve model or the seasonal naïve model (Hyndman and Athanasopoulos, 2018; Makridakis,
Wheelwright and Hyndman, 1998).

1.2 Purpose
Ascending from the rapid growth in the esport industry, its many stakeholders and the fact
that no relevant academic research has applied forecasting methods on the growth of esport,
this paper aims to predict how many professional matches that will be played in the top three
most popular esports during the first three months of 2021. Furthermore, we aim to achieve
this by using proven forecasting methods and comparing them to a benchmark model.

1.3 Research Questions
Can we predict how many professional matches will be played during the first three months
of 2021 for the three most popular esports and if so, how accurately by using ARIMA and
neural networks models?

 3
2. Theory
In this section, three different statistical theories regarding forecasting will be explored;
seasonal naïve, autoregressive integrated moving average and neural networks in
chronological order. Lastly, error measures, that facilitate evaluation of these forecasting
methods, will be defined.

2.1 Seasonal Naïve Model
If you have belief that the value today will be equal to the value yesterday you might want to
consider using a Naïve model. Forecasting using this basic method will mean that the
predicted value will be equal to the last observed value (Hyndman and Athanasopoulos
2018).

Considering seasonal data, there is the Seasonal Naïve method which predicts each value as
equal to the same value last season. In this case predictions will be generated to be equal to
the last observed value 52 weeks earlier. The forecast for time T +h can be written as:

 , (5)

where m equals the seasonal period, and k is the number of years in the forecast period prior
to time T + h (Hyndman and Athanasopoulos 2018). Often a Naïve or Seasonal Naïve model
is used to compare other, more complex, models' error measures or accuracy (Makridakis,
Wheelwright and Hyndman, 1998).

2.2 Autoregressive Integrated Moving Average
In time series analysis the autoregressive integrated moving average model (ARIMA) is a
generalization of the autoregressive moving average (ARMA) model and is a common
forecasting method used in time series analysis. An ARIMA model can be interpreted as
three different parts with the first part referring to the autoregressive process. The
autoregressive process states that the output variable will depend linearly on its previous
values as well as an error term that represents what cannot be explained from the past values.

In order for this to be possible we have to assume that ( ) and that the error terms are
independent of the past values of yt all through the entire time series. An autoregressive

 4
process of order p, where B is the backshift operator, can be commonly expressed as an
AR(p) (Box, Jenkins and Reinsel 1994):

 . (1)

The MA part of the ARIMA model is referring to the moving average which does not use the
past values of the variable but instead relies on the past error terms to make forecasts of
future values. The present value can be found by adding weights to the previous error terms
(Box, Jenkins and Reinsel 1994). A moving average process of order q or a MA(q), can be
commonly expressed as (Cryer and Chan, 2008) :

 . (2)

Lasty we have an integration of the autoregressive process and moving average process of
some difference, d. The integration tells us that the data values have been replaced with the d
difference between their values and the d previous values. In usage of an ARMA model the
data needs to be stationary, meaning that the properties of the series should not depend on
time. One way of achieving this is by using an integrated ARMA (ARIMA) model. The
ARIMA model is in time series a common way to model non-stationary data where the d:th
difference, in general, generates a stationary ARMA process (Vandaele, 1983). We can
express a general ARIMA model as (Box, Jenkins and Reinsel, 1994):

 , (3)

where the term ϕp is the AR polynomial, θq is the MA polynomial and, again, B equals the
backshift operator with the d: th difference (Box, Jenkins and Reinsel 1994).

2.2.1 Seasonal ARIMA
A seasonal ARIMA (SARIMA) model is created by adding in seasonal components to the
ARIMA process. The seasonal components are similar to the non-seasonal components but
instead backshifts in regard to seasonal periods. The modelling procedure is very close to
ARIMA but we also need to choose seasonal AR(P) and MA(Q) terms for the model. The

 5
notation for a general SARIMA model can be expressed as SARIMA(p,d,q)(P,D,Q) with m

being the seasonal frequency. We can express a general SARIMA model as (Box, Jenkins and
Reinsel, 1994):

 , (4)

where is the seasonal autoregressive term, the seasonal moving average term and D
the seasonal difference.

2.2.2 Box-Jenkins Methodology
A popular and proven modelling process is the Box-Jenkins method, described by Box,
Jenkins and Reinsel (1994) and applied by many (e. g. Cryer and Chan, 2008; Makridakis and
Hibon, 1997; Makridakis, Wheelwright and Hyndman, 1998; Vandaele, 1983). It is an
iterative modelling process used in most practical situations when all information about the
object for forecasting is not available nor comprehensible (Box, Jenkins and Reinsel, 1994).
The process follows iterative steps to prepare data, identify and evaluate models and lastly
produce forecasts with the chosen model. Summarized, the methodology follows these steps
(Makridakis, Wheelwright and Hyndman, 2018):

 1. Data preparation
 2. Model selection
 3. Model estimation
 4. Model diagnostics
 5. Forecasting

The first step concerns the initial raw data. Here the data are evaluated and should be
transformed or differentiated in order to achieve a stationary time series. The second step
involves selecting a model based on examinations of the data’s autocorrelation function
(ACF) and partial autocorrelation function (PACF). Proceeding, to the third step, where
identified models are evaluated and selected based on established criterions. Fourthly, the
models are evaluated by diagnostics, e.g. testing residuals for autocorrelation. Only if the
models pass the fourth step, can they proceed to the last step of forecasting. If the model does
not pass the diagnostics, you must go back to step two where another model has to be

 6
specified that could pass the diagnostics in step four (Makridakis, Wheelwright and
Hyndman, 2018).

2.3 Neural Networks
Artificial Neural networks is a machine learning method that allows complex nonlinear
relationships between the response variable and its predictors. The method is based on the
human brain and how neurons are connected. Neural networks are commonly used for
classification purposes but have proven to be useful in other fields such as forecasting due to
their ability to capture non-linear relationships (Makridakis, Wheelwright and Hyndman,
1998).

Neural networks generally consist of three layers with connections from one layer to another
passing information along. Neural networks can feed information cyclical or in one direction,
a feed-forward neural network is passing along information in one direction (Hyndman and
Athanasopoulos, 2018). The first layer is called the input layer and consists of a number of
input values, these values enter nodes that later transfer the information onto the next layer.
Each node in the input layer is later connected to each node in the next layer creating a
complex network. The nodes are connected through the layers using weights , and and

these are obtained using a learning algorithm that minimises an error measure, like the Mean
Square Error (MSE). This means that the output of the nodes in one layer are then inputs in
the next. The intermediate layer is called the hidden layer. Like the input layer; the hidden
layer contains nodes and is what makes the neural network non-linear. A neural network with
no hidden layer is equivalent to a normal linear regression (Hyndman and Athanasopoulos,
2018).

The inputs to each node in the hidden layer are combined using a weighted linear
combination which is later modified by a nonlinear function before being treated as output.
The input to each j node in the hidden layer is calculated as

 (6)
 ,

where n is the number of nodes in the input layer and j is the amount of nodes in the hidden
layer. In the hidden layer, the output is modified before being input to the next layer using a

 7
nonlinear function such as the sigmoid function. The general formula for the sigmoid
function:

 (7)
 .

The sigmoid function tends to reduce the effect of extreme values and thus making the
method somewhat suitable for data containing outliers. The modified values become input
values to the next layer which could either be more hidden layers or a final output layer.

To train the neural network, the weight starts off by taking on random values which are later
updated using the observed data. By doing so there is an element of randomness in the neural
network’s predictions. Therefore, a normal approach is to train the network several times
using random starting points and then taking the average of the results. The model is later
tested against new data to give an idea of its accuracy (Hyndman and Athanasopoulos, 2018).

Using neural networks with time series data, the lagged values of the time series can be used
as input values in the first layer. This is called a neural network autoregressive (NNAR)
model. The model has similarities to an AR model but uses the structure of a neural network.
The NNAR, similar to the SARIMA model, performs multi-step forecasting by taking
predicted values into account for further predictions (Hyndman and Athanasopoulos, 2018).
For this report, we use the notation NNAR(p,P,k)m where p is the number of lagged inputs, P
is the amount of last observed values from the same season, k is the number of nodes in the
hidden layer and m is equal to the seasonal frequence. The NNAR model does not require the
data to be stationary in order to train the model, however transforming the data can
sometimes help to improve the model accuracy (Hyndman and Athanasopoulos, 2018).

2.4 Evaluation measures

2.4.1 Mean Error
The mean error (ME) is one of the most basic error measures. It is simply calculated by
dividing the sum of the actual values minus the predicted values by the number of
predictions. Although using the ME as an evaluation measure can be misleading since
negative and positive errors can cancel each other out and thus displaying a good model when
it is in fact not (Hyndman and Athanasopoulos 2018). In this study the ME is used as a quick

 8
overview for under- or overestimation in the predictions rather than a tool for forecasting
accuracy. The formula for ME:

 (8)
 ,

where is the actual value and is the predicted value.

2.4.2 Root Mean Square Error
The root mean square error (RMSE) takes the root of the MSE. Meaning that the measure
takes the square root of the squared difference between the predicted values and the actual
values. The value of the RMSE is hard to interpret by itself but can be a useful tool when
comparing multiple models, applied on the same data set (Hyndman and Athanasopoulos
2018). RMSE is calculated as by:

 (9)
 .

 9
2.4.3 Mean Absolute Error
The mean absolute error (MAE) is similar to the ME but measures the errors in absolute
values. This calculation will then provide a strictly positive output where negative or positive
error does not cancel each other out, like the case with ME. Meaning, that the MAE gives a
more accurate measure on how well the model fits the data given that all the absolute errors
summarize (Hyndman and Athanasopoulos 2018). The formula for MAE is:

 (10)
 .

2.4.4 Mean Absolute Percentage Error
The mean absolute percentage error (MAPE) is an error measure that can be used to compare
models between different data sets given that it does not take the measuring unit into account
but rather outputs the errors in percentage. The calculations are computed by:

 (11)
 ,

where ei is derived as earlier. This lets us calculate the percentage error for a given time point
(Hyndman and Athanasopoulos, 2018).

2.4.5 Akaike Information Criterion
The Akaike information criterion (AIC) is a method used for model selection (Akaike, 1974).
The main objective of the AIC is to estimate the relative loss of information for different
models. The criterion is designed in a way that makes it easy to compare models by choosing
the model with the lowest AIC value. The AIC is defined as:

 (12)

where the term k takes the number of parameters in the model into consideration. When using
an ARIMA model the term equals to k = p + q + 1 with the constant 1 referring to if an
intercept is included in the model. If there is no intercept in the model the constant 1 is
removed. The purpose of the term k is to penalize the model for overfitting (Akaike, 1974).

 10
3. Data
In this section the data used for the study will be presented as well as any transformations
made to it.

The data used in this study have been acquired through Abios Gaming which are a world
leading data supplier within the esport industry. Abios have since 2015 been collecting high
quality data on a broad selection of esport genres using various data collection methods
(Abios, 2021). How Abios defines a “professional esport match” is according to Franscensco
Katsoulakis1 “In general, we collect data on all esports matches for the first three divisions in
each respective game”, which describes how our data are collected. So, only matches played
in the top three divisions of respective games will be used.

The three games used in the study are all played on a global level, meaning that the data
covers many different time zones. With this in mind, we could assume that specific regional
deviations, e.g. holidays, that could affect the amount of matches are not indicative enough to
be taken into account when building the models.

The time series data used in this report spans from January 2017 to and including March
2021. The first four years, i.e. 2017 through 2020, is used to create the models and the first 12
weeks in 2021 is used to evaluate them. The data, that are collected on an hourly basis, have
been aggregated to a weekly basis as hourly predictions would, not exclusively, be difficult to
conduct but also irrelevant for stakeholders. The time period corresponds to 221 weeks with a
total of 142 381 covered matches in the three games LOL, CS:GO and Dota2. All years had
52 weeks except the year of 2020 which had 53 weeks. Given this, the week 52 and 53 in
2020 have been merged together to fit a 52 weeks year schedule. This has been done in order
for our seasonal models to operate accurately with the seasonal difference taken into account,
since it could severely harm the predictions otherwise (Hyndman and Athanasopoulos 2018).
In addition, this data manipulation did not render any extreme values, which indicates that
there is no meaningful harm done to the data.

The time it takes for one match to be played varies within the different games. The game
setting in LOL and Dota2 are similar, although the average game length in professional play

1
 Franscensco Katsoulakis, Head of Data Quality, Abios Gaming AB, verbally, 10th of May 2021.

 11
varies with Dota2 averaging 45 minutes (Hassall, 2020) and LOL matches lasting on average
for 32 minutes (Games of Legends Esport, 2021). Professional CS:GO matches have been
known to take 45 minutes on average (Scales, 2020).

 Figure 3.1. Overview of Figures B.1, B.2 and B.3.

In Figure 3.1, overviewing Figures B.1, B.2 and B.3 in Appendix, we can observe the amount
of matches played every week for the three different games. We can see that the game with
the most played professional matches is CS:GO with yearly peaks at around 500 matches.
Dota2 peaks higher than LOL during some weeks but the LOL matches display a more
seasonal pattern. A reason for this might be that Riot Games, the creator of LOL, operates all
professional leagues and tournaments in the game (Rosell Llorens, 2017). The matches in
Dota2 do not seem as dependent on seasons, although we can see that during some weeks a
large amount of matches are being played.

 12
4. Method
In this section each of the chosen theories' respective methodology will be described in detail.

4.1 Seasonal Naïve Model
By ocularly inspecting the time series plots (see Figure 3.1), some seasonality seems present.
Hence, we choose to apply a seasonal naïve model as our benchmark. The model is applied to
each data set respectively using the snaive() function from the forecast package in R,
predicting that the value of the first week in January 2021 will use the same value as for the
first week in January 2020 and so forth. The models are evaluated using the error measures
defined in Section 2.4.

4.2 SARIMA
The method workflow applied when modelling the (S)ARIMA models is the Box-Jenkins
methodology. Hence, we will follow the steps of the iterative steps defined in Section 2.2.2.

Now when we initially have seen the data, we will proceed to work with it to formulate
models in order to perform forecasts of the first twelve weeks of 2021. In order for SARIMA
modelling to make sense at all, the data needs to contain at least some correlation. By looking
at correlograms we can identify that all data sets contain enough correlation to continue with
our modelling and carry out analysis.

As an initial step, an ocular inspection of the time series plots in Figure 3.1 is done where it is
hard to identify any clear trends in either of the data sets. Hence, we perform the Augmented
Dickey-Fuller (ADF) test to check the three data sets for a unit root. The hypothesis is written
as:

The data for LOL and CS:GO makes us reject the null hypothesis, meaning that on the five
percent significance level there is no unit root. Thus these two data sets are stationary and are
ready to proceed in the modelling process. However, the test on the data for Dota2 did not
make us reject the null hypothesis, suggesting a unit root is present meaning the data are not

 13
stationary. So, this data set needs transformation, or detrending, in order to proceed in the
modelling process. To detrend the data, the first difference is used which should make the
data stationary. However, to be sure the first difference did successfully detrend the data set,
the ADF test is repeated. Now we can reject the null hypothesis on the five percent
significance level and state that this time series is stationary and thus fulfills the requirements
to proceed in the modelling process. Now all data sets show correlation and stationary. Hence
they can proceed to the model selection step. For test output, see Table A.1 in Appendix.

For model selection the correlation of the data is used for modelling by reviewing its ACF
and PACF in order to choose a model. We perform an ocular inspection of the ACF and
PACF for the respective data sets to identify a (S)ARIMA model that could be a good fit for
the data. While this method does not necessarily suggest that the chosen model is the “best”
fit (Makridakis, Wheelwright and Hybdman, 1998), it does however give some indication.
For LOL and CS:GO we identify that an ARIMA(1,0,0) model seems to be a good fit and we
then test different models with varying seasonal components for both data sets. While
inspecting the ACF and PACF for the differentiated data for Dota2, it is not as clear as for the
other data sets. Different SARIMA models are chosen for testing, mainly ARIMA(1,1,1) and
ARIMA (2,1,1), both with various seasonal components. Proceeding with the estimation and
model evaluation diagnostics.

Mainly the error measures ME, RMSE, MAE and AIC are used for model evaluation. All
these measures indicate, in varying manners, how large the error is for each model,
consequently how far the predictions are from the actual values. Point being that low values
on the error measures indicates that the model lies closer to the actual values in its prediction.
Firstly, we look at the in-sample-errors for these measures, which gives an indication on how
the model will perform when trying forecasting. So, the models with the lowest errors over
the different measures will be chosen for forecasting. However, we can see that different
models possess lowest values on different measures, see Table 5.1. Hence, all models will
proceed to the forecasting step in order to evaluate which model renders the most accurate
predictions.

However, before the models will be used for forecasting, respective model’s residuals need to
be tested for autocorrelation. This is done by the Ljung-Box test in order to identify if the

 14
residuals are random and hence indicating that the model is an adequate fit given data. The
test is performed with 20 lags and the hypotheses for the test are:

For test output see Table 5.1 in the Result section.

Lastly, the models that pass step four can proceed to forecasting. The in- and out-of-sample
errors are reviewed and compared to each game’s respective seasonal naïve model to evaluate
which model produces the most accurate predictions. Also the out-of-sample MAPE for each
model is reviewed based on the interpretability and simultaneously enabling us to compare
models between data sets.

4.3 Neural Network Autoregressive
The function nnetar() in R, created by Hyndman and Athanasopoulos (2018), has been used
to estimate the models’ parameters. The NNAR model for Dota2 has been constructed on the
first difference dataset in order to achieve higher accuracy. The function uses MSE as an error
measure and decides the optimal number of lags, p, according to the AIC for a linear AR(p)
model fitted to the seasonal data. Where k is calculated by k = (p + P + 1)/2 if not specified
beforehand. In this study the NNAR models have been constructed using 20 networks fitted
with random starting weights. These are later averaged when producing forecasts for the first
twelve weeks in 2021. The point forecasts are compared to the actual data to calculate the
out-of-sample evaluation error measures. Lastly, the out-of-sample, used earlier, will be used
to compare the rendered models individually and in comparison to the seasonal naïve
benchmark model.

4.4 Model Evaluation
The different models have been evaluated using the aforementioned evaluation measures in
Section 2.4. A specific model is considered to be superior if the majority of the evaluation
measures are more accurate than another model.

 15
5. Results
In this section the results will be displayed and further defined. Structure wise, general
results are presented followed by results from LOL, CS:GO and Dota2 as well as a model
comparison.

5.1 General Results
As mentioned in Method, all data sets were tested for stationarity with the Dickey-Fuller test,
see Table 9.1 in appendix for test output. LOL and CS:GO showed no sign of a unit root
being present on the five percent significance level, hence to be considered stationary.
However, when performing the test on Dota2 we could not reject the null hypothesis on the
five percent significance level. Thus, the data were transformed into first difference and
tested again. After the data manipulation we were able to reject the null hypothesis and thus
consider the data to be stationary. Given that all data sets are stationary they could all proceed
into the modelling process, which are presented next in this section.

In Table 5.1 the results from the model testing are presented. Similarities in the LOL and
CS:GO dataset resulted in the same models being applied to both datasets. Models for each
game are displayed, together with their respective in- and out-of-sample error measures and
also their produced Ljung-Box test p-value. Worth noting is that the models for Dota2 differ,
given the nature of the data, from the models for LOL and CS:GO. The lowest value
produced for each game’s evaluation measures is bold. Also, the model with the lowest
values produced overall for in- and out-of-sample are marked with blue and green
respectively.

 16
Table 5.1: results from model evaluation, forecasting and Ljung-Box test.

From Table 5.1 we can see p-values generated from the Ljung-Box tests, where we can
identify that many of the different models’ residuals show no sign of autocorrelation on the
five percent significance level, except all the naïve models and SARIMA(1,0,0)(0,1,0)52 when
applied on the data for CS:GO.

Furthermore, we can see that most in-sample evaluation measures in our models are lower
than the out-of-sample measures. These measures provide an indication for which model fits
the data well. However, to ultimately evaluate the models, the out-of-sample counterpart is
considered to see how well the models performed when exposed to new data.

5.2 LOL
By firstly inspecting the in-sample measures for the models applied on LOL we can see that
the SARIMA(1,0,0)(0,1,1)52 generates the lowest values for all measures except for the
NNAR(1, 1, 2)52 that generates the lowest ME. However, when observing the out-of-sample

measures we can see that the SARIMA(1,0,0)(1,1,1)52 outperforms all the other models, even
though it performs with lesser accuracy than it did in-sample.

 17
5.3 CS:GO
When observing the model outputs for CS:GO we can see, again, that the SARIMA
(1,0,0)(1,1,1)52 model outperforms the other models, in-sample, over all measures except the
NNAR(1, 1, 2)52 which achieves lower ME. Proceeding with the out-of-sample measures we

can observe that the NNAR outperforms the other models except the SARIMA(1,0,0)(1,0,0)52
which has a lower ME.

5.4 Dota2
Lastly, we consider the results for Dota2. Again, the models used for predicting Dota2 differs
from the models used for the other games. Evidently we can see that the NNAR(9, 1, 6)52

model, which also differs in structure from the NNAR:s for the other esports, outperformed
the other models based on the in-sample measures, except for the AIC that is not produced
for the NNAR where the lowest value were obtained by the SARIMA(1,1,1)(0,1,1)52.
However, the NNAR performs worse in the out-of-sample measures. The lowest
out-of-sample error measures for RMSE and MAE were given by a SARIMA(1,1,1)(0,0,1)52
and the lowest ME were produced by a SARIMA(2,1,1)(0,1,0)52 model.

5.5 Model Comparison
In Table 5.2 we summarize the models with the lowest out-of-sample values for each game
using the evaluation measure MAPE. Firstly, the highest MAPE is achieved by the SARIMA
model for LOL with a value of 77%. Secondly, the SARIMA model for the Dota2 data
produces a MAPE of approximately 41%. Lastly, the NNAR model fitted to the CS:GO
dataset generates the lowest MAPE of roughly 31%.

 Table 5.2: results from model evaluation based on MAPE.

 18
6. Analysis
In this section an analysis of the results is conducted with regard to our chosen methods.
Each method is analysed in chronological order.

6.1 Seasonal Naïve Models
Looking at our results we can see that our most simple model, the seasonal naïve model,
performed badly in comparison to the other models. The seasonal naïve model, for all three
games, showed either large under- or overestimations except for in-sample for CS:GO, based
on the ME values. All the Ljung-Box tests for the seasonal naïve models were significant,
which is a general warning sign for a bad prediction model (Hyndman and Athanasopoulos
2018). Even though this is not applicable for a naïve model, given that the model only uses
the last observed values for predictions, it can still indicate a bad prediction model. To
illustrate, the out-of-sample ME for Dota2 were substantially larger than for the other models
for Dota2 as well as negative, meaning that the seasonal naïve model for Dota2
over-estimated the amount of matches for the prediction period but underestimated for the
training period. The results for the seasonal naïve models indicate that the seasonality in the
datasets vary over the years and that other tools might be necessary in order to capture the
variation. The reason for this might be the usage of the weekly calendar, in which one
specific year's weeks would not correspond to the same dates the next coming year.

6.2 SARIMA Models
For the SARIMA models in LOL and CS:GO we could see that the SARIMA(1,0,0)(0,1,1)52
and SARIMA(1,0,0)(1,1,1)52 produced very similar results and that in both games the model
with a seasonal AR-term performed better for the test period than the training period. The
differences in evaluation measures between the two models are small in both games which
raises the question whether they are significantly different from each other.

The models, fitted to the LOL dataset, display relatively large differences between in- and
out-of-sample measures in comparison to CS:GO and Dota2. We can see that the smallest
differences between in-sample and out-of-sample values are in the CS:GO models. For both
LOL and CS:GO we can identify, for all models, positive and quite large mean errors. For
Dota2 the opposite is true with almost all models showing negative mean errors as well as
relatively small ones. Furthermore, some of the SARIMA models for Dota2 displayed

 19
out-of-sample values to be lower than those in-sample. To elaborate, predicting the amount of
matches has proven itself to be difficult, where the in-sample evaluation measures clearly not
being a guarantee for accurate forecasts. This is also supported by the fact that all the models,
in all the games, with lowest in-sample values were not the most accurate when applied to
new data.

6.3 Neural Network Autoregressive Models
In overall the Neural networks performed well on the training data, especially for the Dota2
dataset where it is substantially better than the other models. Although, the NNAR models
have difficulties on the test data except for the CS:GO data where it outperformed all the
other models. Looking at Figure B.1, B.2 and B.3 in Appendix, we can see the NNAR models
fitted to the training dataset. Following the graphs, The NNAR models for LOL and Dota2
appear to be overfitting but the NNAR model for CS:GO appears not to. The evaluation
measures in Table 5.1 also confirms this with the small differences between in- and
out-of-sample values for the NNAR model in CS:GO. The reason for this can be explained by
the operational process where the NNAR model decides the optimal number of lags p
according to the AIC for a linear AR(p) model fitted to the seasonal data. This process
resulted in a NNAR(1, 1, 2)52model for CS:GO which could not capture all the variation in

the data.

 20
7. Conclusion
So, are we able to predict how many matches will be played in these three esports during the
first three months of 2021 and if so, how well?

In conclusion, the data allowed time series modeling and analysis. Meaning, we were able to
predict, with statistical insurance, how many matches that would be played in the first three
months of 2021 for the three esports, LOL, CS:GO and Dota2. In terms of how accurate these
predictions are, they were able to outperform the benchmark seasonal naïve models. Overall,
the most accurate model is the NNAR model for CS:GO, which achieved a MAPE of roughly
31%, in comparison to the most accurate model for LOL and Dota2 which achieved a MAPE
of 77% respectively 41%. However, the answer to how accurate our models are is arbitrary
and solely depends on how far the predicted values are allowed to be from the actual values.

 21
References
Abios Gaming AB. (2021). About. https://abiosgaming.com/about/ [retrieved 2021-05-15]

Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on
automatic control. 19(6): 716-723.

Borisov, A. (2021). Most popular esports games in 2020.
https://escharts.com/blog/most-popular-esports-games-2020 [retrieved 2021-05-19].

Box, G.E.P., Jenkins, G.M. & Reinsel, G.C. (1994). Time series analysis: forecasting and
control. 3rd ed. New Jersey: Prentice Hall.

Cryer, J.D. & Chan, K. (2008). Time series analysis: with applications in R. 2nd ed. New
York: Springer.

Elasri-Ejjaberi, A., Rodriguez-Rodriguez, S. & Aparicio-Chueca, P. (2020). Effect of eSport
sponsorship on brands: an empirical study applied to youth. Journal of Physical Education
and Sport. 20(2): 852-861

Games of Legends Esport. (2021). World Championship 2020: Overview.
https://gol.gg/tournament/tournament-stats/World%20Championship%202020/ [retrieved
2021-05-24]

Hamari, J. & Sjöblom, M. (2017). What is eSports and why do people watch it. Internet
research. 27(2): 211-232.

Hassall, M. (2020). The Longest Games in Dota2 History.
https://www.hotspawn.com/dota2/guides/the-longest-games-in-dota-2-history [retrieved
2021-05-24]

Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd ed.
Melbourne: OTexts. OTexts.com/fpp2 [retrieved 2021-05-18]

Jenny, S.E., Manning, R.D., Keiper, M.C. & Olrich, T.W. (2017). "Virtual(ly) Athletes:
Where eSports Fit Within the Definition of "Sport". Quest (National Association for
Kinesiology in Higher Education). 69(1): 1-18.

Jonasson, K., Thiborg, J. (2010). Electronic sport and its impact on future sport. Sport in
society. 13(2): 287-299.

 22
Katsoulakis, F. (2021-05-10). Head of Data Quality. Abios Gaming AB. Verbally.

Lee, D. & Schoenstedt, L.J. (2011). Comparison of eSports and Traditional Sports
Consumption Motives. The ICHPER-SD Journal of Research in Health, Physical Education,
Recreation, Sport & Dance. 6(2): 39-44.

Makridakis, S. & Hibon, M. (1997). ARMA Models and the Box–Jenkins Methodology.
Journal of forecasting. 16(3): 147-163.

Makridakis, S., Spiliotis, E. & Assimakopoulos, V. (2018). Statistical and Machine Learning
forecasting methods: Concerns and ways forward. PloS one. 13(3): e0194889-e0194889.

Makridakis, S.G., Wheelwright, S.C. & Hyndman, R.J. (1998). Forecasting: methods and
applications. 3rd ed. New York: John Wiley & Sons.

Newman, J.I., Xue, H., Watanabe, N.M., Yan, G. & McLeod, C.M. (2020). Gaming Gone
Viral: An Analysis of the Emerging Esports Narrative Economy. Communication and sport.

Pizzo, A.D., Baker, B.J., Na, S., Lee, M.A., Kim, D. & Funk, D.C. (2018). eSport vs. Sport:
A Comparison of Spectator Motives. Sport marketing quarterly. 27(2): 108-123.

Reyes, M. S. (2021). The key industry players and trends growing the esports market which
is on track to surpass $1.5B by 2023. Business Insider.
https://www.businessinsider.com/esports-ecosystem-market-report [retrieved 2021-05-19]

Rosell Llorens, M. (2017). eSport Gaming: The Rise of a New Sports Practice. Sport, ethics
and philosophy. 11(4): 464-476.

Scales, K. (2020). A Beginner’s Guide To Esports: Counter-Strike: Global Offensive.
https://checkpointxp.com/2020/04/17/a-beginners-guide-to-esports-counter-strike-global-offe
nsive/ [retrieved 2021-05-24]

Vandaele, W. (1983). Applied time series and Box-Jenkins model. San Diego: Academic
Press.

Zhang, G.P. (2003). Time series forecasting using a hybrid ARIMA and neural network
model. Neurocomputing (Amsterdam). 50: 159-175.

 23
Appendix

Appendix A
 Table A.1: ADF test output for all games with p-values

 24
Appendix B

 Figure B.1: Time series plot for LOL

 Figure B.2: Time series plot for CS:GO

 25
Figure B.3: Time series for Dota2

 26
Appendix C

 Figure C.1: NNAR model over time series plot for LOL

 Figure C.2: NNAR model over time series plot for CS:GO

 27
Figure C.3: NNAR model over first differenced time series plot for Dota2

 28
Appendix D

 Figure D.1 : predicted value over actual value for LOL SARIMA (1, 0, 0)(0, 1, 1) 52

 Figure D.2: predicted value over actual value for CS:GO NNAR (1, 1, 2)52

 29
Figure D.3: predicted value over actual values for Dota2: SARIMA (1, 1, 1)(0, 0, 1)52

 30
You can also read