BIGDATA PROCESSING USING MAPREDUCE FOREIGN EXCHANGE (EUR/USD CURRENCY PAIR)

Page created by Marilyn Carlson
 
CONTINUE READING
BIGDATA PROCESSING USING MAPREDUCE FOREIGN EXCHANGE (EUR/USD CURRENCY PAIR)
BigData Processing Using MapReduce
     Foreign Exchange (EUR/USD Currency Pair)
                                                        Say Er Lim
                                          University Malaya, Selangor, Malaysia
                                                 say2@siswa.um.edu.my

                        Hui Kim Law, Saeed Aghabozorgi, Ying Wah Teh , and Tutut Herawan
                                     University Malaya, Selangor, Malaysia
           {stephyi_hk.qin@siswa.um.edu.my, saeed@um.edu.my, tehyw@um.edu.my, tutut@um.edu.my }

Abstract - This paper describes how using Hadoop                    The foreign exchange market is representing the largest
MapReduce to process big data. The big data that used in            asset class in the world leading to high liquidity, it is
this project is foreign exchange rate of EUR/USD currency           unique and its trading volume is huge. The foreign
pair which taken day by day within a minute. Firstly, the           exchange market operates continuously day by day with
foreign exchange data will load into a Linux environment
                                                                    24 hours per day. Thus, the exchange rates are
that stimulated by the Ubuntu that already set up in a
desktop computer by using Hadoop MapReduce function.                inconsistent, it might change every day with every minute
After that, we extract the required data from the Hadoop            either rise or decline. Foreign exchange rate is among the
that has been successfully loaded. Then, those data are used        most important economic indices in the international
to show time series and predict the foreign exchange rate for       monetary market.
the future (e.g. the next day).                                        In foreign exchange markets, normally we have two
                                                                    sets of price data which are bid and ask price. Ask is the
Index Term - foreign exchange rate, big data, Hadoop,               price that the broker will sell you the position you
MapReduce, predict, moving average.                                 required, while bid price is the price which a broker will
                                                                    buy your current day trading position from you. Broker
                   I.     INTRODUCTION                              uses the bid and ask price to buy current trading position
   Advance in technology and social networks have                   or use it to sell the trading position to intended buyer. In
brought a lot of data. The volume of data is increasing,            addition, there have two sets of data to refer to the
become more complex, high velocity and the type of data             opening and the closing price end of the period
is variable. The size of big data might be petabytes, it            respectively in foreign exchange chart. There have a lot
collected by millions of people that consisting of billions         of factors can affect the ask and bid price of foreign
to trillions of record. Furthermore, big data is coming             exchange market such as volatility of trading market,
from a variety of sources such as social media, web,                differentials in interest rates, differentials in inflations
sales, customer information and other. The large and                and other. The effect of foreign exchange fluctuations
complex data sets are difficult and slow to process                 might affect the profitability of an organization’s
efficiently by using traditional data processing                    business and caused the organization is put to exchange
applications. The challenges of those processing                    risk. Due to foreign exchange market trade is operating
applications are hard to process, capture, store, transfer          every day, so the data for foreign exchange market is
and analysis the data sets.                                         large and high rate fluctuation. Therefore, these data need
   The big data used in this project is the EUR/USD                 to be processed, stored, analyzed and predicted in order
foreign exchange’s data. Foreign exchange is the                    to see the trend of the foreign exchange and help the
conversion of currency into another currency. The                   buyer and seller to identify and make a profit trading.
definition of foreign exchange from Cambridge                          In this paper, we will explain about installation of
Advanced Learner’s Dictionary & Thesaurus is described              Ubuntu and configuration of Hadoop to store data and
as the system by which the type of money used in one                retrieve it. Then we will explain about the Moving
country is exchanged for another country’s money,                   Average approach which is used to predict the foreign
making international trade easier. The foreign exchange             exchange.
market enables currency conversion to assists                          The rest of this paper is organized as follows. In
international trade and investment. US dollar (USD), euro           Section II, the related works are described. The
(EUR), Japanese yen (JPY), British pound (GBP) and                  Installation of Ubuntu and configuration of Hadoop to
Australian dollar (AUD) are the major currencies in the             stimulate a Linux environment for processing big data is
foreign exchange market. EUR/USD is a widely traded                 briefly discussed in Section III and IV. In Section V, we
currency pair in the world (Bekiros & Diks, 2008) [1].              will outline the Moving Average algorithm that applied
                                                                    on foreign exchange time series datasets and the system

                                                                1
BIGDATA PROCESSING USING MAPREDUCE FOREIGN EXCHANGE (EUR/USD CURRENCY PAIR)
architecture. The Graphical User Interface (GUI) for this         fuzzy network with a parallel genetic algorithm also is a
user module is described in Section VI. In Section VII,           good choice for predicting the foreign exchange. Fuzzy
conclusion and future perspectives are drawn.                     inference system has the ability to approximate any non-
                                                                  linear mapping (Kosko, 1993). The genetic algorithm and
                II.    RELATED WORKS                              the adaptive fuzzy network system will optimize the
                                                                  network to approximate the mapping. AutoRegressive
   In the study of Meese and Rogoff showed that naïve             Integrated Moving-Average (ARIMA) is also a foreign
random walk benchmark model is better than                        exchange forecasting model that used by many
conventional linear models in forecasting future exchange         researchers in foreign exchange market. The ARIMA
rates. The authors Chun Teck, Tze Haw and Chee Wooi               models are often referred to as Box-Jenkins models and
employ artificial neural networks (ANNs) and                      are first popularized by Box and Jenkins. ARIMA model
unconditional Vector Autogressive model (VAR) to                  combining its own past values, past errors, current and
predict Yuan/USD exchange rates by using monetary                 past values of other time series to predict a value in time
fundamentals. The result of them shows that ANNs                  series. ARIMA model consist three stages which are
outperformed in market rate forecasts and are supported           identification stage, estimation and diagnostic checking
by monetary fundamentals [2]. Besides that, some                  stage, and the last stage is forecasting.
researchers had used order flow in exchange rate
prediction. They found out that order flow can provide                           III.     HADOOP MAPREDUCE
powerful information that allow public to forecast the
daily exchange rate. Mahnaz Mahdavi had used the loss                MapReduce is a computing model, it used for
function approach of Bayesian statistics to forecast              efficiency processing large data sets and distributed over
foreign exchange rate in his paper. He proposes a loss            cluster of computers. However, Hadoop is an open source
function in his forecasting model and the Bayesian                Java programming framework; it implements a
forecasts slightly outperformed the classical forecast of         computational paradigm named MapReduce for
foreign exchange [3]. In the paper of Forecasting of              processing large data sets on distributed computing
foreign exchange rates of Taiwan’s major trading                  environment. MapReduce is a programming model and
partners by novel nonlinear Grey Bernoulli model                  software framework proposed by Google(Dean &
NGBM, the authors had study the feasibility and                   Ghemawat, 2008) [8]. The Hadoop MapReduce is
effectiveness of novel Grey model with the concept of             inspired by the Google’s MapReduce that invented in the
Bernoulli differential equation for foreign exchange              year 2004, where a software framework application could
prediction. Novel Nonlinear Grey Bernoulli Model                  be broken down into numerous small parts. This Hadoop
(NGBM) has shown improving in the precision of the                MapReduce is a popular big data processing engine that
traditional Grey forecasting model in the preliminary             dedicated to scalable and distributed data intensive
result of this paper and this model is successfully applied       computing. MapReduce consist and perform two separate
in forecasting annual foreign exchange rates of 13                and user-defined functions which is map and reduce in
countries in year 2005 [4]. Furthermore, from the paper           Hadoop program. First, the data sets will be split into
that I study, the authors use relative power parity (PPP)         smaller chunks and then distributed as an input into map
model based on consumer price index (CPI) or traded-              process. The map process will break down the individual
goods price index (TPI) and a linear forecasting                  elements into tuples (key/value pairs). After that, the
technique to determine Yen/US Dollar exchange rates               Hadoop MapReduce framework sorts the outputs of the
over a short-term horizon period. The TPI-based PPP-              maps, which are then input to the reduce process. The
model in outperforming the pure random walk is better             reduce job will combine those data tuples into a smaller
than CPI-based PPP-model [5]. However, CPI-based                  set of tuples to form the output.
PPP-model produced lower forecast error than a random
walk model. An adaptive autoregressive moving average                      IV.          INSTALLATION OF UBUNTU
(ARMA) combining with differential evolution (DE)
based training forecasting model had been studied by                 Firstly, before storing and processing the foreign
some researchers to shows that this proposed ARMA-DE              exchange data, the installation and configuration for the
exchange rate prediction model has superior prediction            Hadoop MapReduce in the personal computers (stand-
potential in short and long range if compare to other             alone system) are needed. From the literature review
models [6].                                                       (Daneshyar & Patel, 2012) that has been found, it is
                                                                  determined that the Hadoop MapReduce is more suitable
A. Forecasting Techniques                                         to install on the Linux environment than the windows
   Neural network is one of the forecasting models for            environment because the windows environment had
foreign exchange market. Yeo state that neural network            problems connecting to the distributed cluster [9]. By
techniques are prime candidates for prediction purpose of         default the personal computer is using the windows
high volatility, complexity and noise market environment          environment, so, it is highly recommended to install the
(Yao & Tan, 2000). Neural networks model able to use              Ubuntu operating system into the personal laptop in order
fundamental and technical indicators as an input to               to run the Hadoop MapReduce. This Ubuntu operating
simulate fundamental and technical analysis, can also             system is a complete desktop Linux-based operating
decrease prediction risks [7]. In addition, an adaptive           system that allows the Linux application to be compiled

                                                              2
BIGDATA PROCESSING USING MAPREDUCE FOREIGN EXCHANGE (EUR/USD CURRENCY PAIR)
and run on a windows operating system in secure - the               prediction of foreign exchange rate is because the data
files and data will stay protected, as well as it loads             analysis of EUR/USD exchange rate is within one day
quickly on any computer. The installation of this Ubuntu            per minute time series and its focus is only for the closing
operating system enables the Hadoop MapReduce to run                ask. It focused on the closing ask is because the closing
on the windows laptop over the Ubuntu. After installation           asks are the most real data of the day and this ask rate
of the Ubuntu operating system, the Hadoop MapReduce                will be brought to the next day’s open asks, furthermore
in the Ubuntu operating system needs to be configured               people mostly use this ask rate to buy the current trading
before it can be used by executing the command. Then,               position from a broker or changing the other country’s
the foreign exchange rate for EUR/USD currency pair                 currency. In addition, using moving average for analysis
can be loaded into the Hadoop MapReduce, and user                   and predicting foreign exchange rate is because it need
needs to key in the Java coding to extract the desired data         rely on previous observed exchange rate to perform
such as date, time and closing ask of the EUR/USD                   further forecasting.
foreign exchange rate as the output.                                  Essentially the analysis performed by Moving Average
                                                                    modeling is divided into two stages. The “Identification”
       V.     MOVING AVERAGE TECHNIQUES                             and “Prediction” stages are summarized below.

   Time series data is ordered by time, exchange rate is            A. Identification Stage
time series data and its data is collected at specific points
                                                                       The first process in identification stage is to specify the
in time. The data (exchange rate) that we measuring are
                                                                    input data set. The input data set is the foreign exchange
referred as variable. Commonly, the frequencies of time
                                                                    rate of EUR/USD currency pair. Then use an identify
series data are observed at annual, quarterly, monthly,
                                                                    statement to read the data of EUR/USD foreign exchange
weekly or daily. In this project, we observed the
                                                                    rate. After that, extract the wanted parameters from the
frequency of exchange rate in daily. Time series analysis
                                                                    Hadoop MapReduce as an output to plot a time series
includes methods that use for analyzing time series data
                                                                    graph according to the date (as an input) that enter by
in order to extract useful and meaningful statistics and
                                                                    users. Table 1 shows the example of EUR/USD foreign
also other characteristics of the data. The techniques of
                                                                    exchange rate data set, and the time series of EUR/USD
time series analysis may be parametric or non-parametric
                                                                    foreign exchange rate that has been plotted is shown in
methods. Time series prediction is use of a model to
                                                                    the Fig. 1 below. The system architecture is shown in Fig.
predict future values based on previously observed
                                                                    2 below.
values. The exist a lot of time series prediction
techniques that use previously observed values or data as                   TABLE 1. EUR/USD Foreign Exchange Rate Data Sets
the basis of estimating future outcome such as moving
                                                                          Date                Time               EUR/USD
average, weighted moving average, exponential
                                                                                                                (Close, Ask)
smoothing, autoregressive moving average (ARMA),
autoregressive integrated moving average (ARIMA),                      12-09-2012           00:09:00              1.28617
linear prediction, trend estimation, growth curve and                  12-09-2012           00:08:00              1.28617
other techniques.                                                      12-09-2012           00:07:00              1.28620
   In this paper, Moving Average technique will be used                12-09-2012           00:06:00              1.28618
to analyze the data and performing prediction. The                     12-09-2012           00:05:00              1.28616
extracted output from the Hadoop MapReduce will be                     12-09-2012           00:04:00              1.28627
passed to the Moving Average model for further analysis                12-09-2012           00:03:00              1.28622
by performing a series of calculation on the closing ask               12-09-2012           00:02:00              1.28625
of foreign exchange rate in order to predict the future                12-09-2012           00:01:00              1.28625
exchange rate. Moving average also called rolling                      12-09-2012           00:00:00              1.28625
average or running average in statistics. The moving                   11-09-2012           23:59:00              1.28620
average model is a simple and common technique that                    11-09-2012           23:58:00              1.28616
used with time series data to analyze a set of data points,            11-09-2012           23:57:00              1.28615
and it can smooth out the fluctuations and highlight                   11-09-2012           23:56:00              1.28632
longer-term trends. This moving average model is often                 11-09-2012           23:55:00              1.28607
used in technical analysis of financial data such as stock             11-09-2012           23:54:00              1.28611
prices, exchange rate or trading volume and can also use               11-09-2012           23:53:00              1.28604
in economics to examine microeconomic time series.                     11-09-2012           23:52:00              1.28602
More than that, moving average is one of the most used                 11-09-2012           23:51:00              1.28625
indicators in Foreign Exchange Market (FOREX). A                       11-09-2012           23:50:00              1.28619
moving average’s formula is taken to predict the foreign
                                                                       11-09-2012           23:49:00              1.28624
exchange rate after identifying and extracting necessary
                                                                       11-09-2012           23:48:00              1.28625
data from Hadoop MapReduce.
                                                                       11-09-2012           23:47:00              1.28626
   The following example illustrates Moving Average
modeling and prediction using a simulated data set                     11-09-2012           23:46:00              1.28621
containing a time series data. The reasons for choosing                11-09-2012           23:45:00              1.28624
Moving Average model as big data analytics and                         11-09-2012           23:44:00              1.28622

                                                                3
11-09-2012             23:43:00                 1.28613
   11-09-2012             23:42:00                 1.28605               R11 = 1.28629 +1.28615 + 1.28610 + 1.28611 + 1.28610
   11-09-2012             23:41:00                 1.28622                                         +
   11-09-2012             23:40:00                 1.28633                     1.28610 + 1.28608 + 1.28609 + 1.28626 + 1.28609

                                                                                                           10
                                                                                = 1.28614

                                                                             VI.       GRAPHICAL USER INTERFACE (GUI)
                                                                          The user module that used in this paper is the Java
                                                                        Graphical User Interface (GUI). This module is to
                                                                        provide an interface for the user to select based on their
                                                                        preferred date of exchange rate graph and then predict the
                                                                        next closing asks exchange rate accordingly. The GUI
                                                                        performance is shown in the Fig. 3, Fig. 4 and Fig. 5
                                                                        below.

Figure 1. Time Series of EUR/USD Foreign Exchange Rate (From Sept
                      11, 2012 to Sept 12, 2012).

                                                                         Figure 3. The user interface of EUR/USD Currency Prediction System

                   Figure 2. System Architecture

B. Prediction Stage
  When the outputs are extracted and the time series is
plotted, the next step is using formula to perform the
prediction of future exchange rate. For example, if those
exchange rates are R t, Rt-1, Rt-2, …… R t-(N-1) for N days
then the formula is:

         R t+1 =   R t + Rt-1 + Rt-2 + …… + R t-(N-1)
                             N
where Rt+1 = Prediction Closing Ask Rate for Period t+1
       Rt-1 = Closing Ask Rate for Period t-1                           Figure 4. The users interface that let user make a selection based on
                                                                                                    their desired date
          N = Number of Periods in the Moving Average

So for example, if a ten-period moving average would be:
         R t+1 =   R t + Rt-1 + Rt-2 + …… + R t-(N-1)
                                 10

                                                                    4
differential evolution based training. Journal of King Saud
                                                                                      University-Computer and Information Sciences.
                                                                               [7]    Yao, J., & Tan, C. L. (2000). A case study on using neural
                                                                                      networks       to     perform     technical     forecasting      of
                                                                                      forex. Neurocomputing, 34(1), 79-98.
                                                                               [8]    Dean,     J.,    &    Ghemawat,      S.    (2008).    MapReduce:
                                                                                      SimplifiedDataProcessingonLargeClusters. Communication of The
                                                                                      ACM, Vol.51, No, 107–113.
                                                                               [9]    Daneshyar, S., & Patel, A. (2012). Evaluation of Data Processing
                                                                                      Using MapReduce Framework in Cloud and Stand-Alone
                                                                                      Computing. International Journal, 3.
                                                                               [10]   Muhammad, A., & King, G. A. (1997, March). Foreign exchange
                                                                                      market forecasting using evolutionary fuzzy networks.
                                                                                      In Computational Intelligence for Financial Engineering (CIFEr),
                                                                                      1997., Proceedings of the IEEE/IAFE 1997 (pp. 213-219). IEEE.
                                                                               [11]   Iokibe, T., Murata, S., & Koyama, M. (1995, October). Prediction
                                                                                      of foreign exchange rate by local fuzzy reconstruction method.
                                                                                      In Systems, Man and Cybernetics, 1995. Intelligent Systems for
      Figure 5. Time Series of EUR/USD Foreign Exchange Rate that                     the 21st Century., IEEE International Conference on (Vol. 5, pp.
                   generated based on the user selection.                             4051-4054). IEEE.
                                                                               [12]   Gutjahr, S., Riedmiller, M., & Klingemann, J. (1997). Daily
                                                                                      prediction of the foreign exchange rate between the us dollar and
                     VII.       CONCLUSION                                            the german mark using neural networks. Proc. of SPICES, 492-
                                                                                      498.
   We have proposed using Hadoop MapReduce for
                                                                               [13]   Dittrich, J., & Quiané-Ruiz, J. A. (2012). Efficient big data
processing foreign exchange data in this paper. The                                   processing in Hadoop MapReduce. Proceedings of the VLDB
programming language used in this user module is Java.                                Endowment, 5(12), 2014-2015.
A simple and clear technique (Moving Average) is used                          [14]   Narayan, S., Bailey, S., & Daga, A. (2012, November). Hadoop
to forecast the exchange rate for EUR/USD currency pair.                              Acceleration in an OpenFlow-based cluster. In High Performance
Besides that, we found out that Hadoop MapReduce is                                   Computing, Networking, Storage and Analysis (SCC), 2012 SC
                                                                                      Companion: (pp. 535-538). IEEE.
suitable for processing a variety of big data sets, it can
                                                                               [15]   Schultz, J., Vierya, J., & Lu, E. (2012, November). Analyzing
minimize the processing time and get the accurate output                              Patterns in Large-Scale Graphs Using MapReduce in Hadoop.
in the shortest time. Using another algorithm to predict                              In High Performance Computing, Networking, Storage and
the exchange rate and processing the big data within least                            Analysis (SCC), 2012 SC Companion: (pp. 1457-1458). IEEE.
time require can be another opportunity for further work.                      [16]   Saeed Reza Aghabozorgi and Teh Ying Wah. "Shape-based
                                                                                      Clustering of Time Series Data", Journal of Intelligent Data
                                                                                      Analysis 18(5) (ISI/SCOPUS Cited Publication /Accepted).
              VIII.    Acknowledgment
                                                                               [17]   Saeed Reza Aghabozorgi and Teh Ying Wah. "Incremental
  The authors would like to thank the reviewers for their
                                                                                      Clustering of Time Series Data by Fuzzy Clustering", Journal of
comments on earlier versions of this paper. This research                             Information Scienceand Engineering        28     (4),     671-688
is funded by University of Malaya Research Grant                                      (ISI/SCOPUS Cited Publication / Published )
(UM.C/625/1/HIR/MOHE/SC/13/2).                                                 [18]   Saeed Reza Aghabozorgi and The Ying Wah. “Stock Market Co-
                                                                                      movement Assessment using a Three-Phase Clustering Method”,
                                                                                      Expert            Systems            With             Applications,
                                                                                      DOI:10.1016/j.eswa.2013.08.028          (ISI/SCOPUS          Cited
                                                                                      Publication).
                            REFERENCES                                         [19]   Saeed Reza Aghabozorgi, Teh Ying Wah, Amineh Amini, and
                                                                                      Mahmoud Reza Saybani "A New Approach to Present Prototypes
                                                                                      in Clustering of Time Series", in Proceedings of The 7th
[1]    Bekiros, S. D., & Diks, C. G. (2008). The nonlinear dynamic                    International Conference of Data Mining, Las Vegas, USA, July
       relationship of exchange rates: Parametric and nonparametric                   2011, pp. 214-220.
       causality testing. Journal of macroeconomics, 30(4), 1641-1650.
[2]    Lye, C. T., Chan, T. H., & Hooy, C. W. (2011). Forecasting
       Chinese Foreign Exchange with Monetary Fundamentals using
       Artificial Neural Networks. In 3rd Int Conf Inf Finance Eng (Vol.
       12, pp. 560-564).
[3]    Mahdavi, M. (1997). A Bayesian approach to foreign exchange             Say Er Lim was born at Muar, Johor Malaysia, on 27 July 1990. She
       forecasting.Global Finance Journal, 8(1), 15-31.                                             gained her bachelor of Information
[4]    Chen, C. I., Chen, H. L., & Chen, S. P. (2008). Forecasting of                               Technology (IT) which major in management
       foreign exchange rates of Taiwan’s major trading partners by                                 at University of Malaya, Malaysia (2010-
       novel nonlinear Grey Bernoulli model NGBM (1,                                                2014).
       1). Communications in Nonlinear Science and Numerical
       Simulation,13(6), 1194-1204.
[5]    Grossmann, A., & Simpson, M. W. (2010). Forecasting the
       Yen/US Dollar exchange rate: Empirical evidence from a capital
       enhanced relative PPP-based model. Journal of Asian
       Economics, 21(5), 476-484.
[6]    Rout, M., Majhi, B., Majhi, R., & Panda, G. (2013). Forecasting
       of currency exchange rates using an adaptive ARMA model with

                                                                           5
Saeed Aghabozorgi received his B.Sc. in Computer Engineering and
                     Software Discipline from University of
                     Isfahan, Iran, in 2002. He received his
                     M.Sc. from Islamic Azad University, Iran, in
                     2005, and his Ph.D from University of Malaya
                     in 2013. Currently, he is a lecturer at the
                     Department of Information System, Faculty of
                     Computer      Science    and      Information
                     Technology, University of Malaya, Kuala
                     Lumpur, Malaysia. His current research area is
                     data mining.

Law Hui Kim, was born in the city of Malacca, Malaysia, July 26,
                    1990 She gained her barcelor of Information
                    Technology (IT) that major in the
                    management field at the University of Malaya
                    (UM), Kuala Lumpur, Malaysia (2010-2014).

Ying-Wah Teh received his B.Sc. and M.Sc. from Oklahoma City
                      University and Ph.D. from University of
                      Malaya. He is currently an Associate
                      Professor at Information Science Department,
                      faculty of Computer Science and Information
                      Technology, University of Malaya. His
                      research interests include data mining, text
                      mining, document mining, cloud computing
                      and big data.

                           TUTUT HERAWAN received PhD degree in
computer science in 2010 from Universiti Tun Hussein Onn Malaysia.
He is currently a senior lecturer at Department of Information System,
University of Malaya. His research area includes rough and soft set
theory, DMKDD, and decision support in information system. He is an
editorial board and act as a reviewer for various journals. He has also
served as a program committee member and co-organizer for numerous
international conferences/workshops.

                                                                          6
You can also read