ANN-Based Forecasting of Foreign Currency Exchange Rates

Page created by Jimmie Reese
 
CONTINUE READING
Neural Information Processing - Letters and Reviews                                        Vol. 3, No. 2, May 2004

  LETTER

           ANN-Based Forecasting of Foreign Currency Exchange Rates
                                             Joarder Kamruzzaman
                           Gippsland School of Computing and Information Technology
                             Monash University, Churchill, Victoria-3842, Australia
                             E-mail: Joarder Kamruzzama@infortach.monash.edu.au

                                                Ruhul A. Sarker
                          School of Information Technology and Electrical Engineering
                 University of NSW, ADFA Campus, NorthCott Drive, Canberra 2600, Australia
                                          E-mail: ruhul@cs.adfa.edu.au

                                           (Submitted on March 24, 2004)

           Abstract - In this paper, we have investigated artificial neural networks based prediction
           modeling of foreign currency rates using three learning algorithms, namely, Standard
           Backpropagation (SBP), Scaled Conjugate Gradient (SCG) and Backpropagation with
           Bayesian Regularization (BPR). The models were trained from historical data using five
           technical indicators to predict six currency rates against Australian dollar. The forecasting
           performance of the models was evaluated using a number of widely used statistical metrics
           and compared. Results show that significantly close prediction can be made using simple
           technical indicators without extensive knowledge of market data. Among the three models,
           SCG based model outperforms other models when measured on two commonly used
           metrics and attains comparable results with BPR based model on other three metrics. The
           effect of network architecture on the performance of the forecasting model is also presented.
           Future research direction outlining further improvement of the model is discussed.

           Keywords-Neural network, ARIMA, financial forecasting, foreign exchange

                                                 1. Introduction
      In the past, foreign exchange rates were only determined by the balance of payments. The balance of
payments was merely a way of listing receipts and payments in international transactions for a country.
Payments involve a supply of the domestic currency and a demand for foreign currencies. Receipts involve a
demand for the domestic currency and a supply of foreign currencies. The balance was determined mainly by the
import and export of goods. Thus, the prediction of the exchange rates was not a complex issue in the past.
Unfortunately, the interest rates, and local and international supply-demand factors had become more relevant to
each currency later on. On the top of this, the fixed foreign exchange rates were abandoned and a floating
exchange rate system was implemented by the industrialized nations in 1973. Recently, further liberalization of
trades is being discussed in General Agreement on Trade and Tariffs.
      Due to the introduction of floating exchange rates and the rapid expansion of global trading markets over
the last few decades, the foreign currency exchange market has experienced unprecedented growth. In fact, the
exchange rates play an important role in controlling dynamics of the exchange market as well as the trading of
goods in the import-export markets. For example, if the Australian currency is weaker than the US currency, the
US traders would prefer to import certain Australian goods, and the Australian producers and traders would find
the US as an attractive export market. On the other hand, if Australia is dependent on US for importing certain
goods, it will then be too costly for the Australian consumers under the current exchange rates. In that case,
Australia may look for a better source that means shifting from the US to a new import market. As we can
imagine, the trade relation and the cost of export/import goods is directly dependent on the currency exchange
rate of the trading partners. The duration of any international trade agreement between any two partners could
vary from short term to many years. However, the exchange rates vary continuously during the trading hours. As
a result, the appropriate prediction of exchange rate is a crucial factor for the success of many businesses and

                                                                                                               49
ANN-Based Forecasting of Foreign Currency Exchange Rate                                 J. Kamruzzaman and R. Sarker

financial institutions. Although the market is well-known for its unpredictability and volatility, there exist a
number of interest groups for predicting exchange rates using numerous conventional and modern techniques.
      Exchange rates prediction is one of the challenging applications of modern time series forecasting. The
rates are inherently noisy, non-stationary and chaotic [5, 27]. These characteristics suggest that there is no
complete information that could be obtained from the past behaviour of such markets to fully capture the
dependency between the future exchange rates and that of the past. One general assumption is made in such
cases is that the historical data incorporate all those behaviour. As a result, the historical data is the major player
in the prediction process. Although the well-known conventional forecasting techniques provide predictions, for
many stable forecasting systems, of acceptable quality, these techniques seem inappropriate for non-stationary
and chaotic system such as currency exchange rates, interest rates and share prices. The purpose of this paper is
to investigate the use of artificial neural networks based techniques for prediction of foreign currency exchange
rates. We planned to experiment with three different algorithms under different architecture for several different
exchange rates.
      For more than two decades, Box and Jenkins’ Auto-Regressive Integrated Moving Average (ARIMA)
technique [2] has been widely used for time series forecasting. Because of its popularity, the ARIMA model has
been used as a benchmark to evaluate many new modelling approaches [9]. However, ARIMA is a general
univariate model and it is developed based on the assumption that the time series being forecasted are linear and
stationary [3]. As we indicated earlier, ARIMA would not be the right technique for prediction of exchange rates.
      The Artificial Neural Networks, the well-known function approximators in prediction and system
modelling, has recently shown its great applicability in time-series analysis and forecasting [25-28]. ANN assists
multivariate analysis. Multivariate models can rely on grater information, where not only the lagged time series
being forecast, but also other indicators (such as technical, fundamental, inter-marker etc. for financial market),
are combined to act as predictors. In addition, ANN is more effective in describing the dynamics of non-
stationary time series due to its unique non-parametric, non-assumable, noise-tolerant and adaptive properties.
ANNs are universal function approximators that can map any nonlinear function without a priori assumptions
about the data [3].
      In several applications, Tang and Fishwich [22], Jhee and Lee [10], Kamruzzaman and Sarker [12-13],
Wang and Leu [23], Hill et al. [8], and many other researchers have shown that ANNs perform better than
ARIMA models, specifically, for more irregular series and for multiple-period-ahead forecasting. Kaastra and
Boyd [11] provided a general introduction of how a neural network model should be developed to model
financial and economic time series. Many useful, practical considerations were presented in their article. Zhang
and Hu [28] analysed backpropagation neural networks' ability to forecast an exchange rate. Wang [24]
cautioned against the dangers of one-shot analysis since the inherent nature of data could vary. Klein and Rossin
[14] proved that the quality of the data also affects the predictive accuracy of a model. More recently, Yao et al.
[25] evaluated the capability of a backpropagation neural-network model as an option price forecasting tool.
They also recognised the fact that neural-network models are context sensitive and when studies of this type are
conducted, it should be as comprehensive as possible for different markets and different neural-network models.
       In this paper, we apply ANNs for predicting currency exchange rates of Australian Dollar against six other
currencies such as US Dollar (USD), Great British Pound (GBP), Japanese Yen (JPY), Singapore Dollar (SGD),
New Zealand Dollar (NZD) and Swiss Franc (CHF) using their historical exchange rates. Three different ANNs
based models using the existing learning algorithms such as standard backpropagation, scaled conjugate gradient
and Baysian regularization were considered. A total of 500 historical exchange rates data (closing rate of each
week), for each of six currency rates, were collected and used as inputs to build the prediction models in our
study, and then additional 65 exchange rates data were used to evaluate the models. The prediction results of all
these models, for 35 and 65 weeks, were compared based on five different evaluation indicators such as
Normalized Mean Square Error (NMSE), Mean Absolute Error (MAE), Directional Symmetry (DS), Correct Up
trend (CU) and Correct Down trend (CD). The results show that scaled conjugate gradient and Baysian
regularization based models show competitive results and these models forecasts more accurately than standard
Backpropagation which has been studied considerably in other studies.
      In section 2, ANN forecasting model and performance metrics are defined. Section 3 and section 4 describe
experimental results and conclusion, respectively.

                                 2. Neural Network Forecasting Model
     Neural networks are a class of nonlinear model that can approximate any nonlinear function to an arbitrary
degree of accuracy and have the potential to be used as forecasting tools in many different areas. The most
commonly used neural network architecture is multilayer feedforward network. It consists of an input layer, an

50
Neural Information Processing - Letters and Reviews                                                    Vol. 3, No. 2, May 2004

output layer and one or more intermediate layer called hidden layer. All the nodes at each layer are connected to
each node at the upper layer by interconnection strength called weights. A training algorithm is used to attain a
set of weights that minimizes the difference the target and actual output produced by the network. There are
many different neural net learning algorithms found in the literature. No study has been reported to analytically
determine the generalization performance of each algorithm. In this study, we experimented with three different
neural network learning algorithms, namely standard Backpropagation (BP), Scaled Conjugate Gradient
Algorithm (SCG) and Backpropagation with Regularization (BPR) in order to evaluate which algorithm predicts
the exchange rate of Australian dollar most accurately. In the following we describe the algorithms briefly.

2.1 Learning Algorithms
Standard Backpropagation (SBP): Backpropagation [21] updates the weights iteratively to map a set of input
vectors (x1,x2,…,xp) to a set of corresponding output vectors (y1,y2,…,yp). The input xp is presented to the
network and multiplied by the weights. All the weighted inputs to each unit of upper layer are summed up, and
produce output governed by the following equations.
                                              y p = f ( Wo hp + θo),                                       (1)

                                               h p = f ( W h x p + θh ),                                       (2)
where Wo and Wh are the output and hidden layer weight matrices, hp is the vector denoting the response of
hidden layer for pattern ‘p’, θo and θh are the output and hidden layer bias vectors, respectively and f(.) is the
sigmoid activation function. The cost function to be minimized in standard Backpropagation is the sum of
squared error defined as
                                             1
                                        E = ∑ (t p − y p ) (t p − y p )
                                                              T
                                                                                                               (3)
                                             2 p
where tp is the target output vector for pattern ‘p’. The algorithm uses gradient descent technique to adjust the
connection weights between neurons. Denoting the fan-in weights to a single neuron by a weight vector w, its
update in the t-th epoch is governed by the following equations.
                                                ∆w t = − η ∇E (w ) w = w (t) +α ∆w t -1                                    (4)
      The parameters η and α are the learning rate and the momentum factor, respectively. The learning rate
parameter controls the step size in each iteration. For a large-scale problem Backpropagtion learns very slowly
and its convergence largely depends on choosing suitable values of η and α by the user.

Scaled Conjugate Gradient (SCG): The error surface in Backpropagation may contain long ravines with sharp
curvature and gently sloping floor which causes slow convergence. In conjugate gradient methods, a search is
performed along conjugate directions, which produces generally faster convergence than steepest descent
directions [7]. In steepest descent search, a new direction is perpendicular to the old direction. This approach to
the minimum is a zigzag path and one step can be mostly undone by the next. In conjugate gradient methods, a
new search direction spoils as little as possible the minimization achieved by the previous direction and the step
size is adjusted in each iteration. The general procedure to determine the new search direction is to combine the
new steepest descent direction with the previous search direction so that the current and previous search
directions are conjugate. Conjugate gradient techniques are based on the assumption that, for a general non-
quadratic error function, error in the neighborhood of a given point is locally quadratic. The weight changes in
successive steps are given by the following equations.
                                                 wt +1 = wt + αt dt                                             (5)
                                                          dt = − gt + βt dt −1                                             (6)
with
                                                          gt ≡ ∇ E ( w ) w = w t                                           (7)

                                                                 ∆gt −1 gt               ∆gt −1 gt
                                            T                          T                       T
                                       gt gt
                               βt =     T
                                                       or βt =     T
                                                                               or βt =     T
                                                                                                                           (8)
                                      gt −1 gt −1                gt −1 dt −1             gt −1 gt −1
where dt and dt-1 are the conjugate directions in successive iterations. The step size is governed by the coefficient
αt and the search direction is determined by βt. In scaled conjugate gradient the step size αt is calculated by the
following equations.

                                                                                                                           51
ANN-Based Forecasting of Foreign Currency Exchange Rate                                                 J. Kamruzzaman and R. Sarker

                                                                         T
                                                                      dt g t
                                                           αt = −                                                                (9)
                                                                        δt
                                                                                           2
                                                     δ t = dt H t dt + λ t dt
                                                            T
                                                                                                                                (10)
where λt is the scaling co-efficient and Ht is the Hessian matrix at iteration t. λ is introduced because, in case of
non-quadratic error function, the Hessian matrix need not be positive definite. In this case, without λ, δ may
become negative and weight update may lead to an increase of error function. With sufficiently large λ, the
modified Hessian is guaranteed to be positive (δ > 0). However, for large values of λ, step size will be small. If
the error function is not quadratic or δ 0. In case of δ  0.75, λt+1=λt/2;        For ∆t < 0.25, λt+1=4λt ;                Otherwise, λt+1=λt

Bayesian Reguralization (BPR): A desired neural network model should produce small error not only on sample
data but also on out of sample data. To produce a network with better generalization ability, MacKay [17]
proposed a method to constrain the size of network parameters by regularization. Regularization technique forces
the network to settle to a set of weights and biases having smaller values. This causes the network response to be
smoother and less likely to overfit [7] and capture noise. In regularization technique, the cost function F is
defined as
                                                F = γ E D + (1 − γ ) EW                                       (14)
                                                                 2
where ED is the same as E defined in Eq. (3), Ew= w / 2 is the sum of squares of the network parameters, and γ
(
Neural Information Processing - Letters and Reviews                                                                              Vol. 3, No. 2, May 2004

       If we assume a uniform prior distribution p(γ|M) for the regularization parameter γ, then maximizing the
  posterior probability is achieved by maximizing the likelihood function p(D|γ,M). Since all probabilities have a
  Gaussian form it can be expressed as
                                                                        −N / 2                             −L / 2
                                            p( D | γ , M ) = (π   /γ)            [π /(1−γ )]                        Z F (γ )                          (17)
  where L is the total number of parameters in the NN. Supposing that F has a single minimum as a function of w
  at w* and has the shape of a quadratic function in a small area surrounding that point, ZF is approximated as [17]
                                                Z F ≈ (2π )
                                                              L/2         −1 / 2    *           *
                                                                    det            H exp( − F (w ))                                                   (18)
                            2           2
  where H=γ∇ ED +(1-γ)∇ EW is the Hessian matrix of the objective function. Using Eq. (18) into Eq. (17), the
  optimum value of γ at the minimum point can be determined.
        Foresee and Hagan [6] propose to apply Gauss-Newton approximation to Hessian matrix, which can be
  conveniently implemented if the Lebenberg-Marquart optimization algorithm [20] is used to locate the minimum
  point. This minimizes the additional computation required for regularization.

  2.2 Forecasting Model
        Technical and fundamental analyses are the two major financial forecasting methodologies. In recent times,
  technical analysis has drawn particular academic interest due to the increasing evidence that markets are less
  efficient than was originally thought [15]. Like many other economic time series model, exchange rate exhibits
  its own trend, cycle, season and irregularity. In this study, we used time delay moving average as technical data.
  The advantage of moving average is its tendency to smooth out some of the irregularity that exits between
  market days [26]. In our model, we used moving average values of past weeks to feed to the neural network to
  predict the following week’s rate. The indicators are MA5, MA10, MA20, MA60, MA120 and Xi, namely,
  moving average of one week, two weeks, one month, one quarter, half year and last week's closing rate,
  respectively. The predicted value is Xi+1. So the neural network model has 6 inputs for six indicators, one hidden
  layer and one output unit to predict exchange rate. Yao et al. [26] has reported that increasing the number of
  inputs does not necessarily improve the performance. Historical data are used to train the model. Once trained
  the model is used for forecasting.

                                              3. Experimental Results and Discussion
       In the following, we describe the forex data collection, performance metrics to evaluate and compare the
  predictive power of the models and the simulation results.
  3.1 Data collection
        The data used in this study is the foreign exchange rate of six different currencies against Australian dollar
  from January 1991 to July 2002 made available by the Reserve Bank of Australia. We considered exchange rate
  of US dollar (USD), British Pound (GBP), Japanese Yen (JPY), Singapore dollar (SGD), New Zealand dollar
  (NZD) and Swiss Franc (CHF). As outlined in Section 2.2, 565 weekly data was considered of which first 500
  weekly data was used is training and the remaining 65 weekly data for evaluating the model. The plots of
  historical rates for USD, GBP, SGD, NZD, CHF are shown in Fig. 1(a) and for JPY in Fig. 1(b).
                1.6                                                                          120
                                     USD           GBP              SGD                                                                         JPY
                1.4                  NZD           CHF                                       100
                1.2
Exchange Rate

                                                                                      Exchange rate

                 1                                                                                    80

                0.8                                                                                   60
                0.6
                                                                                                      40
                0.4
                0.2                                                                                   20
                 0                                                                                    0
                      1   51 101 151 201 251 301 351 401 451 501 551                                       1   51 101 151 201 251 301 351 401 451 501 551
                                       Week Num ber                                                                        Week Num ber
                                        (a)                                                  (b)
            Figure 1. Historical exchange rates for (a) USD, GBP, SGD, NZD and CHF (b) JPN against Australian dollar.

                                                                                                                                                       53
ANN-Based Forecasting of Foreign Currency Exchange Rate                                                              J. Kamruzzaman and R. Sarker

3. 2 Performance Metrics

     The forecasting performance of the above model is evaluated against a number of widely used statistical
metric, namely, Normalized Mean Square Error (NMSE), Mean Absolute Error (MAE), Directional Symmetry
(DS), Correct Up trend (CU) and Correct Down trend (CD). These criteria are defined in Table 1. x k and xˆ k
are the actual and predicted values, respectively. NMSE and MAE measure the deviation between actual and
forecast value. Smaller values of these metrics indicate higher accuracy in forecasting. Additional evaluation
measures include the calculation of correct matching number of the actual and predicted values with respect to
sign and directional change. DS measures correctness in predicted directions while CU and CD measure the
correctness of predicted up and down trend, respectively.

                         Table 1: Performance Metrics to Evaluate the Forecasting Accuracy of the Model.
                                       2
                    ∑ ( x k − xˆ k )
                                                   1                          2
     NMSE =          k
                                           =               ∑ ( x k − xˆ k )
                                               σ
                                       2           2
                    ∑ (x k − x k)                      N   k
                     k
                1
     MAE =               xk − xˆ k
                N
             100           ⎧1 if ( x k − x k −1) ( xˆ k − xˆ k −1) ≥ 0
     DS =        ∑d , dk = ⎨
              N k k        ⎩0 otherwise
                     ∑dk
                         k
     CU = 100                   ,
                     ∑ tk
                         k

            ⎧1 if ( xˆ k − xˆ k −1) > 0, ( x k − x k −1) ( xˆ k − xˆ k −1) ≥ 0           ⎧1 if ( x k − x k −1) > 0
     dk = ⎨                                                                       , tk = ⎨
            ⎩0 otherwise                                                                 ⎩0 otherwise
              ∑ dk
     CD = 100   k
                ∑ tk
                 k

          ⎧1 if ( xˆ k − xˆ k −1) < 0, ( x k − x k −1) ( xˆ k − xˆ k −1) ≥ 0             ⎧1 if ( x k − x k −1) < 0
     dk = ⎨                                                                  ,     t k = ⎨0 otherwise
          ⎩0 otherwise                                                                   ⎩

3.3 Simulation Results
      A neural network model was trained with six inputs representing the six technical indicators, a hidden layer
and an output unit to predict the exchange rate. The final set of weights to which a network settles down (and
hence its performance) depends on a number of factors, e.g., initial weights chosen, different learning parameters
used during training (described in section 2.1) and the number of hidden units. For each algorithm, we trained 30
different networks with different initial weights and learning parameters. The number of hidden units was varied
between 3~7 and the training was terminated at iteration number between 5000 to 10000. The network that
yielded the best result out of the 30 trials in each algorithm is presented here.
      We measured the performance metrics on the test data to investigate how well the neural network
forecasting model captured the underlying trend of the movement of each currency against Australian dollar.
Table 2 shows the performance metrics achieved by each model over a forecasting period of 35 weeks and Table
3 shows the same over 65 weeks (previous 35 weeks plus additional 30 weeks). The results show that SCG and
BPR model consistently performs better than SBP model in terms of all performance metrics in almost all the
currency exchange rates. For example, in case of forecasting US dollar rate over 35 weeks, NMSE achieved by
SCG and BPR is quite low and is almost half of that achieved by SBP. This means these models are capable of
predicting exchange rates more closely than SBP. Also, in predicting trend directions SCG and BPR is almost
10% more accurate than SBP. The reason of better performance by SCG and BPR algorithm is the improved
learning technique which allows them to search efficiently in weight space for solution. Similar trend is observed
in predicting other currencies.

54
Neural Information Processing - Letters and Reviews                                                  Vol. 3, No. 2, May 2004

  Table 2: Measurement of Prediction Performance                      Table 3: Measurement of Prediction Performance over
             over 35 Week Prediction                                                  65 Week Prediction.

             NN                Performance metrics                               NN              Performance metrics
Currency                                                              Currency
            model    NMSE      MAE     DS     CU       CD                       model   NMSE      MAE DS        CU      CD
            SBP      0.5041   0.0047 71.42 76.47      70.58                     SBP     0.0937   0.0043 75.38 81.57    69.23
   US.                                                                   US.
            SCG      0.2366   0.0033 82.85 82.35      88.23                     SCG     0.0437   0.0031 83.07 78.94    92.30
  Dollar                                                               Dollar
            BPR      0.2787   0.0036 82.85 82.35      88.23                     BPR     0.0441   0.0030 83.07 78.94    92.30
            SBP      0.5388   0.0053 77.14 75.00      78.94                     SBP     0.2231   0.0038 80.00 75.75    87.09
B. Pound    SCG      0.1578   0.0030 77.14 81.25      73.68           B. Pound SCG      0.0729   0.0023 84.61 87.87    83.87
            BPR      0.1724   0.0031 82.85 93.75      73.68                     BPR     0.0790   0.0024 87.69 93.93    83.87
            SBP      0.1530   0.6372 74.28 72.72      76.92                     SBP     0.0502   0.5603 76.92 75.67    78.57
  J. Yen    SCG      0.1264   0.6243 80.00 81.81      76.92            J. Yen SCG       0.0411   05188 81.53 83.78     78.57
            BPR      0.1091   0.5806 80.00 81.81      76.92                     BPR     0.0367   0.5043 81.53 83.78    78.57
            SBP      0.2950   0.0094 85.71 82.35      88.88                     SBP     0.0935   0.0069 83.07 82.35    83.87
S. Dollar   SCG      0.2321   0.0076 82.85 82.35      83.33           S. Dollar SCG     0.0760   0.0060 86.15 88.23    83.87
            BPR      0.2495   0.0080 82.85 82.35      83.33                     BPR     0.0827   0.0063 86.15 88.23    83.87
            SBP      0.1200   0.0046 77.14 75.00      78.94                     SBP     0.0342   0.0042 78.46 71.42    86.11
   NZ                                                                    NZ
            SCG      0.0878   0.0038 85.71 87.50      84.21                     SCG     0.0217   0.0033 84.61 82.14    88.88
  Dollar                                                               Dollar
            BPR      0.0898   0.0039 85.71 87.50      84.21                     BPR     0.0221   0.0033 84.61 82.14    88.88
            SBP      0.1316   0.0101 80.00 75.00      86.66                     SBP     0.1266   0.0098 83.07 80.00    86.66
 S. Franc   SCG      0.0485   0.0059 82.85 80.00      86.66           S. Franc SCG      0.0389   0.0052 84.61 84.61    86.66
            BPR      0.0496   0.0057 80.00 75.00      86.66                     BPR     0.0413   0.0051 81.53 77.14    86.66

      Between SCG and BPR models, the former performs better in all currencies except Japanese Yen in terms
of the two most commonly used criteria, i.e., NMSE and MAE. In terms of other metrics, SCG yields slightly
better performance in case of Swiss France, BR slightly better in US Dollar and British Pound, both perform
equally in case of Japanese Yen, Singapore and New Zealand Dollar. In both algorithms, the directional change
prediction accuracy is above 80% which is much improvement than the 70% accuracy achieved in [26] in a
similar study.
      The comparative diagrams showing the output forecast by neural network model and actual time series
over 65 weeks for six currencies are shown in Fig. 2(a)~(f). Fig 2(a) show the forecasting of USD by all the
three models. The plots show that the forecasting by SCG and BPR models more closely follows the actual rate.
For other currencies SCG model prediction is very close to the actual exchange rate.
      In this study, we also investigated the influence of neural network architecture on prediction performance.
Using SCG learning algorithm, the model was trained with different number of hidden units to predict USD
against AUD. Out of 30 successful trials, the prediction performance of the best trial is reported in Table 4. The
architecture of the network is denoted by ‘i-h-o’ indicating i neurons in input layer, h neuron in hidden layer and
o neuron in output layer. The performance varies with the number of hidden units. However, in all cases the
performance is better than that of SBP model. The best performance is achieved with three units, increasing the
number of hidden nodes (after three) deteriorates the performance.

                              Table 4: Effect of hidden unit number on prediction performance.
    Architec                       35 week prediction                                65 week prediction
      ture          NMSE         MAE       DS         CU       CD      NMSE       MAE       DS          CU         CD
     6-2-1          0.3228      0.0036    80.00     82.35     82.35    0.0596    0.0033    81.53      78.94       88.46
     6-3-1          0.2366      0.0033    82.85     82.35     88.23    0.0437    0.0031    83.07      78.94       92.30
     6-4-1          0.3349      0.0039    77.14     76.47     82.35    0.0548    0.0034    80.00      76.31       88.46
     6-5-1          0.3401      0.0041    74.28     76.47     76.47    0.0564    0.0034    78.46      78.94       80.76
     6-6-1          0.2975      0.0036    77.14     76.47     82.35    0.0512    0.0033    80.00      76.31       88.46
     6-7-1          0.3517      0.0041    74.28     76.47     76.47    0.0808    0.0043    78.46      76.31       84.61

     The generalization ability of neural networks, i.e., its ability to produce correct output in response to an
unseen input is influenced by a number of factors: 1) the size of the training set, 2) the degrees of freedom of the
network related to the architecture, and 3) the physical complexity of the problem at hand. Practically we have

                                                                                                                          55
ANN-Based Forecasting of Foreign Currency Exchange Rate                                         J. Kamruzzaman and R. Sarker

     0.6                                                          0.4
                            Actual                               0.39                       Actual
 0.58                       Forecast (SBP)
                                                                 0.38                       Forecast (SCG)
                            Forecast (BPR)
 0.56                       Forecast (SCG)                       0.37
 0.54                                                            0.36

 0.52                                                            0.35
                                                                 0.34
     0.5
                                                                 0.33
 0.48
                                                                 0.32                                 (b) GBP/AUD
                                      (a) USD/AUD
 0.46                                                            0.31

 0.44                                                             0.3
           1    11     21      31      41       51    61                   1    11     21       31      41      51    61

                                                                 1.05
     75
                     Actual                                                           Actual
                     Forecast (SCG)                                                   Forecast (SCG)
     70                                                                1

     65                                                          0.95

     60                                                           0.9

     55                                (c) JPY/AUD               0.85                                 (d) SGD/AUD

     50                                                           0.8
           1   11      21       31      41      51      61                  1    11    21        31     41      51     61

  1.3                                                              1
                                       Actual                                                Actual
                                       Forecast (SCG)           0.95                         Forecast (SCG)
 1.26

                                                                 0.9
 1.22
                                                                0.85
 1.18
                                                                 0.8

 1.14
                                                                0.75
                                     (e) NZD/AUD                                                       (f) CHF/AUD

  1.1                                                            0.7
           1   11      21      31      41       51    61                1       11    21        31      41     51     61

               Figure 2. Forecasting of different currencies by SCG based neural network model over 65 weeks.

no control on the problem complexity, and in our simulation the size of the training set is fixed. This leaves the
generalization ability, i.e., performance of the model dependent on the architecture of the corresponding neural
network. Generalization performance can also be related to the complexity of the model in the sense that, in
order to achieve best generalization, it is important to optimize the complexity of the prediction model [1]. In
case of neural networks, the complexity can be varied by changing the number of adaptive parameters in the
network. A network with fewer weights is less complex than one with more weights. It is known that the
“simplest hypothesis/model is least likely to overfit”. A network that uses the least number of weights and biases
to achieve a given mapping is least likely to overfit the data and is most likely to generalize well on the unseen
data. If redundancy is added in the form of extra hidden unit or additional parameters, it is likely to degrade

56
Neural Information Processing - Letters and Reviews                                            Vol. 3, No. 2, May 2004

performance because more than the necessary number of parameters is used to achieve the same mapping. In
case nonlinear regression, two extreme solutions should be avoided: filtering out the underlying function or
underfitting (not enough hidden neurons), or modeling of noise or overfitting data (too many hidden neurons).
This situation is also known as bias-variance dilemma. One way of controlling the effective complexity of the
model in practice is to compare a range of models having different architectures and select the one that yields
best performance on the test data. As can be seen in our simulation that the network with 3 hidden units is
optimum yielding best performance. Increasing the hidden units adds additional parameters, introduces
redundancy and deteriorates the performance.

                                   4. Conclusion and further Research
      This paper has presented and compared three different neural network models to perform foreign currency
exchange forecasting using simple technical indicators. Scaled conjugate gradient based model achieves closer
prediction of all the six currencies than other models. As Medeiros et al. argues in [18], there are evidences that
favour linear and nonlinear models against random walk, and nonlinear models stand a better chance when
nonlinearity is spread in time series. A neural network model with improved learning technique is thus a
promising candidate for forex prediction. Results in this study shows that SCG neural network model achieves
very close prediction in terms of NMSE and MAE metrics. Several authors has argued that directional change
metrics may be a better standard for determining the quality of forecasting. In terms of this metric, SCG model
achieves comparable results to BPR model. However, which metric should be given more importance depends
on trading strategies and how the prediction is used for trading advantages [25]. Both SCG and BPR based
models attain significantly high rate of predicting correct directional change (above 80%). In this study, we
considered six technical indicators as inputs. Some other technical indicators like RSI (relative strength index),
momentums can also be considered. In that case sensitivity analysis may be done to eliminate less sensitive
variables. We investigated the effect of network architecture on performance metrics. The performance varies
slightly with the number of units in hidden layer. The optimum number of hidden units needs to be determined
by trial and error.
      In [26] Yao et al. outlined the need for an automatic model building facility to avoid tedious trial and error
method. This can be achieved by employing a constructive algorithm [16] that starts with a minimum
architecture and automatically grows by incrementally adding hidden units and layers one by one. It is claimed
that this type of architecture is most likely to build an optimum or near optimum architecture automatically. This
would further add the flexibility of the model but its performance in forex prediction needs to be investigated.
Chan et al. [4] has applied PNN (Probabilistic Neural Network) for forecasting and trading of stock index with a
view that training PNN is faster that enables the user to develop a frequently updated training scheme. In actual
applications, retraining of a forecasting model with the most recent data may be necessary to increase the chance
of achieving a better forecast. Fast learning may be an advantage but it does not always guarantee an improved
performance. A comparison between PNN-based forecasting model and our model, with retraining at some
interval, will assess the suitability of the model in such application. Further investigation in this direction will
focus on formulating an objective function in ANN model that takes other goodness measures e.g., trading
strategies and profit into consideration. Recently several applications of combining wavelet transformation in
financial time series forecasting has been proposed. A hybrid system using wavelet techniques in neural
networks looks promising [29]. However, finding the best way of combining wavelet techniques and neural
networks in forex prediction needs to be investigated and remains the focus of future research.

                                                      References
[1] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, NY, 1995.
[2] G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francosco, CA, 1990.
[3] L. Cao and F. Tay, “Financial forecasting using support vector machines,” Neural Comput. & Applic, vol. 10, pp.184-
    192, 2001.
[4] A. Chen, M. Leung and H. Daouk, “Application of neural netwroks to an emerging financial market: forecasting and
    trading the Taiwan Stock Index,” Computer & Operations Research, vol. 30, pp. 901-903, 2003.
[5] G. Deboeck, Trading on the Edge: Neural, Genetic and Fuzzy Systems for Chaotic Financial Markets, New York Wiley,
    1994.
[6] F. D. Foresee and M.T. Hagan, “Gauss-Newton approximation to Bayesian regularization,” Proc. IJCNN 1997, pp.
    1930-1935, 1997.
[7] M.T. Hagan, H.B. Demuth and M.H. Beale, Neural Network Design, PWS Publishing, Boston, MA, 1996.

                                                                                                                    57
ANN-Based Forecasting of Foreign Currency Exchange Rate                                       J. Kamruzzaman and R. Sarker

[8] T. Hill, M. O’Connor and W. Remus, “Neural network models for time series forecasts,” Management Science, vol. 42,
     pp 1082-1092, 1996.
[9] H. B. Hwarng and H. T. Ang, “A simple neural network for ARMA(p,q) time series,” OMEGA: Int. Journal of
     Management Science, vol. 29, pp 319-333, 2002.
[10]      W. C. Jhee and J. K. Lee, “Performance of neural networks in managerial forecasting,” Intelligent Systems in
     Accounting, Finance and Management, vol. 2, pp 55-71, 1993.
[11] I. Kaastra and M. Boyd, “Designing a neural network for forecasting financial and economic time-series,”
     Neurocomputing, vol. 10, pp. 215-236, 1996.
[12] J. Kamruzzaman and R. Sarker, “Comparing ANN based models with ARIMA for prediction of forex rates,” ASOR
     Bulletin, vol. 22, pp. 2-11, 2003.
[13] J. Kamruzzaman and R. Sarker, “Forecasting of currency exchange rates using ANN: a case study,” Proc. of IEEE Int.
     Conf. on Neural Networks & Signal Processing (ICNNSP03), pp. 793-797, 2003.
[14] B. D. Klein and D. F. Rossin, “Data quality in neural network models: effect of error rate and magnitude of error on
     predictive accuracy,” OMEGA: Int. Journal of Management Science, vol. 27, pp 569-582, 1999.
[15] B. LeBaron, “Technical trading rule profitability and foreign exchange intervention,” Journal of Int. Economics, vol. 49,
     pp. 124-143, 1999.
[16] M. Lehtokangas, “Modified constructive backpropagation for regression,” Neurocomputing, vol. 35, pp. 113-122, 2000.
[17] D.J.C. Mackay, “Bayesian interpolation,” Neural Computation, vol. 4, pp. 415-447, 1992.
[18] M.C. Medeiros, A. Veiga and C.E. Pedreira, “Modeling exchange rates: smooth transitions, neural network and linear
     models,” IEEE Trans. Neural Networks, vol. 12, no. 4, 2001.
[19] A.F. Moller, “A scaled conjugate gradient algorithm for fast supervised learning,” Neural Networks, vol. 6, pp.525-533,
     1993.
[20] J.J. More, “The Levenberg-Marquart algorithm: implementation and theory, in: G.A. Watson, ed., Numerical analysis,”
      Lecture Notes in Mathematics 630, pp. 105-116, Springer-Verlag, London, 1977.
[21] D.E Rumelhart, J.L. McClelland and the PDP research group, Parallel Distributed Processing, vol. 1, MIT Press, 1986.
[22] Z. Tang and P. A. Fishwich, “Backpropagation neural nets as models for time series forecasting,” ORSA Journal on
     Computing, vol. 5, no. 4, pp 374-385, 1993.
[23] J. H.Wang and J. Y. Leu, “Stock market trend prediction using ARIMA-based neural networks,” Proc. of IEEE Int. Conf.
     on Neural Networks, vol. 4, pp. 2160-2165, 1996.
[24] S. Wang, “An insight into the standard back-propagation neural network model for regression analysis,” OMEGA: Int.
     Journal of Management Science, vol. 26, pp.133-140, 1998.
[25] J. Yao, Y. Li and C. L. Tan, “Option price forecasting using neural networks,” OMEGA: Int. Journal of Management
     Science, vol. 28, pp 455-466, 2000.
[26] J. Yao and C.L. Tan, “A case study on using neural networks to perform technical forecasting of forex,”
     Neurocomputing, vol. 34, pp. 79-98, 2000.
[27] S. Yaser and A. Atiya, “Introduction to Financial Forecasting,” Applied Intelligence, vol. 6, pp 205-213, 1996.
[28] G. Zhang and M. Y. Hu, “Neural network forecasting of the British Pound/US dollar exchange rate,” OMEGA: Int.
     Journal of Management Science, vol. 26, pp. 495-506, 1998.
[29] B. Zhang, R. Coggins, M. Jabri, D. Dersch and B. Flower, “Multiresolution forecasting for future trading using wavelet
     decomposition,” IEEE Trans. Neural Networks, special issue in Financial Engg., vol. 12, no.4, pp.765-775, 2000.

                     Joarder Kamruzzaman received his B.Sc and M.Sc in Electrical Engineering from
                     Bangladesh University of Engineering and Technology, Dhaka, Bangladesh in 1986 and
                     1989 respectively, and PhD in Information Systems Engineering from Muroran Institute of
                     Technology, Japan in 1993. Currently he is a faculty member in the Faculty of Information
                     Technology, Monash University, Australia. His research interest includes neural networks,
                     fuzzy system, genetic algorithm, computer networks and bioinformatics etc.

                   Ruhul Sarker received his Ph.D. in 1991 from DalTech, Dalhousie University, Halifax,
                   Canada, and is currently a senior faculty member in the School of Information Technology
                   and Electrical Engineering, University of New South Wales, ADFA Campus, Canberra,
                   Australia. His main research interests are Evolutionary Optimization, Neural Networks and
                   Applied Operations Research. He has recently edited four books and has published more than
                   90 refereed papers in international journals and conference proceedings. Dr Sarker is actively
                   involved with a number of national and international conferences & workshop organizations
                   in the capacity of chair / co-chair or program committee member. He has recently served as a
                   technical co-chair for IEEE-CEC2003. He is also a member of task force for promoting
evolutionary multi-objective optimization operated by IEEE Neural Network Society.

58
You can also read