ICA and Committee Machine-Based Algorithm for Cursor Control in a BCI System

Page created by Ruth Gill
 
CONTINUE READING
ICA and Committee Machine-Based Algorithm
      for Cursor Control in a BCI System

              Jianzhao Qin1 , Yuanqing Li1,2 , and Andrzej Cichocki3
                  1
                    Institute of Automation Science and Engineering,
          South China University of Technology, Guangzhou, 510640, China
                 2
                   Institute for Infocomm Research, Singapore 119613
3
    Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute
                           Wako shi, Saitama 3510198, Japan

        Abstract. In recent years, brain-computer interface (BCI) technology
        has emerged very rapidly. Brain-computer interfaces (BCIs) bring us a
        new communication interface technology which can translate brain ac-
        tivities into control signals of devices like computers, robots. The pre-
        processing of electroencephalographic (EEG) signal and translation al-
        gorithms play an important role in EEG-based BCIs. In this study, we
        employed an independent component analysis (ICA)-based preprocess-
        ing method and a committee machine-based translation algorithm for
        the offline analysis of a cursor control experiment. The results show that
        ICA is an efficient preprocessing method and the committee machine is
        a good choice for translation algorithm.

1     Introduction
BCIs give their users a communication and control approach that does not de-
pend on the brain’s normal output channels (i.e. peripheral nerves and muscles).
These new communication systems can improve the quality-of-life of those peo-
ple with severe motor disabilities, and provide a new way for able-bodied people
to control computers or other devices (e.g., robot arm).
    EEG-based BCIs record EEG at the scalp to control cursor movement, to
select letters or icons. Since the EEG signal includes some noise, such as eye
movements, eye blinks and EMG, the BCIs should include a preprocessing pro-
cedure to separate the useful EEG signal from noise (including artifacts). A
good preprocessing method can greatly improve the information transferring
rate (ITR) of BCIs. ICA has been widely used in blind source separation [1],
[2], [3], and biomedical signal analysis including EEG signal analysis [4]. In the
offline analysis of a cursor control experiment, we used an ICA-based preprocess-
ing method. The results show that the accuracy rate has improved dramatically
after ICA preprocessing.
    A translation algorithm transforms the EEG features derived by the signal
preprocessing stage into actual device control commands. In the offline case with-
out feedback, the translation algorithm primarily performs a pattern recognition
task (We extract features from preprocessed EEG signal, then classify them into

J. Wang, X. Liao, and Z. Yi (Eds.): ISNN 2005, LNCS 3496, pp. 973–978, 2005.
c Springer-Verlag Berlin Heidelberg 2005
974    Jianzhao Qin, Yuanqing Li, and Andrzej Cichocki

several classes that indicate the users’ different intentions). In supervised learn-
ing, if the size of training data is small (It is usual in BCIs), the overfitting
problem may arise. A good transfer function should have a good generaliza-
tion performance. In the analysis, we designed a simple and efficient committee
machine as a transfer function to handle the overfitting problem.

2     Methods
In this section, we first describe the experiment data set and illustrate the frame-
work of our offline analysis, then introduce the ICA preprocessing and the feature
extraction. Finally, the structure of the committee machine and the classification
procedure are presented.

2.1   Data Description
The EEG-based cursor control experiment was carried out in Wadsworth Center.
The recorded data set was given in the BCI competition 2003. The data set and
the details of this experiment are available on the web site
    http://ida.first.fraunhofer.de/projects/bci/competition.
The data set was recorded from three subjects (AA, BB, CC). The framework
of our offline analysis is depicted as Fig. 1.

                           Fig. 1. Framework diagram

2.2   Independent Component Analysis
Independent component analysis is a method for solving the blind source sepa-
ration problem [5]: A random source vector S(n) is defined by
                       S(n) = [S1 (n), S2 (n), . . . , Sm (n)]T                (1)
where the m components are a set of independent sources. The argument n
denotes discrete time. A, a nonsingular m-by-m matrix, is called mixing matrix.
The relation between X(n) and S(n) is as follows
                                  X(n) = AS(n)                                 (2)
The source vector S(n) and the mixing matrix A are both unknown. The task
of blind source separation is to find a demixing matrix C such that the original
source vector S(n) can be recovered as below
                                  Y(n) = CX(n)                                 (3)
ICA and Committee Machine-Based Algorithm for Cursor Control        975

The ICA method is based on the assumption that the original sources are sta-
tistically independent. The objective of an ICA algorithm is to find a demixing
matrix C, such that components of Y are statistically independent. We assume
that the multichannel EEG can be modelled by (2), where X(n) is the recorded
multichannel EEG at time n, A is the mixing matrix, and S(n) is the source
vector at time n.
    There are many algorithms to implement ICA. Bell and Sejnowski (1995)
[6] proposed an infomax algorithm. Natural gradient (1995) was proposed and
applied to ICA by Amari et al [7]. In the analysis, we applied a natural gradient-
flexible ICA algorithm [8], which could separate mixtures of sub- and super-
Gaussian source signals. We expected that ICA preprocessing can separate the
useful EEG components from the noise (including artifacts).

2.3   Feature Extraction
In the analysis, we extracted and combined two kinds of features from the pre-
processed EEG. One is the power feature, the other is the CSP feature.
    The data includes 64 channels of EEG signal, but we only used 9 channels
of EEG signal with channel number [8, 9, 10, 15, 16, 17, 48, 49, 50] for the ICA
preprocessing and power feature extraction. These 9 channels covered the left
sensorimotor cortex, which is the most important part when the subject used
his or her EEG to control the cursor in this experiment. During each trial with
trial length 368 samples (subject AA and BB) or 304 samples (subject CC), we
imagined that the position of the cursor was updated once in every time interval
of 160 adjacent samples, and two subsequent time intervals were overlapped
in 106 (Subject AA and BB) or 124 (subject CC) samples. Thus there were 5
updates of the position of the cursor in each trial. Only one best component,
which had the best correct recognition rate in training sets (sessions 1–6), was
used for power feature extraction. For each trial, the power feature is defined as,

                       PF = [P F1 , P F2 , P F3 , P F4 , P F5 ]               (4)
                                                  
                P Fn =        Pn (f ) ∗ w1 +              Pn (f ) ∗ w2        (5)
                        f ∈[11,14]              f ∈[22,26]

where Pn (f ) is the power spectral of the n − th time bin. The parameters w1 and
w2 are determined by experiment. The criteria for choosing the two parameters
is similar to that for choosing the best component.
    CSP is a technique that has been applied to EEG analysis to find spatial
structures of event-related (de-)synchronization [9]. Our CSP feature is defined
as in [9]. The CSP analysis consists of calculating a matrix W and diagonal
matrix D:
                      WΣ1 WT = D and WΣ4 WT = 1 − D                            (6)
where Σ1 and Σ4 are the normalized covariance matrix of the trial-concatenated
matrix of target 1 and 4, respectively. W can be obtained by jointed diagonaliza-
tion method. Prior to calculating features by CSPs, common average reference
976    Jianzhao Qin, Yuanqing Li, and Andrzej Cichocki

[10] was carried out, then the referenced EEG was filtered in 10–15Hz. The CSP
feature for each trial consists of 6 most discriminating main diagonal elements of
the transformed covariance matrix for a trial followed by a log-transformation [9].

3     Committee Machine-Based Translation Algorithm

Multi-layer perceptron is a strong tool in supervised-learning pattern recognition,
but when the size of the training samples is relatively small compared with
the number of network parameters, the overfitting problem may arise. In the
session, based on the features mentioned above, we describe a committee machine
consisting of several small-scale multi-layer perceptrons to solve the overfitting
problem.
    In our analysis, the data from sessions 1–6 (about 1200 samples) were used
for training. A statistical theory on overfitting phenomenon [11] suggests that
overfitting may occur when N < 30W , where N is the number of training
samples, W denotes the number of network parameters. According to this theory,
the maximum number of network parameters should be less than 40. In order to
satisfy this requirement, we designed a committee machine to divide the task into
2 simple tasks, so the structure and training of each network in the committee
machine can be simplified.

                   Fig. 2. The structure of a committee machine

    The structure of the committee machine is depicted in Fig. 2. The units of
this committee machine are several small-scale three-layer (including input layer)
perceptrons with nonlinear activation function. We call these networks ’experts’
which are divided into two groups. One group of experts make their decisions by
using power features, while the other group’s decision is from CSP features. The
experts in the same group share common inputs, but are trained differently by
varied initial values. Each network has four output neurons corresponding to four
target positions. The final decision of a group is made by averaging all outputs
of its experts, then the final outputs of the two groups are linearly combined to
produce an overall output of the machine.
ICA and Committee Machine-Based Algorithm for Cursor Control        977

4     Result

We trained ICA on 250s (40000 samples) of EEG recording randomly chosen
from session 1–6. All the trials in session 7–10 were used to test our method. For
the purpose of comparison, we done feature extraction and classification under
three conditions. 1) ICA was used for preprocessing, and committee machine was
used for classification. 2) Without ICA preprocessing, the best channel of raw
EEG signal was chosen for power feature extraction, and committee machine was
used for classification. 3) ICA was used for preprocessing, while the committee
machine was replaced by normal multiple-layer network for classification. The
results for the three subjects are shown in Table 1, which were obtained under
the three conditions.

Table 1. Accuracy rates (%) for the three subjects obtained under above three condi-
tions

    Subject Condition Session 7 Session 8 Session 9 Session 10 Average accuracy
    AA      1         71.20     71.20     66.49     69.63       69.63
    AA      2         68.06     68.06     65.45     68.59       67.54
    AA      3         70.68     64.40     63.35     59.69       64.53
    BB      1         63.87     62.30     47.12     54.97       57.07
    BB      2         62.30     61.78     42.41     48.17       53.67
    BB      3         62.30     57.07     46.07     46.60       53.01
    CC      1         66.67     72.82     54.36     81.54       68.85
    CC      2         63.59     70.26     50.77     72.82       64.36
    CC      3         61.54     66.15     54.87     68.21       62.69

5     Discussion and Conclusion

Table 1 shows that the accuracy of offline analysis has been improved consid-
erably by using the ICA preprocessing method. Furthermore, the committee
machine is better in generalization performance than the normal multiple-layer
network after comparing the results in conditions 1 and 3.
   In the analysis, we used ICA as the preprocessing method for BCI. This
method has some advantages. First, we think that the ICA preprocessing can
separate useful source components from noise. Thus we can choose one or two
components which contain more useful information for extracting power fea-
tures than before preprocessing. Second, since we choose a smaller number of
ICA components for feature extraction, the computation burden can be reduced.
Furthermore, the dimensionality of the feature space can be reduced, as a con-
sequence, not only the structure of the classifier can be simplified, but also the
overfitting problem can be solved to some extent.
   Meanwhile, a committee machine was used as a translation algorithm, which
can also improve the performance of BCIs. The key point of committee machine
978     Jianzhao Qin, Yuanqing Li, and Andrzej Cichocki

is to divide a complex computational task into a number of simple tasks. Due
to the simple network structure, the constituent experts of the machine are easy
to be trained, and the generalization performance is improved.

Acknowledgements
This study was supported by the National Natural Science Foundation of China
(No. 60475004, No. 60325310), Guangdong Province Science Foundation for Re-
search Team Program (No. 04205789), and the Excellent Young Teachers Pro-
gram of MOE, China.

References
 1. Li, Y., Wang, J., Zurada, J. M.: Blind Extraction of Singularly Mixed Sources
    Signals. IEEE Trans. On Neural Networks, 11 (2000) 1413–2000
 2. Li, Y., Wang, J.: Sequential Blind Extraction of Linearly Mixed Sources. IEEE
    Trans. on Signal Processing, 50 (2002) 997–1006
 3. Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing: Learning
    Algorithms and Applications. John Wiley, New York (2002).
 4. Makeig, S., Bell, A.J., Jung, T.-P., Sejnowski, T.J.: Independent Component Anal-
    ysis of Electroencephalographic Data. Adv Neural Info Processing Systems, 8
    (1996) 145–151
 5. Comon, P.: Independent Component Analysis - A New Concept Signal Procesing,
    36 (1994) 287–314
 6. Bell, A.J., Sejnowski, T.J.: An Information-maximization Approach to Blind Sep-
    aration and Blind Deconvolution. Neural Computation, 7 (1995) 1129–1159
 7. Amari, S., Chichocki, A., Yang, H.H.: A New Learning Algorithm for Blind Signal
    Separation. Advances in Neural Information Processing, 8 (1996) 757–763
 8. Choi, S., Chichocki, A., Amari, S.: Flexible Independent Component Analysis.
    Proc. of the 1998 IEEE Workshop on NNSP, (1998) 83–92
 9. Ramoser, H., Gerking, J.M., Pfurtscheller, G.: Optimal Spatial Filtering of Single
    Trial EEG during Imagined Hand Movement. IEEE Trans. Rehab. Eng. 8 (2000)
    441–446
10. McFarland, D.J., McCane, L.M., David, S.V., Wolpaw, J.R.: Spatial Filter Se-
    lection for EEG-based Communication. Electroenc. Clin. Neurophys. 103 (1997)
    386–394
11. Amari, S., Murata, N., Müller, K.-R, Finke, M., Yang, H.: Statistical Theory of
    Overtraining-Is Cross-Validation Asymptotically Effective Advances in Neural In-
    formation Processing Systems. Vol. 8. MIT Press Cambridge, MA (1996) 176–182
You can also read