Comparative Assessment of Independent Component Analysis (ICA) for Face Recognition

Page created by William Wright
 
CONTINUE READING
Appears in the Second International Conference on Audio- and Video-based Biometric Person Authentication, AVBPA’99, Washington D. C. USA, March 22-24, 1999.

                           Comparative Assessment of Independent Component
                                 Analysis (ICA) for Face Recognition

                                           Chengjun Liu and Harry Wechsler
                                Department of Computer Science, George Mason University,
                                  4400 University Drive, Fairfax, VA 22030-4444, USA
                                               cliu, wechsler @cs.gmu.edu         

                               Abstract                                               fication problems. A successful face recognition method-
                                                                                      ology depends heavily on the particular choice of the fea-
    This paper addresses the relative usefulness of Inde-                             tures used by the (pattern) classifier [4], [17], [3]. Feature
pendent Component Analysis (ICA) for Face Recognition.                                selection in pattern recognition involves the derivation of
Comparative assessments are made regarding (i) ICA sen-                               salient features from the raw input data in order to reduce
sitivity to the dimension of the space where it is carried out,                       the amount of data used for classification and simultane-
and (ii) ICA discriminant performance alone or when com-                              ously provide enhanced discriminatory power.
bined with other discriminant criteria such as Bayesian                                  One popular technique for feature selection and di-
framework or Fisher’s Linear Discriminant (FLD). Sensi-                               mensionality reduction is Principal Component Analysis
tivity analysis suggests that for enhanced performance ICA                            (PCA) [8], [6]. PCA is a standard decorrelation technique
should be carried out in a compressed and whitened Prin-                              and following its application one derives an orthogonal
cipal Component Analysis (PCA) space where the small                                  projection basis which directly leads to dimensionality re-
trailing eigenvalues are discarded. The reason for this                               duction, and possibly to feature selection. PCA was first
finding is that during whitening the eigenvalues of the co-                           applied to reconstruct human faces by Kirby and Sirovich
variance matrix appear in the denominator and that the                                [10] and to recognize faces by Turk and Pentland [19]. The
small trailing eigenvalues mostly encode noise. As a con-                             recognition method, known as eigenfaces, defines a feature
sequence the whitening component, if used in an uncom-                                space, or “face space”, which drastically reduces the di-
pressed image space, would fit for misleading variations                              mensionality of the original space, and face detection and
and thus generalize poorly to new data. Discriminant anal-                            identification are then carried out in the reduced space.
ysis shows that the ICA criterion, when carried out in the                               Independent Component Analysis (ICA) has emerged
properly compressed and whitened space, performs bet-                                 recently as one powerful solution to the problem of blind
ter than the eigenfaces and Fisherfaces methods for face                              source separation [5], [9], [7] while its possible use for
recognition, but its performance deteriorates when aug-                               face recognition has been shown by Bartlett and Sejnowski
mented by additional criteria such as the Maximum A Pos-                              [1]. ICA searches for a linear transformation to express a
teriori (MAP) rule of the Bayes classifier or the FLD. The                            set of random variables as linear combinations of statisti-
reason for the last finding is that the Mahalanobis distance                          cally independent source variables [5]. The search crite-
embedded in the MAP classifier duplicates to some extent                              rion involves the minimization of the mutual information
the whitening component, while using FLD is counter to                                expressed as a function of high order cumulants. Basically
the independence criterion intrinsic to ICA.                                          PCA considers the 2nd order moments only and it uncor-
                                                                                      relates data, while ICA accounts for higher order statistics
                                                                                      and it identifies the independent source components from
1. Introduction                                                                       their linear mixtures (the observables). ICA thus provides
                                                                                      a more powerful data representation than PCA [9]. As
    Face recognition is important not only because it has a                           PCA derives only the most expressive features for face re-
lot of potential applications in research fields such as Hu-                          construction rather than face classification, one would usu-
man Computer Interaction (HCI), biometrics and security,                              ally use some subsequent discriminant analysis to enhance
but also because it is a typical Pattern Recognition (PR)                             PCA performance [18].
problem whose solution would help solving other classi-                                  This paper makes a comparative assessment on the use
of ICA as a discriminant analysis criterion whose goal is           faces and the Most Discriminant Features (MDF) method)
to enhance PCA stand alone performance. Experiments                 for face recognition. Methods that combine PCA and the
in support of our comparative assessment of ICA for face            standard FLD, however, lack in their generalization ability
recognition are carried out using a large data set consisting       as they overfit to the training data. To address the overfit-
of 1,107 images and drawn from the FERET database [16].             ting problem Liu and Wechsler [11] introduced Enhanced
The comparative assessment suggests that for enhanced               FLD Models (EFM) to improve on the generalization ca-
face recognition performance ICA should be carried out              pability of the standard FLD based classifiers such as Fish-
in a compressed and whitened space, and that ICA per-               erfaces [2].
formance deteriorate when it is augmented by additional
decision rules such as the Bayes classifier or the Fisher’s         3. Independent Component Analysis (ICA)
linear discriminant analysis.
                                                                        As PCA considers the 2nd order moments only it lacks
2. Background                                                       information on higher order statistics. ICA accounts for
                                                                    higher order statistics and it identifies the independent
    PCA provides an optimal signal representation tech-             source components from their linear mixtures (the observ-
nique in the mean square error sense. The motivation be-            ables). ICA thus provides a more powerful data represen-
hind using PCA for human face representation and recog-             tation than PCA [9] as its goal is that of providing an inde-
nition comes from its optimal and robust image compres-             pendent rather than uncorrelated image decomposition and
sion and reconstruction capability [10] [15]. PCA yields            representation.

                                                                                                                  
projection axes based on the variations from all the training           ICA of a random vector searches for a linear trans-
samples, hence these axes are fairly robust for representing        formation which minimizes the statistical dependence be-
both training and testing images (not seen during training).
PCA does not distinguish, however, the different roles of
                                                                    tween its components [5]. In particular, let
                                                                    random vector representing an image, where               be a
                                                                                                                        is the di-
                                                                    mensionality of the image space. The vector is formed by
the variations (within- and between-class variations) and
it treats them equally. This leads to poor performance              concatenating the rows or the columns of the image which
when the distributions of the face classes are not separated        may be normalized to have a unit norm and/or an equalized

                                                                                     "!$#
by the mean-difference but separated by the covariance-             histogram. The covariance matrix of is defined as
difference [6].

                                                                            &% 
                                                                                                                              (1)

                                                                                       ()(*,+- '
    Swets and Weng [18] point out that PCA derives only
the most expressive features which are unrelated to actual          where       is the expectation operator, denotes the trans-
face recognition, and in order to improve performance one           pose operation, and                 . The ICA of factor-

                                                                                           .,/0. !
needs additional discriminant analysis. One such discrim-           izes the covariance matrix      into the following form
inant criterion, the Bayes classifier, yields the minimum

                                                                            /                              .
classification error when the underlying probability density                                                                  (2)
functions (pdf) are known. The use of the Bayes classifier
is conditioned on the availability of an adequate amount            where
                                                                                        1
                                                                              is diagonal real positive and       transforms the

                                                                                           2. 1
of representative training data in order to estimate the pdf.       original data into
As an example, Moghaddam and Pentland [13] developed
                                                                                                                              (3)

                                                                                                             1
a Probabilistic Visual Learning (PVL) method which uses
the eigenspace decomposition as an integral part of esti-

                                                                                                         .
                                                                    such that the components of the new data are indepen-
mating the pdf in high-dimensional image spaces. To ad-
                                                                    dent or “the most independent possible” [5].
dress the lack of sufficient training data Liu and Wechsler
                                                                       To derive the ICA transformation , Comon [5] de-
[12] introduced the Probabilistic Reasoning Models (PRM)
                                                                    veloped an algorithm which consists of three operations:
where the conditional pdf for each class is modeled using
                                                                    whitening, rotation, and normalization. First, the whiten-

                                                                        3
the pooled within-class scatter and the Maximum A Pos-
                                                                    ing operation transforms a random vector      into another
teriori (MAP) Bayes classification rule is implemented in
                                                                                           254 687&9$: 3
                                                                    one that has a unit covariance matrix.
the reduced PCA subspace.

                                                                            4 6
    The Fisher’s Linear Discriminant (FLD) is yet another                                                                     (4)
popular discriminant criterion. By applying first PCA

                                                                                                  4 6(4 !
for dimensionality reduction and then FLD for discrimi-             where and are derived by solving the following eigen-
nant analysis, Belhumire, Hespanha, and Kriegman [2] and            value equation.
Swets and Weng [18] developed similar methods (Fisher-                                                                (5)

                                                                2
4  7 26 :  
where
tor matrix and
                   7 :
                 
                      
                                   is an orthonormal
                                        
                                                   
                                                         eigenvec-
                                                     is a diagonal                   #                  small trailing eigenvalues are excluded so that we can ob-
                                                                                                         tain more robust whitening results. Toward that end, dur-
eigenvalue matrix of          . One notes that whitening, an in-                                         ing the whitening transformation we should keep a balance
tegral ICA component, counteracts the fact that the Mean                                                 between the need that the selected eigenvalues account for
Square Error (MSE) preferentially weighs low frequencies                                                 most of the spectral energy of the raw data and the require-
[14]. The rotation operations, then, perform source sepa-                                                ment that the trailing eigenvalues of the covariance matrix
ration (to derive independent components) by minimizing                                                  are not too small. As a result, the eigenvalue spectrum of
the mutual information approximated using higher order                                                   the training data supplies useful information for the deci-
cumulants. Finally, the normalization operation derives                                                  sion of the dimensionality  of the subspace.
unique independent components in terms of orientation,
unit norm, and order of projections [5].

4. Sensitivity Analysis of ICA

                                    3 6 79:4 !
   Eq. 4 can be rearranged to the following form
                                        

        4         6
                                                                                               (6)

where and are eigenvector and eigenvalue matrices,
respectively (see Eq. 5). Eq. 6 shows that during whiten-
ing the eigenvalues appear in the denominator. The trailing
eigenvalues, which tend to capture noise as their values are
fairly small, thus cause the whitening step to fit for mis-
leading variations and make the ICA criterion to generalize
poorly when it is exposed to new data. As a consequence, if
the whitening step is preceded by a dimensionality reduc-
tion procedure ICA performance would be enhanced and
computational complexity reduced. Specifically, space di-
mensionality is first reduced to discard the small trailing
                                                                                                             ! "$#&%('*)!+-,/.!01'2' 3 &      4 06587 '9:0;4
to decreased ICA performance (see Fig. 3 and Fig. 4) as                                                                                                0.96

they amplify the effects of noise.   So we set the dimension
                               W                                                                                                                       0.95
of the reduced space as          .
                                                                                                                                                       0.94
    In order to assess the sensitivity of ICA in terms of the
dimension of the compressed and whitened space where it                                                                                                0.93

                                                                                                                              correct retrieval rate
is implemented, we carried out a comparative assessment                                                                                                0.92
for different dimensional ( ) whitened subspace. Fig. 3
                                                                                                                                                       0.91
and Fig. 4 show the ICA face recognition performance
when different  is chosen during the whitening transfor-                                                                                               0.9

mation (see Eq. 9) and as the number of features used can
                                                                                                                                                       0.89
range up to the dimension ofW the compressed space. The
curve corresponding to           performs better than all the                                                                                         0.88                                                                    m = 30
                                                                                                                                                                                                                               m = 40
other curves obtained for different  values. Fig. 4 shows                                                                                             0.87
                                                                                                                                                                                                                               m = 50
                                                                                                                                                                                                                               m = 70
that ICA performance deteriorates quite severely as  is
                                                                                                                                                       0.86
getting larger. The reason for this deterioration is that for                                                                                                 20        25   30    35      40     45       50       55   60      65         70
                                                                                                                                                                                          number of features
the large values of  , more small trailing eigenvalues (see
                                                                                                                                                                    
                                                                                                                                                          ! "$#&%(' $+ .!065&4% 4 <
                                                                                                                                                                                                 . % 0;4 P' 
                                                                                                                                                                                                              . >
                                                                                                                                                                                         ! "
Fig. 2 ) are included in the whitening step and this leads
                                                                                                                                                        I!T= P%@:06 4 A P%@ AK'C%@T.    B '
                                                                                                                                                                                             C                    W
                                                                                                                                                                        #
to an unstable transformation. Note that smaller values for                                                                                                                                                    
 lead to decreased ICA performance as well because the                                                                                                  %(' =5!' PA '
consequence overfitting takes place. To improve the gen-                                                                               0.96

                                *)
eralization ability of Fisherfaces, we implemented it in the
           PCA subspace. The comparative performance                                                                                  0.94
of eigenfaces, Fisherfaces and ICA is W plotted in Fig. 5
when ICA was implemented in the            PCA subspace.
Fig. 5 shows that ICA criterion performs better than both

                                                                                                              correct retrieval rate
                                                                                                                                       0.92

eigenfaces and Fisherfaces.
   We also assessed the ICA discriminant criterion against
                                                                                                                                        0.9
two other popular discriminant criteria: the MAP rule of
the Bayes classifier and the Fisher’s linear discriminant
as they are embedded within the Probabilistic Reasoning                                                                                0.88

Models (PRM) [12] and the Enhanced FLD Models (EFM)                                                                                                                                                                              ICA
[11]. Fig. 6 plots the comparative performances of PRM-1                                                                                                                                                                         PRM−1
                                                                                                                                       0.86
and EFM-1 against the ICA method with  again being set                                                                                                                                                                          EFM−1

to 40. ICA is shown to have comparable face recognition                                                                                        18        20         22        24    26    28      30    32          34      36     38
                                                                                                                                                                                     number of features
performance with the MAP rule of the Bayes classifier and
the Fisher’s linear discriminant as embedded within PRM                                                                                                                             
                                                                                                                                        ! "$#&%(' $+ . 065&4% 4 .C% 0;4 P' .C> '< "&'
                                     "                   "    
                          > 4CP'= !T=BC'(4CP' = 4 &I             +                                                                               
                                                                                                                                        ! "$#&%(' $+ .!0 54%@4 A           ' 5!'(.C% 0;4&P'.C>   4 I
                                                                                                                                         JP.!06L8 C'
0.95                                                                                       the 4th Annual Joint Symposium on Neural Computation,
                                                                                                                    Pasadena, CA, May 17, 1997.
                          0.9
                                                                                                              [2]   P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces
                         0.85                                                                                       vs. fisherfaces: Recognition using class specific linear pro-
                                                                                                                    jection. IEEE Trans. Pattern Analysis and Machine Intelli-
                          0.8                                                                                       gence, 19(7):711–720, 1997.
correct retrieval rate

                                                                                                              [3]   R. Brunelli and T. Poggio. Face recognition: Features vs.
                         0.75
                                                                                                                    templates. IEEE Trans. Pattern Analysis and Machine In-
                          0.7                                                                                       telligence, 15(10):1042–1053, 1993.
                                                                                                              [4]   R. Chellappa, C. Wilson, and S. Sirohey. Human and
                         0.65                                                                                       machine recognition of faces: A survey. Proc. IEEE,
                                                                                                                    83(5):705–740, 1995.
                          0.6
                                                                                                              [5]   P. Comon. Independent component analysis, a new con-
                                                                                          ICA
                         0.55                                                                                       cept? Signal Processing, 36:287–314, 1994.
                                                                                          ICA+FLD
                                                                                                              [6]   K. Fukunaga. Introduction to Statistical Pattern Recogni-
                          0.5
                             5      10         15       20         25           30      35          40
                                                                                                                    tion. Academic Press, second edition, 1991.
                                                      number of features                                      [7]   A. Hyvarinen and E. Oja. A fast fixed-point algorithm
                          !T" &                                        
                               # %(' + .!0 54%@4 A ' 5!'C% >(.C% 0;4 P'.>        4 I                   for independent component analysis. Neural Computation,
                                      
                              ?P.!0 L& ' I    
                                                TA B 
                                                       +                                                  [8]
                                                                                                                    9:1483–1492, 1997.
                                                                                                                    I. T. Jolliffe. Principal Component Analysis. Springer, New
                                                                                                                    York, 1986.
                                                                                                              [9]   J. Karhunen, E. Oja, L. Wang, R. Vigario, and J. Joutsen-
                                                                                                                    salo. A class of neural networks for independent compo-
6. Conclusions                                                                                                      nent analysis. IEEE Trans. on Neural Networks, 8(3):486–
                                                                                                                    504, 1997.
    This paper addresses the relative usefulness of the inde-                                                [10]   M. Kirby and L. Sirovich. Application of the karhunen-
pendent component analysis for Face Recognition. Com-                                                               loeve procedure for the characterization of human faces.
parative assessments are made regarding (i) ICA sensitiv-                                                           IEEE Trans. Pattern Analysis and Machine Intelligence,
ity to the dimension of the space where it is carried out,                                                          12(1):103–108, 1990.
                                                                                                             [11]   C. Liu and H. Wechsler. Enhanced fisher linear discrimi-
and (ii) ICA discriminant performance alone or when com-
                                                                                                                    nant models for face recognition. In Proc. the 14th Inter-
bined with other discriminant criteria such as the MAP cri-                                                         national Conference on Pattern Recognition, Queensland,
teria of the Bayes classifier or the Fisher’s linear discrim-                                                       Australia, August 17-20, 1998.
inant. The sensitivity analysis suggests that for enhanced                                                   [12]   C. Liu and H. Wechsler. Probabilistic reasoning models for
performance ICA should be carried out in a compressed                                                               face recognition. In Proc. IEEE Computer Society Confer-
and whitened space where most of the representative in-                                                             ence on Computer Vision and Pattern Recognition, Santa
formation of the original data is preserved and the small                                                           Barbara, California, USA, June 23-25, 1998.
trailing eigenvalues discarded. The dimensionality of the                                                    [13]   B. Moghaddam and A. Pentland. Probabilistic visual learn-
compressed subspace is decided based on the eigenvalue                                                              ing for object representation. IEEE Trans. Pattern Analysis
                                                                                                                    and Machine Intelligence, 19(7):696–710, 1997.
spectrum from the training data. The discriminant analysis                                                   [14]   B. A. Olshausen and D. J. Field. Emergence of simple-
shows that the ICA criterion, when carried out in the prop-                                                         cell receptive field properties by learning a sparse code for
erly compressed and whitened space, performs better than                                                            natural images. Nature, 381(13):607–609, 1996.
the eigenfaces and Fisherfaces methods for face recogni-                                                     [15]   P. Penev and J. Atick. Local feature analysis: A general
tion, but its performance deteriorates significantly when                                                           statistical theory for object representation. Network: Com-
augmented by an additional discriminant criteria such as                                                            putation in Neural Systems, 7:477–500, 1996.
the FLD.                                                                                                     [16]   P. Phillips, H. Moon, P. Rauss, and S. Rizvi. The FERET
                                                                                                                    september 1996 database and evaluation procedure. In
                                                                                                                    Proc. First Int’l Conf. on Audio and Video-based Biometric
Acknowledgments: This work was partially supported                                                                  Person Authentication, pages 12–14, Switzerland, 1997.
by the DoD Counterdrug Technology Development Pro-                                                           [17]   A. Samal and P. Iyengar. Automatic recognition and analy-
gram, with the U.S. Army Research Laboratory as Techni-                                                             sis of human faces and facial expression: A survey. Pattern
cal Agent, under contract DAAL01-97-K-0118.                                                                         Recognition, 25(1):65–77, 1992.
                                                                                                             [18]   D. L. Swets and J. Weng. Using discriminant eigenfeatures
                                                                                                                    for image retrieval. IEEE Trans. on PAMI, 18(8):831–836,
References                                                                                                          1996.
                                                                                                             [19]   M. Turk and A. Pentland. Eigenfaces for recognition. Jour-
                   [1] M. Bartlett and T. Sejnowski. Independent components of                                      nal of Cognitive Neuroscience, 13(1):71–86, 1991.
                       face images: A representation for face recognition. In Proc.

                                                                                                         6
You can also read