Discriminant Kernels derived from the optimum nonlinear discriminant analysis

Author

Kurita, Takio

Author_Institution

Hiroshima Univ., Hiroshima, Japan

fYear

2011

fDate

July 31 2011-Aug. 5 2011

Firstpage

299

Lastpage

306

Abstract

Linear discriminant analysis (LDA) is one of the well known methods to extract the best features for multi-class discrimination. Recently Kernel discriminant analysis (KDA) has been successfully applied in many applications. KDA is one of the nonlinear extensions of LDA and construct nonlinear discriminant mapping by using kernel functions. But the kernel function is usually defined a priori and it is not known what the optimum kernel function for nonlinear discriminant analysis is. Also the class information is not usually introduced to define the kernel functions. In this paper the optimum kernel function in terms of the discriminant criterion is derived by investigating the optimum discriminant mapping constructed by the optimum nonlinear discriminant analysis (ONDA). Otsu derived the optimum nonlinear discriminant analysis (ONDA) by assuming the underlying probabilities similar with the Bayesian decision theory. He showed that the optimum non linear discriminant mapping was obtained by using Variational Calculus. The optimum nonlinear discriminant mapping can be defined as a linear combination of the Bayesian a posterior probabilities and the coefficients of the linear combination are obtained by solving the eigenvalue problem of the matrices defined by using the Bayesian a posterior probabilities. This means that the ONDA is closely related to Bayesian decision theory. Also Otsu showed that LDA could be interpreted as a linear approximation of the ONDA through the linear approximation of the Bayesian a posterior probabilities. In this paper, the optimum kernel function is derived by investigating the optimum discriminant mapping constructed by ONDA. The derived kernel function is also given by using the Bayesian a posterior probabilities. This means that the class information is naturally introduced in the kernel function. For real application, we can define a family of discriminate kernel functions can be defined by changing the estimation method of the Bayesi- - an a posterior probabilities.

Keywords

Bayes methods; decision theory; eigenvalues and eigenfunctions; matrix algebra; variational techniques; Bayesian a posterior probabilities; Bayesian decision theory; class information; discriminant criterion; discriminant kernels; eigenvalue problem; feature extraction; kernel discriminant analysis; linear approximation; matrices; multiclass discrimination; optimum kernel function; optimum nonlinear discriminant analysis; optimum nonlinear discriminant mapping; variational calculus; Bayesian methods; Covariance matrix; Decision theory; Eigenvalues and eigenfunctions; Kernel; Linear approximation; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks (IJCNN), The 2011 International Joint Conference on

Conference_Location

San Jose, CA

ISSN

2161-4393

Print_ISBN

978-1-4244-9635-8

Type

conf

DOI

10.1109/IJCNN.2011.6033235

Filename

6033235