• DocumentCode
    3492632
  • Title

    Discriminant Kernels derived from the optimum nonlinear discriminant analysis

  • Author

    Kurita, Takio

  • Author_Institution
    Hiroshima Univ., Hiroshima, Japan
  • fYear
    2011
  • fDate
    July 31 2011-Aug. 5 2011
  • Firstpage
    299
  • Lastpage
    306
  • Abstract
    Linear discriminant analysis (LDA) is one of the well known methods to extract the best features for multi-class discrimination. Recently Kernel discriminant analysis (KDA) has been successfully applied in many applications. KDA is one of the nonlinear extensions of LDA and construct nonlinear discriminant mapping by using kernel functions. But the kernel function is usually defined a priori and it is not known what the optimum kernel function for nonlinear discriminant analysis is. Also the class information is not usually introduced to define the kernel functions. In this paper the optimum kernel function in terms of the discriminant criterion is derived by investigating the optimum discriminant mapping constructed by the optimum nonlinear discriminant analysis (ONDA). Otsu derived the optimum nonlinear discriminant analysis (ONDA) by assuming the underlying probabilities similar with the Bayesian decision theory. He showed that the optimum non linear discriminant mapping was obtained by using Variational Calculus. The optimum nonlinear discriminant mapping can be defined as a linear combination of the Bayesian a posterior probabilities and the coefficients of the linear combination are obtained by solving the eigenvalue problem of the matrices defined by using the Bayesian a posterior probabilities. This means that the ONDA is closely related to Bayesian decision theory. Also Otsu showed that LDA could be interpreted as a linear approximation of the ONDA through the linear approximation of the Bayesian a posterior probabilities. In this paper, the optimum kernel function is derived by investigating the optimum discriminant mapping constructed by ONDA. The derived kernel function is also given by using the Bayesian a posterior probabilities. This means that the class information is naturally introduced in the kernel function. For real application, we can define a family of discriminate kernel functions can be defined by changing the estimation method of the Bayesi- - an a posterior probabilities.
  • Keywords
    Bayes methods; decision theory; eigenvalues and eigenfunctions; matrix algebra; variational techniques; Bayesian a posterior probabilities; Bayesian decision theory; class information; discriminant criterion; discriminant kernels; eigenvalue problem; feature extraction; kernel discriminant analysis; linear approximation; matrices; multiclass discrimination; optimum kernel function; optimum nonlinear discriminant analysis; optimum nonlinear discriminant mapping; variational calculus; Bayesian methods; Covariance matrix; Decision theory; Eigenvalues and eigenfunctions; Kernel; Linear approximation; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2011 International Joint Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4244-9635-8
  • Type

    conf

  • DOI
    10.1109/IJCNN.2011.6033235
  • Filename
    6033235