• DocumentCode
    1760995
  • Title

    Emphasizing Minority Class in LDA for Feature Subset Selection on High-Dimensional Small-Sized Problems

  • Author

    Feng Yang ; Mao, K.Z. ; Lee, Gary Kee Khoon ; Wenyin Tang

  • Author_Institution
    Dept. of Comput. Sci., Agency for Sci., Technol. & Res. (A*STAR), Singapore, Singapore
  • Volume
    27
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan. 1 2015
  • Firstpage
    88
  • Lastpage
    101
  • Abstract
    Although mostly used for pattern classification, linear discriminant analysis (LDA) can also be used in feature selection as an effective measure to evaluate the separative ability of a feature subset. When applied to feature selection on high-dimensional small-sized (HDSS) data (generally) with class-imbalance, LDA encounters four problems, including singularity of scatter matrix, overfitting, overwhelming and prohibitively computational complexity. In this study, we propose the LDA-based feature selection method minority class emphasized linear discriminant analysis (MCE-LDA) with a new regularization technique to address the first three problems. Different to giving equal or more emphasis to majority class in conventional forms of regularization, the proposed regularization emphasizes more on minority class, with the expectation of improving overall performance by alleviating overwhelming of majority class to minority class as well as overfitting in minority class. In order to reduce computational overhead, an incremental implementation of LDA-based feature selection has been introduced. Comparative studies with other forms of regularization to LDA as well as with other popular feature selection methods on five HDSS problems show that MCE-LDA can produce feature subsets with excellent performance in both classification and robustness. Further experimental results of true positive rate (TPR) and true negative rate (TNR) have also verified the effectiveness of the proposed technique in alleviating overwhelming and overfitting problems.
  • Keywords
    feature selection; pattern classification; HDSS data; MCE-LDA; TNR; TPR; feature subset selection; feature subset separative ability; high-dimensional small-sized problems; majority class; minority class emphasized linear discriminant analysis; pattern classification; regularization technique; true negative rate; true positive rate; Computational complexity; Data engineering; Error analysis; IEEE transactions; Knowledge engineering; Linear discriminant analysis; Vectors; Feature subset selection; class emphasis; classification; regularized linear discriminant analysis; robustness;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2320732
  • Filename
    6807689