DocumentCode :
1760995
Title :
Emphasizing Minority Class in LDA for Feature Subset Selection on High-Dimensional Small-Sized Problems
Author :
Feng Yang ; Mao, K.Z. ; Lee, Gary Kee Khoon ; Wenyin Tang
Author_Institution :
Dept. of Comput. Sci., Agency for Sci., Technol. & Res. (A*STAR), Singapore, Singapore
Volume :
27
Issue :
1
fYear :
2015
fDate :
Jan. 1 2015
Firstpage :
88
Lastpage :
101
Abstract :
Although mostly used for pattern classification, linear discriminant analysis (LDA) can also be used in feature selection as an effective measure to evaluate the separative ability of a feature subset. When applied to feature selection on high-dimensional small-sized (HDSS) data (generally) with class-imbalance, LDA encounters four problems, including singularity of scatter matrix, overfitting, overwhelming and prohibitively computational complexity. In this study, we propose the LDA-based feature selection method minority class emphasized linear discriminant analysis (MCE-LDA) with a new regularization technique to address the first three problems. Different to giving equal or more emphasis to majority class in conventional forms of regularization, the proposed regularization emphasizes more on minority class, with the expectation of improving overall performance by alleviating overwhelming of majority class to minority class as well as overfitting in minority class. In order to reduce computational overhead, an incremental implementation of LDA-based feature selection has been introduced. Comparative studies with other forms of regularization to LDA as well as with other popular feature selection methods on five HDSS problems show that MCE-LDA can produce feature subsets with excellent performance in both classification and robustness. Further experimental results of true positive rate (TPR) and true negative rate (TNR) have also verified the effectiveness of the proposed technique in alleviating overwhelming and overfitting problems.
Keywords :
feature selection; pattern classification; HDSS data; MCE-LDA; TNR; TPR; feature subset selection; feature subset separative ability; high-dimensional small-sized problems; majority class; minority class emphasized linear discriminant analysis; pattern classification; regularization technique; true negative rate; true positive rate; Computational complexity; Data engineering; Error analysis; IEEE transactions; Knowledge engineering; Linear discriminant analysis; Vectors; Feature subset selection; class emphasis; classification; regularized linear discriminant analysis; robustness;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2014.2320732
Filename :
6807689
Link To Document :
بازگشت