Title :
Selecting Discrete and Continuous Features Based on Neighborhood Decision Error Minimization
Author :
Hu, Qinghua ; Pedrycz, Witold ; Yu, Daren ; Lang, Jun
Author_Institution :
Harbin Inst. of Technol., Harbin, China
Abstract :
Feature selection plays an important role in pattern recognition and machine learning. Feature evaluation and classification complexity estimation arise as key issues in the construction of selection algorithms. To estimate classification complexity in different feature subspaces, a novel feature evaluation measure, called the neighborhood decision error rate (NDER), is proposed, which is applicable to both categorical and numerical features. We first introduce a neighborhood rough-set model to divide the sample set into decision positive regions and decision boundary regions. Then, the samples that fall within decision boundary regions are further grouped into recognizable and misclassified subsets based on class probabilities that occur in neighborhoods. The percentage of misclassified samples is viewed as the estimate of classification complexity of the corresponding feature subspaces. We present a forward greedy strategy for searching the feature subset, which minimizes the NDER and, correspondingly, minimizes the classification complexity of the selected feature subset. Both theoretical and experimental comparison with other feature selection algorithms shows that the proposed algorithm is effective for discrete and continuous features, as well as their mixture.
Keywords :
category theory; computational complexity; decision theory; error statistics; feature extraction; greedy algorithms; learning (artificial intelligence); minimisation; pattern classification; rough set theory; sampling methods; search problems; NDER; categorical feature; class probability; classification complexity estimation; continuous feature subspace selection algorithm; decision boundary region; decision positive region; discrete feature subspace selection algorithm; feature evaluation measure; feature subset search; forward greedy strategy; machine learning; neighborhood decision error rate minimization; neighborhood rough-set model; numerical feature; pattern recognition; sample set; Continuous feature; decision error minimization; discrete feature; feature selection; neighborhood; rough sets;
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
DOI :
10.1109/TSMCB.2009.2024166