DocumentCode
3208152
Title
Feature selection for classifying high-dimensional numerical data
Author
Wu, Yimin ; Zhang, Aidong
Author_Institution
Dept. of Comput. Sci. & Eng., SUNY, Buffalo, NY, USA
Volume
2
fYear
2004
fDate
27 June-2 July 2004
Abstract
Classifying high-dimensional numerical data is a very challenging problem. In high dimensional feature spaces, the performance of supervised learning methods suffers from the curse of dimensionality, which degrades both classification accuracy and efficiency. To address this issue, we present an efficient feature selection method to facilitate classifying high-dimensional numerical data. Our method employs balanced information gain to measure the contribution of each feature (for data classification); and it calculates feature correlation with a novel extension of balanced information gain. By integrating feature contribution and correlation, our feature selection approach uses a forward sequential selection algorithm to select uncorrelated features with large balanced information gain. Extensive experiments have been carried out on image and gene microarray datasets to demonstrate the effectiveness and robustness of the presented method.
Keywords
data analysis; learning (artificial intelligence); pattern classification; data classification; feature selection method; forward sequential selection algorithm; gene microarray datasets; high-dimensional numerical data; supervised learning methods; Bioinformatics; Feedback; Filters; Gain measurement; Information retrieval; Multimedia systems; Pattern recognition; Supervised learning; Training data; Uncertainty;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
ISSN
1063-6919
Print_ISBN
0-7695-2158-4
Type
conf
DOI
10.1109/CVPR.2004.1315171
Filename
1315171
Link To Document