DocumentCode
424117
Title
Efficient feature selection for high-dimensional data using two-level filter
Author
Li, Yun ; Wu, Zhong-Fu ; Liu, Jia-Min ; Tang, Yan-Yun
Author_Institution
Dept. of Comput., Chongqing Univ., China
Volume
3
fYear
2004
fDate
26-29 Aug. 2004
Firstpage
1711
Abstract
Feature selection is a key problem to pattern recognition and machine learning, and it is difficult to get the optimal feature subset for its NP-hard. Currently, the dimensionality of feature set or instance set is very high in many applications, such as information retrieval, so the feature selection from high-dimensional data is also an urgent task for researchers. This paper presents a new approach, which is a two-level filter model system integrating the relief and a newly developed algorithm of feature cluster, to reduce the dimensionality of large-scale feature set via the feature correlation (relevance) including the feature-feature correlation and feature-class correlation. Our major contributions are: (1) to present a system to perform feature selection from high-dimensional data; (2) to analyze the change of system architecture according to the time cost of the parts in the system; (3) to summarize and comment on the calculations of feature correlation; (4) to perform experiments to show the effective of the proposed approach, which has shown that the system can efficiently get a better compromise between dimensionality reduction and accuracy rate of classification than just part of the system. In many cases, it can improve the accuracy rate and dimensionality reduction.
Keywords
computational complexity; correlation theory; feature extraction; filtering theory; learning (artificial intelligence); optimisation; pattern classification; pattern clustering; set theory; NP-hard problems; classification accuracy; dimensionality reduction; feature class correlation; feature clustering algorithm; feature selection; feature-feature correlation; high dimensional data; information retrieval; large scale feature set; machine learning; optimal feature subset; pattern recognition; system architecture; two level filter model system; Clustering algorithms; Costs; Educational institutions; Information retrieval; Large scale integration; Machine learning; Optical computing; Optical filters; Optical materials; Pattern recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN
0-7803-8403-2
Type
conf
DOI
10.1109/ICMLC.2004.1382051
Filename
1382051
Link To Document