DocumentCode :
424117
Title :
Efficient feature selection for high-dimensional data using two-level filter
Author :
Li, Yun ; Wu, Zhong-Fu ; Liu, Jia-Min ; Tang, Yan-Yun
Author_Institution :
Dept. of Comput., Chongqing Univ., China
Volume :
3
fYear :
2004
fDate :
26-29 Aug. 2004
Firstpage :
1711
Abstract :
Feature selection is a key problem to pattern recognition and machine learning, and it is difficult to get the optimal feature subset for its NP-hard. Currently, the dimensionality of feature set or instance set is very high in many applications, such as information retrieval, so the feature selection from high-dimensional data is also an urgent task for researchers. This paper presents a new approach, which is a two-level filter model system integrating the relief and a newly developed algorithm of feature cluster, to reduce the dimensionality of large-scale feature set via the feature correlation (relevance) including the feature-feature correlation and feature-class correlation. Our major contributions are: (1) to present a system to perform feature selection from high-dimensional data; (2) to analyze the change of system architecture according to the time cost of the parts in the system; (3) to summarize and comment on the calculations of feature correlation; (4) to perform experiments to show the effective of the proposed approach, which has shown that the system can efficiently get a better compromise between dimensionality reduction and accuracy rate of classification than just part of the system. In many cases, it can improve the accuracy rate and dimensionality reduction.
Keywords :
computational complexity; correlation theory; feature extraction; filtering theory; learning (artificial intelligence); optimisation; pattern classification; pattern clustering; set theory; NP-hard problems; classification accuracy; dimensionality reduction; feature class correlation; feature clustering algorithm; feature selection; feature-feature correlation; high dimensional data; information retrieval; large scale feature set; machine learning; optimal feature subset; pattern recognition; system architecture; two level filter model system; Clustering algorithms; Costs; Educational institutions; Information retrieval; Large scale integration; Machine learning; Optical computing; Optical filters; Optical materials; Pattern recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN :
0-7803-8403-2
Type :
conf
DOI :
10.1109/ICMLC.2004.1382051
Filename :
1382051
Link To Document :
بازگشت