مرکز منطقه ای اطلاع رساني علوم و فناوري - Efficient feature selection for high-dimensional data using two-level filter

DocumentCode :

424117

Title :

Efficient feature selection for high-dimensional data using two-level filter

Author :

Li, Yun ; Wu, Zhong-Fu ; Liu, Jia-Min ; Tang, Yan-Yun

Author_Institution :

Dept. of Comput., Chongqing Univ., China

Volume :

fYear :

2004

fDate :

26-29 Aug. 2004

Firstpage :

1711

Abstract :

Feature selection is a key problem to pattern recognition and machine learning, and it is difficult to get the optimal feature subset for its NP-hard. Currently, the dimensionality of feature set or instance set is very high in many applications, such as information retrieval, so the feature selection from high-dimensional data is also an urgent task for researchers. This paper presents a new approach, which is a two-level filter model system integrating the relief and a newly developed algorithm of feature cluster, to reduce the dimensionality of large-scale feature set via the feature correlation (relevance) including the feature-feature correlation and feature-class correlation. Our major contributions are: (1) to present a system to perform feature selection from high-dimensional data; (2) to analyze the change of system architecture according to the time cost of the parts in the system; (3) to summarize and comment on the calculations of feature correlation; (4) to perform experiments to show the effective of the proposed approach, which has shown that the system can efficiently get a better compromise between dimensionality reduction and accuracy rate of classification than just part of the system. In many cases, it can improve the accuracy rate and dimensionality reduction.

Keywords :

computational complexity; correlation theory; feature extraction; filtering theory; learning (artificial intelligence); optimisation; pattern classification; pattern clustering; set theory; NP-hard problems; classification accuracy; dimensionality reduction; feature class correlation; feature clustering algorithm; feature selection; feature-feature correlation; high dimensional data; information retrieval; large scale feature set; machine learning; optimal feature subset; pattern recognition; system architecture; two level filter model system; Clustering algorithms; Costs; Educational institutions; Information retrieval; Large scale integration; Machine learning; Optical computing; Optical filters; Optical materials; Pattern recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on

Print_ISBN :

0-7803-8403-2

Type :

conf

DOI :

10.1109/ICMLC.2004.1382051

Filename :

1382051

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=424117