Title :
C-Focus-3: a C-Focus with a New Heuristic Search Strategy
Author :
Santoro, Daniel M. ; Nicoletti, Maria Do Carmo ; Hruschka, Estevam R., Jr.
Author_Institution :
Univ. Fed. de S. Carlos, Sao Carlos
Abstract :
The problem of feature selection is particularly important in areas such as machine learning and data mining. Given a training set which generally is described as a set of instances, each of them represented as a vector of feature-value pairs and an associated class, a feature selection method tries to identify features that are relevant for describing the concept embedded in the training set. The Focus algorithm is a very popular feature subset approach used in discrete data; its version for continuous data is known as C-Focus. The selection process implemented by Focus searches for the smallest feature subset that does not produce inconsistency in the training set. Due to its search mechanism, however, the use of the Focus (or C-Focus) becomes restrictive when the number of features is very large. This paper proposes a heuristic search method to be used by Focus (C-Focus), aiming at improving its performance. Empirical results of a system implementing C-Focus with the new search strategy (C-Focus-3) show that the proposed heuristic performs very well when the relevant subset of features has more than half of the number of features present in the original training set.
Keywords :
data mining; learning (artificial intelligence); search problems; C-Focus-3; data mining; feature selection; feature-value pairs; heuristic search strategy; machine learning; Data mining; Filters; Gain measurement; Intelligent systems; Learning systems; Machine learning; Machine learning algorithms; Proposals; Search methods; Search problems;
Conference_Titel :
Intelligent Systems Design and Applications, 2007. ISDA 2007. Seventh International Conference on
Conference_Location :
Rio de Janeiro
Print_ISBN :
978-0-7695-2976-9
DOI :
10.1109/ISDA.2007.20