DocumentCode :
2314929
Title :
Unified Strategy for Feature Selection and Data Imputation
Author :
Bratu, Camelia Vidrighin ; Potolea, Rodica
Author_Institution :
Comput. Sci. Dept., Tech. Univ. of Cluj-Napoca, Cluj-Napoca, Romania
fYear :
2009
fDate :
26-29 Sept. 2009
Firstpage :
413
Lastpage :
419
Abstract :
Data-related issues represent the main causes for insufficient performance in data mining. Existing strategies for tackling these issues include procedures for handling incomplete data - mandatory in various schemes, and feature selection, both augmenting the learning process. Our previous work on data imputation has shown that a good imputation policy for strongly correlated attributes with the class can improve the learning accuracy. Moreover, feature selection also enhances the performance of an inducer. The focus of the paper is to validate the performance and stability of our combined methodology for pre-processing data. The novelty of the method resides in the combination of feature selection with data imputation, in order to obtain an improved version of the training set. The experimental results have shown that, when mining incomplete data, our combined pre-processing methodology boosts the accuracy of a classifier. Moreover, it is more successful than each of the individual steps it combines, feature selection and imputation, producing better or similar results.
Keywords :
data handling; data mining; learning (artificial intelligence); pattern classification; data handling; data imputation; data mining; feature selection; learning process; Cleaning; Computer science; Data analysis; Data mining; Data preprocessing; Filtering; Humans; Performance evaluation; Scientific computing; Stability; classification; combined methodology; feature selection; imputation; pre-processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2009 11th International Symposium on
Conference_Location :
Timisoara
Print_ISBN :
978-1-4244-5910-0
Electronic_ISBN :
978-1-4244-5911-7
Type :
conf
DOI :
10.1109/SYNASC.2009.53
Filename :
5460822
Link To Document :
بازگشت