• DocumentCode
    2314929
  • Title

    Unified Strategy for Feature Selection and Data Imputation

  • Author

    Bratu, Camelia Vidrighin ; Potolea, Rodica

  • Author_Institution
    Comput. Sci. Dept., Tech. Univ. of Cluj-Napoca, Cluj-Napoca, Romania
  • fYear
    2009
  • fDate
    26-29 Sept. 2009
  • Firstpage
    413
  • Lastpage
    419
  • Abstract
    Data-related issues represent the main causes for insufficient performance in data mining. Existing strategies for tackling these issues include procedures for handling incomplete data - mandatory in various schemes, and feature selection, both augmenting the learning process. Our previous work on data imputation has shown that a good imputation policy for strongly correlated attributes with the class can improve the learning accuracy. Moreover, feature selection also enhances the performance of an inducer. The focus of the paper is to validate the performance and stability of our combined methodology for pre-processing data. The novelty of the method resides in the combination of feature selection with data imputation, in order to obtain an improved version of the training set. The experimental results have shown that, when mining incomplete data, our combined pre-processing methodology boosts the accuracy of a classifier. Moreover, it is more successful than each of the individual steps it combines, feature selection and imputation, producing better or similar results.
  • Keywords
    data handling; data mining; learning (artificial intelligence); pattern classification; data handling; data imputation; data mining; feature selection; learning process; Cleaning; Computer science; Data analysis; Data mining; Data preprocessing; Filtering; Humans; Performance evaluation; Scientific computing; Stability; classification; combined methodology; feature selection; imputation; pre-processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2009 11th International Symposium on
  • Conference_Location
    Timisoara
  • Print_ISBN
    978-1-4244-5910-0
  • Electronic_ISBN
    978-1-4244-5911-7
  • Type

    conf

  • DOI
    10.1109/SYNASC.2009.53
  • Filename
    5460822