• DocumentCode
    2251105
  • Title

    On the Use of Data Mining Tools for Data Preparation in Classification Problems

  • Author

    Goncalves, Paulo M. ; Barros, Roberto S M ; Vieira, Davi C L

  • Author_Institution
    Centro de Inf., Univ. Fed. de Pernambuco, Recife, Brazil
  • fYear
    2012
  • fDate
    May 30 2012-June 1 2012
  • Firstpage
    173
  • Lastpage
    178
  • Abstract
    The data preparation phase is a critical step in the KDD (Knowledge Discovery in Databases) process. This phase is crucial for a good data mining result because if data is not correctly prepared, all the next phases of the process are compromised. DMPML is a framework that stores preprocessed data for different data mining algorithms in an XML document and retrieves the correct codification by the use of an XSLT document according to the needs of the data mining algorithm. This paper presents a comparison between DMPML and three data mining applications (Weka, Rapid Miner, and KNIME) that implement the directed graph approach, concerning the time spent to create and execute the data preparation tasks for two data mining algorithms. The tests were executed using different types of data sets: numerical, categorical, and mixed. We observed that the scheme used by DMPML can simplify the usage of different data mining algorithms and significantly reduce the time spent creating the data preparation tasks.
  • Keywords
    XML; data mining; data preparation; directed graphs; pattern classification; DMPML; KDD process; XML document; XSLT document; classification problems; data mining algorithms; data mining tools; data preparation; data preparation tasks; directed graph approach; knowledge discovery in databases; Communities; Computers; Data mining; Educational institutions; Testing; Time measurement; XML; DMPML; Data preparation; Tools comparison; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Science (ICIS), 2012 IEEE/ACIS 11th International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-1536-4
  • Type

    conf

  • DOI
    10.1109/ICIS.2012.79
  • Filename
    6211093