• DocumentCode
    1971414
  • Title

    Supervised multivariate discretization in mixed data with Random Forests

  • Author

    Berrado, Abdelaziz ; Runger, Georger C.

  • Author_Institution
    Ind. Eng., EMI, Rabat
  • fYear
    2009
  • fDate
    10-13 May 2009
  • Firstpage
    211
  • Lastpage
    217
  • Abstract
    Discretizing continuous attributes is necessary before association rules mining or using several inductive learning algorithms with a heterogeneous data space. This data preprocessing step should be carried out with a minimum information loss; that is the mutual information between attributes on the one hand and between attributes and the class labels on the other should not be destroyed. This paper introduces a novel supervised, global and dynamic discretization algorithm, called RFDisc (Random Forests Discretizer). It derives its ability in conserving the data properties from the Random Forests learning algorithm. RFDisc is simple, relatively fast and learns automatically the number of bins into which each continuous attribute is to be discretized. Empirical results indicate that the accuracies of classification algorithms such as CART when used with several data sets are comparable before and after discretization using RFDisc. Furthermore, C5.0 achieves the highest classification accuracy with data discretized with RFDisc when compared with other well known discretization algorithms.
  • Keywords
    data mining; learning by example; random processes; RFDisc; association rules mining; classification algorithms; continuous attributes; data preprocessing step; dynamic discretization algorithm; inductive learning algorithms; random forests discretizer; random forests learning algorithm; supervised multivariate discretization; Association rules; Classification algorithms; Data mining; Design automation; Electromagnetic interference; Heuristic algorithms; Industrial engineering; Information entropy; Machine learning algorithms; Partitioning algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Systems and Applications, 2009. AICCSA 2009. IEEE/ACS International Conference on
  • Conference_Location
    Rabat
  • Print_ISBN
    978-1-4244-3807-5
  • Electronic_ISBN
    978-1-4244-3806-8
  • Type

    conf

  • DOI
    10.1109/AICCSA.2009.5069327
  • Filename
    5069327