• DocumentCode
    906967
  • Title

    CAIM discretization algorithm

  • Author

    Kurgan, Lukasz A. ; Cios, Krzysztof J.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Alberta Univ., Edmonton, Alta., Canada
  • Volume
    16
  • Issue
    2
  • fYear
    2004
  • Firstpage
    145
  • Lastpage
    153
  • Abstract
    The task of extracting knowledge from databases is quite often performed by machine learning algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discretization algorithm that transforms continuous attributes into discrete ones. We describe such an algorithm, called CAIM (class-attribute interdependence maximization), which is designed to work with supervised data. The goal of the CAIM algorithm is to maximize the class-attribute interdependence and to generate a (possibly) minimal number of discrete intervals. The algorithm does not require the user to predefine the number of intervals, as opposed to some other discretization algorithms. The tests performed using CAIM and six other state-of-the-art discretization algorithms show that discrete attributes generated by the CAIM algorithm almost always have the lowest number of intervals and the highest class-attribute interdependency. Two machine learning algorithms, the CLIP4 rule algorithm and the decision tree algorithm, are used to generate classification rules from data discretized by CAIM. For both the CLIP4 and decision tree algorithms, the accuracy of the generated rules is higher and the number of the rules is lower for data discretized using the CAIM algorithm when compared to data discretized using six other discretization algorithms. The highest classification accuracy was achieved for data sets discretized with the CAIM algorithm, as compared with the other six algorithms.
  • Keywords
    data mining; decision trees; learning (artificial intelligence); optimisation; very large databases; CLIP4 rule algorithm; class-attribute interdependence maximization; classification rules; decision tree algorithm; discretization algorithm; machine learning algorithm; supervised data discretization; Algorithm design and analysis; Classification tree analysis; Data mining; Decision trees; Discrete transforms; Internet; Machine learning algorithms; Performance evaluation; Spatial databases; Testing;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2004.1269594
  • Filename
    1269594