Abstract :
Recent studies in data mining have proposed a new classification approach, called associative classification, which, according to several reports, such as Liu, B. et al (1998), achieves higher classification accuracy than traditional classification approaches such as C4.S However, the approach also suffers from two major deficiencies: (1) it generates a very large number of association rules, which leads to high processing overhead; and (2) its confidence-based rule evaluation measure may lead to overfitting. In comparison with associative classification, traditional rule-based classifiers, such as C4.5, FOIL and RIPPER, are substantially faster but their accuracy, in most cases, may not be as high. In this paper, we propose a new classification approach, CLoPAR (Classification based on Predictive Association Rules), which combines the advantages of both associative classification and traditional rule-based classification. Instead of generating a large number of candidate rules as in associative classification, CLoPAR adopts a greedy algorithm to generate rules directly from training data. Moreover, CLoPAR generates and tests more rules than traditional rule-based classifiers to avoid missing important rules. To avoid overfitting, CLoPAR uses expected accuracy to evaluate each rule and uses the best k rules in prediction
Keywords :
associative processing; classification; data mining; C4.5; CLoPAR; FOIL; RIPPER; associative classification; classification accuracy; data mining; predictive association rules; rule evaluation measure; rule-based classification; Association rules; Data mining; Dynamic programming; Greedy algorithms; Intelligent systems; Testing; Training data; association rule; rule-based classification;