Title :
An extended Chi2 algorithm for discretization of real value attributes
Author :
Su, Chao-Ton ; Hsu, Jyh-Hwa
Author_Institution :
Dept. of Ind. Eng. & Eng. Manage., Nat. Tsing Hua Univ., Hsinchu, Taiwan
fDate :
3/1/2005 12:00:00 AM
Abstract :
The variable precision rough sets (VPRS) model is a powerful tool for data mining, as it has been widely applied to acquire knowledge. Despite its diverse applications in many domains, the VPRS model unfortunately cannot be applied to real-world classification tasks involving continuous attributes. This requires a discretization method to preprocess the data. Discretization is an effective technique to deal with continuous attributes for data mining, especially for the classification problem. The modified Chi2 algorithm is one of the modifications to the Chi2 algorithm, replacing the inconsistency check in the Chi2 algorithm by using the quality of approximation, coined from the rough sets theory (RST), in which it takes into account the effect of degrees of freedom. However, the classification with a controlled degree of uncertainty, or a misclassification error, is outside the realm of RST. This algorithm also ignores the effect of variance in the two merged intervals. In this study, we propose a new algorithm, named the extended Chi2 algorithm, to overcome these two drawbacks. By running the software of See5, our proposed algorithm possesses a better performance than the original and modified Chi2 algorithms.
Keywords :
computational complexity; data integrity; data mining; learning (artificial intelligence); pattern classification; rough set theory; statistical analysis; data mining; modified Chi2 algorithm; real value attributes discretization; variable precision rough sets model; Approximation algorithms; Chaos; Classification algorithms; Data mining; Entropy; Error correction; Rough sets; Software algorithms; Software performance; Uncertainty;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2005.39