Title of article :
Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data
Author/Authors :
Mantas، نويسنده , , Carlos J. and Abellلn، نويسنده , , Joaquيn، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Abstract :
In the area of classification, C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the over-fitting. A modification of C4.5, called Credal-C4.5, is presented in this paper. This new procedure uses a mathematical theory based on imprecise probabilities, and uncertainty measures. In this way, Credal-C4.5 estimates the probabilities of the features and the class variable by using imprecise probabilities. Besides it uses a new split criterion, called Imprecise Information Gain Ratio, applying uncertainty measures on convex sets of probability distributions (credal sets). In this manner, Credal-C4.5 builds trees for solving classification problems assuming that the training set is not fully reliable. We carried out several experimental studies comparing this new procedure with other ones and we obtain the following principal conclusion: in domains of class noise, Credal-C4.5 obtains smaller trees and better performance than classic C4.5.
Keywords :
Imprecise Dirichlet model , Uncertainty measures , Imprecise probabilities , C4.5 algorithm , Noisy data , Credal Decision Trees
Journal title :
Expert Systems with Applications
Journal title :
Expert Systems with Applications