DocumentCode :
3184295
Title :
On Classification Confidence and Ranking Using Decision Trees
Author :
Tóth, Norbert ; Pataki, Béla
Author_Institution :
Budapest Univ. of Technol. & Econ., Budapest
fYear :
2007
fDate :
June 29 2007-July 2 2007
Firstpage :
133
Lastpage :
138
Abstract :
In this paper a novel method is proposed that extends the decision tree framework, allowing standard decision tree classifiers to provide a unique certainty value for every input sample they classify. This value is calculated for every input sample individually and represents the classifier\´s certainty in the classification.The algorithm consists of three main parts. 1) The input sample\´s distance is calculated to the decision boundary. This step involves solving a set of linearly constrained quadratic programs. The distance calculating procedure also allows the use of different distance metrics, where the minimal distance projection is not necessarily invariant. 2) Kernel density estimation is done on the distance values of a training set to obtain conditional true and false classification profiles. 3) Using the conditional densities Bayesian computation is applied to calculate the conditional true classification probability, which we use as classification certainty. The algorithm proposed in this paper is not limited to axis parallel trees, it can be applied to any kind of decision tree where the decisions are hyperplanes (not necessarily parallel to the axes). The algorithm does not alter the tree structure, the growth process is not modified. It only uses the training data to obtain true and false classification profiles conditional to distance from the decision boundary. The usability of the method is demonstrated on two examples. One artificial two dimensional dataset, and one real world nine dimensional dataset. It is shown that the method can significantly increase the classification accuracy (in the cost of rejecting a certain number of samples, saying their classification would be too "risky"). It is also demonstrated that the classification certainty value can be effectively used for ranking purposes.
Keywords :
Bayes methods; decision trees; quadratic programming; Bayesian computation; artificial two dimensional dataset; axis parallel trees; classification certainty; classification confidence; classification profiles; conditional densities; conditional true classification probability; decision trees; distance metrics; kernel density estimation; linearly constrained quadratic programs; real world nine dimensional dataset; Bayesian methods; Classification algorithms; Classification tree analysis; Costs; Decision trees; Kernel; Probability; Training data; Tree data structures; Usability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Engineering Systems, 2007. INES 2007. 11th International Conference on
Conference_Location :
Budapest
Print_ISBN :
1-4244-1147-5
Electronic_ISBN :
1-4244-1148-3
Type :
conf
DOI :
10.1109/INES.2007.4283686
Filename :
4283686
Link To Document :
بازگشت