Title :
Decision Trees for Uncertain Data
Author :
Tsang, Smith ; Kao, B. ; Yip, Kevin Y. ; Ho, Wai-Shing ; Lee, Sau Dan
Author_Institution :
Dept. of Comput. Sci., Univ. of Hong Kong, Hong Kong
fDate :
March 29 2009-April 2 2009
Abstract :
Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information, which originates from measurement/quantisation errors, data staleness, multiple repeated measurements, etc. The value uncertainty is represented by multiple values forming a probability distribution function (pdf). We discover that the accuracy of a decision tree classifier can be much improved if the whole pdf, rather than a simple statistic, is taken into account. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Since processing pdf´s is computationally more costly, we propose a series of pruning techniques that can greatly improve the efficiency of the construction of decision trees.
Keywords :
data handling; decision trees; probability; uncertain systems; classical decision tree building algorithms; data tuples; decision tree classifier; probability distribution function; pruning techniques; uncertain data; uncertain information; value uncertainty; Buildings; Classification tree analysis; Clustering algorithms; Computer science; Data engineering; Decision trees; Probability distribution; Quantization; Statistical distributions; Testing; c4.5; classification; decision tree; uncertain data;
Conference_Titel :
Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-3422-0
Electronic_ISBN :
1084-4627
DOI :
10.1109/ICDE.2009.26