DocumentCode :
2850832
Title :
Decision tree evolution using limited number of labeled data items from drifting data streams
Author :
Fan, Wei ; Huang, Yi-an ; Yu, Philip S.
Author_Institution :
IBM T. J. Watson Res., Hawthorne, NY, USA
fYear :
2004
fDate :
1-4 Nov. 2004
Firstpage :
379
Lastpage :
382
Abstract :
Most previously proposed mining methods on data streams make an unrealistic assumption that "labelled" data stream is readily available and can be mined at anytime. However, in most real-world problems, labelled data streams are rarely immediately available. Due to this reason, models are reconstructed only when labelled data become available periodically. This passive stream mining model has several drawbacks. We propose a concept of demand-driven active data mining. In active mining, the loss of the model is either continuously guessed without using any true class labels or estimated, whenever necessary, from a small number of instances whose actual class labels are verified by paying an affordable cost. When the estimated loss is more than a tolerable threshold, the model evolves by using a small number of instances with verified true class labels. Previous work on active mining concentrates on error guess and estimation. In this paper, we discuss several approaches on decision tree evolution.
Keywords :
data mining; decision trees; data stream mining; decision tree evolution; demand-driven active data mining; drifting data streams; labeled data; Change detection algorithms; Costs; Credit cards; Data mining; Data warehouses; Decision trees; Educational institutions; Engines; Estimation error; Production;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
Print_ISBN :
0-7695-2142-8
Type :
conf
DOI :
10.1109/ICDM.2004.10026
Filename :
1410315
Link To Document :
بازگشت