DocumentCode
2850832
Title
Decision tree evolution using limited number of labeled data items from drifting data streams
Author
Fan, Wei ; Huang, Yi-an ; Yu, Philip S.
Author_Institution
IBM T. J. Watson Res., Hawthorne, NY, USA
fYear
2004
fDate
1-4 Nov. 2004
Firstpage
379
Lastpage
382
Abstract
Most previously proposed mining methods on data streams make an unrealistic assumption that "labelled" data stream is readily available and can be mined at anytime. However, in most real-world problems, labelled data streams are rarely immediately available. Due to this reason, models are reconstructed only when labelled data become available periodically. This passive stream mining model has several drawbacks. We propose a concept of demand-driven active data mining. In active mining, the loss of the model is either continuously guessed without using any true class labels or estimated, whenever necessary, from a small number of instances whose actual class labels are verified by paying an affordable cost. When the estimated loss is more than a tolerable threshold, the model evolves by using a small number of instances with verified true class labels. Previous work on active mining concentrates on error guess and estimation. In this paper, we discuss several approaches on decision tree evolution.
Keywords
data mining; decision trees; data stream mining; decision tree evolution; demand-driven active data mining; drifting data streams; labeled data; Change detection algorithms; Costs; Credit cards; Data mining; Data warehouses; Decision trees; Educational institutions; Engines; Estimation error; Production;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
Print_ISBN
0-7695-2142-8
Type
conf
DOI
10.1109/ICDM.2004.10026
Filename
1410315
Link To Document