• DocumentCode
    2850832
  • Title

    Decision tree evolution using limited number of labeled data items from drifting data streams

  • Author

    Fan, Wei ; Huang, Yi-an ; Yu, Philip S.

  • Author_Institution
    IBM T. J. Watson Res., Hawthorne, NY, USA
  • fYear
    2004
  • fDate
    1-4 Nov. 2004
  • Firstpage
    379
  • Lastpage
    382
  • Abstract
    Most previously proposed mining methods on data streams make an unrealistic assumption that "labelled" data stream is readily available and can be mined at anytime. However, in most real-world problems, labelled data streams are rarely immediately available. Due to this reason, models are reconstructed only when labelled data become available periodically. This passive stream mining model has several drawbacks. We propose a concept of demand-driven active data mining. In active mining, the loss of the model is either continuously guessed without using any true class labels or estimated, whenever necessary, from a small number of instances whose actual class labels are verified by paying an affordable cost. When the estimated loss is more than a tolerable threshold, the model evolves by using a small number of instances with verified true class labels. Previous work on active mining concentrates on error guess and estimation. In this paper, we discuss several approaches on decision tree evolution.
  • Keywords
    data mining; decision trees; data stream mining; decision tree evolution; demand-driven active data mining; drifting data streams; labeled data; Change detection algorithms; Costs; Credit cards; Data mining; Data warehouses; Decision trees; Educational institutions; Engines; Estimation error; Production;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
  • Print_ISBN
    0-7695-2142-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2004.10026
  • Filename
    1410315