• DocumentCode
    1797823
  • Title

    Using HDDT to avoid instances propagation in unbalanced and evolving data streams

  • Author

    Dal Pozzolo, Andrea ; Johnson, R. ; Caelen, Olivier ; Waterschoot, Serge ; Chawla, Nitesh V. ; Bontempi, Gianluca

  • Author_Institution
    Comput. Sci. Dept., Univ. Libre de Bruxelles, Brussels, Belgium
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    588
  • Lastpage
    594
  • Abstract
    Hellinger Distance Decision Trees [10] (HDDT) has been previously used for static datasets with skewed distributions. In unbalanced data streams, state-of-the-art techniques use instance propagation and standard decision trees (e.g. C4.5 [27]) to cope with the unbalanced problem. However it is not always possible to revisit/store old instances of a stream. In this paper we show how HDDT can be successfully applied in unbalanced and evolving stream data. Using HDDT allows us to remove instance propagations between batches with several benefits: i) improved predictive accuracy ii) speed iii) single-pass through the data. We use a Hellinger weighted ensemble of HDDTs to combat concept drift and increase accuracy of single classifiers. We test our framework on several streaming datasets with unbalanced classes and concept drift.
  • Keywords
    data handling; decision trees; learning (artificial intelligence); C4.5; HDDT; Hellinger distance decision trees; Hellinger weighted ensemble; instance propagation; skewed distributions; static datasets; unbalanced data streams; Accuracy; Algorithm design and analysis; Computational modeling; Credit cards; Decision trees; Equations; Mathematical model; Concept drift; Data streams; Fraud detection; HDDT; Hellinger distance; Unbalanced data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889638
  • Filename
    6889638