DocumentCode
1797823
Title
Using HDDT to avoid instances propagation in unbalanced and evolving data streams
Author
Dal Pozzolo, Andrea ; Johnson, R. ; Caelen, Olivier ; Waterschoot, Serge ; Chawla, Nitesh V. ; Bontempi, Gianluca
Author_Institution
Comput. Sci. Dept., Univ. Libre de Bruxelles, Brussels, Belgium
fYear
2014
fDate
6-11 July 2014
Firstpage
588
Lastpage
594
Abstract
Hellinger Distance Decision Trees [10] (HDDT) has been previously used for static datasets with skewed distributions. In unbalanced data streams, state-of-the-art techniques use instance propagation and standard decision trees (e.g. C4.5 [27]) to cope with the unbalanced problem. However it is not always possible to revisit/store old instances of a stream. In this paper we show how HDDT can be successfully applied in unbalanced and evolving stream data. Using HDDT allows us to remove instance propagations between batches with several benefits: i) improved predictive accuracy ii) speed iii) single-pass through the data. We use a Hellinger weighted ensemble of HDDTs to combat concept drift and increase accuracy of single classifiers. We test our framework on several streaming datasets with unbalanced classes and concept drift.
Keywords
data handling; decision trees; learning (artificial intelligence); C4.5; HDDT; Hellinger distance decision trees; Hellinger weighted ensemble; instance propagation; skewed distributions; static datasets; unbalanced data streams; Accuracy; Algorithm design and analysis; Computational modeling; Credit cards; Decision trees; Equations; Mathematical model; Concept drift; Data streams; Fraud detection; HDDT; Hellinger distance; Unbalanced data;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4799-6627-1
Type
conf
DOI
10.1109/IJCNN.2014.6889638
Filename
6889638
Link To Document