DocumentCode :
1797823
Title :
Using HDDT to avoid instances propagation in unbalanced and evolving data streams
Author :
Dal Pozzolo, Andrea ; Johnson, R. ; Caelen, Olivier ; Waterschoot, Serge ; Chawla, Nitesh V. ; Bontempi, Gianluca
Author_Institution :
Comput. Sci. Dept., Univ. Libre de Bruxelles, Brussels, Belgium
fYear :
2014
fDate :
6-11 July 2014
Firstpage :
588
Lastpage :
594
Abstract :
Hellinger Distance Decision Trees [10] (HDDT) has been previously used for static datasets with skewed distributions. In unbalanced data streams, state-of-the-art techniques use instance propagation and standard decision trees (e.g. C4.5 [27]) to cope with the unbalanced problem. However it is not always possible to revisit/store old instances of a stream. In this paper we show how HDDT can be successfully applied in unbalanced and evolving stream data. Using HDDT allows us to remove instance propagations between batches with several benefits: i) improved predictive accuracy ii) speed iii) single-pass through the data. We use a Hellinger weighted ensemble of HDDTs to combat concept drift and increase accuracy of single classifiers. We test our framework on several streaming datasets with unbalanced classes and concept drift.
Keywords :
data handling; decision trees; learning (artificial intelligence); C4.5; HDDT; Hellinger distance decision trees; Hellinger weighted ensemble; instance propagation; skewed distributions; static datasets; unbalanced data streams; Accuracy; Algorithm design and analysis; Computational modeling; Credit cards; Decision trees; Equations; Mathematical model; Concept drift; Data streams; Fraud detection; HDDT; Hellinger distance; Unbalanced data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
Type :
conf
DOI :
10.1109/IJCNN.2014.6889638
Filename :
6889638
Link To Document :
بازگشت