DocumentCode :
3720769
Title :
Drift detection in data stream classification without fully labelled instances
Author :
Edwin Lughofer;Eva Weigl;Wolfgang Heidl;Christian Eitzinger;Thomas Radauer
Author_Institution :
Department of Knowledge-Based Mathematical Systems, Johannes Kepler University of Linz, Altenbergerstrasse 69, A-4040, Austria
fYear :
2015
Firstpage :
1
Lastpage :
8
Abstract :
Drift detection is an important issue in classification-based stream mining in order to be able to inform the operators in case of unintended changes in the system. Usually, current detection approaches rely on the assumption to have fully supervised labeled streams available, which is often a quite unrealistic scenario in on-line real-world applications. We propose two ways to improve economy and applicability of drift detection: 1.) a semi-supervised approach employing single-pass active learning filters for selecting the most interesting samples for supervising the performance of classifiers and 2.) a fully unsupervised approach based on the overlap degree of classifier´s output certainty distributions. Both variants rely on a modified version of the Page-Hinkley test, where a fading factor is introduced to outweigh older samples, making it more flexible to detect successive drift occurrences in a stream. The approaches are compared with the fully supervised variant (SoA) on two real-world on-line applications: the semi-supervised approach is able to detect three real-occurring drifts in these streams with an even lower than resp. the same delay as the supervised variant of about 200 (versus 300) resp. 70 samples, and this by requiring only 20% labelled samples.
Keywords :
"Fading","Electronic mail","Delays","Data models","Biological system modeling","Computational modeling","Feature extraction"
Publisher :
ieee
Conference_Titel :
Evolving and Adaptive Intelligent Systems (EAIS), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/EAIS.2015.7368802
Filename :
7368802
Link To Document :
بازگشت