DocumentCode :
3602807
Title :
Evaluation of Multiclass Novelty Detection Algorithms for Data Streams
Author :
Ribeiro de Faria, Elaine ; Ribeiro Goncalves, Isabel ; Gama, Joao ; Carlos Ponce de Leon Ferreira Carvalho, Andre
Author_Institution :
Comput. Sch., Fed. Univ. of Uberlandia, Uberlandia, Brazil
Volume :
27
Issue :
11
fYear :
2015
Firstpage :
2961
Lastpage :
2973
Abstract :
Data stream mining is an emergent research area that investigates knowledge extraction from large amounts of continuously generated data, produced by non-stationary distribution. Novelty detection, the ability to identify new or previously unknown situations, is a useful ability for learning systems, especially when dealing with data streams, where concepts may appear, disappear, or evolve overtime. There are several studies currently investigating the application of novelty detection techniques in data streams. However, there is no consensus regarding how to evaluate the performance of these techniques. In this study, we propose a new evaluation methodology for multiclass novelty detection in data streams able to deal with: i) unsupervised learning, which generates novelty patterns without an association with the true classes, where one class may be composed of a novelty set, ii) confusion matrix that increases overtime, iii) confusion matrix with a column representing unknown examples, i.e., those not explained by the model, and iv) representation of the evaluation measures overtime. We propose a new methodology to associate the novelty patterns detected by the algorithm, in an unsupervised fashion, with the true classes. Finally, we evaluate the performance of the proposed methodology through the use of known novelty detection algorithms with artificial and real data sets.
Keywords :
data mining; matrix algebra; unsupervised learning; artificial data sets; confusion matrix; data stream mining; evaluation measures; evaluation methodology; knowledge extraction; learning systems; multiclass novelty detection algorithms; nonstationary distribution; novelty patterns; real data sets; true classes; unsupervised learning; Context; Data mining; Decision support systems; Electronic mail; Mathematical model; Measurement uncertainty; Time measurement; Evaluation methodologies; data streams; evaluation methodology; novelty detection;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2015.2441713
Filename :
7118190
Link To Document :
بازگشت