DocumentCode :
2711167
Title :
SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining
Author :
Chen, Sheng ; He, Haibo
Author_Institution :
Dept. of Electr. & Comput. Eng., Stevens Inst. of Technol., Hoboken, NJ, USA
fYear :
2009
fDate :
14-19 June 2009
Firstpage :
522
Lastpage :
529
Abstract :
Recent years have witnessed an incredibly increasing interest in the topic of stream data mining. Despite the great success having been achieved, current approaches generally assume that the class distribution of the stream data is relatively balanced. However, in applications such as network intrusion detection, credit fraud detection, spam classification, and many others, the class distribution is mostly imbalanced and the cost for misclassifying a minority example is very expensive. Concept drifts is an unavoidable issue for stream data mining research, which is even more difficult to handle when the classifier has to learn from an imbalanced data stream whose target concept keeps drifting all the time. In this article, we propose a selectively recursive approach (SERA) to deal with the problem of learning from nonstationary imbalanced data streams. By selectively absorbing the previously received minority examples into the current training data chunk and potentially assigning the sampling probabilities proportionally to the majority and minority examples, SERA can alleviate the difficulty confronted by the conventional stream data mining methods when they have to learn from the nonstationary imbalanced data streams. Experiments performed on the synthetic datasets show that compared to the existing approaches, our approach is competitive in the general assessment metrics and is capable of significantly performance improvement in predicting minority instances.
Keywords :
data mining; learning (artificial intelligence); probability; sampling methods; SERA; class distribution; concept drift; current training data chunk; imbalanced data stream; nonstationary imbalanced stream data mining; sampling probability; selectively recursive approach; Costs; Data mining; Helium; Intrusion detection; Learning systems; Neural networks; Predictive models; Sampling methods; Streaming media; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2009. IJCNN 2009. International Joint Conference on
Conference_Location :
Atlanta, GA
ISSN :
1098-7576
Print_ISBN :
978-1-4244-3548-7
Electronic_ISBN :
1098-7576
Type :
conf
DOI :
10.1109/IJCNN.2009.5178874
Filename :
5178874
Link To Document :
بازگشت