Title :
Incremental Ensemble Classifier Addressing Non-stationary Fast Data Streams
Author :
Parker, Brandon S. ; Khan, Latifur ; Bifet, Albert
Author_Institution :
Univ. of Texas at Dallas, Richardson, TX, USA
Abstract :
Classification of data points in a data stream is a fundamentally different set of challenges than data mining on static data. While streaming data is often placed into the context of "Big Data" (or more specifically "Fast Data") wherein one-pass algorithms are used, true data streams offer additional hurdles due to their dynamic, evolving, and non-stationary nature. During the stream, the available labels (or concepts) often change, and a concept\´s definition in the feature space can also evolve (or drift) over time. The core issue is that the hidden generative function of the data is not a constant function, but rather evolves over time. This is known as a non-stationary distribution. In this paper, we describe a new approach to using ensembles for stream classification. While the core method is straightforward, it is specifically designed to adapt quickly with very little overhead to the dynamic and evolving nature of data streams generated from non-stationary functions. Our method, M3, is based on a weighted majority ensemble of heterogeneous model types where model weights are updated on-line using Reinforcement Learning techniques. We compare our method with current leading algorithms as implemented in the Massive Online Analysis (MOA) framework using UCI benchmark and synthetic stream generator data sets, and find that our method shows particularly strong gain over the baseline method when ground truth is of limited availability to the classifiers.
Keywords :
Big Data; data mining; learning (artificial intelligence); pattern classification; Big Data; M3 method; MOA framework; UCI benchmark; data generative function; data mining; data point classification; incremental ensemble classifier; massive online analysis framework; nonstationary distribution; nonstationary fast data streams; one-pass algorithms; reinforcement learning techniques; static data; stream classification; synthetic stream generator data sets; weighted majority ensemble; Accuracy; Data mining; Equations; Heuristic algorithms; Prediction algorithms; Training; Training data; Big Data; Fast Data; Stream mining; classifier; non-stationary distribution;
Conference_Titel :
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4275-6
DOI :
10.1109/ICDMW.2014.116