Title :
Novel Class Detection and Feature via a Tiered Ensemble Approach for Stream Mining
Author :
Parker, Brendon ; Mustafa, Albara M. ; Khan, Latifur
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA
Abstract :
Static data mining assumptions with regard to features and labels often fail the streaming context. Features evolve, concepts drift, and novel classes are introduced. Therefore, any classification algorithm that intends to operate on streaming data must have mechanisms to mitigate the obsolescence of classifiers trained early in the stream. This is typically accomplished by either continually updating a monolithic model, or incrementally updating an ensemble. Traditional static data mining algorithms futile in a streaming context (and often in a distributed sensor network) due to their need to iterate over the entire data set locally. Our approach -- named HSMiner (Hierarchical Stream Miner) -- takes a hierarchical decomposition approach to the ensemble classifier concept. By breaking the classification problem into tiers, we can better prune the irrelevant features and counter individual classification error through weighted voting and boosting. In addition, the atomic decomposition of feature inputs enables straightforward mapping to distributing the ensemble among resources in the network. The implementation proves to be fast and very memory conservative, and we emulate a distributed environment via signal-linked threads. We examine the theoretical and empirical analysis of our approach, specifically examining trade-offs of three different novel class detection variations, and compare these results to a similar method using benchmark data sets.
Keywords :
data mining; distributed processing; pattern classification; HSMiner; benchmark data sets; class detection variations; classification algorithm; distributed environment; ensemble classifier concept; feature input atomic decomposition; hierarchical decomposition approach; hierarchical stream miner; monolithic model; signal-linked threads; static data mining assumptions; stream mining; streaming data; tiered ensemble approach; weighted boosting; weighted voting; Accuracy; Algorithm design and analysis; Classification algorithms; Context; Data mining; Heuristic algorithms; Training; concept drift; distributed stream mining; feature evolution; hierarchical ensembles; novel class detection;
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on
Conference_Location :
Athens
Print_ISBN :
978-1-4799-0227-9
DOI :
10.1109/ICTAI.2012.168