Title :
Evolving Big Data Stream Classification with MapReduce
Author :
Haque, Ashraful ; Parker, Brendon ; Khan, Latifur ; Thuraisingham, Bhavani
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA
fDate :
June 27 2014-July 2 2014
Abstract :
Big Data Stream mining has some inherent challenges which are not present in traditional data mining. Not only Big Data Stream receives large volume of data continuously, but also it may have different types of features. Moreover, the concepts and features tend to evolve throughout the stream. Traditional data mining techniques are not sufficient to address these challenges. In our current work, we have designed a multi-tiered ensemble based method HSMiner to address aforementioned challenges to label instances in an evolving Big Data Stream. However, this method requires building large number of AdaBoost ensembles for each of the numeric features after receiving each new data chunk which is very costly. Thus, HSMiner may face scalability issue in case of classifying Big Data Stream. To address this problem, we propose three approaches to build these large number of AdaBoost ensembles using MapReduce based parallelism. We compare each of these approaches from different aspects of design. We also empirically show that, these approaches are very useful for our base method to achieve significant scalability and speedup.
Keywords :
Big Data; data mining; learning (artificial intelligence); pattern classification; AdaBoost; Big Data; HSMiner; MapReduce; data mining; multitiered ensemble based method; stream classification; stream mining; Big data; Data mining; Distributed databases; Indexes; Parallel processing; Scalability; Sorting; Distributed Processing; Evolving Big Data Stream; MapReduce; Scalability;
Conference_Titel :
Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5062-1
DOI :
10.1109/CLOUD.2014.82