DocumentCode :
172926
Title :
Evolving Big Data Stream Classification with MapReduce
Author :
Haque, Ashraful ; Parker, Brendon ; Khan, Latifur ; Thuraisingham, Bhavani
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
570
Lastpage :
577
Abstract :
Big Data Stream mining has some inherent challenges which are not present in traditional data mining. Not only Big Data Stream receives large volume of data continuously, but also it may have different types of features. Moreover, the concepts and features tend to evolve throughout the stream. Traditional data mining techniques are not sufficient to address these challenges. In our current work, we have designed a multi-tiered ensemble based method HSMiner to address aforementioned challenges to label instances in an evolving Big Data Stream. However, this method requires building large number of AdaBoost ensembles for each of the numeric features after receiving each new data chunk which is very costly. Thus, HSMiner may face scalability issue in case of classifying Big Data Stream. To address this problem, we propose three approaches to build these large number of AdaBoost ensembles using MapReduce based parallelism. We compare each of these approaches from different aspects of design. We also empirically show that, these approaches are very useful for our base method to achieve significant scalability and speedup.
Keywords :
Big Data; data mining; learning (artificial intelligence); pattern classification; AdaBoost; Big Data; HSMiner; MapReduce; data mining; multitiered ensemble based method; stream classification; stream mining; Big data; Data mining; Distributed databases; Indexes; Parallel processing; Scalability; Sorting; Distributed Processing; Evolving Big Data Stream; MapReduce; Scalability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5062-1
Type :
conf
DOI :
10.1109/CLOUD.2014.82
Filename :
6973788
Link To Document :
بازگشت