Title :
A novel algorithm for distributed data mining in HDFS
Author :
Natarajan, Sriraam ; Sehar, Sountharrajan
Author_Institution :
Comput. Sci. & Eng., Bannari Amman Inst. of Technol., Sathyamangalam, India
Abstract :
Evolution of Cloud computing technology over the Internet and drastic increase in data size and intensity (Big Data) persuade Map Reduce and distributed file systems like HDFS (Hadoop Distributed File System) as the paradigm of choice for distributed data mining applications. With size and complexity of data growing every day, distributed data mining algorithms has to be designed to handle Big Data in compatible with the latest technology available on distributed computing. Earlier research activities in data mining comprises, focus on increasing the performance for single task computing algorithms rather than distributed computing which would provide more fast and scalable environment for processing large datasets. Existing algorithms in the field of distributed frequent pattern data mining includes, TPFP-tree, BTP tree, and CARM. But these algorithms suffer from unbalanced workload management among its clusters. In this paper, a novel algorithm, named Association rule mining based on Hadoop (ARMH) has been proposed to utilize the clusters effectively and mining frequent pattern from large databases. Hadoop distributed framework helps in managing the workload among the clusters. The ARMH was implemented in hadoop using Map Reduce programming paradigm.
Keywords :
Big Data; Java; computational complexity; data mining; distributed databases; network operating systems; parallel algorithms; parallel programming; pattern clustering; ARMH algorithm; BTP tree algorithm; Big Data handling; CARM algorithm; HDFS; Hadoop distributed file system; Hadoop distributed framework; Internet; MapReduce programming; TPFP-tree algorithm; association rule mining based on Hadoop algorithm; cloud computing; data complexity; data intensity; data size; distributed computing; distributed data mining algorithms; distributed frequent pattern data mining; single task computing algorithms; unbalanced workload management; Clustering algorithms; Databases; Data Mining; Distributed Computing; Hadoop; Map Reduce;
Conference_Titel :
Advanced Computing (ICoAC), 2013 Fifth International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4799-3447-8
DOI :
10.1109/ICoAC.2013.6921933