Title :
Parallel Processing of Big Data Using Power Iteration Clustering over MapReduce
Author :
Jayalatchumy, D. ; Thambidurai, P. ; Vasumathi, A. Alamaelu
Author_Institution :
PKIET, Karaikal, India
fDate :
Feb. 27 2014-March 1 2014
Abstract :
Extracting useful information from dataset measuring in gigabytes and tetra bytes is a real challenge for data miners. Clustering algorithm have the problem of scalability while dealing with big data. The problem can be handled using parallel algorithm by executing them along with input data on high performance computer. The problem with graph based application requires much time for computation. PIC is an algorithm that is simple, fast, relatively scalable which requires the data and its associated matrix to fit in memory and this becomes infeasible for big data applications. Scalability has been increased using p-PIC and this paper focus on exploring different parallelization strategies for minimizing and compelling communication cost. The algorithm works on with a parallel framework MapReduce. P-PIC algorithm deals with Hadoop cloud a parallel store and computing platform implementing p-PIC using Hadoop framework.
Keywords :
Big Data; cloud computing; data mining; parallel algorithms; parallel programming; pattern clustering; Big Data applications; Hadoop cloud; MapReduce; communication cost minimization; computing platform; data miners; graph based application; high performance computer; information extraction; p-PIC; p-PIC algorithm; parallel algorithm; parallel framework; parallel processing; parallel store; parallelization strategies; power iteration clustering; scalability; Algorithm design and analysis; Clustering algorithms; Data handling; Data storage systems; Fault tolerance; Information management; Scalability; Fault tolerance; GBC; Hadoop; p-PIC;
Conference_Titel :
Computing and Communication Technologies (WCCCT), 2014 World Congress on
Conference_Location :
Trichirappalli
Print_ISBN :
978-1-4799-2876-7
DOI :
10.1109/WCCCT.2014.16