DocumentCode :
678644
Title :
Verification and validation of MapReduce program model for parallel K-means algorithm on Hadoop cluster
Author :
Kumar, Ajit ; Kiran, M. ; Prathap, B.R.
Author_Institution :
Dept. of Comput. Sci. & Eng., Christ Univ., Bangalore, India
fYear :
2013
fDate :
4-6 July 2013
Firstpage :
1
Lastpage :
8
Abstract :
With the development of information technology, a large volume of data is growing and getting stored electronically. Thus, the data volumes processing by many applications will routinely cross the petabyte threshold range, in that case it would increase the computational requirements. Efficient processing algorithms and implementation techniques are the key in meeting the scalability and performance requirements in such scientific data analyses. So for the same here, we have p analyzed the various MapReduce Programs and a parallel clustering algorithm (PKMeans) on Hadoop cluster, using the Concept of MapReduce. Here, in this experiment we have verified and validated various MapReduce applications like wordcount, grep, terasort and parallel K-Means Clustering Algorithm. We have found that as the number of nodes increases the execution time decreases, but also some of the interesting cases has been found during the experiment and recorded the various performance change and drawn different performance graphs. This experiment is basically a research study of above MapReduce applications and also to verify and validate the MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster having four nodes.
Keywords :
learning (artificial intelligence); parallel algorithms; pattern clustering; program verification; Hadoop cluster; MapReduce program model validation; MapReduce program model verification; PKMeans; data volumes processing; grep; information technology; machine learning; parallel K-means algorithm; parallel clustering algorithm; parallel k-means clustering algorithm; performance requirements; processing algorithms; scalability requirements; scientific data analyses; terasort; wordcount; Algorithm design and analysis; Clustering algorithms; Computational modeling; Distributed databases; File systems; Partitioning algorithms; Unsupervised learning; Hadoop; Machine learning; MapReduce; grep; k-means; terasort; wordcount;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing, Communications and Networking Technologies (ICCCNT),2013 Fourth International Conference on
Conference_Location :
Tiruchengode
Print_ISBN :
978-1-4799-3925-1
Type :
conf
DOI :
10.1109/ICCCNT.2013.6726852
Filename :
6726852
Link To Document :
بازگشت