مرکز منطقه ای اطلاع رساني علوم و فناوري - Verification and validation of MapReduce program model for parallel K-means algorithm on Hadoop cluster

DocumentCode :

678644

Title :

Verification and validation of MapReduce program model for parallel K-means algorithm on Hadoop cluster

Author :

Kumar, Ajit ; Kiran, M. ; Prathap, B.R.

Author_Institution :

Dept. of Comput. Sci. & Eng., Christ Univ., Bangalore, India

fYear :

2013

fDate :

4-6 July 2013

Firstpage :

Lastpage :

Abstract :

With the development of information technology, a large volume of data is growing and getting stored electronically. Thus, the data volumes processing by many applications will routinely cross the petabyte threshold range, in that case it would increase the computational requirements. Efficient processing algorithms and implementation techniques are the key in meeting the scalability and performance requirements in such scientific data analyses. So for the same here, we have p analyzed the various MapReduce Programs and a parallel clustering algorithm (PKMeans) on Hadoop cluster, using the Concept of MapReduce. Here, in this experiment we have verified and validated various MapReduce applications like wordcount, grep, terasort and parallel K-Means Clustering Algorithm. We have found that as the number of nodes increases the execution time decreases, but also some of the interesting cases has been found during the experiment and recorded the various performance change and drawn different performance graphs. This experiment is basically a research study of above MapReduce applications and also to verify and validate the MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster having four nodes.

Keywords :

learning (artificial intelligence); parallel algorithms; pattern clustering; program verification; Hadoop cluster; MapReduce program model validation; MapReduce program model verification; PKMeans; data volumes processing; grep; information technology; machine learning; parallel K-means algorithm; parallel clustering algorithm; parallel k-means clustering algorithm; performance requirements; processing algorithms; scalability requirements; scientific data analyses; terasort; wordcount; Algorithm design and analysis; Clustering algorithms; Computational modeling; Distributed databases; File systems; Partitioning algorithms; Unsupervised learning; Hadoop; Machine learning; MapReduce; grep; k-means; terasort; wordcount;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computing, Communications and Networking Technologies (ICCCNT),2013 Fourth International Conference on

Conference_Location :

Tiruchengode

Print_ISBN :

978-1-4799-3925-1

Type :

conf

DOI :

10.1109/ICCCNT.2013.6726852

Filename :

6726852

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=678644