DocumentCode :
2893919
Title :
MapReduce Design of K-Means Clustering Algorithm
Author :
Anchalia, Prajesh P. ; Koundinya, Anjan K. ; Srinath, N.K.
Author_Institution :
Dept. of CSE, R.V. Coll. of Eng., Bangalore, India
fYear :
2013
fDate :
24-26 June 2013
Firstpage :
1
Lastpage :
5
Abstract :
Cluster is a collection of data members having similar characteristics. The process of establishing a relation or deriving information from raw data by performing some operations on the data set like clustering is known as data mining. Data collected in practical scenarios is more often than not completely random and unstructured. Hence, there is always a need for analysis of unstructured data sets to derive meaningful information. This is where unsupervised algorithms come in to picture to process unstructured or even semi structured data sets by resultant. K-Means Clustering is one such technique used to provide a structure to unstructured data so that valuable information can be extracted. This paper discusses the implementation of the K-Means Clustering Algorithm over a distributed environment using ApacheTM Hadoop. The key to the implementation of the K-Means Algorithm is the design of the Mapper and Reducer routines which has been discussed in the later part of the paper. The steps involved in the execution of the K-Means Algorithm has also been described in this paper based on a small scale implementation of the K-Means Clustering Algorithm on an experimental setup to serve as a guide for practical implementations.
Keywords :
data mining; data structures; distributed processing; pattern clustering; Apache Hadoop; MapReduce design; data members; data mining; k-means clustering algorithm; mapper routines; reducer routines; semistructured data sets; unstructured data sets; Algorithm design and analysis; Clustering algorithms; Data mining; Distributed databases; Google; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Applications (ICISA), 2013 International Conference on
Conference_Location :
Suwon
Print_ISBN :
978-1-4799-0602-4
Type :
conf
DOI :
10.1109/ICISA.2013.6579448
Filename :
6579448
Link To Document :
بازگشت