DocumentCode :
1899732
Title :
Modified Fuzzy K-mean clustering using MapReduce in Hadoop and cloud
Author :
Garg, Dweepna ; Gohil, Parth ; Trivedi, Khushboo
Author_Institution :
Dept. of Comput. Sci. & Eng., Babaria Inst. of Technol., Vadodara, India
fYear :
2015
fDate :
5-7 March 2015
Firstpage :
1
Lastpage :
5
Abstract :
Apache Hadoop is an open source software framework which structures Big data (both structured and unstructured). It is nowadays one of the biggest motivator in market as data storage is inexpensive in it. The storage method of Hadoop uses a distributed file system which lets the user store large amount of data by simply adding more nodes to a Hadoop cluster. Clustering a large amount of data is a point of concern. MapReduce, a programming model used by Hadoop allows a parallelization technique by decomposing a larger problem involving large dataset to smaller portion of data and then executing it. A scalable machine learning library named as Mahout is an approach to clustering which runs on Hadoop. In this paper, the Hadoop multi-node cluster is formed using Amazon EC2. This paper focuses on Fuzzy k-mean clustering algorithm which is modified by centroid generation method using MapReduce in Hadoop. Experimental results depict a decrease in the number of iterations thereby leading to a decrease in the execution time when modification of Fuzzy K-mean clustering algorithm is done using Canopy generation in MapReduce in Hadoop.
Keywords :
Big Data; cloud computing; fuzzy set theory; learning (artificial intelligence); parallel programming; pattern clustering; public domain software; Amazon EC2; Apache Hadoop; Canopy generation; Hadoop multinode cluster; Mahout; MapReduce; big data; centroid generation method; cloud computing; data clustering; data storage; distributed file system; execution time; modified fuzzy K-mean clustering; open source software framework; parallelization technique; programming model; scalable machine learning library; Classification algorithms; Clustering algorithms; Random access memory; Cloud; Fuzzy K-mean clustering; HDFS; Hadoop; Mahout; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical, Computer and Communication Technologies (ICECCT), 2015 IEEE International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4799-6084-2
Type :
conf
DOI :
10.1109/ICECCT.2015.7226046
Filename :
7226046
Link To Document :
بازگشت