DocumentCode
1787790
Title
Using Mahout for clustering similar Twitter users: Performance evaluation of k-means and its comparison with fuzzy k-means
Author
Jain, Eeti ; Jain, S.K.
Author_Institution
Dept. of Comput. Eng., Nat. Inst. of Eng., Kurukshetra, India
fYear
2014
fDate
26-28 Sept. 2014
Firstpage
29
Lastpage
33
Abstract
Traditional k-means algorithm has been used successfully to various problems but its application is restricted to small datasets. Online websites like twitter have large amount of data that has to be handled properly. So, there is a need of a platform that can perform faster data clustering which leds to the development of Mahout/Hadoop. Mahout is machine learning library approach to parallel clustering algorithm that run on hadoop in distributed manner. Mahout along with Hadoop proves to be the best option for clustering. In this work, we implement mahout over hadoop platform and perform experiments with datasets from Twitter. In this paper, we have studied the performance evaluation of k-means and compare it with fuzzy k-means by grouping similar users based on their tweets from tweeter website.
Keywords
fuzzy set theory; learning (artificial intelligence); parallel processing; pattern clustering; social networking (online); Hadoop; Mahout; Twitter; fuzzy k-means clustering; k-means algorithm; machine learning library; online Web sites; parallel clustering algorithm; performance evaluation; Clustering algorithms; Convergence; Data mining; Finite element analysis; Internet; Libraries; Vectors; Clustering; Hadoop; Mahout; Parallel; Twitter;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and Communication Technology (ICCCT), 2014 International Conference on
Conference_Location
Allahabad
Print_ISBN
978-1-4799-6757-5
Type
conf
DOI
10.1109/ICCCT.2014.7001465
Filename
7001465
Link To Document