Using Mahout for clustering similar Twitter users: Performance evaluation of k-means and its comparison with fuzzy k-means

Author

Jain, Eeti ; Jain, S.K.

Author_Institution

Dept. of Comput. Eng., Nat. Inst. of Eng., Kurukshetra, India

fYear

2014

fDate

26-28 Sept. 2014

Firstpage

29

Lastpage

33

Abstract

Traditional k-means algorithm has been used successfully to various problems but its application is restricted to small datasets. Online websites like twitter have large amount of data that has to be handled properly. So, there is a need of a platform that can perform faster data clustering which leds to the development of Mahout/Hadoop. Mahout is machine learning library approach to parallel clustering algorithm that run on hadoop in distributed manner. Mahout along with Hadoop proves to be the best option for clustering. In this work, we implement mahout over hadoop platform and perform experiments with datasets from Twitter. In this paper, we have studied the performance evaluation of k-means and compare it with fuzzy k-means by grouping similar users based on their tweets from tweeter website.

Keywords

fuzzy set theory; learning (artificial intelligence); parallel processing; pattern clustering; social networking (online); Hadoop; Mahout; Twitter; fuzzy k-means clustering; k-means algorithm; machine learning library; online Web sites; parallel clustering algorithm; performance evaluation; Clustering algorithms; Convergence; Data mining; Finite element analysis; Internet; Libraries; Vectors; Clustering; Hadoop; Mahout; Parallel; Twitter;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer and Communication Technology (ICCCT), 2014 International Conference on

Conference_Location

Allahabad

Print_ISBN

978-1-4799-6757-5

Type

conf

DOI

10.1109/ICCCT.2014.7001465

Filename

7001465