Title :
Using Mahout for clustering similar Twitter users: Performance evaluation of k-means and its comparison with fuzzy k-means
Author :
Jain, Eeti ; Jain, S.K.
Author_Institution :
Dept. of Comput. Eng., Nat. Inst. of Eng., Kurukshetra, India
Abstract :
Traditional k-means algorithm has been used successfully to various problems but its application is restricted to small datasets. Online websites like twitter have large amount of data that has to be handled properly. So, there is a need of a platform that can perform faster data clustering which leds to the development of Mahout/Hadoop. Mahout is machine learning library approach to parallel clustering algorithm that run on hadoop in distributed manner. Mahout along with Hadoop proves to be the best option for clustering. In this work, we implement mahout over hadoop platform and perform experiments with datasets from Twitter. In this paper, we have studied the performance evaluation of k-means and compare it with fuzzy k-means by grouping similar users based on their tweets from tweeter website.
Keywords :
fuzzy set theory; learning (artificial intelligence); parallel processing; pattern clustering; social networking (online); Hadoop; Mahout; Twitter; fuzzy k-means clustering; k-means algorithm; machine learning library; online Web sites; parallel clustering algorithm; performance evaluation; Clustering algorithms; Convergence; Data mining; Finite element analysis; Internet; Libraries; Vectors; Clustering; Hadoop; Mahout; Parallel; Twitter;
Conference_Titel :
Computer and Communication Technology (ICCCT), 2014 International Conference on
Conference_Location :
Allahabad
Print_ISBN :
978-1-4799-6757-5
DOI :
10.1109/ICCCT.2014.7001465