DocumentCode :
264798
Title :
Categorizing Twitter users on the basis of their interests using Hadoop/Mahout platform
Author :
Jain, Eeti ; Jain, S.K.
Author_Institution :
Dept. of Comput. Eng., Nat. Inst. of Eng., Kurukshetra, India
fYear :
2014
fDate :
15-17 Dec. 2014
Firstpage :
1
Lastpage :
5
Abstract :
Traditional k-means algorithm has been used successfully to various problems but its application is restricted to small datasets. Online websites like twitter have large amount of data that has to be handled properly. So, there is a need of a platform that can perform faster data clustering which leds to the development of Mahout/Hadoop. Mahout is machine learning library approach to parallel clustering algorithm that run on hadoop in distributed manner. Mahout along with Hadoop proves to be the best option for clustering. In this work, we have categorized the twitter users on the basis of their interest patterns by implementing Mahout over Hadoop platform and performed experiments with its datasets. We have also studied the performance evaluation of k-means and fuzzy k-means and have compared their results to find out the better algorithm to work on this type of dataset.
Keywords :
distributed processing; fuzzy set theory; learning (artificial intelligence); pattern clustering; social networking (online); Hadoop-Mahout platform; Twitter user categorization; data clustering; fuzzy k-means; k-means algorithm; machine learning library approach; online Websites; parallel clustering algorithm; Clustering algorithms; Data mining; Density measurement; Euclidean distance; Libraries; Time measurement; Vectors; Clustering; Hadoop; Mahout; Parallel; Twitter;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Industrial and Information Systems (ICIIS), 2014 9th International Conference on
Conference_Location :
Gwalior
Print_ISBN :
978-1-4799-6499-4
Type :
conf
DOI :
10.1109/ICIINFS.2014.7036529
Filename :
7036529
Link To Document :
بازگشت