• DocumentCode
    1787790
  • Title

    Using Mahout for clustering similar Twitter users: Performance evaluation of k-means and its comparison with fuzzy k-means

  • Author

    Jain, Eeti ; Jain, S.K.

  • Author_Institution
    Dept. of Comput. Eng., Nat. Inst. of Eng., Kurukshetra, India
  • fYear
    2014
  • fDate
    26-28 Sept. 2014
  • Firstpage
    29
  • Lastpage
    33
  • Abstract
    Traditional k-means algorithm has been used successfully to various problems but its application is restricted to small datasets. Online websites like twitter have large amount of data that has to be handled properly. So, there is a need of a platform that can perform faster data clustering which leds to the development of Mahout/Hadoop. Mahout is machine learning library approach to parallel clustering algorithm that run on hadoop in distributed manner. Mahout along with Hadoop proves to be the best option for clustering. In this work, we implement mahout over hadoop platform and perform experiments with datasets from Twitter. In this paper, we have studied the performance evaluation of k-means and compare it with fuzzy k-means by grouping similar users based on their tweets from tweeter website.
  • Keywords
    fuzzy set theory; learning (artificial intelligence); parallel processing; pattern clustering; social networking (online); Hadoop; Mahout; Twitter; fuzzy k-means clustering; k-means algorithm; machine learning library; online Web sites; parallel clustering algorithm; performance evaluation; Clustering algorithms; Convergence; Data mining; Finite element analysis; Internet; Libraries; Vectors; Clustering; Hadoop; Mahout; Parallel; Twitter;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Communication Technology (ICCCT), 2014 International Conference on
  • Conference_Location
    Allahabad
  • Print_ISBN
    978-1-4799-6757-5
  • Type

    conf

  • DOI
    10.1109/ICCCT.2014.7001465
  • Filename
    7001465