• DocumentCode
    702920
  • Title

    Document clustering algorithm using modified k-means

  • Author

    Agrawal, Ranjana ; Phatak, Madhura

  • Author_Institution
    Dept. of Computer Engineering, MAEER´S MIT, Pune, India
  • fYear
    2012
  • fDate
    19-20 Oct. 2012
  • Firstpage
    294
  • Lastpage
    296
  • Abstract
    Document clustering is the task of grouping a set of documents into clusters so that the documents in the same cluster are similar to each other than to those in other clusters. One of the applications of document clustering is in web search engine retrieval system to help the users find relevant information quicker, and allow them to focus their search in the appropriate direction. K-means is a commonly used algorithm for document clustering, but it has some disadvantages. The main limitations of K-means are: 1) The number of clusters K has to be given as input and 2) Based on the initializations it converges to different local minima. 3) It is slow and cannot be used for large number of data points.4) It cannot handle empty clusters. In this paper, we have developed a novel algorithm to eliminate all these basic drawbacks of K-means.
  • Keywords
    Cosine similarity; Document clustering; K-means; Threshold;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Communication and Computing (ARTCom2012), Fourth International Conference on Advances in Recent Technologies in
  • Conference_Location
    Bangalore, India
  • Type

    conf

  • DOI
    10.1049/cp.2012.2553
  • Filename
    7087842