• DocumentCode
    3453854
  • Title

    Research on Text Clustering Algorithms

  • Author

    Li Qun ; Huang Xinyuan

  • Author_Institution
    Sch. of Inf. Sci. &Technol., Beijing Forestry Univ., Beijing, China
  • fYear
    2010
  • fDate
    27-28 Nov. 2010
  • Firstpage
    1
  • Lastpage
    3
  • Abstract
    Web documents are enormous. Text clustering is to place the documents with the most words in common into the same cluster. Thus the web search engine can structure the large result set for a certain quest. In this article, we study three kinds of clustering algorithms, prototype based, density based and hierarchical clustering algorithms. We compare two typical algorithms, K-medoids and DBSCAN. The results show that the K-medoids is sensitive to the initial center point and the DBSCAN has a better performance.
  • Keywords
    pattern clustering; query processing; search engines; text analysis; DBSCAN; K-medoids; Web document; Web search engine; density based clustering; hierarchical clustering algorithms; prototype based clustering; text clustering algorithm; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Films; Forestry; Noise; Partitioning algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Technology and Applications (DBTA), 2010 2nd International Workshop on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-6975-8
  • Electronic_ISBN
    978-1-4244-6977-2
  • Type

    conf

  • DOI
    10.1109/DBTA.2010.5659055
  • Filename
    5659055