• DocumentCode
    2357974
  • Title

    Design and Implementation of Chinese Text Clustering System

  • Author

    Tan, Ying ; Huang, Lan ; Qi, Hong ; Zhai, Yandong

  • Author_Institution
    Coll. of Comput. Sci. & Technol., Jilin Univ., Changchun, China
  • fYear
    2009
  • fDate
    25-27 Aug. 2009
  • Firstpage
    1136
  • Lastpage
    1140
  • Abstract
    Clustering technology is the core technology of text mining. Through text clustering, a large number of text messages can be divided into several meaningful classes or clusters. According to the features of Chinese documents, this paper designs and implements the Chinese Text Clustering System to perform automatic clustering of Chinese documents. Firstly, this system will carry out Chinese word automatic segmentation for the input Chinese document sets by using reverse maximum matching method. Secondly, further text preprocessing is performed. Finally the K-means clustering algorithm is used to obtain the clustering results. The prototype system can also be used in clustering Chinese Web pages to search for user´s interest model by search engines, which will improve the efficiency of searching the target content.
  • Keywords
    Internet; data mining; pattern clustering; search engines; text analysis; Chinese Web pages clustering; Chinese text clustering system; Chinese word automatic segmentation; K-means clustering algorithm; reverse maximum matching; search engines; text mining; text preprocessing; Clustering algorithms; Computer science; Data mining; Educational institutions; Electronic mail; Particle separators; Prototypes; Search engines; Text mining; Web pages; Chinese text clustering; Chinese word segmentation; K-means algorithm; reverse maximum matching; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    INC, IMS and IDC, 2009. NCM '09. Fifth International Joint Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4244-5209-5
  • Electronic_ISBN
    978-0-7695-3769-6
  • Type

    conf

  • DOI
    10.1109/NCM.2009.234
  • Filename
    5331328