• DocumentCode
    2865832
  • Title

    Short Text Feature Extraction and Clustering for Web Topic Mining

  • Author

    He, Hui ; Chen, Bo ; Xu, Weiran ; Guo, Jun

  • Author_Institution
    Beijing Univ. of Posts & Telecommun., Beijing
  • fYear
    2007
  • fDate
    29-31 Oct. 2007
  • Firstpage
    382
  • Lastpage
    385
  • Abstract
    This paper is to introduce an algorithm to cluster Chinese short texts for mining web topics based on Chinese chunks. Aiming at the characteristics of Chinese short texts, the algorithm employs N-gram feature extraction to capture Chinese chunks from texts, which reflect the text semantic structure and character dependency. Then RPCL algorithm is applied to realizing text clustering with high precision, which doesn´t need know the exact number of clusters. Finally, the experiment results show that this approach can remarkably reduce the dimensionality and effectively improve the performance of Chinese short texts clustering than traditional methods.
  • Keywords
    Internet; data mining; feature extraction; pattern clustering; text analysis; Chinese short text feature clustering; N-gram feature extraction; RPCL algorithm; Web topic mining; character dependency; text semantic structure; Clustering algorithms; Data mining; Feature extraction; Frequency; Helium; Image segmentation; Internet; Knowledge engineering; Natural languages; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantics, Knowledge and Grid, Third International Conference on
  • Conference_Location
    Shan Xi
  • Print_ISBN
    0-7695-3007-9
  • Electronic_ISBN
    978-0-7695-3007-9
  • Type

    conf

  • DOI
    10.1109/SKG.2007.76
  • Filename
    4438575