• DocumentCode
    506860
  • Title

    Chinese Web Comments Clustering Analysis with a Two-phase Method

  • Author

    Wang, Yexin ; Zhao, Li ; Zhang, Yan

  • Author_Institution
    Dept. of Machine Intell., Peking Univ., Beijing, China
  • Volume
    1
  • fYear
    2009
  • fDate
    14-16 Aug. 2009
  • Firstpage
    430
  • Lastpage
    434
  • Abstract
    Usually a meaningful Web topic has tens of thousands of comments, especially the hot topics. It is valuable if we congregate the comments into clusters and find out the mainstreams. However, such analysis has two difficulties. First, there is no explicit link relationship between Web comments just like those among Web pages or Blog comments. The other problem is, most of the comments are very short, even one or two words. Therefore the traditional clustering algorithms such as CURE and DBSCAN cannot work if applied to these comments directly. In this paper we propose a two-phase algorithm, which will first combine the highly synonymous comments into a longer one based on a connected graph model, and then apply the improved clustering methods to the new collections. Experimental results on two real data sets show that our algorithm performs better than traditional algorithms such as CURE.
  • Keywords
    Internet; graph theory; information analysis; pattern clustering; Chinese Web comments clustering analysis; Web comment; blog comment; connected graph model; synonymous comments combination; two-phase clustering algorithm; Clustering algorithms; Clustering methods; Fuzzy systems; Information services; Internet; Machine intelligence; Web mining; Web pages; Web sites; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-0-7695-3735-1
  • Type

    conf

  • DOI
    10.1109/FSKD.2009.560
  • Filename
    5358557