• DocumentCode
    3278720
  • Title

    Clustering web documents based on Multiclass spectral clustering

  • Author

    He, Xing ; Wang, Jia-bing ; Zhang, Zhong-xian ; Cai, Yi-rong

  • Author_Institution
    Dept. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
  • Volume
    4
  • fYear
    2011
  • fDate
    10-13 July 2011
  • Firstpage
    1466
  • Lastpage
    1471
  • Abstract
    Multiclass spectral clustering is a clustering method which has been successfully applied in image segmentation and many other aspects. In this paper, Multiclass spectral clustering is used to cluster web documents including both English and Chinese pages. Through experiments, we found that Multiclass spectral clustering can be well used in web document clustering, and the method not only works well to cluster English web documents but also works well to cluster Chinese web documents clustering. We applied our method to a web search engine, and users can get the suitable results easily by just selecting the desirable classes.
  • Keywords
    Internet; document image processing; image segmentation; natural language processing; pattern clustering; search engines; Chinese pages; English Web document clustering; English pages; Web search engine; image segmentation; multiclass spectral clustering; Buildings; Classification algorithms; Clustering algorithms; Indexes; Machine learning; Search engines; Sun; Multiclass spectral clustering; search engine; web document clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
  • Conference_Location
    Guilin
  • ISSN
    2160-133X
  • Print_ISBN
    978-1-4577-0305-8
  • Type

    conf

  • DOI
    10.1109/ICMLC.2011.6017004
  • Filename
    6017004