DocumentCode
3278720
Title
Clustering web documents based on Multiclass spectral clustering
Author
He, Xing ; Wang, Jia-bing ; Zhang, Zhong-xian ; Cai, Yi-rong
Author_Institution
Dept. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
Volume
4
fYear
2011
fDate
10-13 July 2011
Firstpage
1466
Lastpage
1471
Abstract
Multiclass spectral clustering is a clustering method which has been successfully applied in image segmentation and many other aspects. In this paper, Multiclass spectral clustering is used to cluster web documents including both English and Chinese pages. Through experiments, we found that Multiclass spectral clustering can be well used in web document clustering, and the method not only works well to cluster English web documents but also works well to cluster Chinese web documents clustering. We applied our method to a web search engine, and users can get the suitable results easily by just selecting the desirable classes.
Keywords
Internet; document image processing; image segmentation; natural language processing; pattern clustering; search engines; Chinese pages; English Web document clustering; English pages; Web search engine; image segmentation; multiclass spectral clustering; Buildings; Classification algorithms; Clustering algorithms; Indexes; Machine learning; Search engines; Sun; Multiclass spectral clustering; search engine; web document clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
Conference_Location
Guilin
ISSN
2160-133X
Print_ISBN
978-1-4577-0305-8
Type
conf
DOI
10.1109/ICMLC.2011.6017004
Filename
6017004
Link To Document