DocumentCode
2334501
Title
Automatic topic identification using webpage clustering
Author
He, Xiaofeng ; Ding, Chris H Q ; Zha, Hongyuan ; Simon, Horst D.
Author_Institution
Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
fYear
2001
fDate
2001
Firstpage
195
Lastpage
202
Abstract
Grouping Web pages into distinct topics is one way of organizing the large amount of retrieved information on the Web. In this paper, we report that, based on a similarity metric, which incorporates textual information, hyperlink structure and co-citation relations, an unsupervised clustering method can automatically and effectively identify relevant topics, as shown in experiments on several retrieved sets of Web pages. The clustering method is a state-of-art spectral graph partitioning method based on the normalized cut criterion first developed for image segmentation
Keywords
information analysis; information resources; information retrieval; pattern clustering; Web page clustering; automatic topic identification; co-citation relations; hyperlink structure; normalized cut criterion; similarity metric; spectral graph partitioning method; textual information; unsupervised clustering method; Clustering algorithms; Clustering methods; Computer science; Image segmentation; Information retrieval; Laboratories; Organizing; Search engines; Taxonomy; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location
San Jose, CA
Print_ISBN
0-7695-1119-8
Type
conf
DOI
10.1109/ICDM.2001.989518
Filename
989518
Link To Document