• DocumentCode
    2334501
  • Title

    Automatic topic identification using webpage clustering

  • Author

    He, Xiaofeng ; Ding, Chris H Q ; Zha, Hongyuan ; Simon, Horst D.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    195
  • Lastpage
    202
  • Abstract
    Grouping Web pages into distinct topics is one way of organizing the large amount of retrieved information on the Web. In this paper, we report that, based on a similarity metric, which incorporates textual information, hyperlink structure and co-citation relations, an unsupervised clustering method can automatically and effectively identify relevant topics, as shown in experiments on several retrieved sets of Web pages. The clustering method is a state-of-art spectral graph partitioning method based on the normalized cut criterion first developed for image segmentation
  • Keywords
    information analysis; information resources; information retrieval; pattern clustering; Web page clustering; automatic topic identification; co-citation relations; hyperlink structure; normalized cut criterion; similarity metric; spectral graph partitioning method; textual information; unsupervised clustering method; Clustering algorithms; Clustering methods; Computer science; Image segmentation; Information retrieval; Laboratories; Organizing; Search engines; Taxonomy; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    0-7695-1119-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2001.989518
  • Filename
    989518