• DocumentCode
    2099470
  • Title

    Using keyword extraction for Web site clustering

  • Author

    Tonella, Paolo ; Ricca, Filippo ; Pianta, Emanuele ; Girardi, Christian

  • Author_Institution
    Centro per la Ricerca Scientifica e Tecnologica, Ist. Trentino di Cultura, Trento, Italy
  • fYear
    2003
  • fDate
    22-22 Sept. 2003
  • Firstpage
    41
  • Lastpage
    48
  • Abstract
    Reverse engineering techniques have the potential to support Web site understanding, by providing views that show the organization of a site and its navigational structure. However, representing each Web page as a node in the diagrams that are recovered from the source code of a Web site leads often to huge and unreadable graphs. Moreover, since the level of connectivity is typically high, the edges in such graphs make the overall result still less usable. Clustering can be used to produce cohesive groups of pages that are displayed as a single node in reverse engineered diagrams. In this paper, we propose a clustering method based on the automatic extraction of the keywords of a Web page. The presence of common keywords is exploited to decide when it is appropriate to group pages together. A second usage of the keywords is in the automatic labeling of the recovered clusters of pages.
  • Keywords
    Web sites; information retrieval; pattern clustering; reverse engineering; text analysis; Web applications; Web site clustering; keyword extraction; reverse engineered diagrams; Application software; Clustering algorithms; Clustering methods; Data mining; HTML; Labeling; Navigation; Proposals; Reverse engineering; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Site Evolution, 2003. Theme: Architecture. Proceedings. Fifth IEEE International Workshop on
  • Conference_Location
    Amsterdam, The Netherlands
  • Print_ISBN
    0-7695-2016-2
  • Type

    conf

  • DOI
    10.1109/WSE.2003.1234007
  • Filename
    1234007