• DocumentCode
    974706
  • Title

    THESUS, a closer view on Web content management enhanced with link semantics

  • Author

    Varlamis, Iraklis ; Vazirgiannis, Michalis ; Halkidi, Maria ; Nguyen, Benjamin

  • Author_Institution
    Dept. of Informatics, Athens Univ. of Econ. & Bus., Greece
  • Volume
    16
  • Issue
    6
  • fYear
    2004
  • fDate
    6/1/2004 12:00:00 AM
  • Firstpage
    685
  • Lastpage
    700
  • Abstract
    With the unstoppable growth of the world wide Web, the great success of Web search engines, such as Google and AltaVista, users now turn to the Web whenever looking for information. However, many users are neophytes when it comes to computer science, yet they are often specialists of a certain domain. These users would like to add more semantics to guide their search through world wide Web material, whereas currently most search features are based on raw lexical content. We show how the use of the incoming links of a page can be used efficiently to classify a page in a concise manner. This enhances the browsing and querying of Web pages. We focus on the tools needed in order to manage the links and their semantics. We further process these links using a hierarchy of concepts, akin to an ontology, and a thesaurus. This work is demonstrated by an prototype system, called THESUS, that organizes thematic Web documents into semantic clusters. Our contributions are the following: 1) a model and language to exploit link semantics information, 2) the THESUS prototype system, 3) its innovative aspects and algorithms, more specifically, the novel similarity measure between Web documents applied to different clustering schemes (DB-Scan and COBWEB), and 4) a thorough experimental evaluation proving the value of our approach.
  • Keywords
    content management; document handling; hypermedia; search engines; semantic Web; Web documents; Web search engines; link analysis; semantic Web; semantic clusters; semantics information; world wide Web; Computer science; Content management; Ontologies; Prototypes; Raw materials; Search engines; Thesauri; Web pages; Web search; Web sites; 65; World Wide Web; link analysis and management; semantic Web.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2004.16
  • Filename
    1294890