• DocumentCode
    408305
  • Title

    Exploring similarity among Web pages using the hyperlink structure

  • Author

    Huang, Shou-Hsuan Stephen ; Molina-Rodríguez, Carlos Humberto ; Quevedo-Torrero, Jesús Ubaldo ; Fonseca-Lozada, Mario Francisco

  • Author_Institution
    Dept. of Comput. Sci., Houston Univ., TX, USA
  • Volume
    1
  • fYear
    2004
  • fDate
    5-7 April 2004
  • Firstpage
    344
  • Abstract
    Hyperlinks inside HTML pages contain a wealth of information about the relationships among Web pages. Given a set of Web pages, we can explore the hyperlink relationships among these pages. This paper first provides formal definitions of hyperlink relations. We then use the notations to define similarity between two Web pages and between two sets of Web pages. For each one of them, we provide several definitions of similarity using forward and backward links. The similarity measure gives us a number between 0 and 1. We also demonstrate how to use the similarity measure to study clustering within a set of pages and to determine the "diversity" of a set of Web pages.
  • Keywords
    Web sites; hypermedia markup languages; HTML pages; Web pages; formal definitions; hyperlink relations; hyperlink relationships; hyperlink structure; Computer science; Data mining; HTML; Information retrieval; Internet; Keyword search; Search engines; Search methods; Web pages; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on
  • Print_ISBN
    0-7695-2108-8
  • Type

    conf

  • DOI
    10.1109/ITCC.2004.1286477
  • Filename
    1286477