• DocumentCode
    2324188
  • Title

    BlockWeb: An IR Model for Block Structured Web Pages

  • Author

    Bruno, Emmanuel ; Faessel, Nicolas ; Le Maitre, J. ; Scholl, Michel

  • Author_Institution
    LSIS, Univ. du Sud Toulon-Var, La Garde
  • fYear
    2009
  • fDate
    3-5 June 2009
  • Firstpage
    219
  • Lastpage
    224
  • Abstract
    BlockWeb is a model that we have developed for indexing and querying web pages according to their content as well as to their visual rendering. These pages are split up into blocks what has several advantages in terms of page indexing and querying: (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to the content of neighbor blocks. In this paper, we present the BlockWeb model and show its interest for indexing images of Web pages, through an experiment performed on electronic versions of French daily newspapers. We also present the engine we have implemented for block extraction, indexing and querying according to the BlockWeb model.
  • Keywords
    Web sites; indexing; query processing; rendering (computer graphics); BlockWeb; IR model; Web page indexing; Web page querying; block structured Web pages; visual rendering; Data mining; Data models; Engines; Indexing; Large scale integration; Permeability; Rendering (computer graphics); Vocabulary; Web pages; XML; block decomposition; image indexing; propagation; web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Content-Based Multimedia Indexing, 2009. CBMI '09. Seventh International Workshop on
  • Conference_Location
    Chania
  • Print_ISBN
    978-1-4244-4265-2
  • Electronic_ISBN
    978-0-7695-3662-0
  • Type

    conf

  • DOI
    10.1109/CBMI.2009.36
  • Filename
    5137844