DocumentCode
2324188
Title
BlockWeb: An IR Model for Block Structured Web Pages
Author
Bruno, Emmanuel ; Faessel, Nicolas ; Le Maitre, J. ; Scholl, Michel
Author_Institution
LSIS, Univ. du Sud Toulon-Var, La Garde
fYear
2009
fDate
3-5 June 2009
Firstpage
219
Lastpage
224
Abstract
BlockWeb is a model that we have developed for indexing and querying web pages according to their content as well as to their visual rendering. These pages are split up into blocks what has several advantages in terms of page indexing and querying: (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to the content of neighbor blocks. In this paper, we present the BlockWeb model and show its interest for indexing images of Web pages, through an experiment performed on electronic versions of French daily newspapers. We also present the engine we have implemented for block extraction, indexing and querying according to the BlockWeb model.
Keywords
Web sites; indexing; query processing; rendering (computer graphics); BlockWeb; IR model; Web page indexing; Web page querying; block structured Web pages; visual rendering; Data mining; Data models; Engines; Indexing; Large scale integration; Permeability; Rendering (computer graphics); Vocabulary; Web pages; XML; block decomposition; image indexing; propagation; web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Content-Based Multimedia Indexing, 2009. CBMI '09. Seventh International Workshop on
Conference_Location
Chania
Print_ISBN
978-1-4244-4265-2
Electronic_ISBN
978-0-7695-3662-0
Type
conf
DOI
10.1109/CBMI.2009.36
Filename
5137844
Link To Document