DocumentCode
2014759
Title
A General Approach for Partitioning Web Page Content Based on Geometric and Style Information
Author
Guo, Hui ; Mahmud, Jalal ; Borodin, Yevgen ; Stent, Amanda ; Ramakrishnan, I.V.
Author_Institution
Stony Brook Univ., Stony Brook
Volume
2
fYear
2007
fDate
23-26 Sept. 2007
Firstpage
929
Lastpage
933
Abstract
In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our approach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over presentation style information to determine presentation style similarity. We present several examples to illustrate the generality of our approach.
Keywords
Internet; general-purpose approach; geometric-style information; partitioning Web page content; visual separators; Clustering algorithms; Computer science; HTML; Humans; Marketing and sales; Ontologies; Particle separators; Partitioning algorithms; Rendering (computer graphics); Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location
Parana
ISSN
1520-5363
Print_ISBN
978-0-7695-2822-9
Type
conf
DOI
10.1109/ICDAR.2007.4377051
Filename
4377051
Link To Document