• DocumentCode
    2014759
  • Title

    A General Approach for Partitioning Web Page Content Based on Geometric and Style Information

  • Author

    Guo, Hui ; Mahmud, Jalal ; Borodin, Yevgen ; Stent, Amanda ; Ramakrishnan, I.V.

  • Author_Institution
    Stony Brook Univ., Stony Brook
  • Volume
    2
  • fYear
    2007
  • fDate
    23-26 Sept. 2007
  • Firstpage
    929
  • Lastpage
    933
  • Abstract
    In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our approach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over presentation style information to determine presentation style similarity. We present several examples to illustrate the generality of our approach.
  • Keywords
    Internet; general-purpose approach; geometric-style information; partitioning Web page content; visual separators; Clustering algorithms; Computer science; HTML; Humans; Marketing and sales; Ontologies; Particle separators; Partitioning algorithms; Rendering (computer graphics); Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
  • Conference_Location
    Parana
  • ISSN
    1520-5363
  • Print_ISBN
    978-0-7695-2822-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.2007.4377051
  • Filename
    4377051