• DocumentCode
    3509205
  • Title

    Proposal of Seam Degree and Content Similarity for Web Page Segmentation

  • Author

    Jun Zeng ; Flanagan, Brendan ; Qingyu Xiong ; Junhao Wen ; Hirokawa, Sachio

  • Author_Institution
    Grad. Sch. of Inf., Kyushu Univ., Fukuoka, Japan
  • fYear
    2013
  • fDate
    Aug. 31 2013-Sept. 4 2013
  • Firstpage
    9
  • Lastpage
    14
  • Abstract
    Page segmentation has received great attention in recent years. However, most research has been based on some pre-defined heuristics or visual cues which may be not suitable for large-scale page segmentation. In this paper, we proposed two parameters: seam degree and content similarity, to indicate the coherent degree of a page block. Instead of analyzing pre-defined heuristics or visual cues, our method utilizes the visual and content features to determine whether a page block should be divided into smaller blocks. We also proposed a principled page segmentation method using these two parameters. An experiment was conducted to determine the relationship between the two parameters and the number of segment results. The empirical results also show that our segmentation method can effectively segment a page into different semantic parts.
  • Keywords
    Internet; Web design; heuristic programming; Web page segmentation; content similarity; large-scale page segmentation; page block; pre-defined heuristic analysis; principled page segmentation method; seam degree; visual cues; Educational institutions; Finite element analysis; HTML; Semantics; Vectors; Visualization; Web pages; content similarity; page segmentation; seam degree; semantic segment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Applied Informatics (IIAIAAI), 2013 IIAI International Conference on
  • Conference_Location
    Los Alamitos, CA
  • Print_ISBN
    978-1-4799-2134-8
  • Type

    conf

  • DOI
    10.1109/IIAI-AAI.2013.56
  • Filename
    6630309