• DocumentCode
    527280
  • Title

    Precise web page segmentation based on semantic block headers detection

  • Author

    Zhang, Aihua ; Jing, Jiwu ; Kang, Le ; Zhang, Lingchen

  • Author_Institution
    Dept. of Electron. Eng. & Inf. Sci., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2010
  • fDate
    16-18 Aug. 2010
  • Firstpage
    63
  • Lastpage
    68
  • Abstract
    Web page segmentation is an important technology for web-driven applications such as search engine and web browser on mobile device. Currently, the researches in this field attempted to mine the features of visual presentation and document structure, but it is difficult to choose proper features to obtain a precise result. The approach which focuses on either vision-based method or DOM structure analysis has its defect and is not providing enough satisfaction for practice. This paper presents a novel algorithm for web page segmentation. By extracting the block headers, the algorithm is able to partition the web page into semantic blocks. The algorithm exploits both the visual features and the structural features in web page from a simple but novel perspective. We apply this algorithm to a group of real world web pages as verification and obtain a very positive result.
  • Keywords
    Web sites; mobile computing; online front-ends; search engines; DOM structure analysis; Web browser; Web page segmentation; Web-driven applications; document object model; document structure; mobile device; search engine; semantic block headers detection; vision-based method; visual presentation; block header; block node; content row; web page segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Content, Multimedia Technology and its Applications (IDC), 2010 6th International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4244-7607-7
  • Electronic_ISBN
    978-8-9886-7827-5
  • Type

    conf

  • Filename
    5568593