DocumentCode :
2012084
Title :
Web Document Analysis Based on Visual Segmentation and Page Rendering
Author :
Cong Kinh Nguyen ; Likforman-Sulem, Laurence ; Moissinac, Jean-Claude ; Faure, Claudie ; Lardon, Jérémy
Author_Institution :
Telecom-ParisTech, Paris, France
fYear :
2012
fDate :
27-29 March 2012
Firstpage :
354
Lastpage :
358
Abstract :
This paper proposes an approach for segmenting a Web page into its semantic parts. Such analysis may be useful for adapting blog or other pages on small devices. In this approach, we take advantage of both dynamic layout after rendering and textual information. Our method segments the page into blocks and then classifies the blocks. A classification in semantic parts is performed thanks to a SVM-based machine learning approach using a set of 30 textual and visual-based features. Evaluation is conducted on a Web blog database. Results are provided for both block classification and blog segmentation into articles.
Keywords :
Web sites; document handling; learning (artificial intelligence); pattern classification; rendering (computer graphics); support vector machines; SVM-based machine learning approach; Web blog database; Web document analysis; Web page segmentation; block classification; block segmentation; dynamic layout; page rendering; textual features; textual information; visual segmentation; visual-based features; Conferences; Text analysis; Internet document; Web page segmentation; block segmentation; semantic block;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location :
Gold Cost, QLD
Print_ISBN :
978-1-4673-0868-7
Type :
conf
DOI :
10.1109/DAS.2012.95
Filename :
6195393
Link To Document :
بازگشت