DocumentCode :
153392
Title :
Newspaper Article Extraction Using Hierarchical Fixed Point Model
Author :
Bansal, Ankur ; Chaudhury, Santanu ; Roy, Sanjay Dhar ; Srivastava, J.B.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol. Delhi, New Delhi, India
fYear :
2014
fDate :
7-10 April 2014
Firstpage :
257
Lastpage :
261
Abstract :
This paper presents a novel learning based framework to extract articles from newspaper images using a Fixed-Point Model. The input to the system comprises blocks of text and graphics, obtained using standard image processing techniques. The fixed point model uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block. We use a hierarchical model which works in two stages. In the first stage, a semantic label (heading, sub-heading, text-blocks, image and caption) is assigned to each segmented block. The labels are then used as input to the next stage to group the related blocks into news articles. Experimental results show the applicability of our algorithm in newspaper labeling and article extraction.
Keywords :
document image processing; feature extraction; image segmentation; regression analysis; hierarchical fixed point model; image processing techniques; newspaper article extraction; newspaper image extraction; newspaper labeling; semantic label; Accuracy; Feature extraction; Image segmentation; Labeling; Layout; Text analysis; Newspaper article; fixed point model; layout analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on
Conference_Location :
Tours
Print_ISBN :
978-1-4799-3243-6
Type :
conf
DOI :
10.1109/DAS.2014.42
Filename :
6831009
Link To Document :
بازگشت