Title :
Document layout analysis for Indian newspapers using contour based symbiotic approach
Author :
Singh, V. ; Kumar, Bijendra
Author_Institution :
Centre for Dev. of Adv. Comput., Noida, India
Abstract :
Document layout analysis is necessary process for automated document recognition systems. Document layout analysis identifies, categorizes and labels the semantics of text blocks for meaningful information retrieval from document images. Our primary target document includes various newspaper and magazine pages which are having complex layout without following any static rules. We propose an effective approach for document layout analysis where power of bottom up approach and top-down approach i.e. region growing and segmentation respectively, have been utilized simultaneously. In this methodology various image morphological operations, contour analysis, connected component analysis, projection analysis are employed for the realization. The proposed algorithm has been successfully implemented and applied over a large number of Indian script newspaper and magazine pages. The results have been evaluated by number of blocks detected and taking their correct ordering information into account.
Keywords :
document image processing; image retrieval; information analysis; optical character recognition; Indian magazine pages; Indian script newspaper; bottom up approach; connected component analysis; contour analysis; contour based symbiotic approach; document images; document layout analysis; document recognition systems; image morphological operations; information retrieval; ordering information; projection analysis; top-down approach; Computers; Image segmentation; Informatics; Layout; Optical character recognition software; Text analysis; White spaces; Layout retention; contours analysis; projection analysis;
Conference_Titel :
Computer Communication and Informatics (ICCCI), 2014 International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4799-2353-3
DOI :
10.1109/ICCCI.2014.6921723