DocumentCode
135059
Title
An approach for printed document labeling
Author
Adak, Chandranath
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of Kalyani, Kalyani, India
fYear
2014
fDate
1-2 Feb. 2014
Firstpage
1
Lastpage
4
Abstract
A document image contains texts and non-texts, it may be printed, handwritten, or hybrid of both. In this paper we deal with printed document where textual region is of printed characters, and non-texts are mainly photo images. Here we propose a model which performs labeling of different components of a printed document image, i.e. identification of heading, subheading, caption, article and photo. Our method consists of a preprocessing stage where fuzzy c-means clustering is used to segment the document image into printed (object) region and background. Then Hough transformation is used to find white-line dividers of object region and grid structure examination is used to extract the non-text portion. After that, we use horizontal histogram to find text lines and then we label different components. Our method gives promising results on printed document of different scripts.
Keywords
Hough transforms; document image processing; fuzzy set theory; pattern clustering; text analysis; Hough transformation; document image; fuzzy c-means clustering; grid structure examination; horizontal histogram; nontext portion; object region; preprocessing stage; printed characters; printed document image; printed document labeling; textual region; white-line dividers; Histograms; Image analysis; Image segmentation; Labeling; Optical character recognition software; Text analysis; Transforms; Document Image Analysis; Document Labeling; Fuzzy C-Means Clustering; Hough Transform; Optical Character Recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Automation, Control, Energy and Systems (ACES), 2014 First International Conference on
Conference_Location
Hooghy
Print_ISBN
978-1-4799-3893-3
Type
conf
DOI
10.1109/ACES.2014.6808032
Filename
6808032
Link To Document