DocumentCode
3482284
Title
Textline information extraction from grayscale camera-captured document images
Author
Bukhari, Syed Saqib ; Breuel, Thomas M. ; Shafait, Faisal
Author_Institution
Tech. Univ. of Kaiserslautern, Kaiserslautern, Germany
fYear
2009
fDate
7-10 Nov. 2009
Firstpage
2013
Lastpage
2016
Abstract
Cameras offer flexible document imaging, but with uneven shading and non-planar page shape. Therefore camera captured documents need to go through dewarping before being processed by traditional text recognition methods. Curled textline detection is an important step of dewarping. Previous approaches of curled textline detection use binarization as a pre-processing step, which can negatively affect the detection results under uneven shading. Furthermore, these approaches are sensitive to high degrees of curl and estimate x-line1 and baseline pairs using regression which may result in inaccurate estimation. We introduce a novel curled textline detection approach for grayscale document images. First, the textline structure is enhanced by using match filter bank smoothing and then central lines of textlines are detected using ridges. Then, x-line and baseline pairs are estimated by adapting active contours (snakes) over ridges. Unlike other approaches, our approach does not use binarization and applies directly on grayscale images. We achieved 91% of detection accuracy with good estimation of x-line and baseline pairs on the dataset of CBDAR 2007 document image dewarping contest.
Keywords
cameras; channel bank filters; document image processing; image segmentation; information filtering; matched filters; object detection; text analysis; active contour adaptation; baseline pair estimation; curled textline detection approach; document image dewarping; grayscale camera-captured document images; grayscale images; image segmentation; match filter bank smoothing; nonplanar page shape; regression analysis; ridge detection; text dewarping; text recognition methods; textline information extraction; x-line estimation; Active contours; Anisotropic magnetoresistance; Cameras; Data mining; Gray-scale; Matched filters; Optical character recognition software; Shape; Smoothing methods; Text recognition; Curled Textline Detection; Grayscale Camera-Captured Document Image Segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Image Processing (ICIP), 2009 16th IEEE International Conference on
Conference_Location
Cairo
ISSN
1522-4880
Print_ISBN
978-1-4244-5653-6
Electronic_ISBN
1522-4880
Type
conf
DOI
10.1109/ICIP.2009.5413799
Filename
5413799
Link To Document