DocumentCode
1992776
Title
A model-based line detection algorithm in documents
Author
Zheng, Yefeng ; Li, Huiping ; Doermann, David
Author_Institution
Lab. for Language & Media Process., Maryland Univ., College Park, MD, USA
fYear
2003
fDate
3-6 Aug. 2003
Firstpage
44
Abstract
In this paper we present a novel model based approach to detect severely broken parallel lines in noisy textual documents. It is important to detect and remove these lines so the text can be segmented and recognized. We use directional single-connected chain, a vectorization based algorithm, to extract the line segments. We then instantiate a parallel line model with three parameters: the skew angle, the vertical line gap, and the vertical translation. A coarse-to-fine approach is used to improve the estimation accuracy. From the model we can incorporate the high level contextual information to enhance detection results even when lines are severely broken. Our experimental results show our method can detect 94% of the lines in our database with 168 noisy Arabic document images.
Keywords
image enhancement; image processing; image recognition; image segmentation; optical character recognition; text analysis; coarse-to-fine approach; directional single-connected chain; estimation accuracy; line segment extraction; model-based line detection algorithm; noisy Arabic document images; noisy textual documents; parallel line model; severely broken parallel lines; skew angle; text recognition; text segmentation; vectorization based algorithm; vertical line gap; vertical translation; Concurrent computing; Context modeling; Data mining; Detection algorithms; Educational institutions; Electronic mail; Image segmentation; Laboratories; Optical character recognition software; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN
0-7695-1960-1
Type
conf
DOI
10.1109/ICDAR.2003.1227625
Filename
1227625
Link To Document