DocumentCode :
1992883
Title :
An approach to extracting the target text line from a document image captured by a pen scanner
Author :
BAI, Zhen-Long ; Huo, Qiang
Author_Institution :
Comput. Sci. & Inf. Syst. Dept., Hongkong Univ., Hong Kong, China
fYear :
2003
fDate :
3-6 Aug. 2003
Firstpage :
76
Abstract :
In this paper, we present a new approach to extracting the target text line from a document image captured by a pen scanner. Given the binary image, a set of possible text lines are first formed by nearest-neighbor grouping of connected components (CC). They are then refined by text line merging and adding the missed CCs. The possible target text line is identified by using a geometric feature based score function and fed to an OCR engine for character recognition. If the recognition result is confident enough, the target text line is accepted. Otherwise, all the remaining text lines are fed to the OCR engine to verify whether an alternative target text line exists or the whole image should be rejected. The effectiveness of the above approach is confirmed by experiments on a testing database consisting of 117 document images captured by C-Pen and ScanEye pen scanners.
Keywords :
document image processing; feature extraction; optical character recognition; C-Pen scanner; ScanEye pen scanner; binary image; captured document image; character recognition; connected components; geometric feature based score function; nearest-neighbor grouping; target text line extraction; testing database; text line merging; Character recognition; Computer science; Data mining; Engines; Image recognition; Information systems; Merging; Optical character recognition software; Target recognition; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN :
0-7695-1960-1
Type :
conf
DOI :
10.1109/ICDAR.2003.1227631
Filename :
1227631
Link To Document :
بازگشت