DocumentCode :
1994294
Title :
Fast lexicon-based word recognition in noisy index card images
Author :
Lucas, Simon M. ; Patoulas, Gregory ; Downton, Andy C.
Author_Institution :
Comput. Sci. Dept., Essex Univ., Colchester, UK
fYear :
2003
fDate :
3-6 Aug. 2003
Firstpage :
462
Abstract :
This paper describes a complete system for reading type-written lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforward to implement. It involves three stages of processing. The first stage extracts row-regions from the image, where each row is a hypothesized line of text. The next stage scans an OCR classifier over each row image, creating a character hypothesis graph in the process. This graph is then searched using a priority-queue based algorithm for the best matches with a set of words (lexicon). Performance evaluation on a set of museum archive cards indicates competitive accuracy and also reasonable throughput. The priority queue algorithm is over two hundred times faster than using flat dynamic programming on these graphs.
Keywords :
feature extraction; image classification; image denoising; image matching; image recognition; optical character recognition; OCR classifier; character hypothesis graph; flat dynamic programming; lexicon-based word recognition; museum index cards; noisy images; noisy index card images; performance evaluation; priority-queue based algorithm; type-written lexicon words; Algorithm design and analysis; Computer science; Dynamic programming; Image recognition; Image segmentation; Optical character recognition software; Packaging machines; Search methods; Systems engineering and theory; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN :
0-7695-1960-1
Type :
conf
DOI :
10.1109/ICDAR.2003.1227708
Filename :
1227708
Link To Document :
بازگشت