DocumentCode :
860138
Title :
Lexicon-driven segmentation and recognition of handwritten character strings for Japanese address reading
Author :
Liu, Cheng-Lin ; Koga, Masashi ; Fujisawa, Hiromichi
Author_Institution :
Central Res. Lab., Hitachi Ltd., Kokubunji, Japan
Volume :
24
Issue :
11
fYear :
2002
fDate :
11/1/2002 12:00:00 AM
Firstpage :
1425
Lastpage :
1437
Abstract :
This paper describes a handwritten character string recognition system for Japanese mail address reading on a very large vocabulary. The address phrases are recognized as a whole because there is no extra space between words. The lexicon contains 111,349 address phrases, which are stored in a trie structure. In recognition, the text line image is matched with the lexicon entries (phrases) to obtain reliable segmentation and retrieve valid address phrases. The paper first introduces some effective techniques for text line image preprocessing and presegmentation. In presegmentation, the text line image is separated into primitive segments by connected component analysis and touching pattern splitting based on contour shape analysis. In lexicon matching, consecutive segments are dynamically combined into candidate character patterns. An accurate character classifier is embedded in lexicon matching to select characters matched with a candidate pattern from a dynamic category set. A beam search strategy is used to control the lexicon matching so as to achieve real-time recognition. In experiments on 3,589 live mail images, the proposed method achieved correct rate of 83.68 percent while the error rate is less than 1 percent.
Keywords :
document image processing; handwritten character recognition; image segmentation; optical character recognition; real-time systems; string matching; tree data structures; vocabulary; Japanese address reading; OCR; beam search strategy; connected component analysis; contour shape analysis; experiments; handwritten character string recognition; image matching; image segmentation; lexicon-driven character segmentation; mail address reading; real-time recognition; text line image preprocessing; touching pattern splitting; trie structure; very large vocabulary; Character recognition; Handwriting recognition; Image analysis; Image recognition; Image segmentation; Pattern analysis; Pattern matching; Postal services; Text recognition; Vocabulary;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2002.1046151
Filename :
1046151
Link To Document :
بازگشت