DocumentCode :
2022072
Title :
A Suffix Tree Based Handwritten Chinese Address Recognition System
Author :
Yan Jiang ; Xiaoqing Ding ; Zheng Ren
Author_Institution :
Tsinghua Univ., Beijing
Volume :
1
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
292
Lastpage :
296
Abstract :
The main contribution of the paper is that it presents a suffix tree based data structure for automatic handwritten Chinese address reading. Since lots of papers have discussed the destination address block (DAB) location for Chinese, we will not extend it in this paper. Instead, we pay more attention to improve the address matching performance after DAB location. As some conventional methods, the extracted text lines are pre-segmented into a series of radicals. We then build a hierarchical structure of sub-strings from the recognized characters of valid radical combinations. Coarse address candidates are selected at the same time. In address maching, we incorporate postcode information to filter redundant addresses. The pre- segmented radicals are compared with candidate address and a cost function combining recognition and structrual cost is evaluated for final decision. In the system, character segmentation, recognition, string searching and matching are considered synchronously by taking advantage of lexicon knowledge. Suffix tree can greatly facilitate the substring generation process and enable the matching process to start from any character to collect potentially bitty information. Therefore, our algorithms is more robust to the intervening noises and irregular writing styles. Finallly, we test 1,000 handwritten Chinese envelopes and achieve a correct rate of 85.30% in 3.0 seconds per mail averagely.
Keywords :
document image processing; feature extraction; handwritten character recognition; image segmentation; optical character recognition; string matching; tree data structures; DAB location; address matching performance; automatic handwritten Chinese address reading; character segmentation; destination address block; handwritten Chinese address recognition system; hierarchical structure; optical character recognition; postcode information; redundant address filtering; string matching; string searching; suffix tree based data structure; text line pre-segmentation; Character generation; Character recognition; Cost function; Data mining; Handwriting recognition; Information filtering; Information filters; Noise robustness; Tree data structures; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4378721
Filename :
4378721
Link To Document :
بازگشت