DocumentCode
1993987
Title
Improving Chinese/English OCR performance by using MCE-based character-pair modeling and negative training
Author
Huo, Qiang ; Feng, Zhi-Dan
Author_Institution
dept. of Comput. Sci. & Inf. Syst., Hong Kong Univ., China
fYear
2003
fDate
3-6 Aug. 2003
Firstpage
364
Abstract
In the past several years, we´ve been developing a high performance OCR engine for machine printed Chinese/ English documents. We have reported previously (1) how to use character modeling techniques based on MCE (minimum classification error) training to achieve the high recognition accuracy, and (2) how to use confidence-guided progressive search and fast match techniques to achieve the high recognition efficiency. In this paper, we present two more techniques that help reduce search errors and improve the robustness of our character recognizer. They are (1) to use MCE-trained character-pair models to avoid error-prone character-level segmentation for some trouble cases, and (2) to perform a MCE-based negative training to improve the rejection capability of the recognition models on the hypothesized garbage images during recognition process. The efficacy of the proposed techniques is confirmed by experiments in a benchmark test.
Keywords
error handling; minimisation; natural language interfaces; optical character recognition; Chinese OCR performance improvement; English OCR performance improvement; MCE-based character-pair modeling; OCR engine; benchmark testing; character modeling techniques; character recognition; confidence-guided progressive search; errorprone character-level segmentation; fast match techniques; hypothesized garbage images; minimum classification error training; negative training; recognition accuracy; recognition efficiency; rejection capability; Benchmark testing; Character generation; Character recognition; Computer science; Image recognition; Image segmentation; Information systems; Optical character recognition software; Robustness; Search engines;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN
0-7695-1960-1
Type
conf
DOI
10.1109/ICDAR.2003.1227690
Filename
1227690
Link To Document