DocumentCode :
2691385
Title :
A Chinese OCR spelling check approach based on statistical language models
Author :
Zhuang, Li ; Bao, Ta ; Zhu, Xiaoyan ; Wang, Chunheng ; Naoi, Satoshi
Author_Institution :
DCST, Tsinghua Univ., Beijing, China
Volume :
5
fYear :
2004
fDate :
10-13 Oct. 2004
Firstpage :
4727
Abstract :
This work describes an effective spelling check approach for Chinese OCR with a new multi-knowledge based statistical language model. This language model combines the conventional n-gram language model and the new LSA (latent semantic analysis) language model, so both local information (syntax) and global information (semantic) are utilized. Furthermore, Chinese similar characters are used in Viterbi search process to expand the candidate list in order to add more possible correct results. With our approach, the best recognition accuracy rate increases from 79.3% to 91.9%, which means 60.9% error reduction.
Keywords :
maximum likelihood estimation; natural languages; optical character recognition; spelling aids; Chinese optical character recognition; Viterbi search process; latent semantic analysis language; spelling check; statistical language models; Character recognition; Computer errors; Engines; Image recognition; Information analysis; Natural languages; Optical character recognition software; Optical computing; Probability; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2004 IEEE International Conference on
ISSN :
1062-922X
Print_ISBN :
0-7803-8566-7
Type :
conf
DOI :
10.1109/ICSMC.2004.1401278
Filename :
1401278
Link To Document :
بازگشت