DocumentCode
1582589
Title
A system for recognizing Vietnamese document images based on HMM and linguistics
Author
Quan, Vu Hai ; Kiem, Hoang ; Trung, Pham Nam ; Tin, Lam Tri ; Ha, Nguyen Duc Hoang ; Nguyen, An H.
Author_Institution
Fac. of Inf. Technol, Univ. of Natural Sci, Ho Chi Minh, Viet Nam
fYear
2001
fDate
6/23/1905 12:00:00 AM
Firstpage
627
Lastpage
630
Abstract
The authors present a system for recognizing Vietnamese document images and propose a method to increase the accuracy for this system. Based on features of the Vietnamese language, we can minimize the number of characters and integrate spell-checking in the recognition process. We also explain how to combine HMMs and our method in the recognition systems. Finally, based on statistical models for word frequency, a dictionary of Vietnamese word frequency was built to predict the next words to be recognized and to aid in post processing. The performance of the proposed approach was evaluated on Vietnamese literature from 1990 to 1997 with a total of 3469518 words (about 16866511 characters). Experimental results show that our method was effective
Keywords
dictionaries; document image processing; hidden Markov models; image recognition; linguistics; natural languages; spelling aids; HMM; OCR; Vietnamese document image recognition; Vietnamese language; Vietnamese literature; Vietnamese word frequency; dictionary; hidden Markov model; linguistics; post processing; recognition process; spell-checking; statistical models; Character recognition; Frequency; Hidden Markov models; Image recognition; Information technology; Natural languages; Optical character recognition software; Predictive models; Probability; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location
Seattle, WA
Print_ISBN
0-7695-1263-1
Type
conf
DOI
10.1109/ICDAR.2001.953865
Filename
953865
Link To Document