Title :
Script-independent, HMM-based text line finding for OCR
Author :
Lu, Zhidong ; Schwartz, Richard ; Raphael, Christopher
Author_Institution :
BBN Technol., GTE Corp., Cambridge, MA, USA
Abstract :
We present a new, script-independent, HMM-based technique to locate text lines on images containing one or more paragraphs of simple-column text. The parameters of the HMMs are trained online on each image using an unsupervised training procedure. We present results of line finding experiments in Arabic, Chinese and English to demonstrate the performance as well as the script-independent nature of the technique. A comparison of the HMM-based line finding with the manual line finding shows that the use of HMM-based technique does not lead to a significant increase in the recognition error rate
Keywords :
feature extraction; optical character recognition; real-time systems; unsupervised learning; Arabic characters; Chinese characters; English characters; OCR; feature extraction; optical character recognition; text line finding; unsupervised learning; Character recognition; Error analysis; Feature extraction; Hidden Markov models; Manuals; Mathematics; Optical character recognition software; Pixel; Speech; Text recognition;
Conference_Titel :
Pattern Recognition, 2000. Proceedings. 15th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
0-7695-0750-6
DOI :
10.1109/ICPR.2000.902979