Title :
OCR in Bangla: an Indo-Bangladeshi language
Author :
Pal, U. ; Chaudhuri, B.B.
Author_Institution :
Electron. & Commun. Sci. Unit, Indian Stat. Inst., Calcutta, India
Abstract :
In this paper a complete OCR system is described for documents of single Bangla (Bengali) font. The character shapes are recognized by a combination of template and feature matching approach. Images digitized by flatbed scanner are subjected to skew correction, line, word and character segmentation, simple and compound character separation, feature extraction and finally character recognition. A feature based tree classifier is used for simple character recognition. Preprocessing like thinning and skeletonization is not necessary in our scheme and hence the system is quite fast. At present, the system has an accuracy of about 96%. Also, some character occurrence statistics have been computed to model an error detection and correction technique in the near future
Keywords :
optical character recognition; Bangla; Indo-Bangladeshi language; OCR; character segmentation; character separation; character shape recognition; feature based tree classifier; feature extraction; feature matching; flatbed scanner; optical character recognition; skew correction; template matching; Character recognition; Error analysis; Error correction; Feature extraction; Image segmentation; Libraries; Natural languages; Optical character recognition software; Shape; Text recognition;
Conference_Titel :
Pattern Recognition, 1994. Vol. 2 - Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on
Conference_Location :
Jerusalem
Print_ISBN :
0-8186-6270-0
DOI :
10.1109/ICPR.1994.576917