DocumentCode
3166383
Title
OCR in Bangla: an Indo-Bangladeshi language
Author
Pal, U. ; Chaudhuri, B.B.
Author_Institution
Electron. & Commun. Sci. Unit, Indian Stat. Inst., Calcutta, India
Volume
2
fYear
1994
fDate
9-13 Oct 1994
Firstpage
269
Abstract
In this paper a complete OCR system is described for documents of single Bangla (Bengali) font. The character shapes are recognized by a combination of template and feature matching approach. Images digitized by flatbed scanner are subjected to skew correction, line, word and character segmentation, simple and compound character separation, feature extraction and finally character recognition. A feature based tree classifier is used for simple character recognition. Preprocessing like thinning and skeletonization is not necessary in our scheme and hence the system is quite fast. At present, the system has an accuracy of about 96%. Also, some character occurrence statistics have been computed to model an error detection and correction technique in the near future
Keywords
optical character recognition; Bangla; Indo-Bangladeshi language; OCR; character segmentation; character separation; character shape recognition; feature based tree classifier; feature extraction; feature matching; flatbed scanner; optical character recognition; skew correction; template matching; Character recognition; Error analysis; Error correction; Feature extraction; Image segmentation; Libraries; Natural languages; Optical character recognition software; Shape; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 1994. Vol. 2 - Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on
Conference_Location
Jerusalem
Print_ISBN
0-8186-6270-0
Type
conf
DOI
10.1109/ICPR.1994.576917
Filename
576917
Link To Document