OCR in Bangla: an Indo-Bangladeshi language

Author

Pal, U. ; Chaudhuri, B.B.

Author_Institution

Electron. & Commun. Sci. Unit, Indian Stat. Inst., Calcutta, India

Volume

2

fYear

1994

fDate

9-13 Oct 1994

Firstpage

269

Abstract

In this paper a complete OCR system is described for documents of single Bangla (Bengali) font. The character shapes are recognized by a combination of template and feature matching approach. Images digitized by flatbed scanner are subjected to skew correction, line, word and character segmentation, simple and compound character separation, feature extraction and finally character recognition. A feature based tree classifier is used for simple character recognition. Preprocessing like thinning and skeletonization is not necessary in our scheme and hence the system is quite fast. At present, the system has an accuracy of about 96%. Also, some character occurrence statistics have been computed to model an error detection and correction technique in the near future

Keywords

optical character recognition; Bangla; Indo-Bangladeshi language; OCR; character segmentation; character separation; character shape recognition; feature based tree classifier; feature extraction; feature matching; flatbed scanner; optical character recognition; skew correction; template matching; Character recognition; Error analysis; Error correction; Feature extraction; Image segmentation; Libraries; Natural languages; Optical character recognition software; Shape; Text recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Pattern Recognition, 1994. Vol. 2 - Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on

Conference_Location

Jerusalem

Print_ISBN

0-8186-6270-0

Type

conf

DOI

10.1109/ICPR.1994.576917

Filename

576917