An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents

Author

Mohanty, Sanghamitra ; Dasbebartta, Himadri Nandini ; Behera, Tarun Kumar

Author_Institution

Dept. of Comput. Sci. & Applic., Utkal Univ., Bhubaneswar

fYear

2009

fDate

4-6 Feb. 2009

Firstpage

398

Lastpage

401

Abstract

Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (optical character recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.

Keywords

natural language processing; optical character recognition; text analysis; English languages; English texts; Oriya texts; bilingual optical character recognition; multiscripts; printed documents; regional languages; Application software; Character recognition; Cleaning; Computer science; Image segmentation; Natural languages; Noise generators; Optical character recognition software; Optical design; Pattern recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Advances in Pattern Recognition, 2009. ICAPR '09. Seventh International Conference on

Conference_Location

Kolkata

Print_ISBN

978-1-4244-3335-3

Type

conf

DOI

10.1109/ICAPR.2009.49

Filename

4782818