DocumentCode
2727395
Title
An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents
Author
Mohanty, Sanghamitra ; Dasbebartta, Himadri Nandini ; Behera, Tarun Kumar
Author_Institution
Dept. of Comput. Sci. & Applic., Utkal Univ., Bhubaneswar
fYear
2009
fDate
4-6 Feb. 2009
Firstpage
398
Lastpage
401
Abstract
Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (optical character recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.
Keywords
natural language processing; optical character recognition; text analysis; English languages; English texts; Oriya texts; bilingual optical character recognition; multiscripts; printed documents; regional languages; Application software; Character recognition; Cleaning; Computer science; Image segmentation; Natural languages; Noise generators; Optical character recognition software; Optical design; Pattern recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Pattern Recognition, 2009. ICAPR '09. Seventh International Conference on
Conference_Location
Kolkata
Print_ISBN
978-1-4244-3335-3
Type
conf
DOI
10.1109/ICAPR.2009.49
Filename
4782818
Link To Document