• DocumentCode
    2727395
  • Title

    An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents

  • Author

    Mohanty, Sanghamitra ; Dasbebartta, Himadri Nandini ; Behera, Tarun Kumar

  • Author_Institution
    Dept. of Comput. Sci. & Applic., Utkal Univ., Bhubaneswar
  • fYear
    2009
  • fDate
    4-6 Feb. 2009
  • Firstpage
    398
  • Lastpage
    401
  • Abstract
    Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (optical character recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.
  • Keywords
    natural language processing; optical character recognition; text analysis; English languages; English texts; Oriya texts; bilingual optical character recognition; multiscripts; printed documents; regional languages; Application software; Character recognition; Cleaning; Computer science; Image segmentation; Natural languages; Noise generators; Optical character recognition software; Optical design; Pattern recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Pattern Recognition, 2009. ICAPR '09. Seventh International Conference on
  • Conference_Location
    Kolkata
  • Print_ISBN
    978-1-4244-3335-3
  • Type

    conf

  • DOI
    10.1109/ICAPR.2009.49
  • Filename
    4782818