• DocumentCode
    604918
  • Title

    An OCR for separation and identification of mixed English — Gujarati digits using kNN classifier

  • Author

    Chaudhari, Shailesh A. ; Gulati, Ravi M.

  • Author_Institution
    Veer Narmad South Gujarat Univ., Surat, India
  • fYear
    2013
  • fDate
    1-2 March 2013
  • Firstpage
    190
  • Lastpage
    193
  • Abstract
    This paper addresses the script identification problem of bilingual printed document images. We propose an OCR system that separates and identify mixed English-Gujarati digits. Here, first the system is trained with standard data samples. Then for testing, data samples are collected from different sources of paper like, news paper, book, magazine, etc. Random sized pre-processed image is normalized to uniform sized image. A statistical approach is used for feature extraction. For classification kNN classifier is used. The model gives average accuracy of 99.26% for Gujarati digits, 99.20% for English digits, and overall accuracy 99.23%.
  • Keywords
    document image processing; natural language processing; optical character recognition; pattern classification; statistical analysis; OCR system; bilingual printed document images; kNN classifier; mixed English Gujarati digits; optical character recognition; script identification problem; standard data samples; statistical approach; uniform sized image; Accuracy; Character recognition; Feature extraction; Image recognition; Optical character recognition software; Support vector machine classification; Normalization; Pre-processing; Vector; etc; kNN Classifier;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems and Signal Processing (ISSP), 2013 International Conference on
  • Conference_Location
    Gujarat
  • Print_ISBN
    978-1-4799-0316-0
  • Type

    conf

  • DOI
    10.1109/ISSP.2013.6526900
  • Filename
    6526900