DocumentCode
604918
Title
An OCR for separation and identification of mixed English — Gujarati digits using kNN classifier
Author
Chaudhari, Shailesh A. ; Gulati, Ravi M.
Author_Institution
Veer Narmad South Gujarat Univ., Surat, India
fYear
2013
fDate
1-2 March 2013
Firstpage
190
Lastpage
193
Abstract
This paper addresses the script identification problem of bilingual printed document images. We propose an OCR system that separates and identify mixed English-Gujarati digits. Here, first the system is trained with standard data samples. Then for testing, data samples are collected from different sources of paper like, news paper, book, magazine, etc. Random sized pre-processed image is normalized to uniform sized image. A statistical approach is used for feature extraction. For classification kNN classifier is used. The model gives average accuracy of 99.26% for Gujarati digits, 99.20% for English digits, and overall accuracy 99.23%.
Keywords
document image processing; natural language processing; optical character recognition; pattern classification; statistical analysis; OCR system; bilingual printed document images; kNN classifier; mixed English Gujarati digits; optical character recognition; script identification problem; standard data samples; statistical approach; uniform sized image; Accuracy; Character recognition; Feature extraction; Image recognition; Optical character recognition software; Support vector machine classification; Normalization; Pre-processing; Vector; etc; kNN Classifier;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems and Signal Processing (ISSP), 2013 International Conference on
Conference_Location
Gujarat
Print_ISBN
978-1-4799-0316-0
Type
conf
DOI
10.1109/ISSP.2013.6526900
Filename
6526900
Link To Document