DocumentCode :
1994148
Title :
A bilingual OCR for Hindi-Telugu documents and its applications
Author :
Jawahar, C.V. ; Pavan Kumar, M.N.S.S.K. ; Kiran, S S Ravi
Author_Institution :
Centre for Visual Inf. Technol., Int. Inst. of Inf. Technol., Hyderabad, India
fYear :
2003
fDate :
3-6 Aug. 2003
Firstpage :
408
Abstract :
This paper describes the character recognition process from printed documents containing Hindi and Telugu text. Hindi and Telugu are among the most popular languages in India. The bilingual recognizer is based on Principal Component Analysis followed by support vector classification. This attains an overall accuracy of approximately 96.7%. Extensive experimentation is carried out on an independent test set of approximately 200000 characters. Applications based on this OCR are sketched.
Keywords :
character sets; document image processing; linguistics; natural language interfaces; optical character recognition; Hindi text; Hindi-Telugu documents; Indian languages; Indian scripts; OCR engine; Principal Component Analysis; SVM PCA based OCR; Telugu text; bilingual OCR; bilingual recognizer; character recognition; document images; support vector classification; Application software; Character recognition; Focusing; Heart; Image recognition; Information technology; Natural languages; Optical character recognition software; Principal component analysis; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN :
0-7695-1960-1
Type :
conf
DOI :
10.1109/ICDAR.2003.1227699
Filename :
1227699
Link To Document :
بازگشت