مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

1585114

Title :

An OCR system for Telugu

Author :

Negi, Atul ; Bhagvati, Chakravarthy ; Krishna, B.

Author_Institution :

Dept. of Comput. & Inf. Sci., Hyderabad Univ., India

fYear :

2001

fDate :

6/23/1905 12:00:00 AM

Firstpage :

1110

Lastpage :

1114

Abstract :

Telugu is the language spoken by more than 100 million people of South India. Telugu has a complex orthography with a large number of distinct character shapes (estimated to be of the order of 10,000) composed of simple and compound characters formed from 16 vowels (called achchus) and 36 consonants (called hallus). We present an efficient and practical approach to Telugu OCR which limits the number of templates to be recognized to just 370, avoiding issues of classifier design for thousands of shapes or very complex glyph segmentation. A compositional approach using connected components and fringe distance template matching was tested to give a raw OCR accuracy of about 92%. Several experiments across varying fonts and resolutions showed the approach to be satisfactory

Keywords :

character sets; document image processing; image matching; optical character recognition; OCR; Telugu language; character shapes; complex glyph segmentation; complex orthography; compositional approach; connected components; experiments; fonts; fringe distance template matching; optical character recognition; scanned documents; vowels; Character recognition; Information technology; Natural languages; Neural networks; Optical character recognition software; Performance analysis; Shape; Speech recognition; Testing; Text recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on

Conference_Location :

Seattle, WA

Print_ISBN :

0-7695-1263-1

Type :

conf

DOI :

10.1109/ICDAR.2001.953958

Filename :

953958

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1585114