Word-wise Sinhala Tamil and English script identification using Gaussian kernel SVM

Author

Chanda, Sukalpa ; Pal, Srikanta ; Pal, Umapada

Author_Institution

Indian Stat. Inst., Kolkata, India

fYear

2008

fDate

8-11 Dec. 2008

Firstpage

1

Lastpage

4

Abstract

There are many documents in Srilanka where a single document page may contain Sinhala, Tamil and English texts. For OCR development of such a document page, it is better to identify different scripts present in the page and then feed the identified portion to the respective OCR module. In this paper, a SVM based technique is proposed for word-wise identification of Sinhala, Tamil and English scripts from a single document page. Structural features, topological features and water reservoir principle based features are mainly used here for the purpose. From the experiment we obtained encouraging results.

Keywords

Gaussian processes; document image processing; feature extraction; optical character recognition; support vector machines; text analysis; Gaussian kernel SVM; OCR module; document image processing; structural feature; topological feature; water reservoir principle-based feature; word-wise English script identification; word-wise Sinhala Tamil script identification; word-wise Sinhala script identification; Feeds; Kernel; Neural networks; Optical character recognition software; Reservoirs; Structural shapes; Support vector machine classification; Support vector machines; Water resources; Water storage;

fLanguage

English

Publisher

ieee

Conference_Titel

Pattern Recognition, 2008. ICPR 2008. 19th International Conference on

Conference_Location

Tampa, FL

ISSN

1051-4651

Print_ISBN

978-1-4244-2174-9

Electronic_ISBN

1051-4651

Type

conf

DOI

10.1109/ICPR.2008.4761823

Filename

4761823