DocumentCode :
2489554
Title :
Word-wise Sinhala Tamil and English script identification using Gaussian kernel SVM
Author :
Chanda, Sukalpa ; Pal, Srikanta ; Pal, Umapada
Author_Institution :
Indian Stat. Inst., Kolkata, India
fYear :
2008
fDate :
8-11 Dec. 2008
Firstpage :
1
Lastpage :
4
Abstract :
There are many documents in Srilanka where a single document page may contain Sinhala, Tamil and English texts. For OCR development of such a document page, it is better to identify different scripts present in the page and then feed the identified portion to the respective OCR module. In this paper, a SVM based technique is proposed for word-wise identification of Sinhala, Tamil and English scripts from a single document page. Structural features, topological features and water reservoir principle based features are mainly used here for the purpose. From the experiment we obtained encouraging results.
Keywords :
Gaussian processes; document image processing; feature extraction; optical character recognition; support vector machines; text analysis; Gaussian kernel SVM; OCR module; document image processing; structural feature; topological feature; water reservoir principle-based feature; word-wise English script identification; word-wise Sinhala Tamil script identification; word-wise Sinhala script identification; Feeds; Kernel; Neural networks; Optical character recognition software; Reservoirs; Structural shapes; Support vector machine classification; Support vector machines; Water resources; Water storage;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location :
Tampa, FL
ISSN :
1051-4651
Print_ISBN :
978-1-4244-2174-9
Electronic_ISBN :
1051-4651
Type :
conf
DOI :
10.1109/ICPR.2008.4761823
Filename :
4761823
Link To Document :
بازگشت