DocumentCode :
1634408
Title :
Two-stage Approach for Word-wise Script Identification
Author :
Chanda, Sukalpa ; Pal, Srikanta ; Franke, Kartin ; Pal, Umapada
Author_Institution :
Dept. of Comput. Sci. & Media Technol., Gjovik Univ. Coll., Gjovik, Norway
fYear :
2009
Firstpage :
926
Lastpage :
930
Abstract :
A two-stage approach for word-wise identification of English (Roman), Devnagari and Bengali (Bangla) scripts is proposed. This approach balances the tradeoff between recognition accuracy and processing speed. The 1st stage allows identifying scripts with high speed, yet less accuracy when dealing with noisy data. The advanced 2nd stage processes only those samples that yield low recognition confidence in the first stage. For both stages a rough character segmentation is performed and features are computed on segmented character components. Features used in the 1st stage are a 64-dimensional chain-code-histogram feature, while 400-dimensional gradient features are used in the 2nd stage. Final classification of a word to a particular script is done via majority voting of each recognized character component of the word. Extensive experiments with various confidence scores were conducted and reported here. The overall recognition accuracy and speed is remarkable. Correct classification of 98.51% on 11,123 test words is achieved, even when the recognition-confidence is as high as 95% at both stages.
Keywords :
document image processing; image classification; image segmentation; natural languages; optical character recognition; statistical analysis; Bengali script; Devnagari script; English script; chain-code-histogram feature; noisy data; optical character recognition; recognition accuracy; rough character segmentation; word classification; word-wise script identification; Automatic testing; Character recognition; Computer science; Educational institutions; Information analysis; Information security; Natural languages; Optical character recognition software; Text analysis; Voting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
ISSN :
1520-5363
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2009.239
Filename :
5277552
Link To Document :
بازگشت