DocumentCode
2016831
Title
Page-level script identification from multi-script handwritten documents
Author
Singh, Pawan Kumar ; Dalal, Santu Kumar ; Sarkar, Ram ; Nasipuri, Mita
Author_Institution
Dept. of Comput. Sci. & Eng., Jadvapur Univ., Kolkata, India
fYear
2015
fDate
7-8 Feb. 2015
Firstpage
1
Lastpage
6
Abstract
Script identification has long been the forerunner of many Optical Character Recognition (OCR) processes in a multi-lingual document environment. Script identification has numerous applications in the field of document image analysis, such as document sorting, indexing, retrieval and translation, etc. In this paper, we have developed a page-level script identification technique for handwritten documents using the texture features. The texture features are extracted from the document pages based on the Gray Level Co-occurrence Matrix (GLCM). The proposed technique has been evaluated on four scripts namely, Bangla, Devnagari, Telugu, and Roman using multiple classifiers. Based on their identification accuracies, it is observed that Multi Layer Perceptron (MLP) classifier performs the best. The experimental results demonstrate the effectiveness of the GLCM features in identification of handwritten scripts. Experiments are conducted on a total of 120 document pages and the overall accuracy of the system is found to be 91.48%. Though the system is evaluated on limited dataset, considering the complexities of the scripts, the result can be assumed satisfactory.
Keywords
document image processing; feature extraction; identification; image classification; image texture; matrix algebra; multilayer perceptrons; optical character recognition; GLCM; MLP classifier; OCR; document image analysis; gray level cooccurrence matrix; multilayer perceptron; multiscript handwritten document; optical character recognition; page-level script identification; texture feature extraction; Accuracy; Feature extraction; Image analysis; Optical character recognition software; Optical imaging; Symmetric matrices; Text analysis; Gray Level Cooccurrence Matrix; Handwritten Indian scripts; Optical Character Recognition; Page-level script identification;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer, Communication, Control and Information Technology (C3IT), 2015 Third International Conference on
Conference_Location
Hooghly
Print_ISBN
978-1-4799-4446-0
Type
conf
DOI
10.1109/C3IT.2015.7060113
Filename
7060113
Link To Document