Title :
Page-level handwritten script identification using modified log-Gabor filter based features
Author :
Singh, Pawan Kumar ; Chatterjee, Iman ; Sarkar, Ram
Author_Institution :
Dept. of Comput. Sci. & Eng., Jadavpur Univ., Kolkata, India
Abstract :
Automatic identification of scripts, an imperative research problem during the last few decades, has posed many challenges in any multi-script environment. As India is a multilingual country, therefore, text documents containing more than one language are very familiar phenomenon here. But to digitize these multi-lingual documents using any Optical Character Recognition (OCR) engine, first it is required to recognize the scripts used to write the same. In this paper, a page-level script identification technique for eight popular handwritten scripts namely, Bangla, Devanagari, Gurumukhi, Oriya, Tamil, Telugu, Urdu along with Roman has been proposed. To start with, Modified log-Gabor filters based texture features are designed from each of the document pages. Then the proposed model is evaluated using multiple classifiers and based on their identification accuracies, it is found that Simple Logistic performs the best. Outcome of the present experiment reveals the usefulness of the Modified log-Gabor filters based features in recognition of handwritten Indic scripts. A total of 240 document pages is used to carry out the present experiment and it yields 95.57% accuracy in identifying the scripts of the documents. Even if the proposed method is assessed on limited dataset, but considering the intricacies of the scripts, the outcome can be assumed reasonably acceptable.
Keywords :
Gabor filters; document image processing; feature extraction; handwritten character recognition; image classification; optical character recognition; text analysis; Bangla; Devanagari; Gurumukhi; OCR engine; Oriya; Roman; Tamil; Telugu; Urdu; automatic script identification; handwritten Indic script recognition; modified log-Gabor filter based texture features; multilingual documents; multiple classifiers; multiscript environment; optical character recognition; page-level handwritten script identification technique; simple logistic; text documents; Feature extraction; Gabor filters; Information filters; Logistics; Optical character recognition software; Visualization; Handwritten Indic scripts; Modified logGabor filter; Optical Character Recognition; Page-level script identification;
Conference_Titel :
Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
Conference_Location :
Kolkata
DOI :
10.1109/ReTIS.2015.7232882