• DocumentCode
    1820962
  • Title

    Transform based approach for Indic script identification from handwritten document images

  • Author

    Obaidullah, Sk Md ; Karim, Rownaqul ; Shaikh, Sujal ; Halder, Chayan ; Das, Nibaran ; Roy, Kaushik

  • Author_Institution
    Aliah Univ., Kolkata, India
  • fYear
    2015
  • fDate
    26-28 March 2015
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    In a multi-script country like India script identification from document images is an essential step before choosing appropriate script specific OCR (Optical Character Recognizer). The problem of handwritten script identification is more challenging compared to printed one due to uneven variations with respect to writers, time, content etc. Increasing efforts are coming day by day from document image processing researchers to develop standard techniques for Indic script identification. But most of the works is found to be considering printed script document images. In this paper a simple, robust and segmentation free technique based on different image transform methods and statistical features to identify any one of the four popular Indic scripts namely Bangla, Roman, Devanagari and Oriya is proposed. A dataset of total 101 handwritten document images comprising of more than 11000 words and 1300 lines with almost equal distribution of each type of scripts are built, which were collected from different writers with varying age, sex and educational qualification. On experimentation, an average accuracy rate of 88.1% is found for Four-scripts combination by MLP (Multilayer Perceptron) classifier after five fold cross validation. The average Tri-Scripts and Bi-Scripts accuracy are found to be 89.7% and 94.3% respectively. The outcome of this work is really impressive considering inherent complexities of handwritten Indic scripts.
  • Keywords
    document image processing; handwritten character recognition; image classification; image segmentation; multilayer perceptrons; natural language processing; optical character recognition; transforms; Bangla; Devanagari; India script identification; Indic script identification; MLP classifier; OCR; Oriya; Roman; bi-scripts accuracy; document image processing researcher; handwritten document image; handwritten script identification; image transform method; multilayer perceptron classifier; multiscript country; optical character recognizer; printed script document image; segmentation free technique; statistical feature; transform based approach; tri-scripts accuracy; Discrete cosine transforms; Encoding; Euclidean distance; Handwriting recognition; Image recognition; Image segmentation; Optical imaging; Handwritten Script Identification; Image Transform; MLP Classifier; OCR;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing, Communication and Networking (ICSCN), 2015 3rd International Conference on
  • Conference_Location
    Chennai
  • Print_ISBN
    978-1-4673-6822-3
  • Type

    conf

  • DOI
    10.1109/ICSCN.2015.7219852
  • Filename
    7219852