• DocumentCode
    117630
  • Title

    Structural feature based approach for script identification from printed Indian document

  • Author

    Obaidullah, Sk Md ; Mondal, Aniruddha ; Roy, Kaushik

  • Author_Institution
    Dept. of Comput. Sc. & Eng., Aliah Univ., Kolkata, India
  • fYear
    2014
  • fDate
    20-21 Feb. 2014
  • Firstpage
    120
  • Lastpage
    124
  • Abstract
    Script identification is a complex real life problem for automation of printed or handwritten document processing. The task becomes more challenging when it comes about a multi script/lingual country like India. For the development of OCR for a particular language the script needs to be identified first. That is why development of a script identification system is a pressing need. Till date no such work is available considering all 13 official Indian scripts. In this paper we present a scheme for script identification from printed document for 10 official Indian scripts namely Bangla, Devnagari, Roman, Oriya, Urdu, Gujarati, Telegu, Kannada, Malayalam and Kashmiri. Total 459 document pages are considered and 62 dimensional feature set is computed for the present work. Finally using simple logistic classifier with 5 fold cross validation an average identification rate of 98.9% is found.
  • Keywords
    document image processing; handwriting recognition; natural language processing; optical character recognition; Bangla; Devnagari; Gujarati; India; Indian scripts; Kannada; Kashmiri; Malayalam; Oriya; Roman; Telegu; Urdu; document pages; handwritten document processing; multiscript-lingual country; optical character recognition; printed Indian document; printed document processing; script identification; script identification system; structural feature based approach; Computers; Databases; Educational institutions; Feature extraction; Logistics; Optical character recognition software; Signal processing; Feature Set; OCR; Printed Script Identification; Simple Logistic Classifier;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Integrated Networks (SPIN), 2014 International Conference on
  • Conference_Location
    Noida
  • Print_ISBN
    978-1-4799-2865-1
  • Type

    conf

  • DOI
    10.1109/SPIN.2014.6776933
  • Filename
    6776933