• DocumentCode
    2352919
  • Title

    Creation of data resources and design of an evaluation test bed for Devanagari script recognition

  • Author

    Setlur, Srirangaraj ; Kompalli, Suryaprakash ; Ramanaprasad, Vemulapati ; Govindaraju, Venugopal

  • Author_Institution
    Center of Excellence for Document Anal. & Recognition, Buffalo Univ., NY, USA
  • fYear
    2003
  • fDate
    10-11 March 2003
  • Firstpage
    55
  • Lastpage
    61
  • Abstract
    The Indian subcontinent has a large number of languages, dialects, and scripts with the Devanagari script being the primary and most widely used of all the scripts. To date, much of the Devanagari optical character recognition (OCR) research has been restricted to a handful of groups. So, techniques have not yet been widely disseminated or evaluated independently and automated evaluation tools are currently not available for lack of a standard representation of ground-truth and result data. A key reason for the absence of sustained research efforts in off-line Devanagari OCR appears to be the paucity of data resources. Ground truthed data for words and characters, on-line dictionaries, corpora of text documents and reliable, standardized statistical analyses and evaluation tools are currently lacking. So, the creation of such data resources will undoubtedly provide a much needed fillip to researchers working on Devanagari OCR. This paper describes a National Science Foundation sponsored project under the International Digital Libraries program to create data resources that will facilitate development of Devanagari OCR technology and provide a standardized test bed and evaluation tools for Devanagari script recognition.
  • Keywords
    information resources; natural languages; optical character recognition; Devanagari OCR; Devanagari script recognition; International Digital Libraries; National Science Foundation; OCR research; data resource creation; data resources; evaluation tools; ground truthed data; on-line dictionaries; optical character recognition; standardized test bed; statistical analysis; text documents; Character recognition; Dictionaries; Natural languages; Optical character recognition software; Shape; Software libraries; Statistical analysis; Testing; Text analysis; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Research Issues in Data Engineering: Multi-lingual Information Management, 2003. RIDE-MLIM 2003. Proceedings. 13th International Workshop on
  • ISSN
    1066-1395
  • Print_ISBN
    0-7803-7868-7
  • Type

    conf

  • DOI
    10.1109/RIDE.2003.1249846
  • Filename
    1249846