• DocumentCode
    2198359
  • Title

    Creation of a Huge Annotated Database for Tamil and Kannada OHR

  • Author

    Nethravathi, B. ; Archana, C.P. ; Shashikiran, K. ; Ramakrishnan, A.G. ; Kumar, Vijay

  • Author_Institution
    Dept. of Electr. Eng., Indian Inst. of Sci. (IISc), Bangalore, India
  • fYear
    2010
  • fDate
    16-18 Nov. 2010
  • Firstpage
    415
  • Lastpage
    420
  • Abstract
    This paper describes the efforts at MILE lab, IISc, to create a 100,000-word database each in Kannada and Tamil for the design and development of Online Handwritten Recognition. It has been collected from over 600 users in order to capture the variations in writing style. We describe features of the scripts and how the number of symbols were reduced to be able to effectively train the data for recognition. The list of words include all the characters, Kannada and Indo-Arabic numerals, punctuations and other symbols. A semi-automated tool for the annotation of data from stroke to word level is used. It segments each word into stroke groups and also acts as a validation mechanism for segmentation. The tool displays the stroke, stroke groups and aksharas of a word and hence can be used to study the various styles of writing, delayed strokes and for assigning quality tags to the words. The tool is currently being used for annotating Tamil and Kannada data. The output is stored in a standard XML format.
  • Keywords
    XML; handwritten character recognition; image segmentation; natural language processing; very large databases; Indo-Arabic numerals; Kannada OHR; Kannada numeral; Tamil OHR; XML format; huge annotated database creation; online handwritten recognition; semi-automated data annotation tool; Annotation; Kannada handwriting; OHR database; Online character database; Tamil handwriting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on
  • Conference_Location
    Kolkata
  • Print_ISBN
    978-1-4244-8353-2
  • Type

    conf

  • DOI
    10.1109/ICFHR.2010.71
  • Filename
    5693599