• DocumentCode
    2145202
  • Title

    A Weighted Finite-State Transducer (WFST)-Based Language Model for Online Indic Script Handwriting Recognition

  • Author

    Chowdhury, Suhan ; Garain, Utpal ; Chattopadhyay, Tanushyam

  • Author_Institution
    Comput. Vision & Pattern Recognition (CVPR) Unit, Indian Stat. Inst., Kolkata, India
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    599
  • Lastpage
    602
  • Abstract
    Though designing of classifies for Indic script handwriting recognition has been researched with enough attention, use of language model has so far received little exposure. This paper attempts to develop a weighted finite-state transducer (WFST) based language model for improving the current recognition accuracy. Both the recognition hypothesis (i.e. the segmentation lattice) and the lexicon are modeled as two WFSTs. Concatenation of these two FSTs accept a valid word(s) which is (are) present in the recognition lattice. A third FST called error FST is also introduced to retrieve certain words which were missing in the previous concatenation operation. The proposed model has been tested for online Bangla handwriting recognition though the underlying principle can equally be applied for recognition of offline or printed words. Experiment on a part of ISI-Bangla handwriting database shows that while the present classifiers (without using any language model) can recognize about 73% word, use of recognition and lexicon FSTs improve this result by about 9% giving an average word-level accuracy of 82%. Introduction of error FST further improves this accuracy to 93%. This remarkable improvement in word recognition accuracy by using FST-based language model would serve as a significant revelation for the research in handwriting recognition, in general and Indic script handwriting recognition, in particular.
  • Keywords
    finite state machines; handwriting recognition; natural language processing; transducers; word processing; ISI-Bangla handwriting database; WFST; error FST; online Bangla handwriting recognition; online Indic script handwriting recognition; printed word recognition; recognition accuracy; recognition hypothesis; recognition lattice; third FST; weighted finite-state transducer-based language model; Accuracy; Character recognition; Handwriting recognition; Hidden Markov models; Optical character recognition software; Transducers; Vocabulary; Finite State Transducer (FST); Handwrriting recognition; Indic scripts; Language model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.126
  • Filename
    6065381