• DocumentCode
    1398019
  • Title

    HMM-Based Lexicon-Driven and Lexicon-Free Word Recognition for Online Handwritten Indic Scripts

  • Author

    Bharath, A. ; Madhvanath, Sriganesh

  • Author_Institution
    Hewlett-Packard Labs., Bangalore, India
  • Volume
    34
  • Issue
    4
  • fYear
    2012
  • fDate
    4/1/2012 12:00:00 AM
  • Firstpage
    670
  • Lastpage
    682
  • Abstract
    Research for recognizing online handwritten words in Indic scripts is at its early stages when compared to Latin and Oriental scripts. In this paper, we address this problem specifically for two major Indic scripts-Devanagari and Tamil. In contrast to previous approaches, the techniques we propose are largely data driven and script independent. We propose two different techniques for word recognition based on Hidden Markov Models (HMM): lexicon driven and lexicon free. The lexicon-driven technique models each word in the lexicon as a sequence of symbol HMMs according to a standard symbol writing order derived from the phonetic representation. The lexicon-free technique uses a novel Bag-of-Symbols representation of the handwritten word that is independent of symbol order and allows rapid pruning of the lexicon. On handwritten Devanagari word samples featuring both standard and nonstandard symbol writing orders, a combination of lexicon-driven and lexicon-free recognizers significantly outperforms either of them used in isolation. In contrast, most Tamil word samples feature the standard symbol order, and the lexicon-driven recognizer outperforms the lexicon free one as well as their combination. The best recognition accuracies obtained for 20,000 word lexicons are 87.13 percent for Devanagari when the two recognizers are combined, and 91.8 percent for Tamil using the lexicon-driven technique.
  • Keywords
    handwritten character recognition; hidden Markov models; image representation; natural language processing; HMM based lexicon driven word recognition; Latin scripts; Tamil word samples; bag-of-symbols representation; handwritten Devanagari word; hidden Markov models; lexicon free word recognition; online handwritten Indic scripts; oriental scripts; phonetic representation; symbol writing orders; Character recognition; Feature extraction; Handwriting recognition; Hidden Markov models; Ink; Shape; Writing; Devanagari; Online handwriting recognition; Tamil.; bag of symbols; lexicon driven; lexicon free; symbol order variation; word recognition; Algorithms; Automatic Data Processing; Databases, Factual; Handwriting; India; Internet; Markov Chains; Pattern Recognition, Automated; Reading;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2011.234
  • Filename
    6104057