• DocumentCode
    2029463
  • Title

    An empirical study of statistical language models for contextual post-processing of Chinese script recognition

  • Author

    Li, Yuan-Xiang ; Tan, Chew Lim

  • Author_Institution
    Sch. of Comput., National Univ. of Singapore, Singapore
  • fYear
    2004
  • fDate
    26-29 Oct. 2004
  • Firstpage
    257
  • Lastpage
    262
  • Abstract
    It is crucial to use statistical language models (LM) to improve the accuracy of Chinese offline script recognition. In this paper, we investigate the influence of several LM on the contextual post-processing performance of Chinese script recognition. We first introduce seven LM, i.e., three conventional LM (character-based bigram, character-based trigram, word-based bigram), two class-based bigram LM and two hybrid bigram LM combining word-based bigrams and class-based bigrams. We then investigate how the LM perplexities are affected by training corpus size, smoothing methods and count cutoffs. Next, we demonstrate the above LM influence on the post-processing performance in terms of recognition accuracy, memory requirement and processing speed. Finally, we give a proposal to select a suitable LM in real recognition tasks.
  • Keywords
    character recognition; context-sensitive languages; natural languages; statistical analysis; Chinese script recognition; character-based bigram; character-based trigram; contextual post-processing; statistical language models; word-based bigram; Character recognition; Context modeling; Handwriting recognition; Image recognition; Natural languages; Pattern recognition; Proposals; Shape; Smoothing methods; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition, 2004. IWFHR-9 2004. Ninth International Workshop on
  • ISSN
    1550-5235
  • Print_ISBN
    0-7695-2187-8
  • Type

    conf

  • DOI
    10.1109/IWFHR.2004.15
  • Filename
    1363920