• DocumentCode
    3122352
  • Title

    Distant BI-Gram model, collocation, and their applications in post-processing for Chinese character recognition

  • Author

    Xu, Rui-Feng ; Lu, Qin ; Yeung, Daniel S. ; Wang, Xi-Zhao

  • Author_Institution
    Department of Computing, Hong Kong Polytech. Univ., China
  • Volume
    4
  • fYear
    2002
  • fDate
    4-5 Nov. 2002
  • Firstpage
    2251
  • Abstract
    In this paper, we present a distant BI-Gram model, which extended the regular BI-Gram model by considering the distance information and weight parameters, in order to describe the long-distance restrictions among the Chinese sentence. The extraction of the statistical information and weight parameters of this language model is discussed. Based on this work, the word combination strength and spread are employed to extract the recurrent word combinations, i.e. collocations. The distant BI-Gram model and collocation are applied to a statistic-based post-processing system for improving the recognition performance of Chinese characters. The experimental results show that by employing these two language models, the post-processing system achieves a higher improvement performance.
  • Keywords
    character recognition; natural languages; probability; statistical analysis; Chinese character recognition; Chinese sentence; collocations; distant BI-Gram model; long-distance restrictions; natural language-processing; probability; recurrent word; statistical information; weight parameters; Application software; Character recognition; Cybernetics; Data mining; Databases; Handwriting recognition; Machine learning; Natural languages; Probability; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
  • Print_ISBN
    0-7803-7508-4
  • Type

    conf

  • DOI
    10.1109/ICMLC.2002.1175440
  • Filename
    1175440