• DocumentCode
    3317728
  • Title

    POC-NLW Template Based Tagging Method for Chinese Word Segmentation

  • Author

    Chen, Bo ; He, Hui ; Xu, Weiran ; Guo, Jun

  • Author_Institution
    Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
  • Volume
    2
  • fYear
    2006
  • fDate
    3-6 Nov. 2006
  • Firstpage
    1423
  • Lastpage
    1428
  • Abstract
    In Chinese word segmentation, disambiguation and unknown words identification are becoming the two key issues. In this paper, a two-stage strategy based system is constructed to deal with these problems. First, an n-gram based model is applied to do the basic segmentation as well as disambiguation in some extent. Then, in the second stage, a language tagging template, named POC-NLW, is adopted to carry out a character sequence tagging procedure based on hidden Markov model, which is used to refine the results from the first stage and to identify unknown words. Several detailed experiments have been implemented on the SIGHAN Bakeoff 2005 corpus. Experimental results show that the method can achieve high accuracy on word segmentation, as well as on unknown words identification, with appreciable processing efficiency. This method is characterized by the good interoperability and expansionary over different kinds of unknown words, thus it is applicable for practical Chinese information processing applications
  • Keywords
    hidden Markov models; natural language processing; text analysis; Chinese information processing; Chinese word segmentation; POC-NLW template based tagging; character sequence tagging; hidden Markov model; n-gram based model; unknown word identification; word disambiguation; Dictionaries; Helium; Hidden Markov models; Information processing; Natural languages; Statistics; Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Security, 2006 International Conference on
  • Conference_Location
    Guangzhou
  • Print_ISBN
    1-4244-0605-6
  • Electronic_ISBN
    1-4244-0605-6
  • Type

    conf

  • DOI
    10.1109/ICCIAS.2006.295295
  • Filename
    4076201