• DocumentCode
    442052
  • Title

    Chinese POS tagging based on bilexical co-occurrences

  • Author

    Cao, Hai-Long ; Zhao, Tie-jun ; Li, Sheng ; Sun, Jun ; Zhang, Chun-xiang

  • Author_Institution
    MOE-MS Key Lab. of Natural Language Process. & Speech, Harbin Inst. of Technol., China
  • Volume
    6
  • fYear
    2005
  • fDate
    18-21 Aug. 2005
  • Firstpage
    3766
  • Abstract
    Chinese part of speech tagging is the basis of Chinese information processing. This paper proposes a method based on bilexical co-occurrences to tag Chinese text. The standard hidden Markov model assumes the transition between states (part of speech) is independent of the observation (word) sequence and the generation of a new observation is independent of other observations. In fact, Chinese text does not satisfy this assumption. Based on hidden Markov model, the effect of the words in the context on the decision of part of speech is also considered. The discriminative ability of the model is improved. Deleted interpolation is utilized to mitigate the data sparseness problem. We evaluate the proposed model on PFR China Daily corpus. The tagging accuracy is 99.09% on closed test and 96.37% on open test.
  • Keywords
    hidden Markov models; natural languages; speech processing; text analysis; Chinese POS tagging; Chinese information processing; PFR China Daily corpus; bilexical co-occurrences; hidden Markov model; speech tagging; Entropy; Hidden Markov models; Information processing; Laboratories; Natural language processing; Speech processing; Statistical analysis; Sun; Tagging; Testing; Bilexical Co-occurrences; Chinese Information Processing; Chinese POS tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
  • Conference_Location
    Guangzhou, China
  • Print_ISBN
    0-7803-9091-1
  • Type

    conf

  • DOI
    10.1109/ICMLC.2005.1527595
  • Filename
    1527595