DocumentCode :
442052
Title :
Chinese POS tagging based on bilexical co-occurrences
Author :
Cao, Hai-Long ; Zhao, Tie-jun ; Li, Sheng ; Sun, Jun ; Zhang, Chun-xiang
Author_Institution :
MOE-MS Key Lab. of Natural Language Process. & Speech, Harbin Inst. of Technol., China
Volume :
6
fYear :
2005
fDate :
18-21 Aug. 2005
Firstpage :
3766
Abstract :
Chinese part of speech tagging is the basis of Chinese information processing. This paper proposes a method based on bilexical co-occurrences to tag Chinese text. The standard hidden Markov model assumes the transition between states (part of speech) is independent of the observation (word) sequence and the generation of a new observation is independent of other observations. In fact, Chinese text does not satisfy this assumption. Based on hidden Markov model, the effect of the words in the context on the decision of part of speech is also considered. The discriminative ability of the model is improved. Deleted interpolation is utilized to mitigate the data sparseness problem. We evaluate the proposed model on PFR China Daily corpus. The tagging accuracy is 99.09% on closed test and 96.37% on open test.
Keywords :
hidden Markov models; natural languages; speech processing; text analysis; Chinese POS tagging; Chinese information processing; PFR China Daily corpus; bilexical co-occurrences; hidden Markov model; speech tagging; Entropy; Hidden Markov models; Information processing; Laboratories; Natural language processing; Speech processing; Statistical analysis; Sun; Tagging; Testing; Bilexical Co-occurrences; Chinese Information Processing; Chinese POS tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
Type :
conf
DOI :
10.1109/ICMLC.2005.1527595
Filename :
1527595
Link To Document :
بازگشت