DocumentCode
442052
Title
Chinese POS tagging based on bilexical co-occurrences
Author
Cao, Hai-Long ; Zhao, Tie-jun ; Li, Sheng ; Sun, Jun ; Zhang, Chun-xiang
Author_Institution
MOE-MS Key Lab. of Natural Language Process. & Speech, Harbin Inst. of Technol., China
Volume
6
fYear
2005
fDate
18-21 Aug. 2005
Firstpage
3766
Abstract
Chinese part of speech tagging is the basis of Chinese information processing. This paper proposes a method based on bilexical co-occurrences to tag Chinese text. The standard hidden Markov model assumes the transition between states (part of speech) is independent of the observation (word) sequence and the generation of a new observation is independent of other observations. In fact, Chinese text does not satisfy this assumption. Based on hidden Markov model, the effect of the words in the context on the decision of part of speech is also considered. The discriminative ability of the model is improved. Deleted interpolation is utilized to mitigate the data sparseness problem. We evaluate the proposed model on PFR China Daily corpus. The tagging accuracy is 99.09% on closed test and 96.37% on open test.
Keywords
hidden Markov models; natural languages; speech processing; text analysis; Chinese POS tagging; Chinese information processing; PFR China Daily corpus; bilexical co-occurrences; hidden Markov model; speech tagging; Entropy; Hidden Markov models; Information processing; Laboratories; Natural language processing; Speech processing; Statistical analysis; Sun; Tagging; Testing; Bilexical Co-occurrences; Chinese Information Processing; Chinese POS tagging;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location
Guangzhou, China
Print_ISBN
0-7803-9091-1
Type
conf
DOI
10.1109/ICMLC.2005.1527595
Filename
1527595
Link To Document