DocumentCode :
424251
Title :
Chinese unknown word identification as known word tagging
Author :
Fu, Guo-Hong ; Luke, Kang-Kwong
Author_Institution :
Dept. of Linguistics, Hong Kong Univ., China
Volume :
4
fYear :
2004
fDate :
26-29 Aug. 2004
Firstpage :
2612
Abstract :
This work presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.
Keywords :
character recognition; computational linguistics; hidden Markov models; natural languages; word processing; Chinese unknown word identification; known word tagging; lexicalization technique; lexicalized HMM; lexicalized hidden Markov model; statistical tagger; word-formation patterns; Context modeling; Dictionaries; Hidden Markov models; Machine learning; System testing; Tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN :
0-7803-8403-2
Type :
conf
DOI :
10.1109/ICMLC.2004.1382245
Filename :
1382245
Link To Document :
بازگشت