DocumentCode :
2347842
Title :
A morphology-based Chinese word segmentation method
Author :
Lin, Xiaojun ; Zhao, Liang ; Zhang, Meng ; Wu, Xihong
Author_Institution :
Key Lab. of Machine Perception & Intell., Speech & Hearing Res. Center, Peking Univ., Beijing, China
fYear :
2010
fDate :
21-23 Aug. 2010
Firstpage :
1
Lastpage :
5
Abstract :
This paper proposes a novel method of Chinese word segmentation utilizing morphology information. The method introduces morphology into statistical model to capture structural relationship within word. It improves the conventional Conditional Random Fields (CRFs) models on the ability of representing the structure information. Firstly, a word-segmented Chinese corpus is annotated with morphology tags by a semi-automatic method. The resulting structure-related tags are integrated into the CRFs model. Secondly, a joint CRFs model is trained, which generates both morphology tags and word boundaries. Experiments are carried out on several SIGHAN Bakeoff corpus and show that the morphology information can improve the performance of Chinese word segmentation significantly, especially for the segmentation of out-of-vocabulary words.
Keywords :
computational linguistics; learning (artificial intelligence); natural language processing; statistical analysis; text analysis; SIGHAN bakeoff corpus; conditional random fields; morphology-based Chinese word segmentation method; statistical model; Morphology; Testing; Training; Chinese word segmentation; Morphology; conditional random fields; words out of vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
Type :
conf
DOI :
10.1109/NLPKE.2010.5587786
Filename :
5587786
Link To Document :
بازگشت