DocumentCode :
1909541
Title :
Improving Chinese Chunking with Enriched Statistical and Morphological Knowledge
Author :
Yao, Limin ; Li, Mu ; Huang, Changning
Author_Institution :
Tsinghua Univ., Beijing
fYear :
2007
fDate :
Aug. 30 2007-Sept. 1 2007
Firstpage :
149
Lastpage :
156
Abstract :
In this paper, we address the issue of improving a Chinese chunking system with rich lexicalized information. A method that incorporates statistical information based on distributional similarity between words obtained from large unlabeled corpus and morphological knowledge into a state-of-the-art CRF-based chunking model is proposed to tackle the data sparseness problem given limited amount of labeled training data. Evaluations are performed on the latest release of Chinese Treebank, and experimental results show that our method outperforms the chunking models based on features over word and automatically assigned POS tags when using the same amount of training data.
Keywords :
natural language processing; random processes; statistical analysis; Chinese Treebank; Chinese chunking; conditional random field model; data sparseness problem; morphological knowledge; statistical knowledge; Asia; Chromium; Data mining; Lead; Natural languages; Tagging; Training data; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1611-0
Electronic_ISBN :
978-1-4244-1611-0
Type :
conf
DOI :
10.1109/NLPKE.2007.4368026
Filename :
4368026
Link To Document :
بازگشت