Title :
Improving Chinese Chunking with Enriched Statistical and Morphological Knowledge
Author :
Yao, Limin ; Li, Mu ; Huang, Changning
Author_Institution :
Tsinghua Univ., Beijing
fDate :
Aug. 30 2007-Sept. 1 2007
Abstract :
In this paper, we address the issue of improving a Chinese chunking system with rich lexicalized information. A method that incorporates statistical information based on distributional similarity between words obtained from large unlabeled corpus and morphological knowledge into a state-of-the-art CRF-based chunking model is proposed to tackle the data sparseness problem given limited amount of labeled training data. Evaluations are performed on the latest release of Chinese Treebank, and experimental results show that our method outperforms the chunking models based on features over word and automatically assigned POS tags when using the same amount of training data.
Keywords :
natural language processing; random processes; statistical analysis; Chinese Treebank; Chinese chunking; conditional random field model; data sparseness problem; morphological knowledge; statistical knowledge; Asia; Chromium; Data mining; Lead; Natural languages; Tagging; Training data; Tree data structures;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1611-0
Electronic_ISBN :
978-1-4244-1611-0
DOI :
10.1109/NLPKE.2007.4368026