Title :
Integrate statistical model and lexical knowledge for Chinese multiword chunking
Author :
Zhou, Qiang ; Yu, Hang
Author_Institution :
Centre for Speech & Language Technol., Tsinghua Univ., Beijing
Abstract :
Multiword chunking is designed as a shallow parsing technique to recognize external constituent and internal relation tags of a chunk in sentence. In this paper, we propose a new solution to deal with this problem. We design a new relation tagging scheme to represent different intra-chunk relations and make several experiments of feature engineering to select a best baseline statistical model. We also apply outside knowledge from a large-scale lexical relationship knowledge base to improve parsing performance. By integrating all above techniques, we develop a new Chinese MWC parser. Experimental results show its parsing performance can greatly exceed the rule-based parser trained and tested in the same data set.
Keywords :
knowledge based systems; natural language processing; Chinese multiword chunking; intrachunk relations; large-scale lexical relationship knowledge; relation tagging scheme; rule-based parser; shallow parsing technique; Design engineering; Information science; Labeling; Laboratories; Large-scale systems; Natural languages; Speech; Tagging; Technological innovation; Testing; Multiword chunking; Outside lexical knowledge base; Partial parsing; Relation tagging scheme;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-4515-8
Electronic_ISBN :
978-1-4244-2780-2
DOI :
10.1109/NLPKE.2008.4906765