DocumentCode :
2735814
Title :
Automatic Identification of Chinese Multiword Chunk Based on CRF
Author :
Li, Ru ; Zhong, Lijun ; Li, Shuanghong ; Zhang, Zezheng
Author_Institution :
Sch. of Comput. & Inf. Technol., Shanxi Univ., Taiyuan, China
Volume :
3
fYear :
2010
fDate :
Aug. 31 2010-Sept. 3 2010
Firstpage :
174
Lastpage :
177
Abstract :
Identifying the Chinese multiword chunk automatically is a newly emerged technology in the NLP field. As anew strategy, it can effectively improve the performance of the syntactic parsing. The work follows the standard description system of Chinese multiword chunk and has constructed two tag sequence models based on CRF model, which are named as ”the syntactic mark tagging list model” and ”the sequence mark tagging list model” respectively. The corpus used in the training process is called as ”the Chinese multiword chunk bank”, which is provided by Tsinghua University. In the experiments, by selecting appropriate the features and introducing some important rules, the better results are achieved and this system for identifying the Chinese multiword chunk can run well in a restricted area. Thus, it provides a bridge between syntax and semantic content.
Keywords :
grammars; natural language processing; random processes; word processing; CRF model; Chinese multiword chunk; NLP field; Tsinghua University; conditional random field; sequence mark tagging list model; syntactic mark tagging list model; syntactic parsing; tag sequence model; Periodic structures; Semantics; Syntactics; Tagging; Topology; Training; Vocabulary; CRF; Multiword; chunk parsing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
Conference_Location :
Toronto, ON
Print_ISBN :
978-1-4244-8482-9
Electronic_ISBN :
978-0-7695-4191-4
Type :
conf
DOI :
10.1109/WI-IAT.2010.158
Filename :
5614356
Link To Document :
بازگشت