مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic Identification of Chinese Multiword Chunk Based on CRF

DocumentCode :

2735814

Title :

Automatic Identification of Chinese Multiword Chunk Based on CRF

Author :

Li, Ru ; Zhong, Lijun ; Li, Shuanghong ; Zhang, Zezheng

Author_Institution :

Sch. of Comput. & Inf. Technol., Shanxi Univ., Taiyuan, China

Volume :

fYear :

2010

fDate :

Aug. 31 2010-Sept. 3 2010

Firstpage :

174

Lastpage :

177

Abstract :

Identifying the Chinese multiword chunk automatically is a newly emerged technology in the NLP field. As anew strategy, it can effectively improve the performance of the syntactic parsing. The work follows the standard description system of Chinese multiword chunk and has constructed two tag sequence models based on CRF model, which are named as ”the syntactic mark tagging list model” and ”the sequence mark tagging list model” respectively. The corpus used in the training process is called as ”the Chinese multiword chunk bank”, which is provided by Tsinghua University. In the experiments, by selecting appropriate the features and introducing some important rules, the better results are achieved and this system for identifying the Chinese multiword chunk can run well in a restricted area. Thus, it provides a bridge between syntax and semantic content.

Keywords :

grammars; natural language processing; random processes; word processing; CRF model; Chinese multiword chunk; NLP field; Tsinghua University; conditional random field; sequence mark tagging list model; syntactic mark tagging list model; syntactic parsing; tag sequence model; Periodic structures; Semantics; Syntactics; Tagging; Topology; Training; Vocabulary; CRF; Multiword; chunk parsing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on

Conference_Location :

Toronto, ON

Print_ISBN :

978-1-4244-8482-9

Electronic_ISBN :

978-0-7695-4191-4

Type :

conf

DOI :

10.1109/WI-IAT.2010.158

Filename :

5614356

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2735814