DocumentCode :
3141605
Title :
Parsing-based automatic Chinese term extraction
Author :
Zhang, Meng ; Lin, Xiaojun ; Dai, Xu ; Wu, Xihong
Author_Institution :
Key Lab. of Machine Perception, Peking Univ., Beijing, China
fYear :
2011
fDate :
27-29 Nov. 2011
Firstpage :
122
Lastpage :
125
Abstract :
Term extraction is to automatically extract domain specific terms from a given corpus. Previous works of term extraction only focus on the termhood measurement, rather than the nested candidates. Different from previous methods which identify the nested candidates using the surface lexical information, such as word form characteristics, or the grammatical analysis described as the part-of-speech(POS) sequence patterns, this paper proposes a parsing-based approach to extract noun phrases as nested candidates, therefore, can fully explore the syntactic structure information. Experiments show that the proposed approach performs equally well as the conventional POS sequence patterns approach in the recall of candidates, but with fewer impossible ones. Combined with C-value as the termhood measure, the proposed approach obtains consistent improvements in the rank list of terms.
Keywords :
grammars; natural languages; C-value; grammatical analysis; nested candidates; noun phrase extraction; parsing-based automatic Chinese term extraction; part-of-speech sequence patterns; surface lexical information; syntactic structure information; termhood measurement; word form characteristics; Artificial neural networks; Data mining; IP networks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on
Conference_Location :
Tokushima
Print_ISBN :
978-1-61284-729-0
Type :
conf
DOI :
10.1109/NLPKE.2011.6138179
Filename :
6138179
Link To Document :
بازگشت