Title :
Parsing-based automatic Chinese term extraction
Author :
Zhang, Meng ; Lin, Xiaojun ; Dai, Xu ; Wu, Xihong
Author_Institution :
Key Lab. of Machine Perception, Peking Univ., Beijing, China
Abstract :
Term extraction is to automatically extract domain specific terms from a given corpus. Previous works of term extraction only focus on the termhood measurement, rather than the nested candidates. Different from previous methods which identify the nested candidates using the surface lexical information, such as word form characteristics, or the grammatical analysis described as the part-of-speech(POS) sequence patterns, this paper proposes a parsing-based approach to extract noun phrases as nested candidates, therefore, can fully explore the syntactic structure information. Experiments show that the proposed approach performs equally well as the conventional POS sequence patterns approach in the recall of candidates, but with fewer impossible ones. Combined with C-value as the termhood measure, the proposed approach obtains consistent improvements in the rank list of terms.
Keywords :
grammars; natural languages; C-value; grammatical analysis; nested candidates; noun phrase extraction; parsing-based automatic Chinese term extraction; part-of-speech sequence patterns; surface lexical information; syntactic structure information; termhood measurement; word form characteristics; Artificial neural networks; Data mining; IP networks;
Conference_Titel :
Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on
Conference_Location :
Tokushima
Print_ISBN :
978-1-61284-729-0
DOI :
10.1109/NLPKE.2011.6138179