Title :
Discovering Chinese Compound Term Using Termhood and Unithood Measures
Author :
Kang, Jingjing ; Liu, Tao ; Hu, He ; Du, Xiaoyong
Author_Institution :
Key Labs. of Data Eng. & Knowledge Eng., Minist. of Educ., China
Abstract :
Domain terms play a crucial role in many research areas, which has led to a rise in demand for automatic domain terms extraction. In this paper, we present a two-level evaluation approach based on term hood and unit hood to extract Chinese domain compound terms automatically, which takes the character-level and word-level information into account. To achieve this, we incorporate semantic features by using the word segmentation to recognize single word terms, then leverage the improved C-value and heuristic methods such as word formation pattern and word formation power to evaluate candidates at both levels. By validating our approach with several existing dictionaries, a significant improvement of compound terms detection is achieved. Experiments in legal corpus show our method is superior over other compared methods.
Keywords :
information retrieval; natural language processing; text analysis; Chinese compound term; automatic domain terms extraction; dictionaries; termhood measures; unithood measures; word segmentation; Arrays; Compounds; Dictionaries; Feature extraction; Filtering; Pragmatics; Semantics; CCT C-value; Chinese Word Segmentation; Compound Term; Domain Term Extraction;
Conference_Titel :
Chinagrid Conference (ChinaGrid), 2011 Sixth Annual
Conference_Location :
Liaoning
Print_ISBN :
978-1-4577-0885-5
DOI :
10.1109/ChinaGrid.2011.41