• DocumentCode
    1936540
  • Title

    Discovering Chinese Compound Term Using Termhood and Unithood Measures

  • Author

    Kang, Jingjing ; Liu, Tao ; Hu, He ; Du, Xiaoyong

  • Author_Institution
    Key Labs. of Data Eng. & Knowledge Eng., Minist. of Educ., China
  • fYear
    2011
  • fDate
    22-23 Aug. 2011
  • Firstpage
    60
  • Lastpage
    67
  • Abstract
    Domain terms play a crucial role in many research areas, which has led to a rise in demand for automatic domain terms extraction. In this paper, we present a two-level evaluation approach based on term hood and unit hood to extract Chinese domain compound terms automatically, which takes the character-level and word-level information into account. To achieve this, we incorporate semantic features by using the word segmentation to recognize single word terms, then leverage the improved C-value and heuristic methods such as word formation pattern and word formation power to evaluate candidates at both levels. By validating our approach with several existing dictionaries, a significant improvement of compound terms detection is achieved. Experiments in legal corpus show our method is superior over other compared methods.
  • Keywords
    information retrieval; natural language processing; text analysis; Chinese compound term; automatic domain terms extraction; dictionaries; termhood measures; unithood measures; word segmentation; Arrays; Compounds; Dictionaries; Feature extraction; Filtering; Pragmatics; Semantics; CCT C-value; Chinese Word Segmentation; Compound Term; Domain Term Extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinagrid Conference (ChinaGrid), 2011 Sixth Annual
  • Conference_Location
    Liaoning
  • Print_ISBN
    978-1-4577-0885-5
  • Type

    conf

  • DOI
    10.1109/ChinaGrid.2011.41
  • Filename
    6051734