• DocumentCode
    1909541
  • Title

    Improving Chinese Chunking with Enriched Statistical and Morphological Knowledge

  • Author

    Yao, Limin ; Li, Mu ; Huang, Changning

  • Author_Institution
    Tsinghua Univ., Beijing
  • fYear
    2007
  • fDate
    Aug. 30 2007-Sept. 1 2007
  • Firstpage
    149
  • Lastpage
    156
  • Abstract
    In this paper, we address the issue of improving a Chinese chunking system with rich lexicalized information. A method that incorporates statistical information based on distributional similarity between words obtained from large unlabeled corpus and morphological knowledge into a state-of-the-art CRF-based chunking model is proposed to tackle the data sparseness problem given limited amount of labeled training data. Evaluations are performed on the latest release of Chinese Treebank, and experimental results show that our method outperforms the chunking models based on features over word and automatically assigned POS tags when using the same amount of training data.
  • Keywords
    natural language processing; random processes; statistical analysis; Chinese Treebank; Chinese chunking; conditional random field model; data sparseness problem; morphological knowledge; statistical knowledge; Asia; Chromium; Data mining; Lead; Natural languages; Tagging; Training data; Tree data structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-1611-0
  • Electronic_ISBN
    978-1-4244-1611-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2007.4368026
  • Filename
    4368026