• DocumentCode
    3317512
  • Title

    Hierarchical iterative and self-supervised method for concept-word acquisition from large-scale Chinese corpora

  • Author

    Tian, Guogang ; Cao, Cungen

  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    312
  • Lastpage
    317
  • Abstract
    This paper proposes a hierarchical iterative and self-supervised method (HISS) to acquire concept words from a large-scale, un-segmented Chinese corpus. It has two levels of iteration: the EM-CLS algorithm and the Viterbi-C/S algorithm constitute the inner iteration for generating concept words, and the concept word validation constitutes the outer iteration together with the concept word generation. Through multiple iterations, it integrates the concept word generation and validation into a uniform acquisition process. In the process of acquisition, the HISS method can cope with the problem of over-segmentation, over-combination and data sparseness. The experimental result shows that the HISS method is valid for concept word acquisition that can simultaneously increase the precision and recall rate of concept word acquisition.
  • Keywords
    expectation-maximisation algorithm; knowledge acquisition; natural languages; text analysis; EM-CLS algorithm; Viterbi-C/S algorithm; concept word acquisition; expectation-maximisation algorithm; hierarchical iterative and self-supervised method; large-scale Chinese corpora; Context modeling; Data mining; Information processing; Iterative algorithms; Iterative methods; Knowledge acquisition; Laboratories; Large-scale systems; Mutual information; Paper technology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598754
  • Filename
    1598754