• DocumentCode
    441572
  • Title

    Chinese text chunking using lexicalized HMMs

  • Author

    Fu, Guo-Hong ; Xu, Rui-Feng ; Luke, Kang-Kwong ; Lu, Qin

  • Author_Institution
    Dept. of Linguistics, Hong Kong Univ., China
  • Volume
    1
  • fYear
    2005
  • fDate
    18-21 Aug. 2005
  • Firstpage
    7
  • Abstract
    This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system.
  • Keywords
    hidden Markov models; linguistics; text analysis; Chinese text chunking; POS; PolyU Shallow Treebank; base phrase recognition; base phrase structure; chunk boundary; chunk type; chunk-internal cues; lexicalization technique; lexicalized hidden Markov model; part-of-speech information; tagging task; word boundary; Computational efficiency; Dictionaries; Entropy; Hidden Markov models; Machine learning; Natural languages; Support vector machines; Tagging; Text mining; Text recognition; Text chunking; base phrase recognition; base phrase structure; lexicalized hidden Markov models (HMMs);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
  • Conference_Location
    Guangzhou, China
  • Print_ISBN
    0-7803-9091-1
  • Type

    conf

  • DOI
    10.1109/ICMLC.2005.1526911
  • Filename
    1526911