• DocumentCode
    476193
  • Title

    A morpheme-based lexical chunking system for Chinese

  • Author

    Fu, Guo-hong ; Kit, Chun-yu ; Webster, Jonathan J.

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Heilongjiang Univ., Harbin
  • Volume
    5
  • fYear
    2008
  • fDate
    12-15 July 2008
  • Firstpage
    2455
  • Lastpage
    2460
  • Abstract
    Chinese lexical analysis consists of word segmentation and part-of-speech tagging. Most previous studies consider them as two separate tasks. In this paper we formalize the two processes as a unique chunking task on a sequence of morphemes and present an integrated lexical analysis system for Chinese based on lexicalized hidden Markov models. In this way, both contextual lexical information and word-internal morphological features can be statistically explored and further combined for disambiguation and unknown word resolution. Experimental results show that the proposed system outperforms several baselines, illustrating the benefits of the unified lexical chunking method with morphemes as the basic units.
  • Keywords
    hidden Markov models; natural language processing; Chinese lexical analysis; hidden Markov model; morpheme-based lexical chunking system; part-of-speech tagging; statistical analysis; word segmentation; word-internal morphological feature; Computer science; Cybernetics; Hidden Markov models; Information analysis; Information retrieval; Machine learning; Morphology; Natural language processing; Natural languages; Tagging; Chinese lexical analysis; Lexical chunking; Part-of-speech tagging; Word segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2008 International Conference on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-1-4244-2095-7
  • Electronic_ISBN
    978-1-4244-2096-4
  • Type

    conf

  • DOI
    10.1109/ICMLC.2008.4620820
  • Filename
    4620820