• DocumentCode
    424251
  • Title

    Chinese unknown word identification as known word tagging

  • Author

    Fu, Guo-Hong ; Luke, Kang-Kwong

  • Author_Institution
    Dept. of Linguistics, Hong Kong Univ., China
  • Volume
    4
  • fYear
    2004
  • fDate
    26-29 Aug. 2004
  • Firstpage
    2612
  • Abstract
    This work presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.
  • Keywords
    character recognition; computational linguistics; hidden Markov models; natural languages; word processing; Chinese unknown word identification; known word tagging; lexicalization technique; lexicalized HMM; lexicalized hidden Markov model; statistical tagger; word-formation patterns; Context modeling; Dictionaries; Hidden Markov models; Machine learning; System testing; Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
  • Print_ISBN
    0-7803-8403-2
  • Type

    conf

  • DOI
    10.1109/ICMLC.2004.1382245
  • Filename
    1382245