• DocumentCode
    3334251
  • Title

    Hybrid Approach for Khmer Unknown Word POS Guessing

  • Author

    Nou, Chenda ; Kameyama, Wataru

  • Author_Institution
    Waseda Univ., Honjo
  • fYear
    2007
  • fDate
    13-15 Aug. 2007
  • Firstpage
    215
  • Lastpage
    220
  • Abstract
    New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.
  • Keywords
    knowledge based systems; natural language processing; lexicon; part-of-speech tagging; rule-based model; trigram model; unknown word POS guessing; Context modeling; Data mining; Decision trees; Entropy; Machine learning; Natural languages; Neural networks; Predictive models; Tagging; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration, 2007. IRI 2007. IEEE International Conference on
  • Conference_Location
    Las Vegas, IL
  • Print_ISBN
    1-4244-1500-4
  • Electronic_ISBN
    1-4244-1500-4
  • Type

    conf

  • DOI
    10.1109/IRI.2007.4296623
  • Filename
    4296623