Title :
Hybrid Approach for Khmer Unknown Word POS Guessing
Author :
Nou, Chenda ; Kameyama, Wataru
Author_Institution :
Waseda Univ., Honjo
Abstract :
New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.
Keywords :
knowledge based systems; natural language processing; lexicon; part-of-speech tagging; rule-based model; trigram model; unknown word POS guessing; Context modeling; Data mining; Decision trees; Entropy; Machine learning; Natural languages; Neural networks; Predictive models; Tagging; Testing;
Conference_Titel :
Information Reuse and Integration, 2007. IRI 2007. IEEE International Conference on
Conference_Location :
Las Vegas, IL
Print_ISBN :
1-4244-1500-4
Electronic_ISBN :
1-4244-1500-4
DOI :
10.1109/IRI.2007.4296623