DocumentCode :
3334251
Title :
Hybrid Approach for Khmer Unknown Word POS Guessing
Author :
Nou, Chenda ; Kameyama, Wataru
Author_Institution :
Waseda Univ., Honjo
fYear :
2007
fDate :
13-15 Aug. 2007
Firstpage :
215
Lastpage :
220
Abstract :
New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.
Keywords :
knowledge based systems; natural language processing; lexicon; part-of-speech tagging; rule-based model; trigram model; unknown word POS guessing; Context modeling; Data mining; Decision trees; Entropy; Machine learning; Natural languages; Neural networks; Predictive models; Tagging; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration, 2007. IRI 2007. IEEE International Conference on
Conference_Location :
Las Vegas, IL
Print_ISBN :
1-4244-1500-4
Electronic_ISBN :
1-4244-1500-4
Type :
conf
DOI :
10.1109/IRI.2007.4296623
Filename :
4296623
Link To Document :
بازگشت