DocumentCode :
166388
Title :
Hybrid part of speech tagger for Malayalam
Author :
Francis, Matt ; Nair, K. N. Ramachandran
fYear :
2014
fDate :
24-27 Sept. 2014
Firstpage :
1744
Lastpage :
1750
Abstract :
The process of assigning part of speech for every word in a given sentence according to the context is called as part of speech tagging. Part of speech tagging (POS tagging) plays an important role in the area of natural language processing (NLP) including applications such as speech recognition, speech synthesis, natural language parsing, information retrieval, multi words term extraction, word sense disambiguation and machine translation. This paper proposes an efficient and accurate POS tagging technique for Malayalam language using hybrid approach. We propose a Conditional Random Fields(CRF) based method integrated with Rule-Based method. We use SVM based method to compare the accuracy. The corpus both tagged and untagged used for training and testing the system is in the unicode format. The tagset developed by IIIT Hyderabad for Indian Languages is used. The system is tested for selected books of Bible and perform with an accuracy of 94%.
Keywords :
grammars; knowledge based systems; natural language processing; statistical distributions; support vector machines; CRF; Malayalam language; NLP; POS tagging; SVM; conditional random fields; hybrid part of speech tagging; information retrieval; natural language parsing; natural language processing; rule-based method; word sense disambiguation; Accuracy; Compounds; Hidden Markov models; Probabilistic logic; Speech; Speech processing; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
Conference_Location :
New Delhi
Print_ISBN :
978-1-4799-3078-4
Type :
conf
DOI :
10.1109/ICACCI.2014.6968565
Filename :
6968565
Link To Document :
بازگشت