DocumentCode
166388
Title
Hybrid part of speech tagger for Malayalam
Author
Francis, Matt ; Nair, K. N. Ramachandran
fYear
2014
fDate
24-27 Sept. 2014
Firstpage
1744
Lastpage
1750
Abstract
The process of assigning part of speech for every word in a given sentence according to the context is called as part of speech tagging. Part of speech tagging (POS tagging) plays an important role in the area of natural language processing (NLP) including applications such as speech recognition, speech synthesis, natural language parsing, information retrieval, multi words term extraction, word sense disambiguation and machine translation. This paper proposes an efficient and accurate POS tagging technique for Malayalam language using hybrid approach. We propose a Conditional Random Fields(CRF) based method integrated with Rule-Based method. We use SVM based method to compare the accuracy. The corpus both tagged and untagged used for training and testing the system is in the unicode format. The tagset developed by IIIT Hyderabad for Indian Languages is used. The system is tested for selected books of Bible and perform with an accuracy of 94%.
Keywords
grammars; knowledge based systems; natural language processing; statistical distributions; support vector machines; CRF; Malayalam language; NLP; POS tagging; SVM; conditional random fields; hybrid part of speech tagging; information retrieval; natural language parsing; natural language processing; rule-based method; word sense disambiguation; Accuracy; Compounds; Hidden Markov models; Probabilistic logic; Speech; Speech processing; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
Conference_Location
New Delhi
Print_ISBN
978-1-4799-3078-4
Type
conf
DOI
10.1109/ICACCI.2014.6968565
Filename
6968565
Link To Document