Title :
Development of a POS Tagger for Malayalam - An Experience
Author :
Manju, K. ; Soumya, S. ; Idicula, Sumam Mary
Author_Institution :
Dept. of Comput. Sci., Cochin Univ. of Sci. & Technol., Cochin, India
Abstract :
A parts of speech tagger for Malayalam which uses a stochastic approach has been proposed. The tagger makes use of word frequencies and bigram statistics from a corpus. The morphological analyzer is used to generate a tagged corpus due to the unavailability of an annotated corpus in Malayalam. Although the experiments have been performed on a very small corpus, the results have shown that the statistical approach works well with a highly agglutinative language like Malayalam.
Keywords :
mathematical morphology; natural language processing; speech processing; statistical analysis; stochastic processes; Malayalam; agglutinative language; bigram statistics; morphological analyzer; parts of speech tagger; stochastic approach; word frequencies; Communications technology; Computer science; Frequency; Hidden Markov models; Natural languages; Speech processing; Statistics; Stochastic processes; Tagging; Viterbi algorithm; Dravidian Language; HMM; Morphemes; Tagset; Viterbi;
Conference_Titel :
Advances in Recent Technologies in Communication and Computing, 2009. ARTCom '09. International Conference on
Conference_Location :
Kottayam, Kerala
Print_ISBN :
978-1-4244-5104-3
Electronic_ISBN :
978-0-7695-3845-7
DOI :
10.1109/ARTCom.2009.98