Title :
Text chunker for Malayalam using Memory-Based Learning
Author :
Rekha Raj C. T.; Reghu Raj P. C.
Author_Institution :
Department of Computer Science and Engineering, Government Engineering College, Sreekrishnapuram, Kerala, India 678633
Abstract :
Text chunking consists of dividing a text into syntactically correlated parts of words. Given the words and their morphosyntactic class, a chunker will decide which words can be grouped as chunks. Malayalam is a free word order language and has relatively unrestricted phrase structures that make the problem of chunking quite challenging. This paper aims to develop a text chunker for Malayalam using Memory-Based Learning (MBL) approach. Memory-Based Learning is a machine learning methodology based on the idea that the direct reuse of examples using analogical reasoning is more suited for solving language processing problems than the application of rules extracted from those examples. The chunker was trained using the tool Memory-Based Tagger (MBT) with words and their POS tags as features. The chunker demonstrated an accuracy of 97.14%.
Keywords :
"Measurement","Compounds","Tagging","Training","Speech","Context","Hidden Markov models"
Conference_Titel :
Control Communication & Computing India (ICCC), 2015 International Conference on
DOI :
10.1109/ICCC.2015.7432966