DocumentCode :
3778774
Title :
Design of rule based lemmatizer for Kannada inflectional words
Author :
R.J. Prathibha;M.C. Padma
Author_Institution :
Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, Mysuru, India
fYear :
2015
Firstpage :
264
Lastpage :
269
Abstract :
Lemmatizer and stemmer are the two basic modules in most of the natural language processing applications. Stemming is the process of stripping off the affixes that are present in the inflectional word to obtain stem. The extracted stem by the stemmer need not be a valid root or linguistically meaningful word. Lemmatizer removes the affixes that are present in the inflectional word by applying linguistic rules and returns the base-form or dictionary-form of the word, which is known as lemma. The split lemma is a valid root and linguistically meaningful word, hence the lemmatizer requires more linguistic knowledge than the stemmer. In linguistics, the objective of lemmatizer is to group together the different inflected forms of a word, such that these inflected words are analyzed as a common term. In this context, it is necessary to design a lemmatizer for Kannada inflectional words. In this paper we have proposed the design of rule based lemmatizer by adding set of linguistic rules to extract proper and meaningful root from Kannada inflectional word. The proposed module is tested on different types of data sets that are specifically created for this work and the accuracy obtained on these data is above 85%.
Keywords :
"Pragmatics","Natural language processing","Computer science","Information retrieval","Algorithm design and analysis","Computers"
Publisher :
ieee
Conference_Titel :
Emerging Research in Electronics, Computer Science and Technology (ICERECT), 2015 International Conference on
Type :
conf
DOI :
10.1109/ERECT.2015.7499024
Filename :
7499024
Link To Document :
بازگشت