DocumentCode
121655
Title
Dhiya: A stemmer for morphological level analysis of Gujarati language
Author
Sheth, Jikitsha ; Patel, B.
Author_Institution
SRIMCA, Uka Tarsadia Univ., Gopal Vidyanagar, India
fYear
2014
fDate
7-8 Feb. 2014
Firstpage
151
Lastpage
154
Abstract
To understand a language, analysis has to be done at word level, sentence level, context level and discourse level. Morphological analysis comes at the base of all, as it is the first step to understand a given sentence. One of the tasks that can be done at morphological level is stemming. To identify the stem term of a given word is stemming. Stemming is one of the important activities which is not just related to Natural Language Processing domain, but is equally important in Information Retrieval domain. In this paper, authors suggest DHIYA a stemmer for Gujarati language. This stemmer is based on the morphology of Gujarati language. To develop the stemmer, inflections which appeared most in Gujarati text were identified. Based on it, the rule set was created. For training and evaluation of the stemmer´s performance the EMILLE corpus is used. The accuracy of the stemmer is 92.41%.
Keywords
information retrieval; natural language processing; text analysis; word processing; DHIYA; EMILLE corpus; Gujarati language morphology; Gujarati text; context level; discourse level; information retrieval domain; morphological level analysis; sentence level; stemmer performance evaluation; stemming; training; word level; Computers; Gold; Hidden Markov models; Quantum cascade lasers; Gujarati; Indian languages; Morphemes; Stemmer;
fLanguage
English
Publisher
ieee
Conference_Titel
Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014 International Conference on
Conference_Location
Ghaziabad
Type
conf
DOI
10.1109/ICICICT.2014.6781269
Filename
6781269
Link To Document