Title :
Dhiya: A stemmer for morphological level analysis of Gujarati language
Author :
Sheth, Jikitsha ; Patel, B.
Author_Institution :
SRIMCA, Uka Tarsadia Univ., Gopal Vidyanagar, India
Abstract :
To understand a language, analysis has to be done at word level, sentence level, context level and discourse level. Morphological analysis comes at the base of all, as it is the first step to understand a given sentence. One of the tasks that can be done at morphological level is stemming. To identify the stem term of a given word is stemming. Stemming is one of the important activities which is not just related to Natural Language Processing domain, but is equally important in Information Retrieval domain. In this paper, authors suggest DHIYA a stemmer for Gujarati language. This stemmer is based on the morphology of Gujarati language. To develop the stemmer, inflections which appeared most in Gujarati text were identified. Based on it, the rule set was created. For training and evaluation of the stemmer´s performance the EMILLE corpus is used. The accuracy of the stemmer is 92.41%.
Keywords :
information retrieval; natural language processing; text analysis; word processing; DHIYA; EMILLE corpus; Gujarati language morphology; Gujarati text; context level; discourse level; information retrieval domain; morphological level analysis; sentence level; stemmer performance evaluation; stemming; training; word level; Computers; Gold; Hidden Markov models; Quantum cascade lasers; Gujarati; Indian languages; Morphemes; Stemmer;
Conference_Titel :
Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014 International Conference on
Conference_Location :
Ghaziabad
DOI :
10.1109/ICICICT.2014.6781269