DocumentCode :
121655
Title :
Dhiya: A stemmer for morphological level analysis of Gujarati language
Author :
Sheth, Jikitsha ; Patel, B.
Author_Institution :
SRIMCA, Uka Tarsadia Univ., Gopal Vidyanagar, India
fYear :
2014
fDate :
7-8 Feb. 2014
Firstpage :
151
Lastpage :
154
Abstract :
To understand a language, analysis has to be done at word level, sentence level, context level and discourse level. Morphological analysis comes at the base of all, as it is the first step to understand a given sentence. One of the tasks that can be done at morphological level is stemming. To identify the stem term of a given word is stemming. Stemming is one of the important activities which is not just related to Natural Language Processing domain, but is equally important in Information Retrieval domain. In this paper, authors suggest DHIYA a stemmer for Gujarati language. This stemmer is based on the morphology of Gujarati language. To develop the stemmer, inflections which appeared most in Gujarati text were identified. Based on it, the rule set was created. For training and evaluation of the stemmer´s performance the EMILLE corpus is used. The accuracy of the stemmer is 92.41%.
Keywords :
information retrieval; natural language processing; text analysis; word processing; DHIYA; EMILLE corpus; Gujarati language morphology; Gujarati text; context level; discourse level; information retrieval domain; morphological level analysis; sentence level; stemmer performance evaluation; stemming; training; word level; Computers; Gold; Hidden Markov models; Quantum cascade lasers; Gujarati; Indian languages; Morphemes; Stemmer;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014 International Conference on
Conference_Location :
Ghaziabad
Type :
conf
DOI :
10.1109/ICICICT.2014.6781269
Filename :
6781269
Link To Document :
بازگشت