Dhiya: A stemmer for morphological level analysis of Gujarati language

Author

Sheth, Jikitsha ; Patel, B.

Author_Institution

SRIMCA, Uka Tarsadia Univ., Gopal Vidyanagar, India

fYear

2014

fDate

7-8 Feb. 2014

Firstpage

151

Lastpage

154

Abstract

To understand a language, analysis has to be done at word level, sentence level, context level and discourse level. Morphological analysis comes at the base of all, as it is the first step to understand a given sentence. One of the tasks that can be done at morphological level is stemming. To identify the stem term of a given word is stemming. Stemming is one of the important activities which is not just related to Natural Language Processing domain, but is equally important in Information Retrieval domain. In this paper, authors suggest DHIYA a stemmer for Gujarati language. This stemmer is based on the morphology of Gujarati language. To develop the stemmer, inflections which appeared most in Gujarati text were identified. Based on it, the rule set was created. For training and evaluation of the stemmer´s performance the EMILLE corpus is used. The accuracy of the stemmer is 92.41%.

Keywords

information retrieval; natural language processing; text analysis; word processing; DHIYA; EMILLE corpus; Gujarati language morphology; Gujarati text; context level; discourse level; information retrieval domain; morphological level analysis; sentence level; stemmer performance evaluation; stemming; training; word level; Computers; Gold; Hidden Markov models; Quantum cascade lasers; Gujarati; Indian languages; Morphemes; Stemmer;

fLanguage

English

Publisher

ieee

Conference_Titel

Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014 International Conference on

Conference_Location

Ghaziabad

Type

conf

DOI

10.1109/ICICICT.2014.6781269

Filename

6781269