• DocumentCode
    121655
  • Title

    Dhiya: A stemmer for morphological level analysis of Gujarati language

  • Author

    Sheth, Jikitsha ; Patel, B.

  • Author_Institution
    SRIMCA, Uka Tarsadia Univ., Gopal Vidyanagar, India
  • fYear
    2014
  • fDate
    7-8 Feb. 2014
  • Firstpage
    151
  • Lastpage
    154
  • Abstract
    To understand a language, analysis has to be done at word level, sentence level, context level and discourse level. Morphological analysis comes at the base of all, as it is the first step to understand a given sentence. One of the tasks that can be done at morphological level is stemming. To identify the stem term of a given word is stemming. Stemming is one of the important activities which is not just related to Natural Language Processing domain, but is equally important in Information Retrieval domain. In this paper, authors suggest DHIYA a stemmer for Gujarati language. This stemmer is based on the morphology of Gujarati language. To develop the stemmer, inflections which appeared most in Gujarati text were identified. Based on it, the rule set was created. For training and evaluation of the stemmer´s performance the EMILLE corpus is used. The accuracy of the stemmer is 92.41%.
  • Keywords
    information retrieval; natural language processing; text analysis; word processing; DHIYA; EMILLE corpus; Gujarati language morphology; Gujarati text; context level; discourse level; information retrieval domain; morphological level analysis; sentence level; stemmer performance evaluation; stemming; training; word level; Computers; Gold; Hidden Markov models; Quantum cascade lasers; Gujarati; Indian languages; Morphemes; Stemmer;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014 International Conference on
  • Conference_Location
    Ghaziabad
  • Type

    conf

  • DOI
    10.1109/ICICICT.2014.6781269
  • Filename
    6781269