• DocumentCode
    238745
  • Title

    Assamese Word Sense Disambiguation using Supervised Learning

  • Author

    Borah, Pranjal Protim ; Talukdar, Gitimoni ; Baruah, Arup

  • Author_Institution
    Dept. of Comput. Sci. & Eng. & IT, Assam Don Bosco Univ., Guwahati, India
  • fYear
    2014
  • fDate
    27-29 Nov. 2014
  • Firstpage
    946
  • Lastpage
    950
  • Abstract
    Word sense disambiguation (WSD) can be defined as a task that focuses on estimating the right sense of a word in its context. It is important as a pre-processing step in information extraction, machine translation, question answering and many other natural language processing tasks. Ambiguity in Word Sense arises when a particular word has more than one possible sense. Finding the correct sense requires thorough knowledge regarding words. This information of words is often derived from the sources such as words appearing in the context of the target word, part of speech information of the words in the neighbour, syntactical relations and local collocations. Our main aim in this paper is to develop an automatic system for WSD in Assamese using a Naive Bayes classifier. This is the first work to the best of our knowledge on developing an automatic WSD system for Assamese language. Assamese, the main language of most of the people in North-Eastern part of India is a morphologically very rich language. In Assamese WSD is a challenging task because a word can behave differently when combined with a suffix or a sequence of suffixes to have an entirely different sense. WSD often makes use of lexical resources such as WordNet, lexicon, annotated or unannotated corpora etc for its process of disambiguation.
  • Keywords
    learning (artificial intelligence); natural language processing; pattern classification; Assamese language; Assamese word sense disambiguation; India; WSD; WordNet; disambiguation process; information extraction; lexicon; machine translation; naive Bayes classifier; natural language processing task; question answering; supervised learning; Context; Dictionaries; Ink; Natural language processing; Semantics; Speech; Training; Lexicon; Local collocations; Polysemic word; Unigram cooccurence; Wordnet;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Contemporary Computing and Informatics (IC3I), 2014 International Conference on
  • Conference_Location
    Mysore
  • Type

    conf

  • DOI
    10.1109/IC3I.2014.7019726
  • Filename
    7019726