• DocumentCode
    238749
  • Title

    Supervised named entity recognition in Assamese language

  • Author

    Talukdar, Gitimoni ; Borah, Pranjal Protim ; Baruah, Arup

  • Author_Institution
    Dept. of Comput. Sci. & Eng. & IT, Assam Don Bosco Univ., Guwahati, India
  • fYear
    2014
  • fDate
    27-29 Nov. 2014
  • Firstpage
    187
  • Lastpage
    191
  • Abstract
    In each and every natural language nouns play a very important role. A subcategory of noun is proper noun. They represent the names of person, location, organization etc. The task of recognizing the proper nouns in a text and categorizing them into some classes such as person, location, organization and other is called Named Entity Recognition. This is a very essential step of many natural language processing applications that makes the process of information extraction easier. Named Entity Recognition (NER) in most of the Indian languages has been performed using rule-based, supervised and unsupervised approaches. In this work our target language is Assamese, the language spoken by most of the people in North-Eastern part of India and particularly in Assam. In Assamese language, Named Entity Recognition has been performed using the rule based and suffix stripping based approaches. Supervised learning technique is more useful and can be easily adapted to new domains compared to rule based approaches. This paper reports the first work in Assamese NER using a machine learning technique. In this paper Assamese Named Entity Recognition is performed using Naïve Bayes classifier. Since feature extraction plays the most important role in getting better performance in any machine learning technique, in this work our aim is to put forward a description of a few important features related to Assamese NER and performance measure of the system using these features.
  • Keywords
    Bayes methods; feature extraction; learning (artificial intelligence); natural language processing; pattern classification; Assam; Assamese NER; Assamese language; India; feature extraction; machine learning technique; naive Bayes classifier; natural language processing; supervised named entity recognition; Compounds; Computer science; Context; Educational institutions; Informatics; Organizations; Training; Corpus; Morphology; Naïve Bayes Classifier; Named Entity Recognition; Suffix stripping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Contemporary Computing and Informatics (IC3I), 2014 International Conference on
  • Conference_Location
    Mysore
  • Type

    conf

  • DOI
    10.1109/IC3I.2014.7019728
  • Filename
    7019728