• DocumentCode
    2278615
  • Title

    Using Hidden Markov Model to improve the accuracy of Punjabi POS tagger

  • Author

    Sharma, Sanjeev Kumar ; Lehal, Gurpreet Singh

  • Author_Institution
    Dept. of CSE, BIS Coll. of Eng. & Technol., Moga, India
  • Volume
    2
  • fYear
    2011
  • fDate
    10-12 June 2011
  • Firstpage
    697
  • Lastpage
    701
  • Abstract
    POS tagger is the process of assigning a correct tag to each word of the sentence. Accuracy of all NLP tasks like grammar checker, phrase chunker, machine translation etc. depends upon the accuracy of the POS tagger. We attempted to improve the accuracy of existing Punjabi POS tagger. This POS tagger lacks in resolving the ambiguity of compound and complex sentences. A Bi-gram Hidden Markov Model has been used to solve the part of speech tagging problem. An annotated corpus of 20,000 words was used for training and estimating of HMM parameter. Maximum likelihood method has been used to estimate the parameter. This HMM approach has been implemented by using Viterby algorithm. A module has been developed that takes the existing POS tagger output as input and assign the correct tag to the words having more than one tag. Our module was tested on the corpus containing 26,479 words. The accuracy of 90.11% was evaluated using manual approach.
  • Keywords
    hidden Markov models; maximum likelihood estimation; natural language processing; Punjabi POS tagger; Viterbi algorithm; bi-gram hidden Markov model; grammar checker task; machine translation task; maximum likelihood method; natural language processing; part-of-speech tagger; phrase chunker task; Accuracy; Hidden Markov models; Natural language processing; Probability; Speech; Tagging; Training; HMM; POS; Punjabi; Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Automation Engineering (CSAE), 2011 IEEE International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-8727-1
  • Type

    conf

  • DOI
    10.1109/CSAE.2011.5952600
  • Filename
    5952600