• DocumentCode
    1954072
  • Title

    Urdu Noun Phrase Chunking - Hybrid Approach

  • Author

    Siddiq, Shahid ; Hussain, Sarmad ; Ali, Aasim ; Malik, Kamran ; Ali, Wajid

  • Author_Institution
    FAST, NUCES, Lahore, Pakistan
  • fYear
    2010
  • fDate
    28-30 Dec. 2010
  • Firstpage
    69
  • Lastpage
    72
  • Abstract
    In this work, chunking is used to mark the noun phrases of Urdu sentences. The approach used in this work is hybrid that combines statistical method and hand crafted rules. The statistical model used in this work is HMM along with IOB chunk annotation. From a POS tagged corpus of 100,000 words, around 90,000 word tokens are used for training and 10,000 word tokens for testing. Several experiments are conducted to achieve high accuracy with different combinations of input, output and rule application patterns. Overall accuracy of 97.52% is achieved using TnT Tagger. It is observed that the input sequence which is successful in this regard is merging of POS annotation with IOB annotation.
  • Keywords
    natural language processing; word processing; POS annotation; Urdu sentence; hand crafted rule; noun phrase chunking; statistical method; Accuracy; Computational modeling; Hidden Markov models; Probabilistic logic; Tagging; Testing; Training; Accuracy; Hybrid Approach; Noun Phrase Chunking; Part of Speech; Precision; Recall;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2010 International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-1-4244-9063-9
  • Type

    conf

  • DOI
    10.1109/IALP.2010.71
  • Filename
    5681546