DocumentCode
1954072
Title
Urdu Noun Phrase Chunking - Hybrid Approach
Author
Siddiq, Shahid ; Hussain, Sarmad ; Ali, Aasim ; Malik, Kamran ; Ali, Wajid
Author_Institution
FAST, NUCES, Lahore, Pakistan
fYear
2010
fDate
28-30 Dec. 2010
Firstpage
69
Lastpage
72
Abstract
In this work, chunking is used to mark the noun phrases of Urdu sentences. The approach used in this work is hybrid that combines statistical method and hand crafted rules. The statistical model used in this work is HMM along with IOB chunk annotation. From a POS tagged corpus of 100,000 words, around 90,000 word tokens are used for training and 10,000 word tokens for testing. Several experiments are conducted to achieve high accuracy with different combinations of input, output and rule application patterns. Overall accuracy of 97.52% is achieved using TnT Tagger. It is observed that the input sequence which is successful in this regard is merging of POS annotation with IOB annotation.
Keywords
natural language processing; word processing; POS annotation; Urdu sentence; hand crafted rule; noun phrase chunking; statistical method; Accuracy; Computational modeling; Hidden Markov models; Probabilistic logic; Tagging; Testing; Training; Accuracy; Hybrid Approach; Noun Phrase Chunking; Part of Speech; Precision; Recall;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location
Harbin
Print_ISBN
978-1-4244-9063-9
Type
conf
DOI
10.1109/IALP.2010.71
Filename
5681546
Link To Document