DocumentCode
530273
Title
Urdu noun phrase chunking: HMM based approach
Author
Ali, Wajid ; Malik, M. Kamran ; Hussain, Sarmad ; Siddiq, Shahid ; Ali, Aasim
Author_Institution
Dept. of Comput. Sci., Nat. Univ. of Comput. & Emerging Sci. (NUCES), Lahore, Pakistan
Volume
2
fYear
2010
fDate
17-19 Sept. 2010
Abstract
Extraction of noun phrase (NP) from text is useful for many natural language processing applications, such as name entity recognition, indexing, searching, parsing etc. We present a noun phrase chunker for Urdu which is based on a statistical approach. A 100,000 words Urdu corpus is manually tagged with NP chunk tags. The corpus is used to develop a statistical approach. Initially, a statistical approach based on standard HMM model is developed for automatics NP chunking. In Urdu phrases, the case marker (CM) indicates the end of a noun phrase and is appended at its end. Thus, if one scans the sentence in reverse order, one may be able to better predict phrase endings. So, the technique is enhanced by changing scanning direction. The technique is further enhanced by merging chunk and POS tags to achieve maximum accuracy. The results of all experiments are reported with maximum overall accuracy of 97.61% achieved using HMM based approach with extended tagset and right to left (RTL) scanning.
Keywords
cognition; hidden Markov models; natural language processing; NP chunk tags; POS tags; Urdu noun phrase chunking; automatics NP chunking; case marker; chunk merging; natural language processing; noun phrase chunker; noun phrase extraction; phrase endings; scanning direction; standard HMM model; Cardiology; Hidden Markov models; Random access memory; Testing; HMM based chunking; NP chunking; Statistical Chunking; Urdu Noun Phrase; chunking;
fLanguage
English
Publisher
ieee
Conference_Titel
Educational and Information Technology (ICEIT), 2010 International Conference on
Conference_Location
Chongqing
Print_ISBN
978-1-4244-8033-3
Electronic_ISBN
978-1-4244-8035-7
Type
conf
DOI
10.1109/ICEIT.2010.5607623
Filename
5607623
Link To Document