DocumentCode
3326416
Title
Development of Pashto Treebank
Author
Ali, Raian ; Khan, Muhammad Asad ; Khan, Muhammad Asad
Author_Institution
Quaid-e-Azam Coll. of Commerce, Univ. of Peshawar, Peshawar, Pakistan
fYear
2011
fDate
11-13 July 2011
Firstpage
257
Lastpage
262
Abstract
This paper is about the development of Pashto Treebank in the form of Extensible Markup Language (XML) code. A Chart Parser has been developed that uses Chart Parsing Algorithm for building parse trees for Pashto sentences. The output of the parser is the parsed text which can be obtained in one of its three forms such as reduced graph, parse tree and XML code. For parsing, the parser needs Context Free Grammar (CFG) of Pashto language and Tagged Input Text as input. The system has been tested on real world text taken from Pashto novels and web sites and tagged manually. Eighty seven (87) sentences were parsed by the parser in which fifty four (54) were correctly parsed with a single parse tree and the rest 33 were parsed with multiple trees and thus the accuracy obtained is 62.06%.
Keywords
Web sites; XML; computational linguistics; context-free grammars; natural language processing; text analysis; Pashto language; Pashto novels; Pashto sentences; Pashto treebank; Web sites; XML code; chart parsing algorithm; context free grammar; extensible markup language code; reduced graph; tagged input text; Argon; Humans; Nickel; Testing; XML; Corpus; Parser; Parsing; Pashto; Treebank;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Networks and Information Technology (ICCNIT), 2011 International Conference on
Conference_Location
Abbottabad
ISSN
2223-6317
Print_ISBN
978-1-61284-940-9
Type
conf
DOI
10.1109/ICCNIT.2011.6020939
Filename
6020939
Link To Document