• DocumentCode
    3326416
  • Title

    Development of Pashto Treebank

  • Author

    Ali, Raian ; Khan, Muhammad Asad ; Khan, Muhammad Asad

  • Author_Institution
    Quaid-e-Azam Coll. of Commerce, Univ. of Peshawar, Peshawar, Pakistan
  • fYear
    2011
  • fDate
    11-13 July 2011
  • Firstpage
    257
  • Lastpage
    262
  • Abstract
    This paper is about the development of Pashto Treebank in the form of Extensible Markup Language (XML) code. A Chart Parser has been developed that uses Chart Parsing Algorithm for building parse trees for Pashto sentences. The output of the parser is the parsed text which can be obtained in one of its three forms such as reduced graph, parse tree and XML code. For parsing, the parser needs Context Free Grammar (CFG) of Pashto language and Tagged Input Text as input. The system has been tested on real world text taken from Pashto novels and web sites and tagged manually. Eighty seven (87) sentences were parsed by the parser in which fifty four (54) were correctly parsed with a single parse tree and the rest 33 were parsed with multiple trees and thus the accuracy obtained is 62.06%.
  • Keywords
    Web sites; XML; computational linguistics; context-free grammars; natural language processing; text analysis; Pashto language; Pashto novels; Pashto sentences; Pashto treebank; Web sites; XML code; chart parsing algorithm; context free grammar; extensible markup language code; reduced graph; tagged input text; Argon; Humans; Nickel; Testing; XML; Corpus; Parser; Parsing; Pashto; Treebank;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Networks and Information Technology (ICCNIT), 2011 International Conference on
  • Conference_Location
    Abbottabad
  • ISSN
    2223-6317
  • Print_ISBN
    978-1-61284-940-9
  • Type

    conf

  • DOI
    10.1109/ICCNIT.2011.6020939
  • Filename
    6020939