DocumentCode :
2306611
Title :
Building a foundation of HPSG-based treebank on Bangla language
Author :
Mahmud, Altaf ; Khan, Mumit
Author_Institution :
CRBLP, BRAC Univ., Dhaka
fYear :
2007
fDate :
27-29 Dec. 2007
Firstpage :
1
Lastpage :
6
Abstract :
Now a day, the importance of a large annotated corpus for NLP researchers is widely known. In this paper, we describe an initial phase of developing a linguistically annotated corpus for non-configurational dasiaBanglapsila language. Since, the formalism differs from those posited for configurational languages; several features have been added for constraint based parsing through HPSG-based formalism. We propose an outline of a semi-automated process by applying both case marking approach and some morphological analysis to constraint the parsing of a relatively free word order language for creating a linguistically rich, highly-lexicalized annotated corpus.
Keywords :
context-free grammars; context-free languages; natural language processing; tree data structures; Bangla language; HPSG-based treebank; NLP; free word order language; head-driven phrase structure grammar formalism; lexicalized annotated corpus; natural language processing; Bidirectional control; Books; Data mining; Information retrieval; Natural language processing; Natural languages; Pattern matching; Speech analysis; Standards development; Stochastic processes; hpsg; non-configurational; parsing; treebank; treebanking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and information technology, 2007. iccit 2007. 10th international conference on
Conference_Location :
Dhaka
Print_ISBN :
978-1-4244-1550-2
Electronic_ISBN :
978-1-4244-1551-9
Type :
conf
DOI :
10.1109/ICCITECHN.2007.4579375
Filename :
4579375
Link To Document :
بازگشت