DocumentCode :
3508003
Title :
Developing the first balanced corpus for Bangla language
Author :
Salam, Khan Md Anwarus ; Yamada, Setsuo ; Nishino, Tetsuro
Author_Institution :
Univ. of Electro-Commun., Tokyo, Japan
fYear :
2012
fDate :
18-19 May 2012
Firstpage :
1081
Lastpage :
1084
Abstract :
The objective of this paper is to propose the development process of the first Bangladeshi National Corpus. The purpose of the study is to specify the domains to create a balanced Bangla corpus based on some selection criteria. This study focuses on three independent selection criteria: domain, time and medium. This paper also explains domain classifications and weight percentage for each domain. We also identify the prospective source of information for preparing the corpus.
Keywords :
natural language processing; pattern classification; Bangla language; domain classifications; domain selection criteria; first Bangladeshi national corpus; medium selection criteria; time selection criteria; weight percentage; Blogs; Business; Electronic mail; Encyclopedias; History; Patents; Writing; Bangla Language Processing; corpus development;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Informatics, Electronics & Vision (ICIEV), 2012 International Conference on
Conference_Location :
Dhaka
Print_ISBN :
978-1-4673-1153-3
Type :
conf
DOI :
10.1109/ICIEV.2012.6317356
Filename :
6317356
Link To Document :
بازگشت