Title :
Developing the first balanced corpus for Bangla language
Author :
Salam, Khan Md Anwarus ; Yamada, Setsuo ; Nishino, Tetsuro
Author_Institution :
Univ. of Electro-Commun., Tokyo, Japan
Abstract :
The objective of this paper is to propose the development process of the first Bangladeshi National Corpus. The purpose of the study is to specify the domains to create a balanced Bangla corpus based on some selection criteria. This study focuses on three independent selection criteria: domain, time and medium. This paper also explains domain classifications and weight percentage for each domain. We also identify the prospective source of information for preparing the corpus.
Keywords :
natural language processing; pattern classification; Bangla language; domain classifications; domain selection criteria; first Bangladeshi national corpus; medium selection criteria; time selection criteria; weight percentage; Blogs; Business; Electronic mail; Encyclopedias; History; Patents; Writing; Bangla Language Processing; corpus development;
Conference_Titel :
Informatics, Electronics & Vision (ICIEV), 2012 International Conference on
Conference_Location :
Dhaka
Print_ISBN :
978-1-4673-1153-3
DOI :
10.1109/ICIEV.2012.6317356