DocumentCode
591475
Title
Statistical analysis of Hindi BTEC speech database
Author
Arora, Samarth ; Arora, Kavita ; Aggarwal, Shubhashis Sengupta
Author_Institution
CDAC, Noida, India
fYear
2012
fDate
9-12 Dec. 2012
Firstpage
157
Lastpage
162
Abstract
The BTEC (Basic Travel Expression Corpus) is developed by NICT, Japan and has a wide-coverage of basic Japanese travel expressions with English counterparts for the purpose of using it as the basic data for developing high quality speech translation system. The English counterpart of this corpus has been translated Hindi manually. It is used for development of English-Hindi speech translation system. In this paper, we present the statistical analysis of this translated Hindi BTEC corpus. Besides that, the translation methodology adopted in development of the corpus is also described. The statistical evaluations performed in the experiments, provide information of distribution of sentences, words, various phonemes and their growth behavior which provide direction for future enhancement of the corpus.
Keywords
language translation; natural language processing; speech processing; statistical analysis; word processing; Basic Travel Expression Corpus; English-Hindi speech translation system quality; Hindi BTEC speech database corpus translation; Japanese travel expressions; NICT; growth behavior; phoneme distribution information; sentence distribution information; statistical analysis; word distribution information; Sampling methods; Shape; Sociology; Speech; Statistical analysis; Vocabulary; Corpus statistics; Hindi BTEC; Speech Corpus;
fLanguage
English
Publisher
ieee
Conference_Titel
Speech Database and Assessments (Oriental COCOSDA), 2012 International Conference on
Conference_Location
Macau
Print_ISBN
978-1-4673-2811-1
Electronic_ISBN
978-1-4673-2812-8
Type
conf
DOI
10.1109/ICSDA.2012.6422480
Filename
6422480
Link To Document