Statistical analysis of Hindi BTEC speech database

Author

Arora, Samarth ; Arora, Kavita ; Aggarwal, Shubhashis Sengupta

Author_Institution

CDAC, Noida, India

fYear

2012

fDate

9-12 Dec. 2012

Firstpage

157

Lastpage

162

Abstract

The BTEC (Basic Travel Expression Corpus) is developed by NICT, Japan and has a wide-coverage of basic Japanese travel expressions with English counterparts for the purpose of using it as the basic data for developing high quality speech translation system. The English counterpart of this corpus has been translated Hindi manually. It is used for development of English-Hindi speech translation system. In this paper, we present the statistical analysis of this translated Hindi BTEC corpus. Besides that, the translation methodology adopted in development of the corpus is also described. The statistical evaluations performed in the experiments, provide information of distribution of sentences, words, various phonemes and their growth behavior which provide direction for future enhancement of the corpus.

Keywords

language translation; natural language processing; speech processing; statistical analysis; word processing; Basic Travel Expression Corpus; English-Hindi speech translation system quality; Hindi BTEC speech database corpus translation; Japanese travel expressions; NICT; growth behavior; phoneme distribution information; sentence distribution information; statistical analysis; word distribution information; Sampling methods; Shape; Sociology; Speech; Statistical analysis; Vocabulary; Corpus statistics; Hindi BTEC; Speech Corpus;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech Database and Assessments (Oriental COCOSDA), 2012 International Conference on

Conference_Location

Macau

Print_ISBN

978-1-4673-2811-1

Electronic_ISBN

978-1-4673-2812-8

Type

conf

DOI

10.1109/ICSDA.2012.6422480

Filename

6422480