Developing Bengali Speech Corpus for Phone Recognizer Using Optimum Text Selection Technique

Author

Mandal, Sandipan ; Das, Biswajit ; Mitra, Pabitra ; Basu, Anupam

Author_Institution

Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kharagpur, India

fYear

2011

fDate

15-17 Nov. 2011

Firstpage

268

Lastpage

271

Abstract

Speech corpus plays a key role in construction of automatic speech recognition (ASR), text-to-speech (TTS) synthesis and phone recognition (PR) system. PR system and ASR system are quite similar in functionality. The difference between these two is that for PR system the speech signal is converted to phonefootnote{smallest discrete segment of sound in uttered speech} text whereas for ASR system the speech signal is converted to word text. Speech corpus for PR system usually consists of a text corpus, recording data corresponding to the text corpus, phonetic representation of the text corpus and a pronunciation dictionary. Selecting optimum text from available text with balanced phone distribution is an important task for developing high quality PR system. In this paper, we describe our text selection technique and discuss the performance of phone recognition system.

Keywords

speech recognition; speech synthesis; text analysis; ASR system; Bengali speech corpus; PR system; automatic speech recognition system; balanced phone distribution; phone recognition system; phone recognizer; phone-footnote text; phonetic representation; pronunciation dictionary; speech signal; text corpus; text selection technique; text-to-speech synthesis system; Accuracy; Computational modeling; Dictionaries; Hidden Markov models; Speech; Speech recognition; Text recognition; GMM; HMM; MFCC; phoneme; sphinx3; sphinxtrain;

fLanguage

English

Publisher

ieee

Conference_Titel

Asian Language Processing (IALP), 2011 International Conference on

Conference_Location

Penang

Print_ISBN

978-1-4577-1733-8

Type

conf

DOI

10.1109/IALP.2011.16

Filename

6121518