Title :
Bengali speech corpus for continuous auutomatic speech recognition system
Author :
Das, Biswajit ; Mandal, Sandipan ; Mitra, Pabitra
Author_Institution :
Comput. Sci. & Eng., Indian Inst. of Technol., Kharagpur, India
Abstract :
This paper presents Bengali speech corpus development for speaker independent continuous speech recognition. speech corpora is the backbone of automatic speech recognition (ASR) system. Speech corpus can be classified into several class. It may be language dependent or age dependent. We have developed speech corpus for two age groups. Younger group belongs to 20 to 40 years of age whereas older group is distributed into 60 to 80 years. We have created phone and triphone labeled speech corpora. Initially, speech samples are aligned with statistical modeling technique. Statistically labeled files are then pruned by manual correction. Hidden Markov Model Toolkit (HTK) has been used for aligning the speech data. We have observed phoneme recognition and continuous word recognition performance to check speech corpus quality.
Keywords :
hidden Markov models; natural languages; speech recognition; statistical analysis; Bengali speech corpus development; age dependent; continuous word recognition performance; hidden Markov model toolkit; language dependent; older group; phoneme recognition; speaker independent continuous automatic speech recognition system; statistical modeling technique; triphone labeled speech corpora; younger group; Bengali speech corpus; HTK; SPHINX; Speech labeling; Speech recognition;
Conference_Titel :
Speech Database and Assessments (Oriental COCOSDA), 2011 International Conference on
Conference_Location :
Hsinchu
Print_ISBN :
978-1-4577-0930-2
DOI :
10.1109/ICSDA.2011.6085979