Exploring deep neural networks and deep autoencoders in reverberant speech recognition

Author

Mimura, Masato ; Sakai, Shin´ichi ; Kawahara, Toshio

Author_Institution

Acad. Center for Comput. & Media Studies, Kyoto Univ. Sakyo-ku, Kyoto, Japan

fYear

2014

fDate

12-14 May 2014

Firstpage

197

Lastpage

201

Abstract

We propose an approach to reverberant speech recognition adopting deep learning in front end as well as back end of the system. At the front end, we adopt a deep autoencoder (DAE) for enhancing the speech feature parameters, and speech recognition is performed using a DNN-HMM acoustic models at the back end. The system was evaluated on simulated and real reverberant speech data sets. On average, the DNN-HMM system trained on the multi-condition training data outperformed the MLLR-adapted GMM-HMM system trained on the same data. The feature enhancement with the DAE contributed to the improvement of recognition accuracy especially in more adverse conditions. We also performed an unsupervised adaptation of the DNN-HMM models to the test data enhanced by the DAE and achieved improvements in word accuracies in all reverberation conditions of the test data.

Keywords

Gaussian processes; acoustic signal processing; encoding; hidden Markov models; mixture models; neural nets; reverberation; speech recognition; unsupervised learning; DAE; DNN-HMM acoustic models; DNN-HMM system training; Gaussian mixture model; data testing; deep autoencoders; deep learning; deep-neural networks; hidden Markov models; multicondition training data; real reverberant speech data sets; recognition accuracy improvement; reverberant speech recognition; reverberation conditions; simulated reverberant speech data sets; speech feature parameter enhancement; system back end; system front end; unsupervised DNN-HMM models; word accuracy improvement; Accuracy; Hidden Markov models; Microphones; Neural networks; Speech; Speech recognition; Training; Deep Autoencoder (DAE); Deep Neural Networks (DNN); reverberant speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014 4th Joint Workshop on

Conference_Location

Villers-les-Nancy

Type

conf

DOI

10.1109/HSCMA.2014.6843279

Filename

6843279