Title :
Deep neural network acoustic modeling for native and non-native Mandarin speech recognition
Author :
Xin Chen ; Jian Cheng
Author_Institution :
Knowledge Technol., Pearson, Menlo Park, CA, USA
Abstract :
Recently, Context-Dependent Deep Neural Network Hidden Markov Models (CD-DNN-HMMs) proved to be a great success in improving the speech recognition accuracy over traditional Gaussian Mixture Model (GMM)-HMMs. In this paper, we applied CD-DNN-HMMs to an automatic Spoken Chinese Test (SCT) and improved the speech recognition performance significantly. We experimented on a large amount of native and non-native data collected for SCT and evaluated the effect of using different network activation functions: sigmoid and Rectified Linear Unit (ReLU) for acoustic modeling. We investigated accent adaptation on a Linear Input Network (LIN) - ReLU DNN network structure to enhance the non-native speech recognition accuracy. Furthermore, the feature-space Discriminative Linear Regression fDLR based accent adaptation was applied at the bottom of the system and was evaluated. In our task, we observed that applying prior probabilities was critical; pre-training did not help improve the performance; the performance using ReLU DNNs or sigmoid DNNs was similar. Overall, CD-DNN-HMM acoustic models made a relative 20% improvement on our non-native task, and a relative 47% improvement on the native task over traditional GMM-HMM models. The accent adaptation additionally contributed a 2.7% relative gain on the non-native task.
Keywords :
hidden Markov models; neural nets; regression analysis; speech recognition; CD-DNN-HMM; GMM-HMM; Gaussian mixture model; LIN; ReLU DNN network structure; SCT; acoustic modeling; automatic spoken Chinese test; context-dependent deep neural network hidden Markov models; deep neural network acoustic modeling; fDLR; feature-space discriminative linear regression based accent adaptation; linear input network; native Mandarin speech recognition; nonnative Mandarin speech recognition; rectified linear unit; sigmoid DNN; speech recognition accuracy; Acoustics; Adaptation models; Artificial neural networks; Hidden Markov models; Speech recognition; Training; Mandarin; accent adaptation; deep neural network acoustic modeling; nonnative speech recognition; rectified linear unit; spoken language assessment;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ISCSLP.2014.6936617