مرکز منطقه ای اطلاع رساني علوم و فناوري - Deep neural network acoustic modeling for native and non-native Mandarin speech recognition

DocumentCode :

134225

Title :

Deep neural network acoustic modeling for native and non-native Mandarin speech recognition

Author :

Xin Chen ; Jian Cheng

Author_Institution :

Knowledge Technol., Pearson, Menlo Park, CA, USA

fYear :

2014

fDate :

12-14 Sept. 2014

Firstpage :

Lastpage :

Abstract :

Recently, Context-Dependent Deep Neural Network Hidden Markov Models (CD-DNN-HMMs) proved to be a great success in improving the speech recognition accuracy over traditional Gaussian Mixture Model (GMM)-HMMs. In this paper, we applied CD-DNN-HMMs to an automatic Spoken Chinese Test (SCT) and improved the speech recognition performance significantly. We experimented on a large amount of native and non-native data collected for SCT and evaluated the effect of using different network activation functions: sigmoid and Rectified Linear Unit (ReLU) for acoustic modeling. We investigated accent adaptation on a Linear Input Network (LIN) - ReLU DNN network structure to enhance the non-native speech recognition accuracy. Furthermore, the feature-space Discriminative Linear Regression fDLR based accent adaptation was applied at the bottom of the system and was evaluated. In our task, we observed that applying prior probabilities was critical; pre-training did not help improve the performance; the performance using ReLU DNNs or sigmoid DNNs was similar. Overall, CD-DNN-HMM acoustic models made a relative 20% improvement on our non-native task, and a relative 47% improvement on the native task over traditional GMM-HMM models. The accent adaptation additionally contributed a 2.7% relative gain on the non-native task.

Keywords :

hidden Markov models; neural nets; regression analysis; speech recognition; CD-DNN-HMM; GMM-HMM; Gaussian mixture model; LIN; ReLU DNN network structure; SCT; acoustic modeling; automatic spoken Chinese test; context-dependent deep neural network hidden Markov models; deep neural network acoustic modeling; fDLR; feature-space discriminative linear regression based accent adaptation; linear input network; native Mandarin speech recognition; nonnative Mandarin speech recognition; rectified linear unit; sigmoid DNN; speech recognition accuracy; Acoustics; Adaptation models; Artificial neural networks; Hidden Markov models; Speech recognition; Training; Mandarin; accent adaptation; deep neural network acoustic modeling; nonnative speech recognition; rectified linear unit; spoken language assessment;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location :

Singapore

Type :

conf

DOI :

10.1109/ISCSLP.2014.6936617

Filename :

6936617

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=134225