DocumentCode :
3484629
Title :
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription
Author :
Seide, Frank ; Li, Gang ; Chen, Xie ; Yu, Dong
Author_Institution :
Microsoft Res. Asia, Beijing, China
fYear :
2011
fDate :
11-15 Dec. 2011
Firstpage :
24
Lastpage :
29
Abstract :
We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for speaker-independent transcription of phone calls (NIST RT03S Fisher data), CD-DNN-HMMs reduced the word error rate by as much as one third-from 27.4%, obtained by discriminatively trained Gaussian-mixture HMMs with HLDA features, to 18.5%-using 300+ hours of training data (Switchboard), 9000+ tied triphone states, and up to 9 hidden network layers.
Keywords :
Gaussian processes; hidden Markov models; neural nets; speaker recognition; speech synthesis; Gaussian-mixture HMM; context-dependent deep neural networks; conversational speech transcription; hidden Markov model; speaker-independent transcription; speech-to-text transcription; tied triphone states; word error rate; Accuracy; Adaptation models; Feature extraction; Hidden Markov models; Training; Transforms; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Conference_Location :
Waikoloa, HI
Print_ISBN :
978-1-4673-0365-1
Electronic_ISBN :
978-1-4673-0366-8
Type :
conf
DOI :
10.1109/ASRU.2011.6163899
Filename :
6163899
Link To Document :
بازگشت