مرکز منطقه ای اطلاع رساني علوم و فناوري - Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription

DocumentCode :

3484629

Title :

Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription

Author :

Seide, Frank ; Li, Gang ; Chen, Xie ; Yu, Dong

Author_Institution :

Microsoft Res. Asia, Beijing, China

fYear :

2011

fDate :

11-15 Dec. 2011

Firstpage :

Lastpage :

Abstract :

We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for speaker-independent transcription of phone calls (NIST RT03S Fisher data), CD-DNN-HMMs reduced the word error rate by as much as one third-from 27.4%, obtained by discriminatively trained Gaussian-mixture HMMs with HLDA features, to 18.5%-using 300+ hours of training data (Switchboard), 9000+ tied triphone states, and up to 9 hidden network layers.

Keywords :

Gaussian processes; hidden Markov models; neural nets; speaker recognition; speech synthesis; Gaussian-mixture HMM; context-dependent deep neural networks; conversational speech transcription; hidden Markov model; speaker-independent transcription; speech-to-text transcription; tied triphone states; word error rate; Accuracy; Adaptation models; Feature extraction; Hidden Markov models; Training; Transforms; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on

Conference_Location :

Waikoloa, HI

Print_ISBN :

978-1-4673-0365-1

Electronic_ISBN :

978-1-4673-0366-8

Type :

conf

DOI :

10.1109/ASRU.2011.6163899

Filename :

6163899

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3484629