Title :
Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
Author :
Jinyu Li ; Dong Yu ; Jui-Ting Huang ; Gong, Yu
Author_Institution :
Microsoft Corp., Redmond, WA, USA
Abstract :
Context-dependent deep neural network hidden Markov model (CD-DNN-HMM) is a recently proposed acoustic model that significantly outperformed Gaussian mixture model (GMM)-HMM systems in many large vocabulary speech recognition (LVSR) tasks. In this paper we present our strategy of using mixed-bandwidth training data to improve wideband speech recognition accuracy in the CD-DNN-HMM framework. We show that DNNs provide the flexibility of using arbitrary features. By using the Mel-scale log-filter bank features we not only achieve higher recognition accuracy than using MFCCs, but also can formulate the mixed-bandwidth training problem as a missing feature problem, in which several feature dimensions have no value when narrowband speech is presented. This treatment makes training CD-DNN-HMMs with mixed-bandwidth data an easy task since no bandwidth extension is needed. Our experiments on voice search data indicate that the proposed solution not only provides higher recognition accuracy for the wideband speech but also allows the same CD-DNN-HMM to recognize mixed-bandwidth speech. By exploiting mixed-bandwidth training data CD-DNN-HMM outperforms fMPE+BMMI trained GMM-HMM, which cannot benefit from using narrowband data, by 18.4%.
Keywords :
channel bank filters; hidden Markov models; neural nets; speech recognition; vocabulary; CD-DNN-HMM; LVSR; acoustic model; context-dependent deep neural network hidden Markov model; feature dimensions; large vocabulary speech recognition; mel scale log filter bank features; mixed-bandwidth training data; narrowband speech; voice search data; wideband speech recognition; Filter banks; Narrowband; Speech; Speech recognition; Training; Training data; Wideband; CD-DNN-HMM; deep neural network; log filter bank; mixed-bandwidth; narrowband; wideband;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4673-5125-6
Electronic_ISBN :
978-1-4673-5124-9
DOI :
10.1109/SLT.2012.6424210