DocumentCode :
1686035
Title :
Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription
Author :
Hang Su ; Gang Li ; Dong Yu ; Seide, Frank
Author_Institution :
Microsoft Res. Asia, Beijing, China
fYear :
2013
Firstpage :
6664
Lastpage :
6668
Abstract :
We investigate back-propagation based sequence training of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, for conversational speech transcription. Theoretically, sequence training integrates with backpropagation in a straight-forward manner. However, we find that to get reasonable results, heuristics are needed that point to a problem with lattice sparseness: The model must be adjusted to the updated numerator lattices by additional iterations of frame-based cross-entropy (CE) training; and to avoid distortions from “runaway” models, we can either add artificial silence arcs to the denominator lattices, or smooth the sequence objective with the frame-based one (F-smoothing). With the 309h Switchboard training set, the MMI objective achieves a relative word-error rate reduction of 11-15% over CE for matched test sets, and 10-17% for mismatched ones. This includes gains of 4-7% from realigned CE iterations. The BMMI and sMBR objectives gain less. With 2000h of data, gains are 2-9% after realigned CE iterations. Using GPGPUs, MMI is about 70% slower than CE training.
Keywords :
backpropagation; hidden Markov models; iterative methods; neural nets; smoothing methods; speech processing; BMMI; CD-DNN-HMM; CE iterations; CE training; F-smoothing; GPGPU; MMI objective; artificial silence arcs; back-propagation based sequence training; context-dependent deep network; context-dependent deep-neural-network HMM; conversational speech transcription; denominator lattices; error backpropagation; frame-based cross-entropy training; lattice sparseness; matched test sets; numerator lattices; runaway models; sMBR objectives; sequence objective; switchboard training set; word-error rate reduction; Biological neural networks; Hidden Markov models; Lattices; Speech; Speech recognition; Switches; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6638951
Filename :
6638951
Link To Document :
بازگشت