Title :
Variable-activation and variable-input deep neural network for robust speech recognition
Author :
Rui Zhao ; Jinyu Li ; Yifan Gong
Author_Institution :
Microsoft Search Technol. Center Asia, Beijing, China
Abstract :
In a previous study, we proposed variable-component deep neural network (VCDNN) to improve the robustness of context-dependent deep neural network hidden Markov model (CD-DNN-HMM). We model the components of DNN a set of polynomial functions of environmental variables, more specifically signal-to-noise ratio (SNR). We refined VCDNN on two types of DNN components: (1) weighting matrix and bias (2) the output of each layer. These two methods are called variable-parameter DNN (VPDNN) and variable-output DNN (VODNN). Although both methods got good gain over the standard DNN, they doubled the number of parameters even with only the first-order environment variable. In this study, we propose two new types of VCDNN, namely variable activation DNN (VADNN) and variable input DNN (VIDNN). The environment variable is applied to the hidden layer activation function in VADNN, and is applied directly to the input in VIDNN. Both DNNs only increase a negligible number of parameters compared to the standard DNN. Experimental results on Aurora4 task show that both methods are effective, and VIDNN can beat all other variations of VCDNN with relative 7.69% word error reduction from the standard DNN with the least increase in number of parameters.
Keywords :
hidden Markov models; matrix algebra; neural nets; polynomials; speech recognition; Aurora4 task; CD-DNN-HMM; SNR; VADNN; VCDNN; VIDNN; VODNN; context-dependent deep neural network hidden Markov model; first-order environment variable; hidden layer activation function; polynomial functions; robust speech recognition; signal-to-noise ratio; variable activation DNN; variable input DNN; variable-component deep neural network; variable-output DNN; variable-parameter DNN; weighting matrix; Filter banks; Hidden Markov models; Noise measurement; Signal to noise ratio; Speech; Standards; Vectors; deep neural network; robust speech recognition; variable activation; variable component; variable input;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2014 IEEE
DOI :
10.1109/SLT.2014.7078632