Title :
Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise
Author :
Wollmer, Martin ; Zixing Zhang ; Weninger, Felix ; Schuller, Bjorn ; Rigoll, Gerhard
Author_Institution :
BMW Group, Munich, Germany
Abstract :
The recognition of spontaneous speech in highly variable noise is known to be a challenge, especially at low signal-to-noise ratios (SNR). In this paper, we investigate the effect of applying bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks for speech feature enhancement in noisy conditions. BLSTM networks tend to prevail over conventional neural network architectures, whenever the recognition or regression task relies on an intelligent exploitation of temporal context information. We show that BLSTM networks are well-suited for mapping from noisy to clean speech features and that the obtained recognition performance gain is partly complementary to improvements via additional techniques such as speech enhancement by non-negative matrix factorization and probabilistic feature generation by Bottleneck-BLSTM networks. Compared to simple multi-condition training or feature enhancement via standard recurrent neural networks, our BLSTM-based feature enhancement approach leads to remarkable gains in word accuracy in a highly challenging task of recognizing spontaneous speech at SNR levels between -6 and 9 dB.
Keywords :
matrix decomposition; recurrent neural nets; regression analysis; speech enhancement; speech recognition; Bottleneck-BLSTM networks; bidirectional LSTM networks; bidirectional long short-term memory; highly nonstationary noise; highly variable noise; multicondition training; neural network architectures; noisy conditions; nonnegative matrix factorization; probabilistic feature generation; recognition performance gain; recognition task; recurrent neural networks; regression task; signal-to-noise ratios; speech feature enhancement; speech recognition; spontaneous speech; temporal context information; word accuracy; Feature extraction; Noise; Noise measurement; Speech; Speech enhancement; Speech recognition; Training; Long Short-Term Memory; feature enhancement; non-negative matrix factorization; recurrent neural networks;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6638983