DocumentCode :
178889
Title :
Single-channel speech separation with memory-enhanced recurrent neural networks
Author :
Weninger, Felix ; Eyben, Florian ; Schuller, Bjorn
Author_Institution :
Machine Intell. & Signal Process. Group, Tech. Univ. Munchen, München, Germany
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
3709
Lastpage :
3713
Abstract :
In this paper we propose the use of Long Short-Term Memory recurrent neural networks for speech enhancement. Networks are trained to predict clean speech as well as noise features from noisy speech features, and a magnitude domain soft mask is constructed from these features. Extensive tests are run on 73 k noisy and reverberated utterances from the Audio-Visual Interest Corpus of spontaneous, emotionally colored speech, degraded by several hours of real noise recordings comprising stationary and non-stationary sources and convolutive noise from the Aachen Room Impulse Response database. In the result, the proposed method is shown to provide superior noise reduction at low signal-to-noise ratios while creating very little artifacts at higher signal-to-noise ratios, thereby outperforming unsupervised magnitude domain spectral subtraction by a large margin in terms of source-distortion ratio.
Keywords :
acoustic convolution; learning (artificial intelligence); recurrent neural nets; reverberation; speech enhancement; Aachen Room Impulse Response database; audio-visual interest corpus; clean speech prediction; convolutive noise; long-term memory recur- rent neural networks; magnitude domain soft mask; memory-enhanced recurrent neural network training; noise reduction; noise speech feature prediction; noisy utterances; nonstationary sources; real noise recordings; reverberated utterances; short-term memory recurrent neural networks; signal-to-noise ratios; single-channel speech separation; source-distortion ratio; speech enhancement; spontaneous-emotionally colored speech; stationary sources; Estimation; Noise; Noise measurement; Recurrent neural networks; Speech; Speech enhancement; Training; Long Short-Term Memory; Speech enhancement; recurrent neural networks; speech separation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854294
Filename :
6854294
Link To Document :
بازگشت