DocumentCode :
3429439
Title :
Joint acoustic and spectral modeling for speech dereverberation using non-negative representations
Author :
Mohammadiha, Nasser ; Smaragdis, Paris ; Doclo, Simon
Author_Institution :
Dept. of Med. Phys. & Acoust., Univ. of Oldenburg, Oldenburg, Germany
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
4410
Lastpage :
4414
Abstract :
This paper proposes a single-channel speech dereverberation method enhancing the spectrum of the reverberant speech signal. The proposed method uses a non-negative approximation of the convolutive transfer function (N-CTF) to simultaneously estimate the magnitude spectrograms of the speech signal and the room impulse response (RIR). To utilize the speech spectral structure, we propose to model the speech spectrum using non-negative matrix factorization, which is directly used in the N-CTF model resulting in a new cost function. We derive new estimators for the parameters by minimizing the obtained cost function. Additionally, to investigate the effect of the speech temporal dynamics for dereverberation, we use a frame stacking method and derive optimal estimators. Experiments are performed for two measured RIRs and the performance of the proposed method is compared to the performance of a state-of-the-art dereverberation method enhancing the speech spectrum. Experimental results show that the proposed method improved instrumental speech quality measures, where using speech temporal dynamics was found to be beneficial in severe reverberation conditions.
Keywords :
matrix decomposition; reverberation; speech processing; transfer functions; transient response; N-CTF; RIR; convolutive transfer function; cost function; frame stacking method; instrumental speech quality measures; magnitude spectrograms; non-negative approximation; non-negative matrix factorization; reverberant speech signal; room impulse response; single-channel speech dereverberation method; speech spectral structure; speech spectrum; speech temporal dynamics; Acoustics; Cost function; Dictionaries; Spectrogram; Speech; Speech enhancement; Non-negative convolutive transfer function; dictionary-based processing; non-negative matrix factorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178804
Filename :
7178804
Link To Document :
بازگشت