Joint acoustic and spectral modeling for speech dereverberation using non-negative representations

Author

Mohammadiha, Nasser ; Smaragdis, Paris ; Doclo, Simon

Author_Institution

Dept. of Med. Phys. & Acoust., Univ. of Oldenburg, Oldenburg, Germany

fYear

2015

fDate

19-24 April 2015

Firstpage

4410

Lastpage

4414

Abstract

This paper proposes a single-channel speech dereverberation method enhancing the spectrum of the reverberant speech signal. The proposed method uses a non-negative approximation of the convolutive transfer function (N-CTF) to simultaneously estimate the magnitude spectrograms of the speech signal and the room impulse response (RIR). To utilize the speech spectral structure, we propose to model the speech spectrum using non-negative matrix factorization, which is directly used in the N-CTF model resulting in a new cost function. We derive new estimators for the parameters by minimizing the obtained cost function. Additionally, to investigate the effect of the speech temporal dynamics for dereverberation, we use a frame stacking method and derive optimal estimators. Experiments are performed for two measured RIRs and the performance of the proposed method is compared to the performance of a state-of-the-art dereverberation method enhancing the speech spectrum. Experimental results show that the proposed method improved instrumental speech quality measures, where using speech temporal dynamics was found to be beneficial in severe reverberation conditions.

Keywords

matrix decomposition; reverberation; speech processing; transfer functions; transient response; N-CTF; RIR; convolutive transfer function; cost function; frame stacking method; instrumental speech quality measures; magnitude spectrograms; non-negative approximation; non-negative matrix factorization; reverberant speech signal; room impulse response; single-channel speech dereverberation method; speech spectral structure; speech spectrum; speech temporal dynamics; Acoustics; Cost function; Dictionaries; Spectrogram; Speech; Speech enhancement; Non-negative convolutive transfer function; dictionary-based processing; non-negative matrix factorization;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178804

Filename

7178804