• DocumentCode
    3429439
  • Title

    Joint acoustic and spectral modeling for speech dereverberation using non-negative representations

  • Author

    Mohammadiha, Nasser ; Smaragdis, Paris ; Doclo, Simon

  • Author_Institution
    Dept. of Med. Phys. & Acoust., Univ. of Oldenburg, Oldenburg, Germany
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4410
  • Lastpage
    4414
  • Abstract
    This paper proposes a single-channel speech dereverberation method enhancing the spectrum of the reverberant speech signal. The proposed method uses a non-negative approximation of the convolutive transfer function (N-CTF) to simultaneously estimate the magnitude spectrograms of the speech signal and the room impulse response (RIR). To utilize the speech spectral structure, we propose to model the speech spectrum using non-negative matrix factorization, which is directly used in the N-CTF model resulting in a new cost function. We derive new estimators for the parameters by minimizing the obtained cost function. Additionally, to investigate the effect of the speech temporal dynamics for dereverberation, we use a frame stacking method and derive optimal estimators. Experiments are performed for two measured RIRs and the performance of the proposed method is compared to the performance of a state-of-the-art dereverberation method enhancing the speech spectrum. Experimental results show that the proposed method improved instrumental speech quality measures, where using speech temporal dynamics was found to be beneficial in severe reverberation conditions.
  • Keywords
    matrix decomposition; reverberation; speech processing; transfer functions; transient response; N-CTF; RIR; convolutive transfer function; cost function; frame stacking method; instrumental speech quality measures; magnitude spectrograms; non-negative approximation; non-negative matrix factorization; reverberant speech signal; room impulse response; single-channel speech dereverberation method; speech spectral structure; speech spectrum; speech temporal dynamics; Acoustics; Cost function; Dictionaries; Spectrogram; Speech; Speech enhancement; Non-negative convolutive transfer function; dictionary-based processing; non-negative matrix factorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178804
  • Filename
    7178804