Title :
Target speaker separation in a multisource environment using speaker-dependent postfilter and noise estimation
Author :
Mowlaee, Pejman ; Saeidi, Rahim
Author_Institution :
Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria
Abstract :
In this paper, we present a novel system for enhancing a target speech corrupted in a non-stationary real-life noise scenario. The proposed system consists of one spatial beamformer based on GCC-PHAT-estimated time-delay of arrival followed by three postfilters applied in a sequential way, namely: Wiener filter, minimum mean square error estimator (MMSE) of the log-amplitude, and a model-driven postfilter (MDP) that relies on particular speech signal statistics captured by target speaker Gaussian mixture model. The beamformer accounts for the directional interferences while the MMSE speech enhancement suppresses the stationary background noise, and MDP contributes to suppress the non-stationary sources from the binaural mixture. In our evaluation, multiple objective quality metrics are used to report the speech enhancement and separation performance, averaged on the CHiME development set. The proposed system performs better than standard state-of-the-art techniques and shows comparable performance with other systems submitted to the CHiME challenge. More precisely, it is successful in suppressing the non-stationary interfering sources at different SNR levels supported by the relatively high scores for signal-to-interference-ratio.
Keywords :
Gaussian processes; Wiener filters; array signal processing; least mean squares methods; speaker recognition; speech enhancement; CHiME challenge; CHiME development set; GCC-PHAT-estimated time-delay of arrival; MDP; MMSE speech enhancement; Wiener filter; log-amplitude; minimum mean square error estimator; model-driven postfilter; multisource environment; noise estimation; nonstationary interfering sources; nonstationary real-life noise scenario; postfilters; signal-to-interference-ratio; spatial beamformer; speaker-dependent postfilter; speech signal statistics; target speaker Gaussian mixture model; target speaker separation; target speech; Estimation; Noise measurement; Signal to noise ratio; Speech; Speech enhancement; Multisource noise; non-stationary noise; speech enhancement; speech quality;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639071