مرکز منطقه ای اطلاع رساني علوم و فناوري - A Multichannel MMSE-Based Framework for Speech Source Separation and Noise Reduction

DocumentCode :

61131

Title :

A Multichannel MMSE-Based Framework for Speech Source Separation and Noise Reduction

Author :

Souden, Mehrez ; Araki, Shunsuke ; Kinoshita, Keizo ; Nakatani, Takeshi ; Sawada, Hideyuki

Author_Institution :

NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan

Volume :

Issue :

fYear :

2013

fDate :

Sept. 2013

Firstpage :

1913

Lastpage :

1928

Abstract :

We propose a new framework for joint multichannel speech source separation and acoustic noise reduction. In this framework, we start by formulating the minimum-mean-square error (MMSE)-based solution in the context of multiple simultaneous speakers and background noise, and outline the importance of the estimation of the activities of the speakers. The latter is accurately achieved by introducing a latent variable that takes N+1 possible discrete states for a mixture of N speech signals plus additive noise. Each state characterizes the dominance of one of the N+1 signals. We determine the posterior probability of this latent variable, and show how it plays a twofold role in the MMSE-based speech enhancement. First, it allows the extraction of the second order statistics of the noise and each of the speech signals from the noisy data. These statistics are needed to formulate the multichannel Wiener-based filters (including the minimum variance distortionless response). Second, it weighs the outputs of these linear filters to shape the spectral contents of the signals´ estimates following the associated target speakers´ activities. We use the spatial and spectral cues contained in the multichannel recordings of the sound mixtures to compute the posterior probability of this latent variable. The spatial cue is acquired by using the normalized observation vector whose distribution is well approximated by a Gaussian-mixture-like model, while the spectral cue can be captured by using a pre-trained Gaussian mixture model for the log-spectra of speech. The parameters of the investigated models and the speakers´ activities (posterior probabilities of the different states of the latent variable) are estimated via expectation maximization. Experimental results including comparisons with the well-known independent component analysis and masking are provided to demonstrate the efficiency of the proposed framework.

Keywords :

Gaussian distribution; Wiener filters; blind source separation; expectation-maximisation algorithm; higher order statistics; least mean squares methods; probability; signal denoising; speech enhancement; Gaussian-mixture-like model; MMSE-based speech enhancement; N speech signals plus additive noise mixture; N+1 possible discrete states; acoustic noise reduction; background noise; blind source separation; expectation maximization estimation; independent component analysis; joint multichannel speech source separation; latent variable; linear filters; log-spectra; minimum variance distortionless response; minimum-mean-square error based solution; multichannel MMSE-based framework; multichannel Wiener-based filters; multichannel recordings; multiple simultaneous speakers; normalized observation vector; posterior probability; pretrained Gaussian mixture model; second order statistics; signal estimates; sound mixtures; speech signals; Blind source separation; Wiener filter; microphone arrays; minimum variance distortionless response; minimum-mean-square error; noise reduction;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2013.2263137

Filename :

6516079

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=61131