مرکز منطقه ای اطلاع رساني علوم و فناوري - Model-Based Expectation-Maximization Source Separation and Localization

DocumentCode :

1295090

Title :

Model-Based Expectation-Maximization Source Separation and Localization

Author :

Mandel, Michael I. ; Weiss, Ron J. ; Ellis, Daniel P W

Author_Institution :

Dept. of Electr. Eng., Columbia Univ., New York, NY, USA

Volume :

Issue :

fYear :

2010

Firstpage :

382

Lastpage :

394

Abstract :

This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their interaural phase and level differences, MESSL generates masks that can be used to isolate individual sound sources. We first describe a probabilistic model of interaural parameters that can be evaluated at individual spectrogram points. By creating a mixture of these models over sources and delays, the multi-source localization problem is reduced to a collection of single source problems. We derive an expectation-maximization algorithm for computing the maximum-likelihood parameters of this mixture model, and show that these parameters correspond well with interaural parameters measured in isolation. As a byproduct of fitting this mixture model, the algorithm creates probabilistic spectrogram masks that can be used for source separation. In simulated anechoic and reverberant environments, separations using MESSL produced on average a signal-to-distortion ratio 1.6 dB greater and perceptual evaluation of speech quality (PESQ) results 0.27 mean opinion score units greater than four comparable algorithms.

Keywords :

acoustic signal processing; expectation-maximisation algorithm; source separation; interaural phase; maximum-likelihood parameters; model-based expectation-maximization source separation; multiple sound sources; probabilistic model; probabilistic spectrogram masks; reverberant two-channel recording; spectrogram points; speech quality; Maximum-likelihood estimation; speech enhancement; time–frequency masking; underdetermined source separation;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2029711

Filename :

5200357

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1295090