مرکز منطقه ای اطلاع رساني علوم و فناوري - Modelling interaural level and phase cues with Student´s t-distribution for robust clustering in MESSL

DocumentCode :

239659

Title :

Modelling interaural level and phase cues with Student´s t-distribution for robust clustering in MESSL

Author :

Zohny, Zeinab ; Chambers, Jonathon

Author_Institution :

Adv. Signal Process. Group, Loughborough Univ., Loughborough, UK

fYear :

2014

fDate :

20-23 Aug. 2014

Firstpage :

Lastpage :

Abstract :

The state of the art model-based expectation maximization source separation and localization (MESSL) algorithm successfully separates multiple sound sources from only two-channel reverberant mixtures. Since MESSL achieves under-determined convolutive blind source separation by essentially clustering spectrogram points based on their interaural spatial cues, the performance of MESSL degrades substantially when the speech sources are in close proximity. In this paper, we therefore enhance its performance by the integration of robust clustering based on the Student´s t-distribution. This heavy-tailed distribution, as compared to the Gaussian distribution originally used in MESSL for parametric modelling, can potentially better capture outlier values and thereby lead to more accurate probabilistic masks for source separation. The student´s t-distribution is exploited in modelling both the interaural phase difference (IPD) and the interaural level difference (ILD) in order to better represent the uncertainties introduced by noise, reverberations as well as the statistical non-stationarity of speech signals. Simulation studies based on speech mixtures formed from the TIMIT database confirm the advantage of the proposed approach in terms of signal to distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ).

Keywords :

Gaussian distribution; audio databases; blind source separation; pattern clustering; speech processing; Gaussian distribution; ILD; IPD; MESSL; PESQ; SDR; TIMIT database; close proximity; clustering spectrogram points; convolutive blind source separation; heavy tailed distribution; interaural level difference; interaural level modeling; interaural phase difference; interaural spatial cues; parametric modelling; perceptual evaluation of speech quality; phase cues; robust clustering; signal to distortion ratio; speech signals; speech sources; student t-distribution; Complexity theory; Gaussian distribution; Microphones; Robustness; Source separation; Spectrogram; Speech; Blind source separation; clustering; expectation-maximization; mixture models; time-frequency masking;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Digital Signal Processing (DSP), 2014 19th International Conference on

Conference_Location :

Hong Kong

Type :

conf

DOI :

10.1109/ICDSP.2014.6900777

Filename :

6900777

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=239659