DocumentCode :
239659
Title :
Modelling interaural level and phase cues with Student´s t-distribution for robust clustering in MESSL
Author :
Zohny, Zeinab ; Chambers, Jonathon
Author_Institution :
Adv. Signal Process. Group, Loughborough Univ., Loughborough, UK
fYear :
2014
fDate :
20-23 Aug. 2014
Firstpage :
59
Lastpage :
62
Abstract :
The state of the art model-based expectation maximization source separation and localization (MESSL) algorithm successfully separates multiple sound sources from only two-channel reverberant mixtures. Since MESSL achieves under-determined convolutive blind source separation by essentially clustering spectrogram points based on their interaural spatial cues, the performance of MESSL degrades substantially when the speech sources are in close proximity. In this paper, we therefore enhance its performance by the integration of robust clustering based on the Student´s t-distribution. This heavy-tailed distribution, as compared to the Gaussian distribution originally used in MESSL for parametric modelling, can potentially better capture outlier values and thereby lead to more accurate probabilistic masks for source separation. The student´s t-distribution is exploited in modelling both the interaural phase difference (IPD) and the interaural level difference (ILD) in order to better represent the uncertainties introduced by noise, reverberations as well as the statistical non-stationarity of speech signals. Simulation studies based on speech mixtures formed from the TIMIT database confirm the advantage of the proposed approach in terms of signal to distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ).
Keywords :
Gaussian distribution; audio databases; blind source separation; pattern clustering; speech processing; Gaussian distribution; ILD; IPD; MESSL; PESQ; SDR; TIMIT database; close proximity; clustering spectrogram points; convolutive blind source separation; heavy tailed distribution; interaural level difference; interaural level modeling; interaural phase difference; interaural spatial cues; parametric modelling; perceptual evaluation of speech quality; phase cues; robust clustering; signal to distortion ratio; speech signals; speech sources; student t-distribution; Complexity theory; Gaussian distribution; Microphones; Robustness; Source separation; Spectrogram; Speech; Blind source separation; clustering; expectation-maximization; mixture models; time-frequency masking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Signal Processing (DSP), 2014 19th International Conference on
Conference_Location :
Hong Kong
Type :
conf
DOI :
10.1109/ICDSP.2014.6900777
Filename :
6900777
Link To Document :
بازگشت