DocumentCode :
730614
Title :
Variational EM for clustering interaural phase cues in MESSL for blind source separation of speech
Author :
Zohny, Zeinab ; Naqvi, Syed Mohsen ; Chambers, Jonathon A.
Author_Institution :
Adv. Signal Process. Group, Loughborough Univ., Loughborough, UK
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
3966
Lastpage :
3970
Abstract :
The model-based expectation maximization source separation and localization (MESSL) technique is a probabilistic time-frequency masking algorithm that achieves underdetermined blind source separation of speech sources. Using only two-channel recordings, MESSL clusters spectrogram points based on their interaural spatial cues. Gaussian mixture models (GMMs) are assumed for the interaural cues and their corresponding parameters are determined by maximum likelihood estimation (MLE) via the expectation maximization (EM) framework. However, the presence of singularities and over-fitting are major drawbacks of MLE. In this paper, we investigate variational Bayesian (VB) inference for clustering spectrogram points based particularly on their interaural phase difference (IPD) cues. Variational inference overcomes the difficulties associated with the likelihood optimization and improves the separation especially when the sources are in close proximity. Simulation studies based on speech mixtures formed from the TIMIT database confirm the advantage of the proposed approach in terms of signal to distortion ratio (SDR).
Keywords :
Bayes methods; Gaussian processes; audio databases; audio signal processing; blind source separation; expectation-maximisation algorithm; inference mechanisms; mixture models; optimisation; speech processing; EM framework; GMM; Gaussian mixture models; IPD cues; MESSL cluster spectrogram; MESSL technique; MLE; SDR; TIMIT database; VB inference; blind source separation; interaural phase cues; interaural spatial cues; likelihood optimization; maximum likelihood estimation; model-based expectation maximization source separation and localization; probabilistic time-frequency masking algorithm; signal to distortion ratio; speech mixtures; speech sources; two-channel recordings; variational Bayesian inference; Bayes methods; Blind source separation; Maximum likelihood estimation; Nickel; Spectrogram; Speech; Speech processing; Blind source separation; Gaussian mixture models; expectation-maximization; time-frequency masking; variational Bayesian inference;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178715
Filename :
7178715
Link To Document :
بازگشت