Title :
Audio-visual convolutive blind source separation
Author :
Qingju Liu ; Wenwu Wang ; Jackson, P.
Author_Institution :
Centre for Vision, Speech & Signal Process. Univ. of Surrey, Guildford, UK
Abstract :
We present a novel method for speech separation from their audio mixtures using the audio-visual coherence. It consists of two stages: in the off-line training process, we use the Gaussian mixture model to characterise statistically the audio- visual coherence with features obtained from the training set; at the separation stage, likelihood maximization is performed on the independent component analysis (ICA)-separated spectral components. To address the permutation and scaling indeterminacies of the frequency-domain blind source separation (BSS), a new sorting and rescaling scheme using the bimodal coherence is proposed. We tested our algorithm on the XM2VTS database, and the results show that our algorithm can address the permutation problem with high accuracy, and mitigate the scaling problem effectively.
Keywords :
Gaussian processes; blind source separation; frequency-domain analysis; independent component analysis; optimisation; Gaussian mixture model; ICA-separated spectral component; XM2VTS database; audio mixture; audio-visual coherence; audio-visual convolutive blind source separation; bimodal coherence; frequency-domain BSS; independent component analysis-separated spectral component; likelihood maximization; off-line training processing; rescaling scheme; speech separation;
Conference_Titel :
Sensor Signal Processing for Defence (SSPD 2010)
Conference_Location :
London
DOI :
10.1049/ic.2010.0225