Title :
Video assisted speech source separation
Author :
Wang, Wenwu ; Cosker, Darren ; Hicks, Yulia ; Sanei, Saeid ; Chambers, Jonathon
Author_Institution :
Cardiff Sch. of Eng., Cardiff Univ., UK
Abstract :
We investigate the problem of integrating the complementary audio and visual modalities for speech separation. Rather than using independence criteria suggested in most blind source separation (BSS) systems, we use visual features from a video signal as additional information to optimize the unmixing matrix. We achieve this by using a statistical model characterizing the nonlinear coherence between audio and visual features as a separation criterion for both instantaneous and convolutive mixtures. We acquire the model by applying the Bayesian framework to the fused feature observations based on a training corpus. We point out several key existing challenges to the success of the system. Experimental results verify the proposed approach, which outperforms the audio only separation system in a noisy environment, and also provides a solution to the permutation problem.
Keywords :
Bayes methods; blind source separation; feature extraction; matrix algebra; optimisation; sensor fusion; speech processing; statistical analysis; video signal processing; Bayesian framework; audio features; blind source separation; convolutive mixtures; feature extraction; feature fusion; instantaneous mixtures; nonlinear coherence; speech separation; statistical model; unmixing matrix optimization; video assisted speech source separation; video signal; visual features; Bayesian methods; Blind source separation; Coherence; Computer science; Frequency domain analysis; Humans; Signal processing algorithms; Source separation; Speech enhancement; Working environment noise;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
Print_ISBN :
0-7803-8874-7
DOI :
10.1109/ICASSP.2005.1416331