Title :
Interference Reduction in Reverberant Speech Separation With Visual Voice Activity Detection
Author :
Qingju Liu ; Aubrey, Andrew J. ; Wenwu Wang
Author_Institution :
Centre for Vision, Speech & Signal Process., Univ. of Surrey, Guildford, UK
Abstract :
The visual modality, deemed to be complementary to the audio modality, has recently been exploited to improve the performance of blind source separation (BSS) of speech mixtures, especially in adverse environments where the performance of audio-domain methods deteriorates steadily. In this paper, we present an enhancement method to audio-domain BSS with the integration of voice activity information, obtained via a visual voice activity detection (VAD) algorithm. Mimicking aspects of human hearing, binaural speech mixtures are considered in our two-stage system. Firstly, in the off-line training stage, a speaker-independent voice activity detector is formed using the visual stimuli via the adaboosting algorithm. In the on-line separation stage, interaural phase difference (IPD) and interaural level difference (ILD) cues are statistically analyzed to assign probabilistically each time-frequency (TF) point of the audio mixtures to the source signals. Next, the detected voice activity cues (found via the visual VAD) are integrated to reduce the interference residual. Detection of the interference residual takes place gradually, with two layers of boundaries in the correlation and energy ratio map. We have tested our algorithm on speech mixtures generated using room impulse responses at different reverberation times and noise levels. Simulation results show performance improvement of the proposed method for target speech extraction in noisy and reverberant environments, in terms of signal-to-interference ratio (SIR) and perceptual evaluation of speech quality (PESQ).
Keywords :
blind source separation; feature extraction; learning (artificial intelligence); signal detection; speech processing; statistical analysis; AdaBoosting algorithm; BSS; ILD cue; IPD cue; PESQ; SIR; VAD algorithm; audio modality; audio-domain methods; binaural speech mixtures; blind source separation; correlation-energy ratio map; human hearing; interaural level difference; interaural phase difference; interference reduction; interference residual; noise levels; perceptual evaluation of speech quality; reverberant speech separation; reverberation times; room impulse response; signal-to-interference ratio; speaker-independent voice activity detector; speech extraction; speech mixtures; statistical analysis; visual modality; visual voice activity detection; voice activity information; Feature extraction; Hidden Markov models; Interference; Noise measurement; Speech; Training; Visualization; Adaboosting; binaural; blind source separation; interference removal; visual voice activity detection;
Journal_Title :
Multimedia, IEEE Transactions on
DOI :
10.1109/TMM.2014.2322824