Title :
Using audiovisual speech processing to improve the robustness of the separation of convolutive speech mixtures
Author :
Rivet, Bertrand ; Girin, Laurent ; Jutten, Christian ; Schwartz, Jean Luc
fDate :
29 Sept.-1 Oct. 2004
Abstract :
Looking at the speaker´s face seems useful in hearing better a speech signal and extract it from the competing sources before identification. In this paper, we present a novel algorithm plugging audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) algorithms in the difficult case of convolutive mixtures. The algorithm mainly works in the frequency (transform) domain, where the convolutive mixture becomes an additive mixture for each frequency channel. Frequency by frequency separation is made by an audio BSS algorithm, and the audiovisual information is used to solve the standard source permutation problem at the output of the separation stage, for each frequency. The proposed method is shown to be efficient in the case of 2 × 2 convolutive mixtures.
Keywords :
audio signal processing; blind source separation; convolution; speech processing; transforms; audio blind source separation algorithm; audiovisual coherence; audiovisual speech processing; convolutive speech mixture; frequency transform; speech signal; Blind source separation; Filters; Frequency; Iterative closest point algorithm; Oral communication; Robustness; Signal processing; Source separation; Speech enhancement; Speech processing;
Conference_Titel :
Multimedia Signal Processing, 2004 IEEE 6th Workshop on
Print_ISBN :
0-7803-8578-0
DOI :
10.1109/MMSP.2004.1436412