Title :
Audio-visual speaker recognition for video broadcast news: some fusion techniques
Author :
Maison, Benoit ; Neti, Chalapathy ; Senior, Andrew
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
Audio-based speaker identification degrades severely when there is a mismatch between training and test conditions either due to channel or noise. In this paper, we explore various techniques to fuse video based speaker identification with audio-based speaker identification to improve the performance under mismatched conditions. Specifically, we explore techniques to optimally determine the relative weights of the independent decisions based on audio and video to achieve the best combination. Experiments on video broadcast news data suggest that significant improvements can be achieved by the combination in acoustically degraded conditions
Keywords :
acoustic signal processing; audio signal processing; speaker recognition; video signal processing; acoustically degraded conditions; audio-based speaker identification; audio-visual speaker recognition; mismatched conditions; relative independent decision weights; video broadcast news; Broadcasting; Degradation; Face detection; Face recognition; Fuses; Loudspeakers; Multimedia communication; Speaker recognition; Telephony; Testing;
Conference_Titel :
Multimedia Signal Processing, 1999 IEEE 3rd Workshop on
Conference_Location :
Copenhagen
Print_ISBN :
0-7803-5610-1
DOI :
10.1109/MMSP.1999.793814