Title :
Audio-visual speaker recognition using time-varying stream reliability prediction
Author :
Chaudhari, Upendra V. ; Ramaswamy, Ganesh N. ; Potamianos, Gerasimos ; Neti, Chalapathy
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
We examine a time-varying, context dependent, information fusion methodology for multi-stream authentication based on audio and video data collected simultaneously during a user´s interaction with a system. Scores obtained from the two data streams are combined based on the relative local richness, as compared to the training data or derived model, and on the stability of each stream. The results show that the proposed technique outperforms the use of video or audio data alone as well as the use of fused data streams (via concatenation). Of particular note is that the performance improvements are achieved for clean, high quality speech, whereas previous efforts focused on degraded speech conditions.
Keywords :
audio signal processing; audio user interfaces; audio-visual systems; biometrics (access control); gesture recognition; speaker recognition; speech-based user interfaces; video signal processing; audio data; audio-visual speaker recognition; context dependent methodology; fused data streams; information fusion methodology; multi-stream authentication; relative local richness; reliability prediction; time-varying stream; training data; user interaction; video data; Authentication; Degradation; Robustness; Speaker recognition; Speech recognition; Stability; Statistics; Streaming media; Time varying systems; Training data;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
Print_ISBN :
0-7803-7663-3
DOI :
10.1109/ICASSP.2003.1200070