DocumentCode
1873836
Title
Joint audio-video processing for biometric speaker identification
Author
Kanak, A. ; Erzin, E. ; Yemez, Z. ; Tekalp, A.M.
Author_Institution
Coll. of Eng., Koc Univ., Istanbul, Turkey
Volume
3
fYear
2003
fDate
6-9 July 2003
Abstract
In this paper we present a bimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system exploits not only the temporal and spatial correlations existing in speech and video signals of a speaker, but also the cross-correlation between these two modalities. Lip images extracted for each video frame are transformed onto an eigenspace. The obtained eigenlip coefficients are interpolated to match the rate of the speech signal and fused with mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a hidden Markov model (HMM) based identification system. Experimental results are also included for demonstration of the system performance.
Keywords
audio signal processing; biometrics (access control); correlation methods; eigenvalues and eigenfunctions; hidden Markov models; speaker recognition; video signal processing; biometric speaker identification; cross-correlation; eigenlip coefficients; eigenspace; hidden Markov model based identification system; joint audio-video processing; mel frequency cepstral coefficients; recognition performance; spatial correlations; speech signals; temporal correlations; video signals; Biometrics; Educational institutions; Graphics; Hidden Markov models; Laboratories; Multimedia systems; Robustness; Signal processing; Speech; Streaming media;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
Print_ISBN
0-7803-7965-9
Type
conf
DOI
10.1109/ICME.2003.1221373
Filename
1221373
Link To Document