DocumentCode :
2235703
Title :
Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition
Author :
Huang, Jing ; Marcheret, Etienne ; Visweswariah, Karthik
Author_Institution :
IBM T.J Watson Res. Center, Yorktown Heights, NY
fYear :
2005
fDate :
6-6 July 2005
Firstpage :
338
Lastpage :
341
Abstract :
Multi-stream hidden Markov models (HMMs) have recently been very successful in audio-visual speech recognition, where the audio and visual streams are fused at the final decision level. In this paper we investigate fast feature space speaker adaptation using multi-stream HMMs for audio-visual speech recognition. In particular, we focus on studying the performance of feature-space maximum likelihood linear regression (fMLLR), a fast and effective method for estimating feature space transforms. Unlike the common speaker adaptation techniques of MAP or MLLR, fMLLR does not change the audio or visual HMM parameters, but simply applies a single transform to the testing features. We also address the problem of fast and robust on-line fMLLR adaptation using feature space maximum a posterior linear regression (fMAPLR). Adaptation experiments are reported on the IBM infrared headset audio-visual database. On average for a 20-speaker 1 hour independent test set, the multi-stream fMLLR achieves 31% relative gain on the clean audio condition, and 59% relative gain on the noisy audio condition (approximately 7 dB) as compared to the baseline multi-stream system
Keywords :
audio databases; feature extraction; hidden Markov models; maximum likelihood estimation; regression analysis; speaker recognition; video databases; IBM infrared headset; audio-visual database; fMLLR; feature space maximum likelihood linear regression; hidden Markov model; multistream HMM; speaker adaptation technique; speech recognition; Audio databases; Hidden Markov models; Linear regression; Maximum likelihood linear regression; Robustness; Spatial databases; Speech recognition; Streaming media; Testing; Visual databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on
Conference_Location :
Amsterdam
Print_ISBN :
0-7803-9331-7
Type :
conf
DOI :
10.1109/ICME.2005.1521429
Filename :
1521429
Link To Document :
بازگشت