Title :
Automatic speech recognition using audio visual cues
Author :
Yashwanth, H. ; Mahendrakar, Harish ; David, Sumam
Author_Institution :
Dept. of Electron. & Commun., Nat. Inst. of Technol., Karnataka, India
Abstract :
Automatic speech recognition (ASR) systems have been able to gain much popularity since many multimedia applications require robust speech recognition algorithms. The use of audio and visual information in the speaker-independent continuous speech recognition process makes the performance of the system better compared to the ones with only the audio information. There has been a marked increase in the recognition rates by the use of visual data to aid the audio data available. This is due to the fact that video information is less susceptible to ambient noise than audio information. In this paper a robust audio-video speech recognition (AVSR) system that allows us to incorporate the coupled hidden Markov model (CHMM) model for fusion of audio and video modalities is presented. The application records the input data and recognizes the isolated words in the input file over a wide range of signal to noise ratio (SNR). The experimental results show a remarkable increase of about 10% in the recognition rate in the AVSR compared to the audio only ASR and 20% compared to the video only ASR for an SNR of 5 dB.
Keywords :
audio-visual systems; hidden Markov models; image sequences; multimedia systems; speech recognition; video signal processing; AVSR system; CHMM; audio-video speech recognition; audio-visual information; automatic speaker-independent continuous speech recognition; coupled hidden Markov model; data visualisation; isolated words recognition; multimedia application; Application software; Automatic speech recognition; Background noise; Computer interfaces; Hidden Markov models; Humans; Noise robustness; Signal to noise ratio; Speech recognition; Telephony;
Conference_Titel :
India Annual Conference, 2004. Proceedings of the IEEE INDICON 2004. First
Print_ISBN :
0-7803-8909-3
DOI :
10.1109/INDICO.2004.1497730