مرکز منطقه ای اطلاع رساني علوم و فناوري - A probabilistic principal component analysis based hidden Markov model for audio-visual speech recognition

DocumentCode :

2088161

Title :

A probabilistic principal component analysis based hidden Markov model for audio-visual speech recognition

Author :

Ma, Zhanyu ; Leijon, Arne

Author_Institution :

Sound & Image Process. Lab., R. Inst. of Technol., Stockholm

fYear :

2008

fDate :

26-29 Oct. 2008

Firstpage :

2170

Lastpage :

2173

Abstract :

Lipreading is an efficient method among those proposed to improve the performance of speech recognition systems, especially in acoustic noisy environments. This paper proposes a simple audio-visual speech recognition (AVSR) system, which could improve the robustness and accuracy of audio speech recognition by integrating the synchronous audio and visual information. We propose a hidden Markov model (HMM) based on the probabilistic principal component analysis (PCA) for the visual-only speech recognition and the visual modality of the audio-visual speech recognition. The probabilistic PCA based HMM directly uses the images which only contain the speaker´s mouth region without pre-processing (mouth corner detection, contour marking, etc), and takes probabilistic PCA as the observation probability density function (PDF). Then we integrate these two modalities information (audio and visual) together and obtain a multi-stream hidden Markov model (MSHMM). We found that, without extracting the specialized features before processing, probabilistic PCA could capture the principal components during the training and describe the visual part of the materials. It is also verified by the experiments that the integration of the audio and visual information could help to improve the recognition accuracy even at a low acoustic signal-to-noisy ratio (SNR).

Keywords :

hidden Markov models; principal component analysis; speech recognition; acoustic noisy environments; audio speech recognition; audio-visual speech recognition system; hidden Markov model; low acoustic signal-to-noisy ratio; probabilistic principal component analysis; speech recognition systems; visual modality; visual-only speech recognition; Acoustic noise; Data mining; Feature extraction; Hidden Markov models; Mouth; Principal component analysis; Probability density function; Robustness; Speech recognition; Working environment noise; audio-visual speech recognition; multi-stream hidden Markov model; probabilistic PCA;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signals, Systems and Computers, 2008 42nd Asilomar Conference on

Conference_Location :

Pacific Grove, CA

ISSN :

1058-6393

Print_ISBN :

978-1-4244-2940-0

Electronic_ISBN :

1058-6393

Type :

conf

DOI :

10.1109/ACSSC.2008.5074819

Filename :

5074819

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2088161