مرکز منطقه ای اطلاع رساني علوم و فناوري - Robust bi-modal speech recognition based on state synchronous modeling and stream weight optimization

DocumentCode :

542218

Title :

Robust bi-modal speech recognition based on state synchronous modeling and stream weight optimization

Author :

Nakamura, Satoshi ; Kumatani, Ken´ichi ; Tamura, Satoshi

Author_Institution :

ATR Spoken Language Translation Research Laboratories, Japan

Volume :

fYear :

2002

fDate :

13-17 May 2002

Abstract :

There have been higher demands recently for Automatic Speech Recognition (ASR) systems able to operate robustly in acoustically noisy environments. This paper proposes a method to effectively integrate audio and visual information in audio-visual (bi-modal) ASR systems. Such integration inevitably necessitates modeling of the synchronization of the audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on a family of HMM composition. The proposed model can represent state synchronicity not only within a phoneme but also between phonemes. Furthermore, we also propose a rapid stream weight optimization based on GPD algorithm for noisy bi-modal speech recognition. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech. In SNR=0dB our proposed method attained 16% higher performance compared to a product HMMs without the synchronicity re-estimation.

Keywords :

Accuracy; Force; Gold; Hidden Markov models; Optimization; Robustness;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on

Conference_Location :

Orlando, FL, USA

ISSN :

1520-6149

Print_ISBN :

0-7803-7402-9

Type :

conf

DOI :

10.1109/ICASSP.2002.5743716

Filename :

5743716

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=542218