مرکز منطقه ای اطلاع رساني علوم و فناوري - Multi-modal speech recognition using correlativity between modality

DocumentCode :

2389343

Title :

Multi-modal speech recognition using correlativity between modality

Author :

Sato, Yuki ; Hamada, Nozomu

Author_Institution :

Dept. of Syst. Design Eng., Keio Univ., Yokohama, Japan

fYear :

2010

fDate :

6-8 Dec. 2010

Firstpage :

Lastpage :

Abstract :

In recent years, to achieve robust speech recognition against noises, Audio-Visual Speech Recognition(AVSR) system utilizing not only audio but also visual information of lip has been studied. This paper proposes a decision method of the weight called stream exponent representing reliability ratio of audio and visual features. The method focuses on the correlation between audio and visual modality in order to estimate the optimal stream exponent. Furthermore, we modified the stream exponent using periodicity of speech, such as pitch, to handle abrupt noises. An audio-visual database is comprised of specific speaker´s lip image sequences and audio sequences. The contents of the utterance are Japanese counting numbers and sound-alike words. Using this database we constructed the AVSR system and performed an evaluation experiment. The obtained results verify the availability of the proposed method under a variety of noisy environment.

Keywords :

audio-visual systems; image sequences; speech recognition; AVSR; audio modality; audio sequences; audio-visual system; image sequences; speech recognition; visual modality; Computational modeling; Noise; Audio-Visual speech recognition; real environment noise; stream exponent;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Intelligent Signal Processing and Communication Systems (ISPACS), 2010 International Symposium on

Conference_Location :

Chengdu

Print_ISBN :

978-1-4244-7369-4

Type :

conf

DOI :

10.1109/ISPACS.2010.5704657

Filename :

5704657

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2389343