مرکز منطقه ای اطلاع رساني علوم و فناوري - A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition

DocumentCode :

2491628

Title :

A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition

Author :

Terry, Louis ; Katsaggelos, Aggelos K.

Author_Institution :

Dept. of Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL

fYear :

2008

fDate :

8-11 Dec. 2008

Firstpage :

Lastpage :

Abstract :

This work extends and improves a recently introduced (Dec. 2007) dynamic Bayesian network (DBN) based audio-visual automatic speech recognition (AV-ASR) system. That system models the audio and visual components of speech as being composed of the same sub-word units when, in fact, this is not psycholinguistically true. We extend the system to model the audio and visual streams as being composed of separate, yet related, sub-word units. We also introduce a novel stream weighting structure incorporated into the model itself. In doing so, our system makes improvements in word error rate (WER) and overall recognition accuracy in a large vocabulary continuous speech recognition task (LVCSR). The ldquobestrdquo performing proposed system attains a WER of 66.71%whereas the ldquobestrdquo baseline system performs at a WER of 64.30%. The proposed system also improves accuracy to 45.95% from 39.40%.

Keywords :

belief networks; speech recognition; audio-visual automatic speech recognition; large vocabulary continuous speech recognition; phone-viseme dynamic Bayesian network; stream weighting structure; word error rate; Automatic speech recognition; Bayesian methods; Defense industry; Error analysis; Hidden Markov models; Modems; Psychology; Speech recognition; Streaming media; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Pattern Recognition, 2008. ICPR 2008. 19th International Conference on

Conference_Location :

Tampa, FL

ISSN :

1051-4651

Print_ISBN :

978-1-4244-2174-9

Electronic_ISBN :

1051-4651

Type :

conf

DOI :

10.1109/ICPR.2008.4761927

Filename :

4761927

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2491628