Direct identification vs. correlated models to process acoustic and articulatory informations in automatic speech recognition

Author

André-Obrecht, Regine ; Jacob, Bruno

Author_Institution

IRIT, Univ. Paul Sabatier, Toulouse, France

Volume

2

fYear

1997

fDate

21-24 Apr 1997

Firstpage

999

Abstract

Our work deals with the classical problem of merging heterogenous and asynchronous parameters. It is well known that lip reading improves the speech recognition score, especially in noise conditions; so we study more precisely the modeling of the acoustic and labial parameters to propose two automatic speech recognition systems: a direct identification is performed by using a classical HMM approach, no correlation between visual and acoustic parameters is assumed; and two correlated models, a master HMM and a slave HMM, process respectively the labial observations and the acoustic ones. To assess each approach, we use a segmental pre-processing method. Our task is the recognition of spelled French letters, in clear and noisy (cocktail party) environments. Whatever the approach and conditions, the introduction of labial features improves the performance, but the difference between the two models is not enough sufficient to provide any priority

Keywords

acoustic signal processing; correlation methods; feature extraction; hidden Markov models; image processing; natural languages; noise; speech processing; speech recognition; HMM; acoustic information processing; acoustic parameters modeling; articulatory information processing; asynchronous parameters; automatic speech recognition; clear environment; cocktail party environment; correlated models; direct identification; heterogenous parameters; labial features; labial parameters modeling; linguistic decoder; lip reading; master HMM; noise conditions; noisy environment; segmental preprocessing method; slave HMM; speech recognition score; spelled French letters; visual parameters; Acoustic noise; Automatic speech recognition; Decoding; Hidden Markov models; Jacobian matrices; Lips; Master-slave; Optical noise; Speech recognition; Working environment noise;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on

Conference_Location

Munich

ISSN

1520-6149

Print_ISBN

0-8186-7919-0

Type

conf

DOI

10.1109/ICASSP.1997.596108

Filename

596108