Joint distributional modeling with cross-correlation based features

Author

Bilmes, Jeff A.

Author_Institution

Int. Comput. Sci. Inst., Berkeley, CA, USA

fYear

1997

fDate

14-17 Dec 1997

Firstpage

148

Lastpage

155

Abstract

In maximum likelihood based speech recognition systems, it is important to accurately estimate the joint distribution of feature vectors given a particular acoustic model. We propose that by modeling the joint distribution of time localized feature vectors and statistics relating those time localized feature vectors to the relevant acoustic context, we can estimate information contained in the feature vector joint distribution without the accompanying theoretical or computational difficulties. We introduce the modcrossgram (MCG), a computational way of estimating short time spectro temporal correlation based statistics that are informative about the feature vector joint distribution. Using the standard hybrid ANN/HMM architecture, we compare a MCG based speech recognition system with a more traditional one on an isolated word speech database. We show that, in the presence of noise, the MCG based system achieves a significant reduction in word error rate over the standard system

Keywords

hidden Markov models; maximum likelihood detection; neural nets; speech recognition; statistical analysis; MCG based speech recognition system; acoustic context; acoustic model; cross correlation based features; feature vector joint distribution; feature vectors; isolated word speech database; joint distributional modeling; maximum likelihood based speech recognition systems; modcrossgram; short time spectro temporal correlation based statistics; standard hybrid ANN/HMM architecture; time localized feature vectors; word error rate; Computer architecture; Context modeling; Distributed computing; Error analysis; Hidden Markov models; Maximum likelihood estimation; Noise reduction; Spatial databases; Speech recognition; Statistical distributions;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding, 1997. Proceedings., 1997 IEEE Workshop on

Conference_Location

Santa Barbara, CA

Print_ISBN

0-7803-3698-4

Type

conf

DOI

10.1109/ASRU.1997.658999

Filename

658999