DocumentCode
1094152
Title
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
Author
Davis, Steven B. ; Mermelstein, Paul
Author_Institution
Signal Technology, Inc., Santa Barbara, CA
Volume
28
Issue
4
fYear
1980
fDate
8/1/1980 12:00:00 AM
Firstpage
357
Lastpage
366
Abstract
Several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary included many phonetically similar monosyllabic words, therefore the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations. For each parameter set (based on a mel-frequency cepstrum, a linear frequency cepstrum, a linear prediction cepstrum, a linear prediction spectrum, or a set of reflection coefficients), word templates were generated using an efficient dynamic warping method, and test data were time registered with the templates. A set of ten mel-frequency cepstrum coefficients computed every 6.4 ms resulted in the best performance, namely 96.5 percent and 95.0 percent recognition with each of two speakers. The superior performance of the mel-frequency cepstrum coefficients may be attributed to the fact that they better represent the perceptually relevant aspects of the short-term speech spectrum.
Keywords
Acoustic measurements; Acoustic testing; Band pass filters; Cepstrum; Filtering; Laboratories; Loudspeakers; Nonlinear filters; Speech analysis; Speech recognition;
fLanguage
English
Journal_Title
Acoustics, Speech and Signal Processing, IEEE Transactions on
Publisher
ieee
ISSN
0096-3518
Type
jour
DOI
10.1109/TASSP.1980.1163420
Filename
1163420
Link To Document