DocumentCode
893885
Title
Orthogonal transformations of stacked feature vectors applied to HMM speech recognition
Author
Flaherty, M.J. ; Roe, D.B.
Author_Institution
Telecom Australia Res. Labs., Clayton, Vic., Australia
Volume
140
Issue
2
fYear
1993
fDate
4/1/1993 12:00:00 AM
Firstpage
121
Lastpage
126
Abstract
The authors report improvements in speech recognition accuracy by using more sophisticated time analysis as part of the feature selection process. The recognition methodology utilises hidden Markov modelling with continuous density functions. The authors propose using, as speech features, linear transformations of the vector consisting of successive time samples of the cepstrum. Taylor series, the Legendre polynomial transform and the discrete cosine transform share several properties with principal components analysis. These transforms are expected to improve speech recognition accuracy by incorporating higher-order time derivatives (such as the second time derivative) of spectral information while at the same time producing an essentially diagonal covariance. In an experimental evaluation of these ideas, accuracy in speaker-independent recognition of the E-set of the alphabet improved from 55%, with no time varying information, to 68% with first-order time varying information, and 74%, by including second-order time varying information.<>
Keywords
feature extraction; hidden Markov models; spectral analysis; speech recognition; transforms; E-set; HMM speech recognition; Legendre polynomial transform; Taylor series; alphabet; cepstrum; continuous density functions; diagonal covariance; discrete cosine transform; feature selection process; first-order time varying information; hidden Markov modelling; higher-order time derivatives; linear transformations; orthogonal transformations; principal components analysis; second-order time varying information; speaker-independent recognition; spectral information; speech features; speech recognition accuracy; stacked feature vectors; successive time samples; time analysis;
fLanguage
English
Journal_Title
Communications, Speech and Vision, IEE Proceedings I
Publisher
iet
ISSN
0956-3776
Type
jour
Filename
212654
Link To Document