Title :
Signal modeling for high-performance robust isolated word recognition
Author :
Karnjanadecha, Montri ; Zahorian, Stephen A.
Author_Institution :
Dept. of Electr. & Comput. Eng., Old Dominion Univ., Norfolk, VA, USA
fDate :
9/1/2001 12:00:00 AM
Abstract :
This paper describes speech signal modeling techniques which are well-suited to high performance and robust isolated word recognition. We present new techniques for incorporating spectral/temporal information as a function of the temporal position within each word. In particular, spectral/temporal parameters are computed using both variable length blocks with a variable spacing between blocks. We tested features computed with these methods using an alphabet recognition task based on the ISOLET database. The hidden Markov model toolkit (HTK) was used to implement the isolated word recognizer with whole word HMM models. The best accuracy achieved for speaker independent alphabet recognition, using 50 features, was 97.9%, which represents a new benchmark for this task. We also tested these methods with deliberate signal degradation using additive Gaussian noise and telephone band limiting and found that the recognition degrades gracefully and to a smaller degree than for control cases based on MFCC coefficients and delta cepstra terms
Keywords :
Gaussian noise; bandlimited communication; cepstral analysis; hidden Markov models; parameter estimation; signal representation; speech recognition; telephony; DCT; ISOLET database; MFCC coefficients; additive Gaussian noise; alphabet recognition task; delta cepstra terms; hidden Markov model toolkit; high-performance robust isolated word recognition; linear frequency cepstrum coefficients; linear prediction cepstrum coefficients; linear prediction coefficients; mel frequency cepstrum coefficients; reflection coefficients; signal degradation; signal representations; speaker independent alphabet recognition; spectral/temporal information; spectral/temporal parameters; speech signal modeling; telephone band limiting; temporal position; variable length blocks; whole word HMM models; Additive noise; Benchmark testing; Degradation; Gaussian noise; Hidden Markov models; Mel frequency cepstral coefficient; Robustness; Spatial databases; Speech recognition; Telephony;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on