Title :
Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization
Author :
Sheikhzadeh, Hamid ; Deng, Li
Author_Institution :
Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada
Abstract :
The authors describe a novel approach to speech recognition by directly modeling the statistical characteristics of the speech waveforms. This approach allows them to remove the need for using speech preprocessors, which conventionally serve a role of converting speech waveforms into frame-based speech data subject to a subsequent modeling process. Central to their method is the representation of the speech waveforms as the output of a time-varying filter excited by a Gaussian source time-varying in its power. In order to formulate a speech recognition algorithm based on this representation, the time variation in the characteristics of the filter and of the excitation source is described in a compact and parametric form of the Markov chain. They analyze in detail the comparative roles played by the filter modeling and by the source modeling in speech recognition performance. Based on the result of the analysis, they propose and evaluate a normalization procedure intended to remove the sensitivity of speech recognition accuracy to often uncontrollable speech power variations. The effectiveness of the proposed speech-waveform modeling approach is demonstrated in a speaker-dependent, discrete-utterance speech recognition task involving 18 highly confusable stop consonant-vowel syllables. The high accuracy obtained shows promising potentials of the proposed time-domain waveform modeling technique for speech recognition.
Keywords :
filtering and prediction theory; hidden Markov models; sensitivity analysis; speech analysis and processing; speech recognition; statistical analysis; stochastic processes; time series; time-domain analysis; time-varying networks; waveform analysis; Markov chain; autoregressive hidden Markov model; confusable stop consonant-vowel syllables; discrete utterance speech recognition; excitation source; filter modeling; normalization procedure; power normalisation; source modeling; speaker dependent recognition; speech power variations; speech recognition accuracy; speech recognition algorithm; speech waveform modeling; speech waveforms; statistical characteristics modelling; time varying Gaussian source; time-domain waveform modeling; time-varying filter output; waveform based speech recognition; Filters; Hidden Markov models; Performance analysis; Power system modeling; Production systems; Signal processing; Speech analysis; Speech processing; Speech recognition; Time domain analysis;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on