DocumentCode :
417211
Title :
A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances
Author :
Deng, Li ; Lee, Leo J. ; Attias, Hagai ; Acero, Alex
Author_Institution :
Microsoft Res., Redmond, WA, USA
Volume :
1
fYear :
2004
fDate :
17-21 May 2004
Abstract :
A novel approach is developed for efficient and accurate tracking of vocal tract resonances, which are natural frequencies of the resonator from larynx to lips, in fluent speech. The tracking algorithm is based on a version of the structured speech model consisting of continuous-valued hidden dynamics and a piecewise-linearized prediction function from resonance frequencies and bandwidths to LPC cepstra. We present details of the piecewise linearization design process and an adaptive training technique for the parameters that characterize the prediction residuals. An iterative tracking algorithm is described and evaluated that embeds both the prediction-residual training and the piecewise linearization design in an adaptive Kalman filtering framework. Experiments on tracking vocal tract resonances in Switchboard speech data demonstrate high accuracy in the results, as well as the effectiveness of residual training embedded in the algorithm. Our approach differs from traditional formant trackers in that it provides meaningful results even during consonantal closures when the supra-laryngeal source may cause no spectral prominences in speech acoustics.
Keywords :
acoustic resonance; adaptive Kalman filters; filtering theory; iterative methods; learning (artificial intelligence); linear predictive coding; piecewise linear techniques; speech; speech processing; tracking filters; LPC cepstra; Switchboard speech data; adaptive Kalman filter; adaptive training; continuous hidden dynamics; fluent speech; formant trackers; iterative tracking algorithm; natural frequencies; piecewise linearization design process; prediction-residual training; speech acoustics; structured speech model; vocal tract resonance tracking; Bandwidth; Iterative algorithms; Larynx; Linear predictive coding; Lips; Predictive models; Process design; Resonance; Resonant frequency; Speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1326046
Filename :
1326046
Link To Document :
بازگشت