DocumentCode
835694
Title
Tracking vocal tract resonances using a quantized nonlinear function embeddedin a temporal constraint
Author
Deng, Li ; Acero, Alex ; Bazzi, Issam
Volume
14
Issue
2
fYear
2006
fDate
3/1/2006 12:00:00 AM
Firstpage
425
Lastpage
434
Abstract
This paper presents a new technique for high-accuracy tracking of vocal-tract resonances (which coincide with formants for nonnasalized vowels) in natural speech. The technique is based on a discretized nonlinear prediction function, which is embedded in a temporal constraint on the quantized input values over adjacent time frames as the prior knowledge for their temporal behavior. The nonlinear prediction is constructed, based on its analytical form derived in detail in this paper, as a parameter-free, discrete mapping function that approximates the "forward" relationship from the resonance frequencies and bandwidths to the Linear Predictive Coding (LPC) cepstra of real speech. Discretization of the function permits the "inversion" of the function via a search operation. We further introduce the nonlinear-prediction residual, characterized by a multivariate Gaussian vector with trainable mean vectors and covariance matrices, to account for the errors due to the functional approximation. We develop and describe an expectation-maximization (EM)-based algorithm for training the parameters of the residual, and a dynamic programming-based algorithm for resonance tracking. Details of the algorithm implementation for computation speedup are provided. Experimental results are presented which demonstrate the effectiveness of our new paradigm for tracking vocal-tract resonances. In particular, we show the effectiveness of training the prediction-residual parameters in obtaining high-accuracy resonance estimates, especially during consonantal closure.
Keywords
Gaussian processes; covariance matrices; data compression; dynamic programming; expectation-maximisation algorithm; linear predictive coding; nonlinear functions; quantisation (signal); speech coding; covariance matrices; discrete mapping function; dynamic programming; expectation-maximization algorithm; linear predictive coding; multivariate Gaussian vector; nonlinear prediction; nonnasalized vowels; quantized nonlinear function; temporal constraint; trainable mean vectors; vocal tract resonances tracking; Bandwidth; Cepstral analysis; Covariance matrix; Linear predictive coding; Natural languages; Resonance; Resonant frequency; Speech analysis; Speech coding; Vectors; Continuity constraint; dynamic programming; expectation–maximization (EM) optimization; formant; greedy search; linear predictive coding (LPC) cepstrum; nonlinear prediction; prediction residual; quantization; vocal-tract resonance (VTR);
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TSA.2005.855841
Filename
1597248
Link To Document