Joint estimation of vocal tract and nasal tract area functions from speech waveforms via auto-regression moving-average modeling and a pole assignment method

Author

Shang-Hsuan Peng ; Chao-Wen Li ; Yi-Wen Liu

Author_Institution

Dept. Electr. Eng., Nat. Tsing Hua Univ., Hsinchu, Taiwan

fYear

2015

fDate

19-24 April 2015

Firstpage

4644

Lastpage

4648

Abstract

Nasal resonance is utilized in certain languages to differentiate word meanings. The joint filtering effect by the vocal tract and the nasal tract can be modeled by the auto-regression moving-average (ARMA) approach. However, unlike all-pole (i.e., AR) modeling, it has been difficult to derive the equivalent vocal-tract area function directly from an ARMA model due to the nonlinear nature in the relation between model coefficients and vocal-tract geometry. In this paper, we propose a method to decompose an ARMA model approximately into α/C(z) + β/D(z); in our context, 1/C(z) and 1/D(z) represent the filtering effects of the oral and the nasal tract, respectively. Once the decomposition is performed, equivalent oral-tract and nasal-tract area functions can be obtained by converting C(z) and D(z) to their respective lattice representation. The proposed method was applied to non-nasalized and nasalized vowels produced by three speakers, and it was found that the ratio r = β/α tends to be higher in nasalized vowels than in their non-nasalized counterparts. The vocal-tract area function estimated by the present approach was also fairly stable for sustained vowels.

Keywords

approximation theory; autoregressive moving average processes; computational geometry; filtering theory; lattice theory; pole assignment; speaker recognition; speech coding; ARMA model; autoregression moving-average modeling; equivalent oral-tract area functions; joint filtering effect; joint nasal tract area function estimation; joint vocal tract area function estimation; lattice representation; model coefficients; nasal resonance; nasalized vowels; nonnasalized vowels; pole assignment method; speech coding; speech waveforms; vocal-tract geometry; word meaning differentiation; Acoustics; Atmospheric modeling; Electron tubes; Estimation; Lattices; Speech; Transfer functions; ARMA modeling; Speech; nasalization; vocal-tract area function;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178851

Filename

7178851