Improvement of Probabilistic Acoustic Tube model for speech decomposition

Author

Yang Zhang ; Zhijian Ou ; Hasegawa-Johnson, Mark

Author_Institution

Dept. of Electr. & Comput. Eng., Univ. of Illinois, Urbana, IL, USA

fYear

2014

fDate

4-9 May 2014

Firstpage

7929

Lastpage

7933

Abstract

Current model-based speech analysis tends to be incomplete - only a part of parameters of interest (e.g. only the pitch or vocal tract) are modeled, while the rest that might as well be important are disregarded. The drawback is that without joint modeling of parameters that are correlated, the analysis on speech parameters may be inaccurate or even incorrect. Under this motivation, we have proposed such a model called PAT (Probabilistic Acoustic Tube), where pitch, vocal tract and energy are jointly modeled. This paper proposes an improved version of PAT model, named PAT2, where both signal and probabilistic modeling are tremendously renovated. Compared to related works, PAT2 is much more comprehensive, which incorporates mixed excitation, glottal wave and phase modeling. Experimental results show its ability in decomposing speech into desirable parameters and its potential for speech synthesis.

Keywords

probability; speech processing; PAT model; PAT2; probabilistic acoustic tube model; probabilistic modeling; signal modeling; speech analysis; speech decomposition; speech parameters; vocal tract; Discrete Fourier transforms; Image reconstruction; Mel frequency cepstral coefficient; Probabilistic logic; Speech; Speech processing; Probabilistic generative model; model-based speech processing; speech modeling;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6855144

Filename

6855144