Flexible voice morphing based on linear combination of multi-speakers´ vocal tract area functions

Author

Nambu, Yoshiki ; Mikawa, Masahiko ; Tanaka, Kazuyo

Author_Institution

Grad. Sch. of Libr., Inf. & Media Studies, Univ. of Tsukuba, Tsukuba, Japan

fYear

2010

fDate

23-27 Aug. 2010

Firstpage

790

Lastpage

794

Abstract

This paper presents a flexible voice morphing method based on conversion using a linear combination of multi-speakers´ vocal tract area functions, in which phonological identity is maintained in terms of the overall interpolated area. In this system, the characteristic of vocal tract resonances is separated from that of glottal source waves using AR-HMM analysis of speech. The vocal tract resonances and glottal source wave characteristics are independently morphed. For the morphing of vocal tract resonances, log area vocal tract functions, which are derived from AR coefficients, are normalized and then processed by statistical mapping technique. For glottal source waves, statistical mapping is conducted in the cepstrum domain. Morphed speech is re-synthesized by an AR filter of converted glottal source waves which is re-synthesized using a cepstrum domain conversion. With the proposed morphing system, the continuity of formants and perceptual differences between a conventional method and the proposed method are confirmed.

Keywords

cepstral analysis; hidden Markov models; speaker recognition; speech processing; AR filter; AR-HMM speech analysis; cepstrum domain conversion; flexible voice morphing; glottal source wave characteristics; glottal source waves; linear combination; multispeaker vocal tract area functions; phonological identity; statistical mapping; vocal tract resonances; Analytical models; Cepstrum; Estimation; Hidden Markov models; Interpolation; Speech; Training;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Conference, 2010 18th European

Conference_Location

Aalborg

ISSN

2219-5491

Type

conf

Filename

7096532