مرکز منطقه ای اطلاع رساني علوم و فناوري - Articulatory Control of HMM-Based Parametric Speech Synthesis Using Feature-Space-Switched Multiple Regression

DocumentCode :

36276

Title :

Articulatory Control of HMM-Based Parametric Speech Synthesis Using Feature-Space-Switched Multiple Regression

Author :

Ling, Zhen-Hua ; Richmond, Korin ; Yamagishi, Junichi

Author_Institution :

iFLYTEK Speech Lab., Univ. of Sci. & Technol. of China, Hefei, China

Volume :

Issue :

fYear :

2013

fDate :

Jan. 2013

Firstpage :

207

Lastpage :

219

Abstract :

In previous work we proposed a method to control the characteristics of synthetic speech flexibly by integrating articulatory features into a hidden Markov model (HMM) based parametric speech synthesizer. In this method, a unified acoustic-articulatory model is trained, and context-dependent linear transforms are used to model the dependency between the two feature streams. In this paper, we go significantly further and propose a feature-space-switched multiple regression HMM to improve the performance of articulatory control. A multiple regression HMM (MRHMM) is adopted to model the distribution of acoustic features, with articulatory features used as exogenous “explanatory” variables. A separate Gaussian mixture model (GMM) is introduced to model the articulatory space, and articulatory-to-acoustic regression matrices are trained for each component of this GMM, instead of for the context-dependent states in the HMM. Furthermore, we propose a task-specific context feature tailoring method to ensure compatibility between state context features and articulatory features that are manipulated at synthesis time. The proposed method is evaluated on two tasks, using a speech database with acoustic waveforms and articulatory movements recorded in parallel by electromagnetic articulography (EMA). In a vowel identity modification task, the new method achieves better performance when reconstructing target vowels by varying articulatory inputs than our previous approach. A second vowel creation task shows our new method is highly effective at producing a new vowel from appropriate articulatory representations which, even though no acoustic samples for this vowel are present in the training data, is shown to sound highly natural.

Keywords :

Gaussian processes; audio databases; hidden Markov models; regression analysis; speech synthesis; EMA; GMM; Gaussian mixture model; HMM-based parametric speech synthesis; acoustic waveforms; articulatory control; articulatory space; articulatory-to-acoustic regression matrices; context-dependent linear transforms; electromagnetic articulography; exogenous explanatory variables; feature-space-switched multiple regression; hidden Markov model; speech database; state context features; task-specific context feature tailoring method; training data; unified acoustic-articulatory model; vowel identity modification task; Acoustics; Context; Hidden Markov models; Speech; Speech synthesis; Transforms; Articulatory features; Gaussian mixture model; multiple-regression hidden Markov model; speech synthesis;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2012.2215600

Filename :

6289354

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=36276