مرکز منطقه ای اطلاع رساني علوم و فناوري - Factored Maximum Penalized Likelihood Kernel Regression for HMM-Based Style-Adaptive Speech Synthesis

DocumentCode :

1759317

Title :

Factored Maximum Penalized Likelihood Kernel Regression for HMM-Based Style-Adaptive Speech Synthesis

Author :

June Sig Sung ; Doo Hwa Hong ; Nam Soo Kim

Author_Institution :

Sch. of Electr. Eng. & INMC, Seoul Nat. Univ., Seoul, South Korea

Volume :

Issue :

fYear :

2014

fDate :

41730

Firstpage :

251

Lastpage :

261

Abstract :

Speech synthesized from the same text should sound differently depending on the speaking style. Current speech synthesis techniques based on the hidden Markov model (HMM) usually focus on a fixed speaking style and changing the speaking style requires a variety of sets of parameters trained in different speaking styles. A promising alternative is to adapt the base model to the intended speaking style. In our previous work, we proposed factored maximum likelihood linear regression (FMLLR) adaptation where each MLLR parameter is defined as a function of a control vector. We presented a method to train the FMLLR parameters based on a general framework of the expectation-maximization (EM) algorithm. In this paper, we introduce a novel technique called factored maximum penalized likelihood kernel regression (FMLKR) for HMM-based style adaptive speech synthesis. In FMLKR, nonlinear regression between the mean vector of the base model and the corresponding mean vectors of the adaptation data is performed with the use of kernel method based on the FMLLR framework. In a series of experiments on artificial generation of singing voice and expressive speech, we evaluate the performance of the FMLLR and FMLKR techniques with various matrix structures and also compare with other approaches to parameter adaptation in HMM-based speech synthesis.

Keywords :

expectation-maximisation algorithm; hidden Markov models; matrix algebra; regression analysis; speech synthesis; vectors; EM algorithm; FMLKR techniques; FMLLR adaptation; HMM-based style-adaptive speech synthesis; artificial generation; base model; expectation-maximization algorithm; expressive speech; factored maximum penalized likelihood kernel regression; general framework; hidden Markov model; matrix structures; mean vectors; nonlinear regression; parameter adaptation; singing voice; speaking style; Adaptation models; Covariance matrices; Hidden Markov models; Kernel; Speech; Speech synthesis; Vectors; Expressive speech synthesis; FMLKR; MLLR; MRHSMM; factored MLLR; kernel method; parameter adaptation; singing voice;

fLanguage :

English

Journal_Title :

Selected Topics in Signal Processing, IEEE Journal of

Publisher :

ieee

ISSN :

1932-4553

Type :

jour

DOI :

10.1109/JSTSP.2014.2305131

Filename :

6734665

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1759317