مرکز منطقه ای اطلاع رساني علوم و فناوري - Duration refinement for hybrid speech synthesis system using random forest

DocumentCode :

3703406

Title :

Duration refinement for hybrid speech synthesis system using random forest

Author :

Ran Zhang;Xiaoyan Lou;Qinghua Wu

Author_Institution :

Samsung Telecom R&D Center, Beijing, China

fYear :

2015

Firstpage :

792

Lastpage :

796

Abstract :

The hybrid speech synthesis system which combines the hidden Markov model and unit selection method has been widely used and researched in both industry and academia recently due to its naturalness and expressiveness. However, the target duration, which is used to control the duration of selected candidate, is still predicted via the state-based duration model, whose performance is far from satisfactory. As a result, the synthetic speech sounds somewhat bland and even tedious. In this paper, we replace the state-based duration model with Random Forest (RF). Experiments on English database show that the new model yields more accurate predictions, compared with the baseline state-based duration model. The average improvement of phone RMSEs are 4.265 ms and 14.6% in English speech synthesis. The perceptual experiments on the same database further confirm that proposed model have a better performance than the baseline model.

Keywords :

"Hidden Markov models","Predictive models","Speech","Training","Speech synthesis","Databases","Acoustics"

Publisher :

ieee

Conference_Titel :

Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on

Electronic_ISBN :

2156-8111

Type :

conf

DOI :

10.1109/ACII.2015.7344663

Filename :

7344663

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3703406