Title :
Duration refinement for hybrid speech synthesis system using random forest
Author :
Ran Zhang;Xiaoyan Lou;Qinghua Wu
Author_Institution :
Samsung Telecom R&D Center, Beijing, China
Abstract :
The hybrid speech synthesis system which combines the hidden Markov model and unit selection method has been widely used and researched in both industry and academia recently due to its naturalness and expressiveness. However, the target duration, which is used to control the duration of selected candidate, is still predicted via the state-based duration model, whose performance is far from satisfactory. As a result, the synthetic speech sounds somewhat bland and even tedious. In this paper, we replace the state-based duration model with Random Forest (RF). Experiments on English database show that the new model yields more accurate predictions, compared with the baseline state-based duration model. The average improvement of phone RMSEs are 4.265 ms and 14.6% in English speech synthesis. The perceptual experiments on the same database further confirm that proposed model have a better performance than the baseline model.
Keywords :
"Hidden Markov models","Predictive models","Speech","Training","Speech synthesis","Databases","Acoustics"
Conference_Titel :
Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on
Electronic_ISBN :
2156-8111
DOI :
10.1109/ACII.2015.7344663