مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning continuous representation of text for phone duration modeling in statistical parametric speech synthesis

DocumentCode :

3744832

Title :

Learning continuous representation of text for phone duration modeling in statistical parametric speech synthesis

Author :

Sai Krishna Rallabandi;Sai Sirisha Rallabandi;Padmini Bandi;Suryakanth V Gangashetty

Author_Institution :

International Institute of Information Technology - Hyderabad, India

fYear :

2015

Firstpage :

111

Lastpage :

115

Abstract :

In this paper, we investigate the usage of a continuous representation based approach of the feature vector derived from input text to predict the phone durations in a Text to Speech(TTS) system. We pose the problem of predicting the duration as a data driven statistical transformation from the input text onto the feature space. First we present a method to map both the categorical and numeric features that are typically used into a continuous numeric representation and then model it as a form of Matrix Factorization to improve the representation. The proposed system is evaluated based on Root Mean Squared Error(RMSE) as the objective measure and Mean Opinion Score(MOS) as the subjective measure. We find that the system performs on par with the state of the art duration modeling systems both subjectively and objectively.

Keywords :

"Predictive models","Context","Training","Matrix decomposition","Adaptation models","Pragmatics","Symmetric matrices"

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on

Type :

conf

DOI :

10.1109/ASRU.2015.7404782

Filename :

7404782

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3744832