DocumentCode :
134226
Title :
Fusion of magnitude and phase-based features for objective evaluation of TTS voice
Author :
Sailor, Hardik B. ; Patil, Hemant A.
Author_Institution :
Dhirubhai Ambani Inst. of Inf. & Commun. Technol. (DA-IICT), Gandhinagar, India
fYear :
2014
fDate :
12-14 Sept. 2014
Firstpage :
521
Lastpage :
525
Abstract :
This paper analyzes the distance-based objective measures for evaluation of Text-to-Speech (TTS) systems (which is generally used objective measures). In this paper, we discuss some aspects of evaluation of speech quality of synthesized speech. Some of the limitations and issues of subjective evaluation are discussed and importance of objective measures is presented. Traditional objective measure using Dynamic Time Warping (DTW) distance is used in this work. We have used magnitude and phase-based features as well as auditory features to check effectiveness of objective measures for predicting quality of TTS voice. In particular, Mel Frequency Cepstral Coefficients (MFCC) features and phase-based Modified Group Delay-based Cepstral Coefficients (MGDCC) alone have no good correlation with subjective scores. However, feature-level fusion of MFCC and MGDCC gives better correlation than all other feature sets. With this fusion, we obtained value of correlation coefficient, -0.3 and -0.32 for Blizzard Challenge databases 2010 and 2011, respectively. The results also show significance of phase-based features for objective measures when used along with magnitude-based features. The experimental results show that distance-based measures still do not work well with Blizzard Challenge databases and need more general objective measures for measuring quality of TTS voice.
Keywords :
cepstral analysis; correlation methods; sensor fusion; speech synthesis; Blizzard Challenge databases; DTW distance; MFCC features; MGDCC; Mel frequency cepstral coefficient features; TTS voice quality measurement; TTS voice quality prediction; auditory features; correlation coefficient value; distance-based measures; distance-based objective measures; dynamic time warping distance; magnitude-based feature fusion; objective evaluation; phase-based feature fusion; phase-based modified group delay-based cepstral coefficients; speech quality evaluation; subjective evaluation; subjective scores; text-to-speech systems; Correlation; Correlation coefficient; Databases; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech processing; Correlation Coefficient; Dynamic Time Wrapping (DTW); Modified Group Delay; Objective Measures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
Type :
conf
DOI :
10.1109/ISCSLP.2014.6936618
Filename :
6936618
Link To Document :
بازگشت