Title :
Automatic detection of unnatural word-level segments in unit-selection speech synthesis
Author :
Wang, William Yang ; Georgila, Kallirroi
Abstract :
We investigate the problem of automatically detecting unnatural word-level segments in unit selection speech synthesis. We use a large set of features, namely, target and join costs, language models, prosodic cues, energy and spectrum, and Delta Term Frequency Inverse Document Frequency (TF-IDF), and we report comparative results between different feature types and their combinations. We also compare three modeling methods based on Support Vector Machines (SVMs), Random Forests, and Conditional Random Fields (CRFs). We then discuss our results and present a comprehensive error analysis.
Keywords :
speech synthesis; support vector machines; CRF; SVM; TF-IDF; automatic detection; comprehensive error analysis; conditional random fields; delta term frequency inverse document frequency; language models; prosodic cues; random forests; selection speech synthesis; support vector machines; unit-selection speech synthesis; unnatural word-level segments; Acoustics; Feature extraction; Humans; Speech; Speech synthesis; Testing; Training;
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Conference_Location :
Waikoloa, HI
Print_ISBN :
978-1-4673-0365-1
Electronic_ISBN :
978-1-4673-0366-8
DOI :
10.1109/ASRU.2011.6163946