Title :
Selecting non-uniform units from a very large corpus for concatenative speech synthesizer
Author :
Chu, Min ; Peng, Hu ; Yang, Hong-yun ; Chang, Eric
Author_Institution :
Microsoft Res. China, Beijing, China
Abstract :
This paper proposes a two-module text to speech system (TTS) structure, which bypasses the prosody model that predicts numerical prosodic parameters for synthetic speech. Instead, many instances of each basic unit from a large speech corpus are classified into categories by a classification and regression tree (CART), in which the expectation of the weighted sum of square regression error of prosodic features is used as splitting criterion. Better prosody is achieved by keeping slender diversity in prosodic features of instances belonging to the same class. A multi-tier non-uniform unit selection method is presented. It makes the best decision on unit selection by minimizing the concatenated cost of a whole utterance. Since the largest available and suitable units are selected for concatenating, distortion caused by mismatches at concatenated points is minimized. Very natural and fluent speech is synthesized, according to informal listening test
Keywords :
speech intelligibility; speech synthesis; trees (mathematics); CART; Mandarin speech corpus; classification and regression tree; concatenative speech synthesizer; large speech corpus; nonuniform units selection; numerical prosodic parameters prediction; text to speech system; two-module TTS structure; weighted sum of square regression error; Concatenated codes; Numerical models; Predictive models; Signal processing; Signal synthesis; Speech analysis; Speech processing; Speech synthesis; Synthesizers; Timbre;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
Conference_Location :
Salt Lake City, UT
Print_ISBN :
0-7803-7041-4
DOI :
10.1109/ICASSP.2001.941032