Title :
Scaling of waveform segments along the time axis for concatenative speech synthesis
Author :
Nishizawa, Nobuyuki ; Kawai, Hisashi
Author_Institution :
ATR Spoken Language Translation Res. Labs., Kyoto, Japan
Abstract :
Waveform scaling along the time axis is introduced as a pitch and duration conversion method for concatenative speech synthesis. This method will affect F0, duration and spectrum, although no degradation of the naturalness is caused when the scaling ratio is nearly 1. In corpus-based concatenative speech synthesis, when there are many segment candidates with various F0 values or durations, excessive scaling may be unnecessary. The result of experiments indicated that the difference in F0 and duration between the target and a selected segment became smaller. However, it also showed that the conventional cost function in selection cannot represent the degradation of naturalness by spectral distortion, and that the scaling range without degradation may not be enough for the pitch conversion required in our synthesizer. These problems should be improved by wider range scaling with a new cost function that also considers the degradation.
Keywords :
hidden Markov models; speech synthesis; waveform analysis; HMM; concatenative speech synthesis; corpus-based speech synthesis; duration conversion method; hidden Markov model; naturalness degradation cost function; noninteger-ratio sampling frequency converter; pitch conversion method; sampling frequency conversion; synthetic speech quality; text to speech synthesis; time axis; waveform scaling; waveform segment scaling; Acoustic signal processing; Cost function; Degradation; Laboratories; Natural languages; Robustness; Signal generators; Signal synthesis; Speech synthesis; Synthesizers;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
Print_ISBN :
0-7803-8484-9
DOI :
10.1109/ICASSP.2004.1326077