Scaling of waveform segments along the time axis for concatenative speech synthesis

Author

Nishizawa, Nobuyuki ; Kawai, Hisashi

Author_Institution

ATR Spoken Language Translation Res. Labs., Kyoto, Japan

Volume

1

fYear

2004

fDate

17-21 May 2004

Abstract

Waveform scaling along the time axis is introduced as a pitch and duration conversion method for concatenative speech synthesis. This method will affect F_0, duration and spectrum, although no degradation of the naturalness is caused when the scaling ratio is nearly 1. In corpus-based concatenative speech synthesis, when there are many segment candidates with various F₀ values or durations, excessive scaling may be unnecessary. The result of experiments indicated that the difference in F₀ and duration between the target and a selected segment became smaller. However, it also showed that the conventional cost function in selection cannot represent the degradation of naturalness by spectral distortion, and that the scaling range without degradation may not be enough for the pitch conversion required in our synthesizer. These problems should be improved by wider range scaling with a new cost function that also considers the degradation.

Keywords

hidden Markov models; speech synthesis; waveform analysis; HMM; concatenative speech synthesis; corpus-based speech synthesis; duration conversion method; hidden Markov model; naturalness degradation cost function; noninteger-ratio sampling frequency converter; pitch conversion method; sampling frequency conversion; synthetic speech quality; text to speech synthesis; time axis; waveform scaling; waveform segment scaling; Acoustic signal processing; Cost function; Degradation; Laboratories; Natural languages; Robustness; Signal generators; Signal synthesis; Speech synthesis; Synthesizers;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-8484-9

Type

conf

DOI

10.1109/ICASSP.2004.1326077

Filename

1326077