Title :
Study on time-dependent voice quality variation in a large-scale single speaker speech corpus used for speech synthesis
Author :
Kawai, Hisashi ; Tsuzaki, Minoru
Author_Institution :
Spoken Language Translation Res. Labs., Adv. Telecommun. Res. Inst. Int., Kyoto, Japan
Abstract :
The paper studies voice quality variation in a large-scale single speaker corpus used in recent corpus-based speech synthesis. First, a perceptual experiment is conducted to obtain scores for voice quality difference in a stimulus made by concatenating phrases collected from separate recording sessions. Second, acoustic measures are examined on their performance in classifying high and low scoring stimuli. Results show that band-limited power in the 8-16 kHz range performs best, closely followed by MFCC distance in the 0-4 kHz range, and that spectral tilts are almost irrelevant. However, the performance is not satisfactory for practical use (the equal error rate is 25%).
Keywords :
speech processing; speech synthesis; 0 to 4 kHz; 8 to 16 kHz; MFCC distance; acoustic measures; band-limited power; concatenated phrases; perceptual experiment; single speaker speech corpus; spectral tilts; speech synthesis; time-dependent voice quality variation; Acoustic measurements; Acoustic signal detection; Conducting materials; Degradation; Error analysis; Large-scale systems; Loudspeakers; Mel frequency cepstral coefficient; Natural languages; Speech synthesis;
Conference_Titel :
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN :
0-7803-7395-2
DOI :
10.1109/WSS.2002.1224362