مرکز منطقه ای اطلاع رساني علوم و فناوري - Study on time-dependent voice quality variation in a large-scale single speaker speech corpus used for speech synthesis

DocumentCode :

1936806

Title :

Study on time-dependent voice quality variation in a large-scale single speaker speech corpus used for speech synthesis

Author :

Kawai, Hisashi ; Tsuzaki, Minoru

Author_Institution :

Spoken Language Translation Res. Labs., Adv. Telecommun. Res. Inst. Int., Kyoto, Japan

fYear :

2002

fDate :

11-13 Sept. 2002

Firstpage :

Lastpage :

Abstract :

The paper studies voice quality variation in a large-scale single speaker corpus used in recent corpus-based speech synthesis. First, a perceptual experiment is conducted to obtain scores for voice quality difference in a stimulus made by concatenating phrases collected from separate recording sessions. Second, acoustic measures are examined on their performance in classifying high and low scoring stimuli. Results show that band-limited power in the 8-16 kHz range performs best, closely followed by MFCC distance in the 0-4 kHz range, and that spectral tilts are almost irrelevant. However, the performance is not satisfactory for practical use (the equal error rate is 25%).

Keywords :

speech processing; speech synthesis; 0 to 4 kHz; 8 to 16 kHz; MFCC distance; acoustic measures; band-limited power; concatenated phrases; perceptual experiment; single speaker speech corpus; spectral tilts; speech synthesis; time-dependent voice quality variation; Acoustic measurements; Acoustic signal detection; Conducting materials; Degradation; Error analysis; Large-scale systems; Loudspeakers; Mel frequency cepstral coefficient; Natural languages; Speech synthesis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on

Print_ISBN :

0-7803-7395-2

Type :

conf

DOI :

10.1109/WSS.2002.1224362

Filename :

1224362

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1936806