مرکز منطقه ای اطلاع رساني علوم و فناوري - Spectral modification for concatenative speech synthesis

DocumentCode :

352325

Title :

Spectral modification for concatenative speech synthesis

Author :

Wouters, Johan ; Macon, Michael W.

Author_Institution :

Center for Spoken Language Understanding, Oregon Graduate Inst. of Sci. & Technol., Beaverton, OR, USA

Volume :

fYear :

2000

fDate :

2000

Abstract :

Concatenative synthesis can produce high-quality speech but is limited to the allophonic variations and voice types that were captured in the database. It would be desirable to modify speech units to remove formant discontinuities and to create new speaking styles, such as hypo- or hyper-articulated speech. Unfortunately, manipulating the spectral structure often leads to degraded speech quality. We investigate two speech modification strategies, one based on inverse filtering and the other on sinusoidal modeling, and we explain their merits and shortcomings for changing the spectral envelope in speech. We then propose a method which uses sinusoidal modeling and represents the complex sinusoidal amplitudes by an all-pole model. The all-pole model approximates the sinusoidal spectrum well, both in the amplitude and in the phase domain. We use the sinusoidal+all-pole model to control the spectral envelope in recorded speech. High-quality modified speech is generated from the model using sinusoidal synthesis. A perceptual test was conducted, which shows that the model was effective at changing vowel identities and was preferable over residual excited LPC

Keywords :

filtering theory; poles and zeros; spectral analysis; speech synthesis; all-pole model; allophonic variations; amplitude domain; complex sinusoidal amplitudes; concatenative speech synthesis; formant discontinuities; high-quality speech; hyper-articulated speech; hypo-articulated speech; inverse filtering; perceptual test; phase domain; sinusoidal modeling; sinusoidal spectrum; speaking styles; spectral envelope; spectral modification; speech modification strategies; voice types; vowel identities; Bandwidth; Databases; Degradation; Filtering; Filters; Frequency; Linear predictive coding; Natural languages; Speech synthesis; Testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on

Conference_Location :

Istanbul

ISSN :

1520-6149

Print_ISBN :

0-7803-6293-4

Type :

conf

DOI :

10.1109/ICASSP.2000.859116

Filename :

859116

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=352325