Synthesis of unseen context and spectral and pitch contour smoothing in concatenated text to speech synthesis

Author

Low, Phuay Hui ; Vaseghi, Saeed

Author_Institution

Department of Electronic and Computer Engineering, Brunei University, London, UB8 3PH, UK

Volume

1

fYear

2002

fDate

13-17 May 2002

Abstract

The availability and perceptual clarity of speech units, and how these units are put together during synthesis have always been the cornerstones of any high quality concatenative text-to-speech synthesis (TTS) system. The speech units are usually obtained from different sentences and contexts in a speaker-dependent speech database. One of the problems with speech units obtained this way is the occurrence of unseen contexts. Here, unseen contexts denote phonological sequences that are not acoustically represented in the selection pool during synthesis. Unseen units are expected in any concatenative TTS system because it is difficult to obtain an: acoustic representation of all possible existing contexts that could occur in speech. This paper proposes a pitch synchronous, overlap and merge method to synthesise the acoustic representation of unseen contexts from existing similar units found in the inventory. It also gives a brief description of spectral and pitch contour smoothing across concatenated units.

Keywords

Artificial neural networks; Speech; Testing; Transforms;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on

Conference_Location

Orlando, FL, USA

ISSN

1520-6149

Print_ISBN

0-7803-7402-9

Type

conf

DOI

10.1109/ICASSP.2002.5743756

Filename

5743756