Vocal tract modelling with recurrent neural networks

Author

Burrows, T.L. ; Niranjan, M.

Author_Institution

Dept. of Eng., Cambridge Univ., UK

Volume

5

fYear

1995

fDate

9-12 May 1995

Firstpage

3315

Abstract

The speech production system is modelled using true glottal excitation as the source and a recurrent neural network to represent the vocal tract. The hidden nodes have multiple delays of one and two samples, making the network equivalent to a parallel formant synthesiser in the linear regions of the hidden node sigmoids. An ARX model identification is carried out to initialise the neural network parameters. These parameters are re-estimated in an analysis-by-synthesis framework to minimise the synthesis (output) error. Unlike other analysis-by-synthesis speech production models such as CELP, the source and filter in this approach are decoupled, enabling manipulation of the source time-scale to achieve high quality pitch changes

Keywords

IIR filters; delays; digital filters; error analysis; parameter estimation; recurrent neural nets; speech processing; speech synthesis; ARX model identification; analysis-by-synthesis framework; filter; hidden node sigmoids; linear regions; multiple delays; parallel formant synthesiser; pitch changes; recurrent neural networks; source time-scale; speech production; synthesis error; true glottal excitation; vocal tract modelling; Acoustic distortion; Network synthesis; Neural networks; Nonlinear distortion; Nonlinear filters; Production systems; Recurrent neural networks; Speech analysis; Speech synthesis; Vocoders;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on

Conference_Location

Detroit, MI

ISSN

1520-6149

Print_ISBN

0-7803-2431-5

Type

conf

DOI

10.1109/ICASSP.1995.479694

Filename

479694