Fast speaker adaptation of artificial neural networks for automatic speech recognition

Author

Dupont, Stéphane ; Cheboub, Leila

Author_Institution

TCTS-MULTITEL, Faculte Polytech. de Mons, Belgium

Volume

3

fYear

2000

fDate

2000

Firstpage

1795

Abstract

This paper presents a fast speaker adaptation technique dedicated to automatic speech recognition systems using artificial neural networks (ANNs) for hidden Markov models (HMMs) state probability estimation. Speaker-adapted ANNs are first obtained from the training data using affine transformations in the feature space. Similarly to the “eigenvoice” approach, principal components analysis (PCA) is then applied to these transformation matrices. The first few eigenvectors represent a small-dimensional space which captures most of the inter-speaker variability of the training set. During operation, these eigenvectors can be used to constrain the optimization of the transformation matrices for the new speakers. This optimization is performed using steepest descent with gradients obtained using backpropagation through the speaker independent ANN. We have been using state-of-the-art hybrid HMM/ANN systems trained on the Phonebook database. Supervised adaptation experiments with different amounts of data show better performance of this new technique compared to standard linear regression in the feature space: with only 20 words of adaptation data, results show a 15% relative decrease of the word error rate

Keywords

backpropagation; eigenvalues and eigenfunctions; estimation theory; hidden Markov models; neural nets; optimisation; principal component analysis; probability; speech recognition; HMMs state probability estimation; Phonebook database; affine transformations; artificial neural networks; automatic speech recognition; backpropagation; eigenvectors; eigenvoice; fast speaker adaptation; gradients; hidden Markov models; inter-speaker variability; optimization; performance; principal components analysis; speaker independent ANN; steepest descent; training data; training set; transformation matrices; word error rate; Artificial neural networks; Automatic speech recognition; Backpropagation; Constraint optimization; Hidden Markov models; Linear regression; Principal component analysis; Spatial databases; State estimation; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on

Conference_Location

Istanbul

ISSN

1520-6149

Print_ISBN

0-7803-6293-4

Type

conf

DOI

10.1109/ICASSP.2000.862102

Filename

862102